初始化项目,由ModelHub XC社区提供模型
Model: jackf857/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6 Source: Original Platform
This commit is contained in:
36
.gitattributes
vendored
Normal file
36
.gitattributes
vendored
Normal file
@@ -0,0 +1,36 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||
78
README.md
Normal file
78
README.md
Normal file
@@ -0,0 +1,78 @@
|
||||
---
|
||||
library_name: transformers
|
||||
base_model: W-61/llama-3-8b-base-sft-hh-harmless-4xh200
|
||||
tags:
|
||||
- alignment-handbook
|
||||
- new-dpo
|
||||
- generated_from_trainer
|
||||
datasets:
|
||||
- Anthropic/hh-rlhf
|
||||
model-index:
|
||||
- name: llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6
|
||||
results: []
|
||||
---
|
||||
|
||||
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
||||
should probably proofread and complete it, then remove this comment. -->
|
||||
|
||||
# llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6
|
||||
|
||||
This model is a fine-tuned version of [W-61/llama-3-8b-base-sft-hh-harmless-4xh200](https://huggingface.co/W-61/llama-3-8b-base-sft-hh-harmless-4xh200) on the Anthropic/hh-rlhf dataset.
|
||||
It achieves the following results on the evaluation set:
|
||||
- Loss: 0.5318
|
||||
- Fcm Dpo/beta: 0.0183
|
||||
- Margin Dpo/margin Mean: 34.3262
|
||||
- Margin Dpo/margin Std: 53.3636
|
||||
- Logps/chosen: -139.1072
|
||||
- Logps/rejected: -178.1229
|
||||
- Logps/ref Chosen: -74.8595
|
||||
- Logps/ref Rejected: -79.5490
|
||||
- Logits/chosen: 0.6984
|
||||
- Logits/rejected: 0.6510
|
||||
|
||||
## Model description
|
||||
|
||||
More information needed
|
||||
|
||||
## Intended uses & limitations
|
||||
|
||||
More information needed
|
||||
|
||||
## Training and evaluation data
|
||||
|
||||
More information needed
|
||||
|
||||
## Training procedure
|
||||
|
||||
### Training hyperparameters
|
||||
|
||||
The following hyperparameters were used during training:
|
||||
- learning_rate: 5e-07
|
||||
- train_batch_size: 8
|
||||
- eval_batch_size: 8
|
||||
- seed: 42
|
||||
- distributed_type: multi-GPU
|
||||
- num_devices: 4
|
||||
- gradient_accumulation_steps: 2
|
||||
- total_train_batch_size: 64
|
||||
- total_eval_batch_size: 32
|
||||
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
||||
- lr_scheduler_type: cosine
|
||||
- lr_scheduler_warmup_ratio: 0.1
|
||||
- num_epochs: 1
|
||||
|
||||
### Training results
|
||||
|
||||
| Training Loss | Epoch | Step | Validation Loss | Fcm Dpo/beta | Margin Dpo/margin Mean | Margin Dpo/margin Std | Logps/chosen | Logps/rejected | Logps/ref Chosen | Logps/ref Rejected | Logits/chosen | Logits/rejected |
|
||||
|:-------------:|:------:|:----:|:---------------:|:------------:|:----------------------:|:---------------------:|:------------:|:--------------:|:----------------:|:------------------:|:-------------:|:---------------:|
|
||||
| 0.9546 | 0.3023 | 200 | 0.5605 | 0.3456 | 1.6518 | 2.9692 | -78.8826 | -85.2239 | -74.8595 | -79.5490 | 0.2179 | 0.1797 |
|
||||
| 1.1411 | 0.6047 | 400 | 0.5378 | 0.0260 | 22.3130 | 35.3835 | -110.4847 | -137.4872 | -74.8595 | -79.5490 | 0.6215 | 0.5730 |
|
||||
| 1.1307 | 0.9070 | 600 | 0.5318 | 0.0183 | 34.3262 | 53.3636 | -139.1072 | -178.1229 | -74.8595 | -79.5490 | 0.6984 | 0.6510 |
|
||||
|
||||
|
||||
### Framework versions
|
||||
|
||||
- Transformers 4.51.0
|
||||
- Pytorch 2.3.1+cu121
|
||||
- Datasets 2.21.0
|
||||
- Tokenizers 0.21.4
|
||||
23
all_results.json
Normal file
23
all_results.json
Normal file
@@ -0,0 +1,23 @@
|
||||
{
|
||||
"epoch": 0.999244142101285,
|
||||
"eval_fcm_dpo/beta": 0.017207348719239235,
|
||||
"eval_logits/chosen": 0.6921316981315613,
|
||||
"eval_logits/rejected": 0.6450904607772827,
|
||||
"eval_logps/chosen": -139.775146484375,
|
||||
"eval_logps/ref_chosen": -74.85946655273438,
|
||||
"eval_logps/ref_rejected": -79.54898834228516,
|
||||
"eval_logps/rejected": -178.9794158935547,
|
||||
"eval_loss": 0.5342118144035339,
|
||||
"eval_margin_dpo/margin_mean": 34.5147590637207,
|
||||
"eval_margin_dpo/margin_std": 53.65407180786133,
|
||||
"eval_runtime": 37.9398,
|
||||
"eval_samples": 2303,
|
||||
"eval_samples_per_second": 60.701,
|
||||
"eval_steps_per_second": 1.898,
|
||||
"total_flos": 0.0,
|
||||
"train_loss": 1.0886367675756723,
|
||||
"train_runtime": 1755.0349,
|
||||
"train_samples": 42336,
|
||||
"train_samples_per_second": 24.123,
|
||||
"train_steps_per_second": 0.377
|
||||
}
|
||||
29
config.json
Normal file
29
config.json
Normal file
@@ -0,0 +1,29 @@
|
||||
{
|
||||
"architectures": [
|
||||
"LlamaForCausalLM"
|
||||
],
|
||||
"attention_bias": false,
|
||||
"attention_dropout": 0.0,
|
||||
"bos_token_id": 128000,
|
||||
"eos_token_id": 128001,
|
||||
"head_dim": 128,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 4096,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 14336,
|
||||
"max_position_embeddings": 8192,
|
||||
"mlp_bias": false,
|
||||
"model_type": "llama",
|
||||
"num_attention_heads": 32,
|
||||
"num_hidden_layers": 32,
|
||||
"num_key_value_heads": 8,
|
||||
"pretraining_tp": 1,
|
||||
"rms_norm_eps": 1e-05,
|
||||
"rope_scaling": null,
|
||||
"rope_theta": 500000.0,
|
||||
"tie_word_embeddings": false,
|
||||
"torch_dtype": "float32",
|
||||
"transformers_version": "4.51.0",
|
||||
"use_cache": true,
|
||||
"vocab_size": 128256
|
||||
}
|
||||
17
eval_results.json
Normal file
17
eval_results.json
Normal file
@@ -0,0 +1,17 @@
|
||||
{
|
||||
"epoch": 0.999244142101285,
|
||||
"eval_fcm_dpo/beta": 0.017207348719239235,
|
||||
"eval_logits/chosen": 0.6921316981315613,
|
||||
"eval_logits/rejected": 0.6450904607772827,
|
||||
"eval_logps/chosen": -139.775146484375,
|
||||
"eval_logps/ref_chosen": -74.85946655273438,
|
||||
"eval_logps/ref_rejected": -79.54898834228516,
|
||||
"eval_logps/rejected": -178.9794158935547,
|
||||
"eval_loss": 0.5342118144035339,
|
||||
"eval_margin_dpo/margin_mean": 34.5147590637207,
|
||||
"eval_margin_dpo/margin_std": 53.65407180786133,
|
||||
"eval_runtime": 37.9398,
|
||||
"eval_samples": 2303,
|
||||
"eval_samples_per_second": 60.701,
|
||||
"eval_steps_per_second": 1.898
|
||||
}
|
||||
9
generation_config.json
Normal file
9
generation_config.json
Normal file
@@ -0,0 +1,9 @@
|
||||
{
|
||||
"bos_token_id": 128000,
|
||||
"do_sample": true,
|
||||
"eos_token_id": 128001,
|
||||
"max_length": 4096,
|
||||
"temperature": 0.6,
|
||||
"top_p": 0.9,
|
||||
"transformers_version": "4.51.0"
|
||||
}
|
||||
661
margin_logs/margins.jsonl
Normal file
661
margin_logs/margins.jsonl
Normal file
@@ -0,0 +1,661 @@
|
||||
{"epoch": 0.0, "step": 1, "batch_size": 64, "mean": -0.0013527870178222656, "std": 0.2564818859100342, "min": -0.736083984375, "p10": -0.3432229995727539, "median": 0.038166046142578125, "p90": 0.29227676391601565, "max": 0.645111083984375, "pos_frac": 0.578125, "sample": [0.1120758056640625, 0.12518310546875, 0.31621551513671875, 0.13765716552734375, -0.12592506408691406, 0.23141098022460938, -0.21887779235839844, 0.21950721740722656, 0.04480743408203125, 0.020877838134765625, 0.0570220947265625, 0.058269500732421875, -0.4338226318359375, -0.030628204345703125, 0.645111083984375, -0.395477294921875, 0.09050941467285156, 0.0007190704345703125, -0.34615325927734375, 0.016077041625976562, -0.33638572692871094, 0.293853759765625, 0.17610931396484375, 0.22386932373046875, 0.21470260620117188, -0.08536529541015625, 0.0907745361328125, -0.03816986083984375, 0.39190101623535156, 0.16336441040039062, 0.08024787902832031, -0.031158447265625, 0.08477020263671875, 0.002460479736328125, -0.242034912109375, 0.07232666015625, -0.60186767578125, 0.20531463623046875, 0.155731201171875, -0.14299774169921875, -0.25698089599609375, 0.12331962585449219, -0.26497650146484375, 0.15140533447265625, -0.0920257568359375, -0.18599319458007812, 0.19028091430664062, 0.2496490478515625, 0.42162322998046875, 0.17873382568359375, -0.1525421142578125, -0.4972076416015625, 0.32010650634765625, -0.10365867614746094, -0.233795166015625, -0.19828224182128906, -0.4018898010253906, -0.13407135009765625, -0.09596633911132812, 0.031524658203125, 0.28859710693359375, -0.192962646484375, -0.736083984375, 0.3026123046875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000001.npy"}
|
||||
{"epoch": 0.0015117157974300832, "step": 2, "batch_size": 64, "mean": 0.03744968771934509, "std": 0.2875921130180359, "min": -0.7604827880859375, "p10": -0.2812448501586914, "median": 0.03963661193847656, "p90": 0.3654294967651367, "max": 0.8134727478027344, "pos_frac": 0.5625, "sample": [0.30594635009765625, -0.24289894104003906, -0.11509323120117188, -0.13417816162109375, 0.06942558288574219, 0.36568641662597656, -0.14640045166015625, 0.1497650146484375, 0.30261993408203125, 0.10124588012695312, 0.13028717041015625, -0.0031890869140625, 0.0361480712890625, 0.5662612915039062, 0.09694290161132812, -0.01091766357421875, 0.1128997802734375, 0.0411834716796875, -0.21860504150390625, -0.1236419677734375, -0.08812713623046875, 0.10360527038574219, 0.1790008544921875, -0.5114288330078125, 0.3056755065917969, -0.14553451538085938, 0.28168487548828125, 0.26990509033203125, 0.1686878204345703, 0.038089752197265625, 0.19541168212890625, -0.10783576965332031, -0.2644004821777344, -0.19707489013671875, -0.140472412109375, 0.1349811553955078, 0.19672012329101562, -0.0714111328125, 0.53369140625, 0.1271820068359375, 0.8134727478027344, 0.2990264892578125, -0.7604827880859375, -0.08274078369140625, 0.05890846252441406, 0.029361724853515625, 0.4510040283203125, -0.1599273681640625, -0.29346656799316406, 0.10005569458007812, -0.27509117126464844, -0.1937713623046875, 0.19167327880859375, 0.28173065185546875, -0.09406471252441406, -0.3380699157714844, -0.29186248779296875, 0.36483001708984375, 0.009979248046875, 0.44391632080078125, -0.126708984375, -0.6550216674804688, 0.6160736083984375, -0.28388214111328125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000002.npy"}
|
||||
{"epoch": 0.0030234315948601664, "step": 3, "batch_size": 64, "mean": -0.01829671859741211, "std": 0.3084477484226227, "min": -0.8451728820800781, "p10": -0.45811538696289056, "median": -0.0009670257568359375, "p90": 0.33925323486328135, "max": 0.6998214721679688, "pos_frac": 0.5, "sample": [0.06826019287109375, 0.06251335144042969, 0.1527252197265625, -0.7477264404296875, 0.349822998046875, 0.01082611083984375, -0.22621726989746094, 0.169769287109375, 0.17657470703125, -0.40106201171875, 0.0079345703125, -0.233856201171875, 0.3145904541015625, 0.1657276153564453, -0.28617095947265625, 0.6367874145507812, -0.23911285400390625, 0.02264404296875, -0.058319091796875, -0.27312469482421875, -0.6590118408203125, -0.06774711608886719, -0.22212982177734375, -0.044963836669921875, -0.2561912536621094, 0.5389461517333984, 0.39606475830078125, -0.10797119140625, -0.09299468994140625, -0.08887481689453125, 0.19480133056640625, -0.5246353149414062, -0.126373291015625, -0.01955413818359375, -0.1251983642578125, -0.03257942199707031, -0.13482666015625, -0.0891265869140625, 0.18185997009277344, 0.30591583251953125, -0.04106903076171875, 0.09046173095703125, 0.2045745849609375, 0.506500244140625, 0.17519378662109375, 0.1265869140625, -0.10504913330078125, 0.10868644714355469, 0.10106277465820312, -0.64544677734375, -0.8451728820800781, -0.48256683349609375, 0.37139892578125, 0.20044708251953125, -0.008869171142578125, 0.07358551025390625, 0.11188125610351562, 0.6998214721679688, -0.17657089233398438, -0.0560302734375, 0.12055206298828125, 0.00693511962890625, -0.5354843139648438, 0.12958526611328125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000003.npy"}
|
||||
{"epoch": 0.0045351473922902496, "step": 4, "batch_size": 64, "mean": 0.005418956279754639, "std": 0.37015920877456665, "min": -0.917144775390625, "p10": -0.5449897766113281, "median": 0.00412750244140625, "p90": 0.44190483093261734, "max": 0.974609375, "pos_frac": 0.5, "sample": [-0.367919921875, -0.49576568603515625, 0.1840362548828125, -0.6299591064453125, -0.18016815185546875, -0.06969070434570312, 0.1511058807373047, 0.6607856750488281, -0.04657745361328125, 0.681243896484375, -0.24734115600585938, 0.4042243957519531, -0.4658355712890625, 0.078948974609375, -0.31443023681640625, -0.0086517333984375, 0.35399627685546875, -0.0293426513671875, 0.26995849609375, -0.588043212890625, 0.5868377685546875, 0.01690673828125, 0.13739776611328125, -0.65679931640625, 0.18723678588867188, -0.5660858154296875, 0.379425048828125, 0.3002128601074219, 0.6460227966308594, 0.3180084228515625, 0.974609375, 0.25612640380859375, 0.3545646667480469, 0.07745361328125, -0.6039810180664062, 0.4580535888671875, -0.372711181640625, -0.056850433349609375, -0.917144775390625, -0.1428680419921875, -0.09647369384765625, 0.04061126708984375, 0.462554931640625, -0.1565685272216797, 0.2568817138671875, 0.066741943359375, -0.16787338256835938, 0.028772354125976562, -0.25026702880859375, 0.02463531494140625, 0.03960418701171875, -0.2798004150390625, -0.03544807434082031, -0.060314178466796875, 0.29925537109375, 0.200592041015625, 0.1185455322265625, 0.3511924743652344, -0.08373641967773438, -0.6258010864257812, -0.3126068115234375, -0.106719970703125, -0.0235595703125, -0.060394287109375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000004.npy"}
|
||||
{"epoch": 0.006046863189720333, "step": 5, "batch_size": 64, "mean": -0.019109666347503662, "std": 0.33594104647636414, "min": -0.853485107421875, "p10": -0.45509033203124993, "median": -0.019370079040527344, "p90": 0.35085525512695315, "max": 0.6899337768554688, "pos_frac": 0.46875, "sample": [-0.3973808288574219, -0.40169525146484375, -0.036380767822265625, -0.63677978515625, 0.353851318359375, 0.2301788330078125, -0.0012798309326171875, 0.283416748046875, 0.225616455078125, -0.09307098388671875, 0.1951446533203125, -0.566192626953125, 0.15291976928710938, 0.04853630065917969, -0.47797393798828125, -0.37860107421875, -0.17564773559570312, -0.2129688262939453, -0.2896575927734375, 0.24175643920898438, -0.04656791687011719, 0.11891555786132812, 0.20682525634765625, 0.34386444091796875, -0.853485107421875, 0.61993408203125, -0.1216583251953125, -0.34440040588378906, 0.2496490478515625, 0.11761093139648438, 0.41817474365234375, -0.557281494140625, -0.503753662109375, 0.05978584289550781, -0.1270294189453125, -0.2506752014160156, 0.03632926940917969, -0.1540679931640625, 0.3193645477294922, -0.08731269836425781, 0.11246109008789062, -0.1139984130859375, -0.13596343994140625, 0.1036376953125, 0.6899337768554688, -0.09399795532226562, 0.122222900390625, -0.04270172119140625, 0.125244140625, -0.3216705322265625, -0.02619171142578125, 0.2124481201171875, 0.2414703369140625, 0.5971603393554688, 0.63507080078125, -0.25130462646484375, -0.8405838012695312, -0.23019027709960938, -0.012548446655273438, 0.05650138854980469, -0.2167224884033203, -0.072174072265625, 0.15785598754882812, 0.5730094909667969], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000005.npy"}
|
||||
{"epoch": 0.007558578987150416, "step": 6, "batch_size": 64, "mean": -0.035778701305389404, "std": 0.397733598947525, "min": -0.855316162109375, "p10": -0.4810527801513672, "median": 0.010846138000488281, "p90": 0.3136009216308594, "max": 1.331756591796875, "pos_frac": 0.515625, "sample": [0.2731208801269531, -0.14302825927734375, -0.48343658447265625, 0.088958740234375, -0.31719970703125, 0.13886642456054688, 0.10568046569824219, 0.23084259033203125, 0.04549407958984375, -0.3778095245361328, 0.0571136474609375, -0.768707275390625, -0.4754905700683594, -0.1196136474609375, -0.75189208984375, -0.0638580322265625, 0.5451278686523438, 0.6534385681152344, 0.316253662109375, -0.2897472381591797, 0.163330078125, 0.30741119384765625, 0.0041790008544921875, 0.2265491485595703, 0.21175384521484375, -0.089202880859375, -0.40545654296875, 1.331756591796875, -0.45151329040527344, 0.1540699005126953, 0.17408370971679688, 0.10140037536621094, 0.354522705078125, 0.060993194580078125, -0.8238677978515625, -0.344818115234375, -0.4262275695800781, -0.057788848876953125, -0.12702178955078125, -0.12994003295898438, -0.6267166137695312, 0.017513275146484375, -0.09649276733398438, -0.032855987548828125, -0.41709136962890625, -0.214813232421875, 0.28839111328125, 0.17238998413085938, -0.3331718444824219, -0.034259796142578125, 0.30499267578125, 1.0051116943359375, -0.44769287109375, -0.26334381103515625, -0.23876953125, 0.17908668518066406, 0.054538726806640625, 0.15026092529296875, -0.855316162109375, -0.600311279296875, 0.291717529296875, 0.35585975646972656, 0.0792694091796875, 0.07353973388671875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000006.npy"}
|
||||
{"epoch": 0.009070294784580499, "step": 7, "batch_size": 64, "mean": 0.007407456636428833, "std": 0.2482774257659912, "min": -0.6787643432617188, "p10": -0.23647212982177734, "median": 0.0021715164184570312, "p90": 0.2798984527587891, "max": 0.61041259765625, "pos_frac": 0.5, "sample": [-0.048839569091796875, -0.1798095703125, -0.063201904296875, -0.2390155792236328, 0.4372119903564453, 0.19971084594726562, 0.43604469299316406, 0.2374725341796875, 0.61041259765625, -0.078582763671875, -0.465850830078125, 0.04475593566894531, 0.09862518310546875, 0.22287368774414062, 0.023479461669921875, -0.10654449462890625, -0.239410400390625, 0.0106658935546875, 0.13978958129882812, -0.4622039794921875, -0.1766071319580078, -0.01898956298828125, -0.14611053466796875, 0.043792724609375, 0.2976531982421875, -0.028720855712890625, 0.10320281982421875, 0.18354034423828125, 0.5660171508789062, 0.006473541259765625, -0.12164306640625, 0.0680694580078125, -0.1625537872314453, 0.213775634765625, -0.20189666748046875, -0.22797393798828125, -0.23053741455078125, 0.5363616943359375, 0.09276580810546875, -0.15543746948242188, -0.0676422119140625, 0.047397613525390625, 0.2423248291015625, 0.1482086181640625, -0.6787643432617188, 0.0421142578125, -0.3233184814453125, -0.0021305084228515625, 0.01885223388671875, -0.019866943359375, -0.07284355163574219, -0.6273345947265625, 0.25542259216308594, -0.127105712890625, -0.1047210693359375, 0.28582763671875, -0.04183197021484375, -0.04628753662109375, 0.0498199462890625, 0.1208648681640625, -0.11470794677734375, 0.09033203125, 0.2660636901855469, -0.08536148071289062], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000007.npy"}
|
||||
{"epoch": 0.010582010582010581, "step": 8, "batch_size": 64, "mean": 0.012611836194992065, "std": 0.32846611738204956, "min": -0.8866424560546875, "p10": -0.4071956634521484, "median": -0.016847610473632812, "p90": 0.37947120666503914, "max": 1.0269546508789062, "pos_frac": 0.5, "sample": [0.1604022979736328, 0.61346435546875, 0.287872314453125, 0.02803802490234375, -0.2390594482421875, 0.05451202392578125, 0.20592498779296875, 0.3850250244140625, -0.26836204528808594, -0.245147705078125, -0.8866424560546875, 0.04807281494140625, -0.1128387451171875, -0.15668106079101562, 0.2505950927734375, 1.0269546508789062, 0.27603912353515625, 0.23170089721679688, -0.06249237060546875, -0.1757965087890625, -0.06474113464355469, -0.09711837768554688, -0.5432662963867188, 0.0007171630859375, 0.07992172241210938, -0.32735633850097656, -0.3666419982910156, -0.09624862670898438, -0.13498687744140625, -0.13417434692382812, 0.5986518859863281, -0.07272720336914062, -0.2881336212158203, -0.12163925170898438, -0.45755767822265625, -0.0413665771484375, 0.20841217041015625, 0.2840690612792969, 0.563079833984375, 0.35739898681640625, 0.3665122985839844, -0.0786285400390625, 0.3094520568847656, -0.567840576171875, -0.034412384033203125, -0.2887077331542969, 0.19037818908691406, -0.43798065185546875, -0.4316120147705078, -0.23267364501953125, 0.13109970092773438, 0.5684776306152344, 0.036678314208984375, -0.14395523071289062, 0.427459716796875, 0.06335830688476562, -0.19626426696777344, -0.4245758056640625, 0.3150062561035156, 0.21197128295898438, -0.10877418518066406, 0.22107696533203125, 0.08016204833984375, 0.06307601928710938], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000008.npy"}
|
||||
{"epoch": 0.012093726379440665, "step": 9, "batch_size": 64, "mean": 0.016303330659866333, "std": 0.359781414270401, "min": -0.8868408203125, "p10": -0.4028469085693359, "median": 0.0770711898803711, "p90": 0.42090988159179693, "max": 1.02154541015625, "pos_frac": 0.59375, "sample": [-0.19022369384765625, 0.36022186279296875, -0.17043685913085938, 0.27569580078125, 0.07537269592285156, -0.19940948486328125, -0.43772125244140625, -0.24607086181640625, -0.3206024169921875, -0.833282470703125, -0.14306640625, 0.27848243713378906, 0.4138336181640625, 0.24909591674804688, 0.13037109375, -0.4190826416015625, 0.16600799560546875, 0.435455322265625, 0.15651321411132812, 0.020620346069335938, -0.3165740966796875, 0.1114349365234375, 0.411041259765625, 0.3307304382324219, -0.00662994384765625, -0.6903533935546875, 0.24727821350097656, 0.34340667724609375, -0.15703392028808594, -0.4074668884277344, 0.065673828125, -0.39206695556640625, 0.0570831298828125, 0.6516876220703125, 0.11552238464355469, -0.3327217102050781, 0.47081756591796875, -0.3580436706542969, -0.79681396484375, 0.42394256591796875, 0.44211578369140625, -0.31588172912597656, 0.50787353515625, 0.0828399658203125, 0.0028705596923828125, 0.01847076416015625, 0.07876968383789062, 0.10284614562988281, -0.09016227722167969, -0.06853866577148438, 0.1004791259765625, -0.10984039306640625, 0.08084678649902344, 0.3743438720703125, 0.3463459014892578, -0.0960845947265625, -0.24521255493164062, 0.1925201416015625, 0.11365509033203125, 0.2078075408935547, 1.02154541015625, -0.8868408203125, -0.3850364685058594, 0.1949920654296875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000009.npy"}
|
||||
{"epoch": 0.013605442176870748, "step": 10, "batch_size": 64, "mean": -0.010755836963653564, "std": 0.4338391125202179, "min": -1.302886962890625, "p10": -0.38310642242431636, "median": -0.008391380310058594, "p90": 0.3398380279541016, "max": 1.6252899169921875, "pos_frac": 0.5, "sample": [-0.09844970703125, -0.40155601501464844, 0.19847488403320312, 0.09116554260253906, 0.1602935791015625, -1.249969482421875, -0.17928314208984375, 0.0260467529296875, -0.100311279296875, 0.34810638427734375, -0.1593017578125, 0.4628639221191406, 0.02904510498046875, -0.43359375, -0.203338623046875, -0.09760284423828125, 0.018541336059570312, -0.15655517578125, 1.0753021240234375, 0.0240020751953125, 0.261810302734375, 0.09217071533203125, 0.552032470703125, 0.24018096923828125, -0.20627212524414062, -0.297271728515625, 0.032741546630859375, -0.340057373046875, -0.19552230834960938, -0.19127655029296875, 0.28387451171875, 0.17675018310546875, -0.22261810302734375, 0.25492095947265625, 0.08837890625, 0.1453399658203125, -0.051815032958984375, 0.2968006134033203, -0.09018325805664062, 0.2171192169189453, -0.19426727294921875, -0.3022747039794922, -0.67156982421875, -0.6836776733398438, -0.14980316162109375, -0.2737579345703125, -0.25971221923828125, -0.33649444580078125, 1.6252899169921875, -0.301055908203125, -0.5491943359375, 0.20436859130859375, -0.0353240966796875, 0.1474170684814453, 0.0865631103515625, -0.18897247314453125, 0.713348388671875, 0.12316703796386719, 0.286834716796875, 0.30609130859375, 0.3205451965332031, -0.1665802001953125, 0.5125885009765625, -1.302886962890625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000010.npy"}
|
||||
{"epoch": 0.015117157974300832, "step": 11, "batch_size": 64, "mean": -0.036903709173202515, "std": 0.29257139563560486, "min": -1.2016220092773438, "p10": -0.3394252777099609, "median": -0.014438629150390625, "p90": 0.2839859008789063, "max": 0.6437911987304688, "pos_frac": 0.46875, "sample": [-0.134918212890625, 0.2007884979248047, -0.14037704467773438, -0.08025741577148438, -0.35956573486328125, 0.16856765747070312, -0.17639732360839844, 0.03949928283691406, -0.0892181396484375, -0.2525634765625, -0.4546051025390625, 0.15803146362304688, 0.01050567626953125, 0.1491107940673828, -0.0027484893798828125, -0.4613037109375, 0.20262527465820312, 0.019805908203125, -0.0106964111328125, 0.41961669921875, 0.23485183715820312, 0.3412628173828125, 0.6437911987304688, 0.0107269287109375, -0.068267822265625, 0.09973907470703125, -0.18561172485351562, 0.23007965087890625, 0.04912376403808594, 0.157073974609375, 0.10167694091796875, 0.16231727600097656, -0.17358779907226562, 0.36590576171875, -0.054821014404296875, 0.31867218017578125, -0.10259246826171875, 0.322540283203125, 0.10184478759765625, -0.13080596923828125, -0.8992919921875, -0.1735076904296875, -0.221221923828125, 0.18651199340820312, 0.121490478515625, -1.2016220092773438, -0.1042938232421875, 0.1410083770751953, -0.15695953369140625, 0.2864990234375, -0.15642547607421875, -0.2069854736328125, -0.11324119567871094, -0.18899917602539062, -0.01818084716796875, -0.25928497314453125, -0.20238494873046875, 0.2781219482421875, -0.2924308776855469, 0.136138916015625, -0.050296783447265625, -0.5271377563476562, -0.4066314697265625, 0.03746795654296875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000011.npy"}
|
||||
{"epoch": 0.016628873771730914, "step": 12, "batch_size": 64, "mean": -0.06204575300216675, "std": 0.28772544860839844, "min": -0.7448883056640625, "p10": -0.43457183837890623, "median": -0.04758262634277344, "p90": 0.26573829650878905, "max": 0.75775146484375, "pos_frac": 0.4375, "sample": [-0.24803543090820312, -0.3771514892578125, 0.03402519226074219, -0.031106948852539062, -0.13212966918945312, 0.26727294921875, -0.06405830383300781, 0.008953094482421875, 0.0346527099609375, -0.148773193359375, -0.49240875244140625, -0.25310516357421875, -0.02712249755859375, 0.37213134765625, 0.12297439575195312, -0.5128326416015625, -0.072418212890625, -0.4468536376953125, -0.02791595458984375, 0.21350669860839844, 0.15169525146484375, -0.094970703125, -0.7448883056640625, -0.0230255126953125, 0.28318214416503906, -0.0868682861328125, -0.2673473358154297, -0.44312286376953125, -0.0930023193359375, 0.12453842163085938, -0.31447601318359375, -0.3188934326171875, -0.165985107421875, -0.294281005859375, -0.3090782165527344, -0.12088203430175781, -0.3227119445800781, 0.4690093994140625, 0.0765838623046875, -0.41461944580078125, -0.2477874755859375, -0.12486648559570312, -0.3469085693359375, 0.04392814636230469, 0.21421051025390625, 0.5547275543212891, 0.041248321533203125, 0.111328125, 0.26801300048828125, 0.004741668701171875, 0.2490081787109375, 0.2621574401855469, 0.22600555419921875, -0.5523681640625, 0.75775146484375, 0.19435882568359375, -0.40451622009277344, 0.2003326416015625, -0.5055904388427734, 0.1988677978515625, -0.14218902587890625, -0.3095970153808594, 0.024332046508789062, 0.001422882080078125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000012.npy"}
|
||||
{"epoch": 0.018140589569160998, "step": 13, "batch_size": 64, "mean": -0.06563141942024231, "std": 0.2729092538356781, "min": -0.7316131591796875, "p10": -0.4041259765625, "median": -0.0500946044921875, "p90": 0.2991672515869141, "max": 0.6837158203125, "pos_frac": 0.390625, "sample": [-0.353790283203125, 0.12084197998046875, -0.6183319091796875, 0.05657196044921875, -0.276458740234375, -0.7316131591796875, 0.19520187377929688, -0.07212638854980469, 0.3719940185546875, -0.485137939453125, 0.333221435546875, -0.05544471740722656, 0.02071380615234375, -0.31960296630859375, 0.003505706787109375, -0.05914497375488281, -0.317596435546875, -0.0277862548828125, -0.1674041748046875, 0.08440399169921875, -0.3076324462890625, -0.15985870361328125, -0.0314178466796875, 0.1188201904296875, -0.21107864379882812, -0.08129119873046875, 0.096649169921875, -0.04474449157714844, -0.22817230224609375, -0.5830917358398438, 0.09640312194824219, -0.25492095947265625, 0.4348869323730469, -0.48857879638671875, -0.38736724853515625, 0.6837158203125, 0.3309288024902344, 0.018768310546875, 0.30304908752441406, -0.1819610595703125, -0.1263561248779297, 0.15277099609375, -0.5283126831054688, -0.23292160034179688, -0.34992218017578125, 0.23864364624023438, -0.012786865234375, -0.00812530517578125, -0.307952880859375, -0.41130828857421875, 0.19153213500976562, -0.082550048828125, -0.10942649841308594, 0.06355857849121094, 0.0339813232421875, -0.22472381591796875, -0.060699462890625, 0.29010963439941406, 0.25238037109375, 0.3064594268798828, -0.02416229248046875, 0.0559234619140625, -0.03650856018066406, -0.09513664245605469], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000013.npy"}
|
||||
{"epoch": 0.019652305366591082, "step": 14, "batch_size": 64, "mean": 0.09396436810493469, "std": 0.269051194190979, "min": -0.578338623046875, "p10": -0.24916858673095701, "median": 0.09041500091552734, "p90": 0.4600051879882813, "max": 0.6832427978515625, "pos_frac": 0.5625, "sample": [-0.027492523193359375, 0.4978790283203125, -0.578338623046875, 0.19385910034179688, 0.06246185302734375, 0.4028797149658203, 0.0712127685546875, 0.26531982421875, -0.1297473907470703, -0.17270278930664062, -0.40685272216796875, 0.3671913146972656, 0.3533821105957031, -0.3598480224609375, -0.06949996948242188, -0.25403785705566406, -0.25197410583496094, 0.4349365234375, -0.295654296875, 0.3107757568359375, -0.020204544067382812, 0.4707489013671875, 0.09568595886230469, -0.063995361328125, 0.14466094970703125, -0.0815582275390625, 0.08514404296875, 0.5813827514648438, 0.2755928039550781, 0.5628128051757812, -0.177215576171875, 0.6832427978515625, 0.2155609130859375, 0.37530517578125, 0.2755756378173828, 0.08104705810546875, 0.505126953125, 0.14922332763671875, 0.3194313049316406, 0.2620697021484375, 0.1490478515625, 0.492706298828125, 0.2937889099121094, 0.42523193359375, -0.039379119873046875, 0.1704120635986328, -0.06177520751953125, -0.11333465576171875, 0.2673778533935547, 0.12057876586914062, -0.08788299560546875, -0.24262237548828125, -0.0029544830322265625, -0.08455657958984375, -0.03872871398925781, -0.02527618408203125, -0.02944183349609375, -0.23944664001464844, -0.10325431823730469, -0.21891403198242188, 0.10955810546875, 0.1446380615234375, -0.2575111389160156, 0.2320709228515625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000014.npy"}
|
||||
{"epoch": 0.021164021164021163, "step": 15, "batch_size": 64, "mean": -0.01250794529914856, "std": 0.32781127095222473, "min": -0.945892333984375, "p10": -0.43392410278320315, "median": -0.0047130584716796875, "p90": 0.35930118560791025, "max": 0.5981521606445312, "pos_frac": 0.46875, "sample": [0.309539794921875, -0.712799072265625, 0.34422874450683594, -0.006618499755859375, -0.17173385620117188, 0.4864959716796875, -0.445831298828125, -0.17404937744140625, -0.43609619140625, -0.4512939453125, 0.2594490051269531, -5.340576171875e-05, -0.603485107421875, -0.013193130493164062, 0.5572662353515625, 0.1236419677734375, 0.5424575805664062, 0.00885772705078125, 0.1381855010986328, -0.42885589599609375, -0.01233673095703125, -0.16259002685546875, -0.036899566650390625, -0.2589378356933594, -0.023601531982421875, -0.2449493408203125, 0.1206817626953125, -0.4001922607421875, 0.2332916259765625, -0.2158966064453125, 0.20025253295898438, -0.14511871337890625, -0.37604522705078125, 0.18192291259765625, 0.22717666625976562, 0.00479888916015625, 0.3345794677734375, -0.1077117919921875, -0.15317344665527344, 0.11103057861328125, -0.27056884765625, -0.07825469970703125, 0.023956298828125, 0.5981521606445312, 0.3730754852294922, -0.0028076171875, -0.945892333984375, -0.28509521484375, 0.31827354431152344, -0.7450103759765625, -0.194549560546875, 0.12556838989257812, -0.2600059509277344, 0.45682525634765625, -0.140716552734375, 0.27972412109375, 0.2876434326171875, -0.16490554809570312, 0.30234336853027344, -0.08911514282226562, 0.3408966064453125, 0.36576080322265625, 0.174957275390625, 0.12684249877929688], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000015.npy"}
|
||||
{"epoch": 0.022675736961451247, "step": 16, "batch_size": 64, "mean": -0.00016827881336212158, "std": 0.2907392382621765, "min": -0.601470947265625, "p10": -0.27522926330566405, "median": -0.04905223846435547, "p90": 0.36400527954101564, "max": 0.983062744140625, "pos_frac": 0.4375, "sample": [-0.16376304626464844, 0.10449600219726562, -0.419677734375, 0.07260513305664062, 0.32167816162109375, -0.14662742614746094, 0.06367111206054688, -0.0974578857421875, 0.6937332153320312, -0.2793846130371094, -0.19117355346679688, 0.359161376953125, 0.1334228515625, -0.07196044921875, -0.05329132080078125, 0.5660266876220703, 0.4002418518066406, 0.05060005187988281, -0.12514495849609375, 0.1048583984375, -0.1393890380859375, -0.16625213623046875, 0.5978240966796875, 0.3058624267578125, 0.07783126831054688, -0.3551788330078125, 0.36608123779296875, -0.04248046875, 0.028425216674804688, -0.23606109619140625, 0.41124725341796875, -0.5622329711914062, -0.260467529296875, 0.20275497436523438, 0.26617431640625, -0.1107635498046875, 0.983062744140625, -0.242462158203125, 0.10404586791992188, -0.08433341979980469, -0.265533447265625, -0.08343696594238281, 0.03063201904296875, 0.03612518310546875, 0.23416900634765625, 0.0043201446533203125, 0.3280372619628906, -0.17488861083984375, -0.04481315612792969, -0.02931976318359375, -0.15924644470214844, -0.0777435302734375, -0.06012439727783203, -0.23421859741210938, -0.3447113037109375, -0.09350395202636719, -0.49810028076171875, -0.2214813232421875, -0.601470947265625, -0.013082504272460938, 0.030933380126953125, -0.19643783569335938, 0.14617919921875, -0.18875503540039062], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000016.npy"}
|
||||
{"epoch": 0.02418745275888133, "step": 17, "batch_size": 64, "mean": -0.013455450534820557, "std": 0.3074217140674591, "min": -0.88116455078125, "p10": -0.3128028869628906, "median": -0.048485755920410156, "p90": 0.3389343261718751, "max": 0.832244873046875, "pos_frac": 0.453125, "sample": [0.034709930419921875, -0.198089599609375, -0.08382225036621094, -0.06234550476074219, 0.468017578125, 0.20128631591796875, -0.094207763671875, -0.1356220245361328, 0.07621002197265625, -0.13933563232421875, 0.2250823974609375, -0.14377975463867188, -0.09925079345703125, 0.06521224975585938, -0.043304443359375, 0.31781768798828125, -0.5045852661132812, -0.09214019775390625, 0.004428863525390625, -0.234893798828125, -0.009613037109375, 0.300811767578125, -0.88116455078125, 0.34798431396484375, -0.23030853271484375, 0.05344581604003906, -0.29526519775390625, -0.09247207641601562, 0.13496017456054688, -0.2647857666015625, -0.031581878662109375, 0.449676513671875, -0.09272003173828125, 0.28530120849609375, 0.6501846313476562, 0.5704193115234375, -0.2943382263183594, 0.28783416748046875, 0.0914306640625, -0.418487548828125, 0.2242584228515625, 0.15615463256835938, -0.05366706848144531, 0.0331878662109375, 0.1067352294921875, 0.244171142578125, -0.2379913330078125, 0.832244873046875, 0.0904388427734375, 0.154052734375, -0.5851898193359375, -0.8159103393554688, -0.3140296936035156, -0.14267921447753906, 0.406768798828125, -0.3099403381347656, -0.2372150421142578, 0.15807723999023438, -0.06076812744140625, -0.06597137451171875, -0.07926559448242188, 0.02097320556640625, -0.36139488220214844, -0.14688873291015625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000017.npy"}
|
||||
{"epoch": 0.025699168556311415, "step": 18, "batch_size": 64, "mean": 0.09332945942878723, "std": 0.28488782048225403, "min": -0.9840621948242188, "p10": -0.23635520935058593, "median": 0.13289356231689453, "p90": 0.3780199050903321, "max": 1.0432357788085938, "pos_frac": 0.6875, "sample": [0.060192108154296875, 0.44704437255859375, 0.13430404663085938, 0.00247955322265625, 0.40195274353027344, 0.13336181640625, 0.07223129272460938, 1.0432357788085938, 0.31157684326171875, 0.150634765625, 0.27184295654296875, -0.334869384765625, 0.1458454132080078, 0.3575000762939453, 0.5619583129882812, 0.0760650634765625, 0.1415386199951172, -0.21918487548828125, -0.13123703002929688, 0.057525634765625, 0.3868141174316406, 0.332794189453125, 0.07279205322265625, 0.4007835388183594, 0.15238189697265625, -0.14445114135742188, 0.34039306640625, 0.09191131591796875, 0.24871444702148438, 0.27666282653808594, 0.13242530822753906, -0.4556732177734375, -0.08032608032226562, 0.3311920166015625, -0.011812210083007812, 0.19713592529296875, 0.2824134826660156, 0.17885398864746094, -0.0428619384765625, 0.0860443115234375, 0.17690277099609375, 0.0458831787109375, -0.12908935546875, -0.241668701171875, 0.15245819091796875, 0.31854248046875, -0.25478172302246094, 0.0007686614990234375, -0.3213920593261719, -0.9840621948242188, -0.22395706176757812, 0.18849563598632812, -0.3948688507080078, -0.06801414489746094, 0.27597808837890625, -0.06140708923339844, 0.12140655517578125, -0.08463668823242188, 0.2739295959472656, -0.17222213745117188, 0.2834663391113281, -0.0872802734375, 0.3907051086425781, 0.3077430725097656], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000018.npy"}
|
||||
{"epoch": 0.027210884353741496, "step": 19, "batch_size": 64, "mean": 0.04047694802284241, "std": 0.33457908034324646, "min": -1.034759521484375, "p10": -0.344998550415039, "median": 0.020093917846679688, "p90": 0.4305473327636719, "max": 0.9757232666015625, "pos_frac": 0.5625, "sample": [0.5679931640625, 0.1912689208984375, 0.010889053344726562, -0.167236328125, 0.11045074462890625, 0.3298168182373047, 0.16898345947265625, -0.2953071594238281, -0.6612396240234375, -0.15325164794921875, 0.2420806884765625, 0.15290069580078125, 0.5646591186523438, 0.2773609161376953, 0.11766815185546875, -0.13214111328125, -0.04698753356933594, -0.00527191162109375, 0.12632369995117188, 0.27276039123535156, 0.16562652587890625, -0.26966094970703125, -0.15230178833007812, -0.72412109375, 0.422637939453125, 0.9757232666015625, -0.2489471435546875, -0.13930511474609375, -0.414764404296875, 0.09302139282226562, 0.30261993408203125, 0.0838775634765625, 0.402557373046875, 0.6053009033203125, -1.034759521484375, -0.419647216796875, 0.1442108154296875, -0.16121292114257812, 0.4530372619628906, 0.2442455291748047, 0.009500503540039062, -0.08322334289550781, 0.366302490234375, -0.07193756103515625, 0.14937591552734375, 0.2541942596435547, -0.0664520263671875, 0.0182342529296875, -0.0811920166015625, -0.10985565185546875, 0.5333099365234375, 0.021953582763671875, 0.23548126220703125, -0.08852577209472656, -0.36629486083984375, -0.5341758728027344, -0.09088897705078125, -0.22738265991210938, -0.041973114013671875, 0.18146514892578125, 0.2078723907470703, -0.07700347900390625, 0.43393707275390625, 0.0179443359375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000019.npy"}
|
||||
{"epoch": 0.02872260015117158, "step": 20, "batch_size": 64, "mean": 0.029502198100090027, "std": 0.2428610771894455, "min": -0.553253173828125, "p10": -0.295465087890625, "median": 0.057880401611328125, "p90": 0.3306911468505859, "max": 0.5531158447265625, "pos_frac": 0.5625, "sample": [0.10797882080078125, 0.04250335693359375, -0.3243141174316406, 0.2577934265136719, 0.14738845825195312, -0.04746246337890625, 0.24037933349609375, 0.48797607421875, 0.20259857177734375, -0.35828399658203125, 0.4893035888671875, 0.0831146240234375, -0.224945068359375, -0.023441314697265625, -0.428863525390625, -0.14038848876953125, -0.3473968505859375, -0.28533935546875, 0.15486526489257812, -0.2998046875, 0.1058349609375, 0.1435718536376953, -0.16202354431152344, -0.20914459228515625, 0.06481170654296875, 0.24918365478515625, 0.32743072509765625, -0.23021697998046875, 0.3516387939453125, 0.014941215515136719, -0.10614013671875, -0.09503173828125, -0.0645294189453125, 0.1569671630859375, 0.0509490966796875, 0.4515190124511719, 0.1841449737548828, 0.12436676025390625, -0.1361541748046875, 0.23223114013671875, -0.0046405792236328125, -0.22216796875, 0.0939788818359375, 0.1532440185546875, -0.00304412841796875, -0.2435302734375, 0.0019683837890625, -0.41887664794921875, 0.3974037170410156, 0.5531158447265625, -0.553253173828125, 0.1244049072265625, 0.3254547119140625, 0.23503875732421875, -0.16971588134765625, 0.1182708740234375, 0.12540817260742188, 0.14814376831054688, -0.171844482421875, 0.3320884704589844, -0.02179718017578125, 0.06856155395507812, -0.14236068725585938, -0.025722503662109375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000020.npy"}
|
||||
{"epoch": 0.030234315948601664, "step": 21, "batch_size": 64, "mean": -0.018961191177368164, "std": 0.3068406581878662, "min": -0.8856163024902344, "p10": -0.37230987548828126, "median": -0.018652915954589844, "p90": 0.29034042358398443, "max": 0.8795166015625, "pos_frac": 0.390625, "sample": [0.008388519287109375, -0.07485198974609375, -0.06402015686035156, -0.2844085693359375, 0.44852447509765625, -0.3482513427734375, -0.0347137451171875, -0.15468406677246094, 0.22829437255859375, -0.12771034240722656, 0.10750961303710938, -0.378997802734375, -0.00775146484375, -0.01921844482421875, 0.274505615234375, 0.29712677001953125, -0.299163818359375, 0.6092681884765625, -0.256561279296875, 0.03132820129394531, -0.09449386596679688, 0.12526321411132812, -0.08670997619628906, -0.07422637939453125, -0.0138397216796875, -0.6693878173828125, 0.04572868347167969, 0.4621429443359375, 0.7792816162109375, 0.22849655151367188, 0.07012939453125, -0.035076141357421875, -0.02227783203125, -0.21228408813476562, -0.4950408935546875, 0.09812736511230469, 0.8795166015625, 0.15972518920898438, -0.061126708984375, -0.20990371704101562, 0.18115234375, -0.24869537353515625, -0.097808837890625, -0.0055255889892578125, -0.2939910888671875, -0.10425376892089844, 0.17330169677734375, -0.3866310119628906, -0.8856163024902344, -0.018087387084960938, -0.5456008911132812, 0.470245361328125, 0.14414215087890625, 0.245513916015625, -0.0180206298828125, -0.48150634765625, 0.19956398010253906, -0.008691787719726562, -0.038188934326171875, 0.0640106201171875, 0.057338714599609375, -0.00362396240234375, -0.08449554443359375, -0.3567047119140625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000021.npy"}
|
||||
{"epoch": 0.031746031746031744, "step": 22, "batch_size": 64, "mean": 0.030408114194869995, "std": 0.3106282949447632, "min": -0.720794677734375, "p10": -0.36468658447265623, "median": 0.0019388198852539062, "p90": 0.4545608520507813, "max": 0.704803466796875, "pos_frac": 0.5, "sample": [-0.720794677734375, -0.16814422607421875, -0.14737701416015625, 0.6277656555175781, -0.003509521484375, 0.23836517333984375, 0.21283912658691406, -0.0276947021484375, -0.04174995422363281, -0.17108535766601562, 0.47027587890625, -0.25159454345703125, -0.15616798400878906, 0.13065719604492188, 0.10860061645507812, -0.0696563720703125, -0.3742218017578125, -0.12740707397460938, -0.0363922119140625, -0.7056045532226562, 0.5131607055664062, 0.4447021484375, 0.219207763671875, 0.11094856262207031, 0.046566009521484375, 0.22509765625, -0.11598396301269531, 0.3405113220214844, 0.0063343048095703125, 0.048496246337890625, -0.5485992431640625, -0.5161285400390625, -0.03142547607421875, 0.023059844970703125, 0.12368011474609375, -0.1250762939453125, -0.046657562255859375, -0.4487457275390625, -0.16802215576171875, -0.375885009765625, 0.6534442901611328, -0.0024566650390625, 0.262939453125, 0.4587860107421875, 0.029293060302734375, 0.30371856689453125, -0.03851318359375, 0.3059234619140625, 0.704803466796875, -0.342437744140625, -0.09796142578125, 0.6438140869140625, -0.007080078125, 0.4121742248535156, 0.054534912109375, 0.21213531494140625, -0.2336292266845703, 0.27884483337402344, 0.1583251953125, -0.1634674072265625, -0.13745880126953125, 0.08594512939453125, 0.16162872314453125, -0.26953125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000022.npy"}
|
||||
{"epoch": 0.03325774754346183, "step": 23, "batch_size": 64, "mean": 0.024062126874923706, "std": 0.30636462569236755, "min": -0.7731475830078125, "p10": -0.3208877563476562, "median": 0.00151824951171875, "p90": 0.39432010650634763, "max": 1.115814208984375, "pos_frac": 0.515625, "sample": [-0.49857330322265625, 0.037384033203125, 0.0026092529296875, 1.115814208984375, 0.011793136596679688, 0.12169647216796875, -0.167755126953125, -0.014190673828125, -0.6130752563476562, 0.3940277099609375, 0.318695068359375, -0.4161872863769531, 0.035984039306640625, -0.2189655303955078, 0.16187095642089844, 0.056182861328125, -0.7731475830078125, 0.1458282470703125, -0.15423202514648438, 0.0779876708984375, 0.36511993408203125, 0.12520217895507812, -0.08813667297363281, -0.2360687255859375, 0.4240531921386719, 0.2773895263671875, 0.16602325439453125, -0.04434394836425781, 0.7807235717773438, -0.009765625, -0.0599365234375, 0.13684844970703125, 0.14672088623046875, 0.39444541931152344, 0.7172470092773438, -0.0745697021484375, -0.1460723876953125, 0.4190826416015625, 0.031299591064453125, -0.3849811553955078, -0.348175048828125, -0.236602783203125, -0.2572174072265625, 0.00042724609375, -0.02520751953125, 0.444854736328125, 0.1731109619140625, -0.003734588623046875, 0.06362152099609375, -0.3853759765625, -0.1436614990234375, -0.027788162231445312, 0.10415458679199219, -0.07366561889648438, -0.16427230834960938, -0.16827964782714844, 0.01905059814453125, 0.10388946533203125, -0.12938690185546875, -0.01583099365234375, -0.15185546875, -0.037593841552734375, 0.10092735290527344, 0.13455963134765625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000023.npy"}
|
||||
{"epoch": 0.03476946334089191, "step": 24, "batch_size": 64, "mean": -0.054926902055740356, "std": 0.25707152485847473, "min": -0.7554168701171875, "p10": -0.3885902404785156, "median": -0.05146980285644531, "p90": 0.23385696411132817, "max": 0.505767822265625, "pos_frac": 0.4375, "sample": [-0.22115325927734375, 0.00948333740234375, -0.124298095703125, 0.1563243865966797, 0.21025848388671875, -0.20533370971679688, 0.039203643798828125, -0.046295166015625, -0.449615478515625, -0.1143646240234375, -0.2093353271484375, 0.18668365478515625, -0.1196136474609375, -0.1572113037109375, 0.22267913818359375, 0.18170166015625, -0.28722381591796875, -0.5706291198730469, -0.25261688232421875, -0.029535293579101562, -0.28945159912109375, 0.36754608154296875, -0.0952911376953125, -0.167816162109375, 0.37652587890625, -0.06856918334960938, 0.27463531494140625, -0.6474113464355469, -0.7554168701171875, -0.34405517578125, 0.08096885681152344, -0.1843852996826172, -0.40767669677734375, -0.21511459350585938, 0.21028900146484375, 0.0725860595703125, 0.1622161865234375, 0.17340469360351562, -0.2952117919921875, 0.30794525146484375, -0.109222412109375, 0.19647979736328125, 0.04682350158691406, 0.2431659698486328, 0.2386474609375, 0.0259246826171875, -0.025135040283203125, -0.0978851318359375, 0.093017578125, -0.13494873046875, -0.6091785430908203, -0.056644439697265625, -0.1015167236328125, 0.11687850952148438, -0.025848388671875, 0.06916046142578125, -0.06848335266113281, -0.46453857421875, -0.14356231689453125, 0.06488800048828125, 0.160614013671875, -0.291717529296875, 0.505767822265625, 0.07716560363769531], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000024.npy"}
|
||||
{"epoch": 0.036281179138321996, "step": 25, "batch_size": 64, "mean": 0.011602312326431274, "std": 0.3313344419002533, "min": -1.081390380859375, "p10": -0.2961811065673828, "median": -0.0021257400512695312, "p90": 0.34333992004394537, "max": 1.1206817626953125, "pos_frac": 0.5, "sample": [0.100830078125, 0.47998809814453125, 0.12224578857421875, -0.0082550048828125, -0.13828277587890625, 0.0040035247802734375, 0.6876678466796875, -0.15213394165039062, 0.139129638671875, 0.056827545166015625, -0.16817474365234375, -0.03249359130859375, 0.00543212890625, -0.3358612060546875, 0.348175048828125, 0.2932472229003906, -0.30780029296875, 0.22177505493164062, 0.028987884521484375, -0.1285858154296875, -0.027988433837890625, -0.1389751434326172, -0.47035980224609375, 1.1206817626953125, -0.23862457275390625, 0.028293609619140625, 0.007389068603515625, 0.097991943359375, -0.041717529296875, 0.0056915283203125, -0.04115104675292969, 0.1491832733154297, 0.1225738525390625, 0.3320579528808594, 0.114990234375, -0.029531478881835938, 0.2237548828125, -0.08286094665527344, -0.25519371032714844, -0.8390655517578125, -1.081390380859375, -0.1690673828125, -0.3986244201660156, 0.1264667510986328, 0.12560653686523438, -0.12261581420898438, -0.0266265869140625, 0.1923980712890625, 0.2515144348144531, -0.10187530517578125, -0.058032989501953125, -0.16574478149414062, 0.2939605712890625, -0.2518577575683594, -0.5071563720703125, -0.1799945831298828, 0.7251510620117188, 0.2547340393066406, 0.47718048095703125, -0.2690696716308594, 0.20494461059570312, -0.0375823974609375, -0.23882293701171875, 0.4451904296875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000025.npy"}
|
||||
{"epoch": 0.03779289493575208, "step": 26, "batch_size": 64, "mean": -0.020512670278549194, "std": 0.4112747609615326, "min": -1.160552978515625, "p10": -0.5195022583007811, "median": -0.037628173828125, "p90": 0.4388633728027344, "max": 1.5242919921875, "pos_frac": 0.46875, "sample": [0.17515945434570312, -0.6056365966796875, -0.55133056640625, 0.01700592041015625, 0.14473342895507812, 0.31225013732910156, -0.5633697509765625, 0.127532958984375, -0.2708015441894531, 0.07442855834960938, -0.08080673217773438, 1.5242919921875, -0.4452362060546875, -0.29663848876953125, -1.160552978515625, 0.24417877197265625, -0.027133941650390625, 0.06638526916503906, 0.2698631286621094, 0.12641143798828125, -0.201446533203125, -0.07793426513671875, -0.0851287841796875, -0.33814430236816406, -0.0662841796875, -0.6514358520507812, 0.5396804809570312, 0.1685943603515625, 0.045078277587890625, -0.16764068603515625, 0.24605369567871094, 0.49156761169433594, -0.40763092041015625, 0.2940330505371094, -0.06604766845703125, 0.437255859375, -0.0444183349609375, 0.08320999145507812, 0.2084197998046875, 0.7167510986328125, -0.2100963592529297, -0.30237579345703125, -0.18214797973632812, -0.04859161376953125, 0.35926055908203125, -0.43538665771484375, -0.9654693603515625, 0.24391937255859375, -0.0308380126953125, -0.2737140655517578, 0.43955230712890625, -0.05023193359375, 0.2621955871582031, 0.5525970458984375, -0.11316490173339844, -0.37996673583984375, -0.8116912841796875, -0.1917266845703125, 0.45291900634765625, 0.05858421325683594, -0.08185577392578125, 0.18320655822753906, -0.07065200805664062, 0.07759666442871094], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000026.npy"}
|
||||
{"epoch": 0.039304610733182165, "step": 27, "batch_size": 64, "mean": -0.018405765295028687, "std": 0.3046008050441742, "min": -0.5516128540039062, "p10": -0.43494262695312497, "median": 0.002544403076171875, "p90": 0.3710849761962891, "max": 0.8097686767578125, "pos_frac": 0.5, "sample": [-0.18832015991210938, 0.08342933654785156, -0.025325775146484375, -0.0023956298828125, 0.08866691589355469, -0.05031585693359375, -0.2684173583984375, -0.5460662841796875, -0.360595703125, -0.29412841796875, 0.0166778564453125, -0.44580078125, 0.14650726318359375, 0.4602699279785156, -0.044727325439453125, 0.12701416015625, -0.20220947265625, -0.5270767211914062, -0.4929389953613281, -0.296661376953125, 0.09484100341796875, 0.8097686767578125, -0.289581298828125, -0.2298736572265625, -0.17359161376953125, -0.056243896484375, 0.00748443603515625, 0.12533187866210938, 0.4592742919921875, -0.18041419982910156, -0.513275146484375, 0.035976409912109375, 0.2095966339111328, -0.19795989990234375, 0.3749427795410156, -0.5516128540039062, 0.3379993438720703, 0.6807022094726562, 0.1817626953125, 0.1872100830078125, -0.28072357177734375, -0.40960693359375, 0.13216590881347656, -0.11351776123046875, -0.0532379150390625, 0.30810546875, 0.20679473876953125, 0.060970306396484375, 0.07249069213867188, 0.1425323486328125, 0.3868827819824219, -0.3585052490234375, -0.0482330322265625, 0.0239410400390625, 0.1335906982421875, -0.16884231567382812, 0.4723663330078125, -0.2825145721435547, 0.1804962158203125, -0.4544944763183594, 0.23608016967773438, 0.36208343505859375, 0.17476272583007812, -0.3914794921875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000027.npy"}
|
||||
{"epoch": 0.04081632653061224, "step": 28, "batch_size": 64, "mean": 0.021840453147888184, "std": 0.2942545413970947, "min": -0.7530517578125, "p10": -0.3639698028564453, "median": 0.00567626953125, "p90": 0.3365180969238281, "max": 0.7894821166992188, "pos_frac": 0.5, "sample": [0.33260345458984375, -0.2081298828125, -0.3167724609375, -0.130645751953125, -0.4125823974609375, -0.05165863037109375, 0.6260986328125, 0.32392120361328125, 0.44513893127441406, 0.33819580078125, -0.07347488403320312, 0.18225860595703125, 0.0590362548828125, 0.3213348388671875, -0.3137969970703125, -0.104339599609375, -0.13745880126953125, -0.3718833923339844, -0.028621673583984375, 0.27599334716796875, 0.23618316650390625, 0.23363876342773438, 0.24788665771484375, -0.00909423828125, 0.20354461669921875, 0.13326072692871094, -0.1774005889892578, 0.19824981689453125, -0.5448265075683594, 0.1462249755859375, -0.02744293212890625, -0.22103500366210938, 0.20048904418945312, -0.1105804443359375, 0.03226470947265625, 0.7193107604980469, -0.0834503173828125, -0.39994049072265625, -0.1085052490234375, -0.7530517578125, 0.10591316223144531, -0.3455047607421875, -0.3210906982421875, 0.3584098815917969, -0.02089691162109375, 0.05760383605957031, 0.10844039916992188, -0.508026123046875, -0.0054683685302734375, 0.048618316650390625, 0.26633453369140625, 0.7894821166992188, 0.17413330078125, -0.0949249267578125, 0.2723388671875, 0.1706390380859375, -0.13997650146484375, -0.05988311767578125, -0.09909820556640625, 0.3394622802734375, -0.4229278564453125, 0.1597309112548828, 0.016820907592773438, -0.12328338623046875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000028.npy"}
|
||||
{"epoch": 0.042328042328042326, "step": 29, "batch_size": 64, "mean": -0.03698325157165527, "std": 0.3302476406097412, "min": -0.808135986328125, "p10": -0.48498611450195306, "median": -0.0009794235229492188, "p90": 0.3286579132080078, "max": 0.94378662109375, "pos_frac": 0.5, "sample": [0.015613555908203125, 0.24237823486328125, 0.5030345916748047, -0.5778350830078125, 0.94378662109375, 0.15046119689941406, 0.017181396484375, -0.11366844177246094, -0.3714599609375, -0.808135986328125, 0.3265190124511719, 0.02750396728515625, -0.30254364013671875, -0.08765792846679688, 0.032489776611328125, 0.34918212890625, -0.60260009765625, -0.26605796813964844, -0.2527008056640625, 0.105499267578125, 0.000827789306640625, 0.10369873046875, -0.08353424072265625, -0.23401641845703125, 0.1378326416015625, 0.16455841064453125, 0.12566375732421875, -0.6370697021484375, -0.05401611328125, 0.24970054626464844, 0.3072013854980469, -0.049560546875, -0.2242279052734375, 0.11034393310546875, -0.390472412109375, 0.040096282958984375, 0.09840774536132812, 0.25867462158203125, -0.19772720336914062, 0.583984375, -0.3333091735839844, 0.15065383911132812, -0.16573715209960938, -0.22436904907226562, 0.09292221069335938, -0.06109619140625, 0.3295745849609375, -0.597015380859375, 0.2407073974609375, -0.06795120239257812, -0.114501953125, 0.014081954956054688, 0.6301307678222656, -0.0027866363525390625, -0.3659820556640625, 0.011920928955078125, -0.3033294677734375, -0.44593048095703125, 0.29798126220703125, -0.5017242431640625, -0.618927001953125, -0.25753021240234375, -0.04887580871582031, 0.3328094482421875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000029.npy"}
|
||||
{"epoch": 0.04383975812547241, "step": 30, "batch_size": 64, "mean": 0.04662585258483887, "std": 0.2984139323234558, "min": -0.5746555328369141, "p10": -0.2976516723632812, "median": 0.010196685791015625, "p90": 0.5418968200683594, "max": 0.6692962646484375, "pos_frac": 0.5, "sample": [-0.5746555328369141, 0.10359954833984375, -0.0972137451171875, -0.3039703369140625, -0.17358970642089844, 0.22605133056640625, 0.0217132568359375, 0.5438079833984375, 0.5486831665039062, -0.2568359375, -0.34520721435546875, -0.09441375732421875, -0.2475433349609375, 0.5792160034179688, -0.39748382568359375, 0.3036346435546875, -0.22478675842285156, 0.577545166015625, -0.15784454345703125, -0.14484405517578125, -0.2388458251953125, -0.29175567626953125, -0.21487998962402344, 0.15076446533203125, -0.13702392578125, -0.037445068359375, 0.20410537719726562, -0.142242431640625, 0.02608489990234375, -0.00131988525390625, 0.6692962646484375, 0.08970451354980469, -0.0502777099609375, 0.4737396240234375, 0.627105712890625, 0.089569091796875, -0.3994903564453125, -0.06341361999511719, 0.5374374389648438, -0.13814353942871094, 0.10695457458496094, 0.4693145751953125, 0.630645751953125, 0.12453460693359375, -0.17994308471679688, 0.4306793212890625, -0.0514068603515625, 0.16368484497070312, 0.07435798645019531, -0.010955810546875, -0.30017852783203125, -0.04782867431640625, 0.1791820526123047, 0.0311126708984375, -0.4566650390625, -0.16324234008789062, 0.08530426025390625, 0.32224273681640625, 0.25909423828125, 0.05769920349121094, 0.12969589233398438, -0.13507080078125, -0.1058502197265625, 0.3318614959716797], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000030.npy"}
|
||||
{"epoch": 0.045351473922902494, "step": 31, "batch_size": 64, "mean": 0.03089618682861328, "std": 0.31332331895828247, "min": -0.7262420654296875, "p10": -0.31262893676757814, "median": 0.020050048828125, "p90": 0.4732223510742188, "max": 0.7592849731445312, "pos_frac": 0.5, "sample": [0.5170211791992188, -0.1971588134765625, -0.45037841796875, 0.35371971130371094, -0.2548351287841797, -0.7262420654296875, 0.042568206787109375, 0.22730255126953125, 0.09954071044921875, -0.172943115234375, 0.0776824951171875, 0.11254119873046875, 0.6676197052001953, -0.0044097900390625, 0.4216766357421875, -0.137908935546875, 0.18552589416503906, -0.31575775146484375, 0.7592849731445312, -0.3607177734375, 0.662689208984375, 0.055816650390625, 0.0870819091796875, -0.2792205810546875, 0.240478515625, -0.19000244140625, 0.5920028686523438, -0.057586669921875, -0.21198272705078125, -0.0894622802734375, -0.110931396484375, -0.305328369140625, -0.31707763671875, -0.45416831970214844, -0.12335586547851562, 0.1542510986328125, -0.23827362060546875, -0.13439559936523438, -0.28552818298339844, -0.07597732543945312, 0.5564918518066406, -0.5339584350585938, -0.2575950622558594, 0.4739227294921875, 0.099090576171875, 0.130401611328125, 0.12383270263671875, -0.020433425903320312, -0.002468109130859375, 0.35974884033203125, -0.2404022216796875, -0.18156051635742188, 0.36822509765625, 0.12247085571289062, 0.318115234375, 0.0757293701171875, 0.21405029296875, 0.471588134765625, -0.22090911865234375, 0.08200836181640625, 0.290740966796875, 0.20218467712402344, -0.08075332641601562, -0.13632583618164062], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000031.npy"}
|
||||
{"epoch": 0.04686318972033258, "step": 32, "batch_size": 64, "mean": 0.04653611779212952, "std": 0.3308785855770111, "min": -0.8258819580078125, "p10": -0.33290100097656244, "median": 0.01068115234375, "p90": 0.39847049713134763, "max": 1.012786865234375, "pos_frac": 0.515625, "sample": [-0.236846923828125, 0.24523162841796875, -0.284942626953125, -0.7468414306640625, -0.1850738525390625, -0.06587600708007812, 0.246612548828125, -0.13770294189453125, 1.012786865234375, -0.04791259765625, 0.16220855712890625, 0.40140533447265625, 0.13337326049804688, -0.1009979248046875, 0.29880523681640625, 0.226348876953125, -0.35345458984375, 0.3362236022949219, -0.4183006286621094, 0.24407196044921875, -0.13711929321289062, -0.39600181579589844, 0.04571533203125, 0.5908126831054688, 0.31121063232421875, 0.036891937255859375, 0.3226165771484375, 0.03163909912109375, 0.2815093994140625, 0.061466217041015625, 0.010150909423828125, 0.38225555419921875, -0.09426498413085938, 0.397857666015625, 0.06401824951171875, 0.6149520874023438, -0.005462646484375, -0.13141632080078125, 0.655609130859375, 0.817962646484375, -0.10229110717773438, -0.029144287109375, -0.1374664306640625, -0.08258819580078125, -0.11272239685058594, -0.09015655517578125, 0.028066635131835938, 0.011211395263671875, -0.8258819580078125, -0.0955657958984375, 0.2246856689453125, 0.077667236328125, -0.3694477081298828, -0.03472900390625, -0.078948974609375, -0.11826515197753906, -0.2113189697265625, -0.029947280883789062, 0.030445098876953125, 0.39873313903808594, -0.0437774658203125, 0.32045745849609375, 0.30841064453125, -0.6486358642578125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000032.npy"}
|
||||
{"epoch": 0.04837490551776266, "step": 33, "batch_size": 64, "mean": 0.023888081312179565, "std": 0.3603229820728302, "min": -1.0, "p10": -0.3257427215576172, "median": -0.003291606903076172, "p90": 0.3835426330566407, "max": 1.0749282836914062, "pos_frac": 0.484375, "sample": [-0.58038330078125, -0.06439208984375, 0.33933258056640625, 0.0626678466796875, 0.12575912475585938, 0.07987213134765625, 0.06618499755859375, 0.3956298828125, 0.6842727661132812, -0.30709075927734375, -0.1732940673828125, 0.36782264709472656, -0.09939098358154297, 0.05445671081542969, 0.6773719787597656, -0.3337364196777344, -0.7553253173828125, -0.068939208984375, 0.015445709228515625, 0.89141845703125, 0.04573822021484375, 1.0749282836914062, 0.3249549865722656, -0.17978858947753906, -0.7717514038085938, -0.2175140380859375, -0.36981201171875, -0.056743621826171875, 0.0137176513671875, 0.143798828125, 0.39027976989746094, -0.009481430053710938, 0.1888275146484375, 0.04351234436035156, -0.18003082275390625, -0.045948028564453125, -0.00766754150390625, 0.22971343994140625, -0.013263702392578125, -0.5432052612304688, 0.33587646484375, -0.04557037353515625, -0.14014244079589844, -0.09130096435546875, 0.18011474609375, -0.24979400634765625, 0.33560943603515625, 0.2865772247314453, -0.00030422210693359375, -0.16816329956054688, -0.17549896240234375, -0.16916656494140625, -0.18739891052246094, -0.2861175537109375, -0.1079559326171875, -0.260223388671875, -0.00627899169921875, 0.34821319580078125, -1.0, 0.080810546875, 0.3130836486816406, 0.4200325012207031, 0.35979270935058594, 0.318695068359375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000033.npy"}
|
||||
{"epoch": 0.049886621315192746, "step": 34, "batch_size": 64, "mean": 0.03501778841018677, "std": 0.2752145826816559, "min": -0.5201797485351562, "p10": -0.34313964843749994, "median": 0.056031227111816406, "p90": 0.369648551940918, "max": 0.7132797241210938, "pos_frac": 0.53125, "sample": [-0.06631660461425781, 0.5691299438476562, 0.374053955078125, 0.16967391967773438, -0.35851287841796875, -0.0549774169921875, 0.06431388854980469, 0.4452095031738281, 0.2478351593017578, 0.15847396850585938, 0.7132797241210938, -0.3933258056640625, -0.0218048095703125, -0.3077850341796875, -0.3858489990234375, -0.1636962890625, 0.33272552490234375, 0.07204437255859375, 0.5150699615478516, 0.12139320373535156, -0.06526947021484375, 0.08738899230957031, -0.007974624633789062, -0.2294158935546875, -0.16487884521484375, -0.04840087890625, -0.12992095947265625, 0.3904533386230469, 0.08237457275390625, -0.25208091735839844, 0.24199676513671875, 0.35936927795410156, 0.22536277770996094, -0.033321380615234375, 0.035037994384765625, -0.3582916259765625, 0.11867523193359375, 0.3011741638183594, 0.16394805908203125, -0.258575439453125, 0.1427001953125, 0.0845184326171875, 0.2827491760253906, 0.2927093505859375, 0.1708221435546875, -0.2033233642578125, 0.13785552978515625, -0.13532257080078125, 0.06650543212890625, 0.281646728515625, 0.047748565673828125, -0.030517578125, 0.6185760498046875, -0.1182403564453125, -0.060489654541015625, -0.5201797485351562, -0.30040740966796875, -0.21097946166992188, -0.12633705139160156, 0.1721019744873047, 0.24420166015625, -0.1218109130859375, -0.49646759033203125, -0.46550750732421875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000034.npy"}
|
||||
{"epoch": 0.05139833711262283, "step": 35, "batch_size": 64, "mean": 0.009096503257751465, "std": 0.32824620604515076, "min": -0.939361572265625, "p10": -0.42255268096923826, "median": 0.012386322021484375, "p90": 0.36980648040771485, "max": 0.7415199279785156, "pos_frac": 0.515625, "sample": [0.01183319091796875, -0.3803901672363281, 0.3900928497314453, 0.12050628662109375, -0.44062232971191406, 0.012939453125, -0.279693603515625, -0.06332969665527344, 0.32605743408203125, -0.5301933288574219, -0.114471435546875, -0.04312705993652344, -0.0059356689453125, 0.14373016357421875, -0.18825531005859375, -0.07614326477050781, 0.657806396484375, -0.1653289794921875, 0.7042236328125, -0.28710174560546875, 0.29744911193847656, 0.18994903564453125, 0.36983680725097656, -0.29209136962890625, -0.14072036743164062, -0.191009521484375, -0.939361572265625, -0.08382415771484375, -0.307037353515625, 0.352264404296875, -0.0723724365234375, 0.15325164794921875, -0.0586090087890625, -0.471435546875, 0.09799575805664062, -0.5583114624023438, -0.2828407287597656, -0.10479736328125, 0.2701377868652344, -0.5499229431152344, 0.10182571411132812, 0.1992950439453125, 0.7415199279785156, -0.24407196044921875, 0.44194793701171875, 0.01346588134765625, 0.19374465942382812, -0.24829864501953125, 0.023609161376953125, 0.2858715057373047, -0.13740158081054688, -0.1749420166015625, 0.12397003173828125, 0.12793731689453125, -0.5912933349609375, 0.3697357177734375, 0.32916259765625, 0.27930450439453125, 0.2845420837402344, 0.3815765380859375, 0.20277976989746094, 0.3408203125, -0.1682586669921875, 0.23418617248535156], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000035.npy"}
|
||||
{"epoch": 0.05291005291005291, "step": 36, "batch_size": 64, "mean": 0.002379119396209717, "std": 0.31688427925109863, "min": -0.9554290771484375, "p10": -0.39314956665039064, "median": -0.048580169677734375, "p90": 0.41073074340820315, "max": 0.8103485107421875, "pos_frac": 0.4375, "sample": [-0.0707244873046875, 0.4881591796875, -0.11426544189453125, -0.23177719116210938, 0.03589057922363281, -0.5432853698730469, 0.35462188720703125, 0.3064155578613281, 0.4065704345703125, -0.3785858154296875, 0.31122589111328125, 0.12578201293945312, 0.41251373291015625, -0.09930419921875, -0.09473991394042969, 0.0613861083984375, -0.08962249755859375, -0.24878692626953125, -0.03504371643066406, 0.5572776794433594, 0.06019783020019531, 0.28253936767578125, -0.04811859130859375, -0.13972854614257812, -0.13780593872070312, -0.9554290771484375, -0.5382499694824219, 0.5113449096679688, -0.016796112060546875, -0.0947113037109375, -0.15777587890625, -0.10350608825683594, 0.09625244140625, -0.0636444091796875, -0.096099853515625, -0.3312110900878906, -0.4652595520019531, 0.66436767578125, -0.2777366638183594, -0.23075103759765625, -0.13943862915039062, 0.1979827880859375, 0.319854736328125, -0.43575477600097656, 0.0051021575927734375, 0.3311767578125, 0.5127182006835938, 0.0264434814453125, 0.31023406982421875, 0.286773681640625, 0.09810638427734375, -0.4060211181640625, -0.39939117431640625, 0.15291595458984375, -0.1220855712890625, -0.0137939453125, 0.05731201171875, -0.3035545349121094, -0.0622711181640625, -0.049041748046875, -0.06605911254882812, 0.8103485107421875, -0.10965538024902344, 0.038776397705078125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000036.npy"}
|
||||
{"epoch": 0.05442176870748299, "step": 37, "batch_size": 64, "mean": 0.10119268298149109, "std": 0.45306500792503357, "min": -0.5797786712646484, "p10": -0.3392633438110351, "median": 0.00296783447265625, "p90": 0.5994712829589846, "max": 2.094970703125, "pos_frac": 0.515625, "sample": [0.137725830078125, 0.7929153442382812, 0.35235595703125, -0.34716033935546875, -0.089599609375, -0.10999298095703125, -0.1489410400390625, 0.07125282287597656, 0.08778190612792969, 0.0239715576171875, 1.338409423828125, 0.37779998779296875, 0.3143424987792969, 0.51593017578125, 0.004638671875, -0.317047119140625, 0.2815818786621094, 0.3431243896484375, 0.411773681640625, 0.273529052734375, -0.15259933471679688, -0.5797786712646484, -0.5074310302734375, -0.0824127197265625, 0.022691726684570312, 0.1531982421875, -0.07978057861328125, -0.3256969451904297, 0.06504058837890625, -0.04015350341796875, 0.0012969970703125, -0.3450775146484375, 0.7054519653320312, 0.5442123413085938, 1.3472900390625, -0.17095184326171875, 0.06270599365234375, -0.03534889221191406, -0.12100791931152344, -0.179534912109375, 2.094970703125, -0.26857757568359375, -0.04926300048828125, -0.13764190673828125, -0.0921630859375, 0.10890769958496094, 0.1040496826171875, -0.325653076171875, 0.21044921875, -0.471099853515625, 0.23415565490722656, -0.18051910400390625, -0.007320404052734375, -0.06270027160644531, 0.2255706787109375, 0.09600448608398438, -0.25516510009765625, 0.2825050354003906, -0.08111572265625, -0.36949920654296875, 0.717376708984375, -0.4404144287109375, -0.07618522644042969, 0.6231536865234375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000037.npy"}
|
||||
{"epoch": 0.055933484504913075, "step": 38, "batch_size": 64, "mean": 0.05771511793136597, "std": 0.3551373779773712, "min": -0.668243408203125, "p10": -0.34479103088378904, "median": 0.016338348388671875, "p90": 0.48822593688964855, "max": 1.1158981323242188, "pos_frac": 0.515625, "sample": [0.1329803466796875, 0.42987060546875, 0.33670806884765625, 1.1158981323242188, -0.22978591918945312, 0.4549140930175781, -0.03306007385253906, 0.2679328918457031, -0.6666030883789062, -0.007415771484375, 0.16153907775878906, -0.0888519287109375, 0.21431922912597656, 0.25332069396972656, -0.3263397216796875, -0.4298095703125, -0.40265655517578125, -0.015384674072265625, -0.07239532470703125, 0.2913970947265625, -0.35103607177734375, 0.41748046875, -0.13921356201171875, 0.01183319091796875, 0.270233154296875, -0.0712432861328125, 0.50250244140625, -0.668243408203125, 0.03704833984375, -0.2462005615234375, -0.3302192687988281, -0.3241119384765625, 0.08963203430175781, 0.08652114868164062, 0.7781181335449219, 0.306488037109375, -0.18050384521484375, -0.4523048400878906, 0.32178497314453125, -0.09055137634277344, -0.02520751953125, 0.020843505859375, 0.1682910919189453, -0.16650390625, 0.21100997924804688, -0.29399871826171875, -0.44121551513671875, -0.2517356872558594, 0.5669708251953125, 0.907135009765625, 0.0697021484375, 0.1799774169921875, 0.7486801147460938, -0.249542236328125, 0.20915794372558594, 0.23530006408691406, -0.0392303466796875, -0.0518035888671875, -0.09262847900390625, 0.2589683532714844, 0.5493278503417969, -0.03369140625, 0.1330089569091797, -0.2736396789550781], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000038.npy"}
|
||||
{"epoch": 0.05744520030234316, "step": 39, "batch_size": 64, "mean": 0.08076697587966919, "std": 0.39368367195129395, "min": -1.4610137939453125, "p10": -0.3873973846435546, "median": 0.10778999328613281, "p90": 0.52137451171875, "max": 0.826324462890625, "pos_frac": 0.5625, "sample": [0.520904541015625, -0.3308219909667969, -0.01297760009765625, 0.37331390380859375, -0.525421142578125, -0.2014923095703125, 0.40460968017578125, 0.2126483917236328, 0.33736419677734375, 0.1458892822265625, -0.41164398193359375, -0.14868736267089844, 0.6811389923095703, -0.06136131286621094, 0.2603759765625, 0.30504798889160156, -0.15867042541503906, 0.7942790985107422, 0.4927806854248047, -0.0706329345703125, -0.8119869232177734, -0.4123992919921875, -0.06944656372070312, 0.4390830993652344, 0.826324462890625, -0.039005279541015625, -1.4610137939453125, 0.2245330810546875, 0.764404296875, -0.065826416015625, 0.13592529296875, -0.249664306640625, 0.2999725341796875, 0.000823974609375, 0.10339736938476562, 0.232696533203125, 0.26004791259765625, -0.0443267822265625, 0.11975860595703125, -0.5856552124023438, 0.3856353759765625, -0.19583892822265625, -0.2499980926513672, 0.47226715087890625, 0.564697265625, -0.10552215576171875, 0.3869476318359375, 0.521575927734375, -0.07815933227539062, 0.11712646484375, 0.18476104736328125, -0.47705841064453125, -0.17637252807617188, -0.10308837890625, -0.00145721435546875, 0.628875732421875, 0.09765434265136719, 0.30322265625, 0.2823486328125, -0.12417984008789062, 0.35271453857421875, 0.05878257751464844, 0.1121826171875, -0.06231689453125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000039.npy"}
|
||||
{"epoch": 0.05895691609977324, "step": 40, "batch_size": 64, "mean": 0.0019148588180541992, "std": 0.293613076210022, "min": -0.9087677001953125, "p10": -0.37549476623535155, "median": 0.059223175048828125, "p90": 0.2583839416503907, "max": 0.567138671875, "pos_frac": 0.59375, "sample": [-0.1805572509765625, 0.06266021728515625, 0.029438018798828125, 0.1263275146484375, 0.23324203491210938, 0.2404918670654297, 0.0322723388671875, 0.14923477172851562, -0.177581787109375, 0.552947998046875, 0.5006256103515625, 0.2079792022705078, -0.04671478271484375, -0.278961181640625, -0.20506858825683594, -0.03929328918457031, 0.227813720703125, 0.18610382080078125, 0.24371910095214844, 0.10717582702636719, 0.13526535034179688, 0.1197357177734375, 0.19597625732421875, 0.15716934204101562, 0.08988189697265625, 0.26375579833984375, -0.23081207275390625, 0.07288360595703125, 0.43485260009765625, -0.1638774871826172, 0.245849609375, -0.6304397583007812, 0.567138671875, -0.3500175476074219, 0.16013526916503906, 0.07547569274902344, 0.3096809387207031, 0.22645950317382812, -0.5118789672851562, -0.04632568359375, -0.11948966979980469, -0.33349609375, 0.4994163513183594, -0.38641357421875, -0.1734771728515625, 0.10097885131835938, -0.08852767944335938, -0.2681694030761719, -0.2856330871582031, 0.242584228515625, 0.12231826782226562, -0.687591552734375, 0.010532379150390625, 0.0031280517578125, -0.030181884765625, -0.5077400207519531, -0.4215087890625, 0.0557861328125, -0.0453948974609375, 0.19235992431640625, 0.04328155517578125, -0.14183807373046875, -0.9087677001953125, 0.15763092041015625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000040.npy"}
|
||||
{"epoch": 0.06046863189720333, "step": 41, "batch_size": 64, "mean": 0.08180946111679077, "std": 0.38955605030059814, "min": -0.84954833984375, "p10": -0.39430389404296873, "median": 0.08917522430419922, "p90": 0.4978790283203127, "max": 0.9183502197265625, "pos_frac": 0.59375, "sample": [0.22222900390625, -0.06102752685546875, 0.05596733093261719, -0.6706657409667969, 0.1606903076171875, -0.3603057861328125, -0.22762680053710938, 0.7705116271972656, 0.08953857421875, -0.044895172119140625, 0.3637237548828125, 0.1798114776611328, 0.45012664794921875, 0.35723876953125, 0.248687744140625, 0.4504547119140625, -0.03330039978027344, 0.5182037353515625, 0.056919097900390625, -0.016819000244140625, 0.7231063842773438, -0.2476959228515625, -0.1951465606689453, 0.11277008056640625, 0.16661834716796875, 0.19969940185546875, -0.27048492431640625, 0.37124061584472656, 0.1119842529296875, -0.4236602783203125, 0.05393791198730469, 0.3541412353515625, 0.9066925048828125, 0.9183502197265625, -0.03320503234863281, -0.2415771484375, 0.36519622802734375, -0.14493942260742188, 0.41696929931640625, -0.041347503662109375, 0.1559600830078125, -0.8247718811035156, -0.17889022827148438, 0.08881187438964844, 0.8578338623046875, -0.09060859680175781, -0.052967071533203125, 0.33353424072265625, 0.09674072265625, -0.11142730712890625, -0.6314353942871094, -0.84954833984375, 0.104095458984375, 0.1206207275390625, 0.025791168212890625, -0.2775115966796875, 0.42372894287109375, 0.0867767333984375, 0.8478717803955078, -0.40887451171875, 0.4269561767578125, 0.13042449951171875, -0.032718658447265625, -0.61669921875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000041.npy"}
|
||||
{"epoch": 0.06198034769463341, "step": 42, "batch_size": 64, "mean": 0.067737877368927, "std": 0.30101218819618225, "min": -0.6041641235351562, "p10": -0.31495590209960933, "median": 0.0835733413696289, "p90": 0.38837928771972663, "max": 0.9388885498046875, "pos_frac": 0.609375, "sample": [0.3188018798828125, 0.77490234375, -0.036865234375, -0.11429214477539062, 0.18604660034179688, -0.5039825439453125, 0.10654640197753906, -0.09379196166992188, 0.00774383544921875, -0.11281585693359375, 0.00959014892578125, 0.0993194580078125, -0.2951316833496094, -0.23883056640625, 0.27109527587890625, 0.18662261962890625, 0.9388885498046875, 0.1212615966796875, -0.029205322265625, -0.12035369873046875, 0.09912109375, 0.18613433837890625, 0.40402984619140625, -0.026134490966796875, -0.6041641235351562, -0.15838623046875, -0.14426040649414062, 0.21868133544921875, 0.054271697998046875, -0.5016517639160156, 0.3938484191894531, 0.0443115234375, -0.04010772705078125, -0.11126708984375, 0.37561798095703125, -0.05330657958984375, 0.6466598510742188, 0.1539306640625, -0.46891021728515625, 0.35938262939453125, -0.3234519958496094, -0.2864227294921875, -0.5245285034179688, -0.14023590087890625, 0.05731201171875, 0.1708526611328125, 0.2107696533203125, 0.0864715576171875, 0.2597198486328125, -0.092529296875, 0.013214111328125, 0.27130889892578125, 0.33325958251953125, 0.2766609191894531, 0.5323047637939453, 0.15747833251953125, -0.3816375732421875, 0.08067512512207031, 0.5075531005859375, 0.12738800048828125, 0.16630935668945312, 0.3568744659423828, 0.23732757568359375, -0.06480026245117188], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000042.npy"}
|
||||
{"epoch": 0.06349206349206349, "step": 43, "batch_size": 64, "mean": 0.07686793804168701, "std": 0.43431714177131653, "min": -1.0652999877929688, "p10": -0.4047910690307617, "median": 0.06479358673095703, "p90": 0.5404312133789062, "max": 1.46405029296875, "pos_frac": 0.546875, "sample": [0.165069580078125, 0.24679946899414062, -0.075347900390625, 0.1384601593017578, 0.20104217529296875, -0.3619556427001953, -0.20396804809570312, -0.014268875122070312, 0.0657806396484375, -0.337921142578125, 0.33899879455566406, 0.5411224365234375, -0.125640869140625, -0.08792877197265625, 0.20037841796875, 0.4638824462890625, -0.021562576293945312, -0.18481063842773438, 0.17662620544433594, -0.220611572265625, 0.2966289520263672, 0.18810272216796875, -0.080841064453125, -0.20767974853515625, 0.5671005249023438, -0.1254253387451172, 0.04641151428222656, 0.3785858154296875, -0.2733650207519531, 0.538818359375, 1.46405029296875, -0.4737091064453125, 0.20539093017578125, -0.321258544921875, 0.01366424560546875, -0.075164794921875, 0.659912109375, -0.20926666259765625, 0.5029373168945312, 0.40869140625, 0.210693359375, -0.35976409912109375, 0.06544876098632812, -1.0652999877929688, 0.06413841247558594, 1.1876678466796875, 0.31368255615234375, 0.3260517120361328, -0.3092079162597656, 0.23326683044433594, -0.04044151306152344, 0.40077972412109375, 0.2685089111328125, -0.45113372802734375, 0.47695159912109375, 0.5199508666992188, -0.9008922576904297, 0.7591781616210938, 0.6118011474609375, -0.025360107421875, -0.678314208984375, -0.42314910888671875, -0.092376708984375, -0.5803604125976562], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000043.npy"}
|
||||
{"epoch": 0.06500377928949358, "step": 44, "batch_size": 64, "mean": -0.022500991821289062, "std": 0.3487635552883148, "min": -0.5951690673828125, "p10": -0.45119323730468747, "median": -0.07227516174316406, "p90": 0.28334274291992195, "max": 1.245513916015625, "pos_frac": 0.421875, "sample": [-0.45891571044921875, -0.2723274230957031, 0.09503936767578125, 0.06205940246582031, 0.29073333740234375, -0.37384033203125, -0.12860870361328125, -0.20121002197265625, -0.332183837890625, -0.099517822265625, 0.25450897216796875, 0.5309295654296875, -0.559417724609375, -0.09894561767578125, 0.23839950561523438, -0.226531982421875, 0.020597457885742188, -0.07962417602539062, -0.013731002807617188, 0.22359848022460938, -0.07435226440429688, 0.009731292724609375, 0.2646141052246094, -0.28270721435546875, -0.08727264404296875, 0.0147705078125, 0.2610626220703125, -0.5448532104492188, 0.0442352294921875, 0.1680908203125, 0.4568939208984375, -0.32574462890625, 0.11214447021484375, -0.12469482421875, -0.030664443969726562, -0.12511444091796875, 1.245513916015625, 0.09000015258789062, -0.30434417724609375, -0.16774749755859375, -0.08263397216796875, 0.08840560913085938, 0.03045654296875, -0.1072235107421875, -0.1735687255859375, 0.540283203125, -0.24832916259765625, 0.2514801025390625, -0.0454254150390625, -0.0985260009765625, -0.5951690673828125, 0.2660980224609375, 0.20885467529296875, -0.40423583984375, -0.5016021728515625, -0.22983551025390625, -0.45975494384765625, 1.142913818359375, 0.24640655517578125, 0.37912750244140625, -0.07019805908203125, -0.025539398193359375, -0.43317413330078125, -0.589447021484375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000044.npy"}
|
||||
{"epoch": 0.06651549508692366, "step": 45, "batch_size": 64, "mean": 0.10260695219039917, "std": 0.37659668922424316, "min": -1.360595703125, "p10": -0.35237751007080076, "median": 0.11300849914550781, "p90": 0.5709524154663086, "max": 0.8811492919921875, "pos_frac": 0.625, "sample": [0.1298828125, -0.0365753173828125, 0.5855236053466797, -0.0037078857421875, -0.3529949188232422, 0.44484710693359375, -0.5222988128662109, 0.650115966796875, -0.0597381591796875, -1.360595703125, 0.4391517639160156, 0.0395355224609375, 0.09972381591796875, 0.01145172119140625, 0.05954551696777344, -0.03034210205078125, 0.32503509521484375, 0.0080413818359375, 0.416473388671875, 0.5737991333007812, 0.3361663818359375, -0.3509368896484375, 0.5643100738525391, 0.1389598846435547, 0.4829139709472656, -0.29254150390625, 0.686248779296875, 0.04617881774902344, 0.47374725341796875, 0.06355667114257812, 0.14798355102539062, 0.48785400390625, -0.6880264282226562, -0.04741668701171875, 0.23942947387695312, 0.5171356201171875, -0.045684814453125, 0.15180206298828125, -0.44295501708984375, -0.02236175537109375, -0.737274169921875, 0.127197265625, -0.0353240966796875, 0.14203834533691406, -0.21022796630859375, 0.16198158264160156, -0.35491943359375, 0.12629318237304688, -0.046356201171875, 0.58917236328125, 0.2155609130859375, 0.020715713500976562, 0.47014617919921875, 0.2695770263671875, 0.2885932922363281, 0.6362457275390625, 0.8811492919921875, -0.00664520263671875, 0.20441436767578125, -0.017086029052734375, -0.051593780517578125, -0.056396484375, 0.1514739990234375, -0.06512832641601562], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000045.npy"}
|
||||
{"epoch": 0.06802721088435375, "step": 46, "batch_size": 64, "mean": 0.13755744695663452, "std": 0.4308947026729584, "min": -0.6207466125488281, "p10": -0.42817382812499993, "median": 0.13070392608642578, "p90": 0.6811210632324219, "max": 1.2515869140625, "pos_frac": 0.578125, "sample": [0.0911407470703125, -0.5193405151367188, -0.47418975830078125, 0.16924285888671875, 0.8050689697265625, -0.09765625, -0.06555747985839844, -0.24999237060546875, 0.26532745361328125, -0.468658447265625, 0.9026298522949219, 0.21828460693359375, 0.4810638427734375, 0.46930694580078125, -0.09558868408203125, 0.22117233276367188, -0.6110458374023438, 0.17715072631835938, 0.1457042694091797, 0.3564720153808594, 0.3601970672607422, 0.991973876953125, -0.008790969848632812, 0.6060028076171875, 0.192352294921875, -0.333709716796875, 0.12833404541015625, 1.0851821899414062, 0.7736358642578125, -0.097625732421875, 0.029022216796875, 0.5283279418945312, -0.1019439697265625, -0.1186370849609375, -0.30804443359375, 0.557220458984375, 0.031982421875, 0.144195556640625, -0.6207466125488281, 0.4588813781738281, 0.6038322448730469, -0.1036529541015625, 0.6083908081054688, 0.16916656494140625, 0.6936111450195312, 0.08655929565429688, -0.2888946533203125, -0.1539764404296875, -0.280609130859375, -0.26561546325683594, 0.5604438781738281, -0.09624481201171875, -0.2900199890136719, 0.14366912841796875, 0.6519775390625, 1.2515869140625, -0.5218505859375, -0.01076507568359375, -0.5183639526367188, 0.4757080078125, -0.1219940185546875, 0.1330738067626953, -0.21832656860351562, 0.27762603759765625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000046.npy"}
|
||||
{"epoch": 0.06953892668178382, "step": 47, "batch_size": 64, "mean": 0.200914204120636, "std": 0.3104929029941559, "min": -0.4773998260498047, "p10": -0.20305271148681636, "median": 0.19398117065429688, "p90": 0.5740081787109376, "max": 0.9452590942382812, "pos_frac": 0.734375, "sample": [0.20790481567382812, 0.37696075439453125, 0.22149658203125, 0.268218994140625, 0.9452590942382812, 0.17179298400878906, -0.03905487060546875, 0.856597900390625, -0.033447265625, 0.2911376953125, 0.15612411499023438, -0.15430641174316406, -0.0983428955078125, -0.29058074951171875, 0.09004974365234375, 0.34369659423828125, 0.1853008270263672, 0.4305267333984375, 0.0761871337890625, 0.2890167236328125, 0.16924667358398438, -0.15973663330078125, 0.029144287109375, 0.10447883605957031, -0.03345298767089844, 0.1668243408203125, 0.2419605255126953, 0.2740364074707031, 0.12926101684570312, 0.38976097106933594, -0.2216167449951172, 0.41206932067871094, -0.09374237060546875, 0.38483428955078125, 0.01100921630859375, 0.5787734985351562, 0.7115287780761719, 0.19281005859375, 0.34212684631347656, 0.450469970703125, -0.321197509765625, 0.42431640625, 0.2059783935546875, -0.10149383544921875, 0.5477485656738281, 0.008831024169921875, 0.8870697021484375, 0.19515228271484375, 0.5186004638671875, 0.40825843811035156, -0.00653839111328125, 0.0813140869140625, -0.29032135009765625, -0.23471832275390625, -0.4773998260498047, 0.6937217712402344, 0.17107200622558594, 0.5628890991210938, 0.3018608093261719, 0.376800537109375, 0.2964591979980469, 0.73345947265625, -0.057586669921875, -0.4400920867919922], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000047.npy"}
|
||||
{"epoch": 0.0710506424792139, "step": 48, "batch_size": 64, "mean": 0.0035860538482666016, "std": 0.38119831681251526, "min": -0.9284896850585938, "p10": -0.4453884124755859, "median": -0.022296905517578125, "p90": 0.4957069396972662, "max": 0.9372406005859375, "pos_frac": 0.46875, "sample": [-0.1694488525390625, -0.10809326171875, 0.25617408752441406, 0.5989227294921875, 0.1725006103515625, 0.8677330017089844, 0.04803466796875, -0.1615142822265625, -0.304840087890625, -0.1576366424560547, 0.14267730712890625, 0.06364059448242188, -0.32486724853515625, -0.3529205322265625, 0.559539794921875, 0.6586761474609375, 0.25803375244140625, 0.00262451171875, -0.9284896850585938, 0.0514984130859375, -0.02231597900390625, 0.2421112060546875, 0.27214813232421875, 0.34676361083984375, 0.14247512817382812, 0.19722747802734375, -0.1631927490234375, 0.3108100891113281, -0.6726837158203125, 0.31472015380859375, -0.09050750732421875, 0.1439056396484375, -0.06220817565917969, 0.006450653076171875, -0.08671188354492188, 0.22846221923828125, -0.1288909912109375, -0.015819549560546875, 0.9233837127685547, -0.3324737548828125, -0.4728412628173828, 0.1050872802734375, -0.382904052734375, 0.06415367126464844, -0.44818878173828125, -0.11022758483886719, -0.06965446472167969, -0.02227783203125, -0.6987152099609375, -0.4238758087158203, 0.233612060546875, -0.19754791259765625, -0.4681587219238281, 0.10178756713867188, -0.4388542175292969, -0.06351089477539062, 0.9372406005859375, -0.10284614562988281, 0.01104736328125, -0.2775154113769531, -0.031080245971679688, 0.9014205932617188, -0.05955696105957031, -0.5829849243164062], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000048.npy"}
|
||||
{"epoch": 0.07256235827664399, "step": 49, "batch_size": 64, "mean": 0.11987724900245667, "std": 0.4109678566455841, "min": -1.338287353515625, "p10": -0.3054500579833984, "median": 0.12145328521728516, "p90": 0.551046943664551, "max": 1.4840316772460938, "pos_frac": 0.703125, "sample": [0.3933868408203125, 0.15044403076171875, 0.18800926208496094, 0.2257537841796875, -0.12993621826171875, -0.11902236938476562, -0.10728073120117188, 0.13104629516601562, 0.2756805419921875, 0.15380859375, -0.002506256103515625, 0.06715011596679688, 0.3864784240722656, 0.30246734619140625, -0.3655872344970703, 0.03360557556152344, -0.3161354064941406, -0.614227294921875, 0.2346649169921875, -0.72882080078125, 0.18071937561035156, 0.00199127197265625, 0.06158256530761719, 0.188323974609375, 0.208587646484375, 0.8773651123046875, 0.4533233642578125, 0.12307167053222656, 0.09006500244140625, -0.09771728515625, 0.11033248901367188, 0.32403564453125, 0.32544708251953125, 0.5729045867919922, 0.1958484649658203, 0.6963348388671875, 0.388153076171875, 0.4736480712890625, 0.6478424072265625, -0.00032806396484375, 0.5000457763671875, -0.0133819580078125, 0.11983489990234375, 0.2160797119140625, -0.09133148193359375, 0.04618072509765625, 0.077301025390625, 0.3283729553222656, 0.4104728698730469, -0.05816459655761719, 0.07315826416015625, 0.7553176879882812, -1.338287353515625, 0.03963470458984375, -0.24291610717773438, -0.39713287353515625, 0.0479736328125, 1.4840316772460938, 0.63153076171875, 0.29668426513671875, 0.10207366943359375, -0.280517578125, -0.07128143310546875, -0.9440460205078125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000049.npy"}
|
||||
{"epoch": 0.07407407407407407, "step": 50, "batch_size": 64, "mean": 0.03892248868942261, "std": 0.3637368679046631, "min": -0.9008941650390625, "p10": -0.4429340362548828, "median": 0.021257400512695312, "p90": 0.45134716033935546, "max": 1.0003433227539062, "pos_frac": 0.53125, "sample": [0.44913482666015625, 0.187286376953125, -0.0438232421875, -0.13028335571289062, 0.3675041198730469, -0.2486724853515625, 0.2585716247558594, -0.1558074951171875, 0.7623062133789062, 0.30573272705078125, 0.10547256469726562, 0.290435791015625, -0.7384567260742188, -0.14923477172851562, -0.068756103515625, -0.33040618896484375, 0.157012939453125, 0.4467048645019531, 0.45229530334472656, 0.13626861572265625, -0.4804229736328125, 0.28878021240234375, -0.16237831115722656, 0.42324066162109375, -0.09062957763671875, 0.6805419921875, -0.2863044738769531, 0.09432220458984375, 0.025615692138671875, 0.120697021484375, 0.38889503479003906, 0.16864395141601562, 0.5407314300537109, 0.31824302673339844, -0.9008941650390625, -0.445159912109375, 0.24688720703125, 1.0003433227539062, -0.21402549743652344, -0.15358734130859375, 0.56646728515625, 0.2776031494140625, -0.5211181640625, -0.4682769775390625, -0.0059490203857421875, 0.11408233642578125, -0.06803131103515625, 0.21533203125, -0.1501617431640625, 0.256011962890625, 0.22878265380859375, -0.13908958435058594, -0.26481056213378906, -0.10508155822753906, 0.6526165008544922, -0.2962646484375, -0.5736579895019531, -0.4377403259277344, -0.07825469970703125, -0.16597938537597656, -0.21763992309570312, 0.00701141357421875, 0.01689910888671875, 0.031463623046875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000050.npy"}
|
||||
{"epoch": 0.07558578987150416, "step": 51, "batch_size": 64, "mean": 0.17561128735542297, "std": 0.4938313961029053, "min": -1.1319808959960938, "p10": -0.3832773208618164, "median": 0.14874744415283203, "p90": 0.847515106201172, "max": 1.432098388671875, "pos_frac": 0.609375, "sample": [0.8212127685546875, 0.023496627807617188, -0.12652587890625, 0.0468902587890625, 0.43903350830078125, 0.48717498779296875, -0.04237174987792969, 0.13782310485839844, -0.3867168426513672, 0.8998184204101562, 0.03988075256347656, 0.5367317199707031, -0.13277053833007812, 0.8587875366210938, -0.35596466064453125, 0.387054443359375, -0.05060577392578125, 0.3658905029296875, 0.39664459228515625, 0.429840087890625, 0.26084136962890625, 0.19320106506347656, 0.5731410980224609, 1.364410400390625, -0.37525177001953125, -0.7801361083984375, -0.13275146484375, -0.5832252502441406, 0.23961257934570312, 0.31313514709472656, 0.5193252563476562, 1.2113189697265625, -0.08495140075683594, 0.978179931640625, -1.1319808959960938, 0.057342529296875, 0.15967178344726562, -0.4291114807128906, -0.43230438232421875, -0.10174942016601562, -0.07639694213867188, -0.26432037353515625, 0.08679580688476562, 1.0576019287109375, -0.2631187438964844, -0.17120361328125, -0.2393646240234375, 1.432098388671875, 0.2298583984375, 0.6161880493164062, 0.3265552520751953, 0.258026123046875, 0.5600357055664062, -0.18737030029296875, -0.17023468017578125, -0.005039215087890625, 0.31696319580078125, 0.787445068359375, 0.07873153686523438, 0.4551429748535156, -0.5006027221679688, 0.29327392578125, 0.19191932678222656, -0.16790390014648438], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000051.npy"}
|
||||
{"epoch": 0.07709750566893424, "step": 52, "batch_size": 64, "mean": 0.19924335181713104, "std": 0.4404902458190918, "min": -0.8837738037109375, "p10": -0.2757972717285156, "median": 0.16382408142089844, "p90": 0.7775802612304688, "max": 1.288482666015625, "pos_frac": 0.65625, "sample": [1.0881881713867188, 0.4602508544921875, 0.40966796875, 0.625701904296875, 0.8460235595703125, -0.0030975341796875, 0.24104690551757812, -0.20706939697265625, -0.26165008544921875, 0.7389373779296875, 0.7729339599609375, 0.6654129028320312, -0.3524322509765625, 0.779571533203125, 0.5321121215820312, -0.2818603515625, 0.33879852294921875, -0.38690185546875, -0.23626708984375, -0.09276008605957031, 0.0687713623046875, -0.028505325317382812, 0.15070724487304688, 0.17700576782226562, -0.00726318359375, 0.4461669921875, -0.057891845703125, -0.23492431640625, 0.2011871337890625, 0.051067352294921875, 0.1435871124267578, 0.7437591552734375, 0.04494476318359375, -0.3105602264404297, 0.1636524200439453, 0.38828277587890625, 0.005401611328125, 0.19510650634765625, 0.503204345703125, -0.8833160400390625, 0.06385040283203125, 0.3108673095703125, 0.16399574279785156, 0.33381080627441406, -0.113525390625, 0.365142822265625, 0.12798595428466797, 0.04392242431640625, 1.185394287109375, -0.12931060791015625, 0.831146240234375, -0.8837738037109375, -0.05811309814453125, 0.6062583923339844, 1.288482666015625, -0.185943603515625, 0.8542861938476562, -0.23485183715820312, -0.497314453125, 0.18336868286132812, 0.41600799560546875, -0.023386001586914062, 0.298980712890625, 0.36730194091796875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000052.npy"}
|
||||
{"epoch": 0.07860922146636433, "step": 53, "batch_size": 64, "mean": 0.12827906012535095, "std": 0.39863261580467224, "min": -1.106781005859375, "p10": -0.2437496185302734, "median": 0.10881423950195312, "p90": 0.6265871047973635, "max": 1.1720123291015625, "pos_frac": 0.640625, "sample": [0.4953193664550781, -0.21807479858398438, 0.30971527099609375, 0.0435028076171875, 0.3007183074951172, 0.22783470153808594, 0.2018280029296875, 0.5121078491210938, 0.15276718139648438, 0.20038604736328125, -0.09765625, 0.04962158203125, 0.19246673583984375, 0.3832378387451172, -0.25475311279296875, -0.20790863037109375, 0.2371845245361328, 0.5407028198242188, -0.1754169464111328, -0.735198974609375, 0.08269500732421875, 0.11181640625, 0.0177459716796875, 0.7395286560058594, -1.106781005859375, 0.23459625244140625, -0.1706981658935547, 0.7245750427246094, 0.5808010101318359, 0.12836456298828125, 0.09567070007324219, -0.11041259765625, -0.00159454345703125, -0.2763824462890625, 0.3133964538574219, 0.46484375, 0.40314483642578125, 0.8623123168945312, -0.05411338806152344, -0.42229652404785156, -0.1331939697265625, -0.58135986328125, 0.011417388916015625, -0.6894073486328125, 0.646209716796875, 0.249664306640625, -0.08293914794921875, -0.1241455078125, 0.7404975891113281, 1.1720123291015625, -0.0595703125, 0.3815498352050781, -0.21049880981445312, 0.38851165771484375, 0.10581207275390625, 0.032726287841796875, 0.9673919677734375, 0.08742523193359375, 0.30146026611328125, 0.1514873504638672, -0.09336090087890625, 0.3769683837890625, -0.15171241760253906, -0.05268287658691406], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000053.npy"}
|
||||
{"epoch": 0.0801209372637944, "step": 54, "batch_size": 64, "mean": 0.15532565116882324, "std": 0.44076552987098694, "min": -0.9040756225585938, "p10": -0.35993614196777346, "median": 0.17475509643554688, "p90": 0.6130540847778322, "max": 1.81201171875, "pos_frac": 0.671875, "sample": [0.5874366760253906, 0.9427108764648438, -0.19061279296875, 0.18060302734375, -0.014820098876953125, -0.00890350341796875, 0.4564838409423828, 0.16890716552734375, 0.8254547119140625, -0.7305984497070312, 0.0051212310791015625, 0.1566314697265625, 0.821380615234375, -0.9040756225585938, 0.274139404296875, 0.09180641174316406, 0.4793548583984375, 0.2909584045410156, 0.136749267578125, -0.36016845703125, 0.18637847900390625, 1.81201171875, 0.27507972717285156, 0.7882518768310547, -0.274749755859375, 0.3868522644042969, 0.30164337158203125, 0.09111404418945312, 0.11869049072265625, -0.3593940734863281, -0.3766021728515625, -0.059093475341796875, -0.0969696044921875, -0.13069915771484375, 0.6240329742431641, 0.10626220703125, 0.46807861328125, 0.33240509033203125, 0.20953369140625, 0.8504142761230469, 0.42116546630859375, -0.2596282958984375, 0.23601150512695312, 0.3373870849609375, -0.1645355224609375, 0.5231342315673828, 0.07598876953125, 0.13245391845703125, -0.06737709045410156, 0.31253814697265625, 0.49683380126953125, -0.26346588134765625, 0.18724822998046875, -0.0882568359375, 0.45299339294433594, -0.13805580139160156, 0.21436309814453125, 0.1492786407470703, 0.3595390319824219, -0.6143875122070312, -0.5513839721679688, 0.19223785400390625, 0.296356201171875, -0.7613964080810547], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000054.npy"}
|
||||
{"epoch": 0.08163265306122448, "step": 55, "batch_size": 64, "mean": 0.2150469720363617, "std": 0.4703444838523865, "min": -0.6493568420410156, "p10": -0.34114456176757807, "median": 0.18860721588134766, "p90": 0.8751262664794925, "max": 1.37078857421875, "pos_frac": 0.640625, "sample": [1.1629867553710938, 0.34206390380859375, 0.003147125244140625, -0.2619781494140625, -0.2273998260498047, 0.4171142578125, 0.516937255859375, 0.22723388671875, -0.45833396911621094, -0.08683013916015625, 0.179168701171875, 0.3654937744140625, -0.6493568420410156, 0.00284576416015625, -0.038784027099609375, 0.646331787109375, 1.0983543395996094, 0.26227569580078125, -0.10302734375, -0.542755126953125, 0.3009834289550781, -0.6422119140625, 0.5631828308105469, 0.27313232421875, 1.285888671875, 0.21793174743652344, -0.0347137451171875, 0.74627685546875, -0.35463714599609375, -0.634307861328125, 0.5029449462890625, -0.014801025390625, 0.392608642578125, -0.2621574401855469, -0.03631591796875, -0.41632843017578125, 0.3237457275390625, 0.11918258666992188, 0.79156494140625, -0.14704513549804688, 0.1974029541015625, 0.6954269409179688, 0.10294342041015625, 0.9351654052734375, 0.37625885009765625, -0.024042129516601562, 1.0652923583984375, 0.2455902099609375, -0.139617919921875, 0.1527557373046875, 0.4581451416015625, 0.45969200134277344, 0.073486328125, -0.12406539916992188, -0.2758026123046875, 0.5819244384765625, 0.713226318359375, 0.1798114776611328, 0.3621692657470703, 1.37078857421875, -0.094146728515625, -0.309661865234375, 0.9109382629394531, 0.01891326904296875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000055.npy"}
|
||||
{"epoch": 0.08314436885865457, "step": 56, "batch_size": 64, "mean": 0.15978825092315674, "std": 0.39165282249450684, "min": -1.386474609375, "p10": -0.23469715118408202, "median": 0.13935470581054688, "p90": 0.6480243682861329, "max": 1.1162261962890625, "pos_frac": 0.65625, "sample": [-0.0453643798828125, 0.20283889770507812, 0.1114959716796875, -0.2702827453613281, 0.2033233642578125, -0.22817230224609375, 0.9945831298828125, -0.3014984130859375, -0.0092315673828125, 0.06382179260253906, -0.2711067199707031, -0.3567352294921875, 0.1608734130859375, -0.070098876953125, 0.3485565185546875, 0.6333351135253906, -0.0266876220703125, -0.21291542053222656, 0.24147796630859375, 0.667327880859375, -0.2631187438964844, 0.17609405517578125, 0.26302528381347656, 0.36101531982421875, 0.1812152862548828, 0.3988761901855469, 0.0445404052734375, 0.226898193359375, -0.13350296020507812, 0.595123291015625, -0.20035934448242188, 1.0082321166992188, 0.846466064453125, 0.5026054382324219, 0.6543197631835938, 0.31610107421875, -1.386474609375, 0.2833995819091797, -0.11767196655273438, 0.2924842834472656, 0.20431900024414062, 0.06639862060546875, -0.1924896240234375, 0.012847900390625, 0.16414642333984375, -0.08493804931640625, -0.10364532470703125, 0.0978240966796875, 0.11266136169433594, 0.3709697723388672, 0.8134117126464844, -0.21353530883789062, 0.4728431701660156, 0.4078521728515625, -0.0623626708984375, -0.23749351501464844, 0.058917999267578125, 0.22525787353515625, 0.6146507263183594, 0.33640289306640625, 0.11222457885742188, 1.1162261962890625, 0.11783599853515625, -0.06868743896484375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000056.npy"}
|
||||
{"epoch": 0.08465608465608465, "step": 57, "batch_size": 64, "mean": 0.26503095030784607, "std": 0.4748647212982178, "min": -1.0097503662109375, "p10": -0.28848724365234374, "median": 0.23955154418945312, "p90": 0.735302734375, "max": 1.814727783203125, "pos_frac": 0.71875, "sample": [-0.16729354858398438, -0.26972198486328125, 0.2712554931640625, 1.814727783203125, -0.1272907257080078, 0.4722747802734375, 0.5610809326171875, -0.2911834716796875, 0.5129051208496094, 0.370941162109375, 0.959869384765625, 1.0379009246826172, -0.4404945373535156, 0.2606029510498047, 0.239898681640625, -0.33850669860839844, 0.1053619384765625, 0.5457229614257812, 1.0689620971679688, 0.10096931457519531, 0.59765625, -0.4647979736328125, 0.5490970611572266, 0.16815185546875, -0.05714988708496094, -0.22363853454589844, -0.078704833984375, -0.31719207763671875, 0.30158424377441406, -1.0097503662109375, 0.6840972900390625, 0.34845733642578125, 0.4278144836425781, 0.13481903076171875, -0.15546798706054688, 1.1826629638671875, 0.058685302734375, 0.28826332092285156, 0.06067657470703125, 0.269989013671875, -0.02291107177734375, -0.41082763671875, -0.282196044921875, 0.5962467193603516, -0.0235595703125, 0.5157318115234375, 0.6787567138671875, 0.20965576171875, 0.64276123046875, 0.6318607330322266, 0.4907798767089844, 0.23920440673828125, 0.24602508544921875, 1.4175567626953125, 0.024471282958984375, 0.5128898620605469, 0.16876220703125, 0.10216712951660156, 0.14226722717285156, 0.7374725341796875, -0.02939605712890625, 0.08229255676269531, 0.7302398681640625, 0.10849380493164062], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000057.npy"}
|
||||
{"epoch": 0.08616780045351474, "step": 58, "batch_size": 64, "mean": 0.21258807182312012, "std": 0.6354427933692932, "min": -0.8826484680175781, "p10": -0.5309524536132812, "median": 0.11204338073730469, "p90": 1.0033805847167971, "max": 2.382781982421875, "pos_frac": 0.59375, "sample": [-0.8826484680175781, 0.560089111328125, 0.5905551910400391, -0.15687942504882812, 0.5345458984375, -0.4009246826171875, 0.207000732421875, 0.7926483154296875, -0.39484405517578125, -0.2854156494140625, 0.27909088134765625, -0.08339500427246094, 1.021636962890625, -0.5209121704101562, 0.026842117309570312, 0.3067207336425781, 0.8221168518066406, -0.10503387451171875, -0.125457763671875, 1.4019317626953125, 0.07315826416015625, 0.00853729248046875, 0.558990478515625, 0.8693389892578125, -0.6496124267578125, -0.0555419921875, 0.475433349609375, -0.19994354248046875, -0.45624542236328125, 0.6292495727539062, -0.5484466552734375, -0.8685169219970703, 0.17140960693359375, 0.5391311645507812, 2.382781982421875, -0.21378707885742188, 0.50390625, 1.8289031982421875, 1.0204620361328125, 0.1367034912109375, -0.3463134765625, 0.049869537353515625, -0.07461929321289062, -0.5352554321289062, -0.662017822265625, 1.301361083984375, -0.39121437072753906, 0.38895416259765625, 0.051605224609375, 0.7187042236328125, 0.9081268310546875, 0.3607139587402344, 0.09542083740234375, -0.21216201782226562, -0.04959869384765625, 1.1391220092773438, 0.2147064208984375, -0.3499298095703125, -0.6187515258789062, 0.54583740234375, 0.9635238647460938, -0.16658782958984375, 0.3518962860107422, 0.12866592407226562], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000058.npy"}
|
||||
{"epoch": 0.08767951625094482, "step": 59, "batch_size": 64, "mean": 0.187290757894516, "std": 0.5245020389556885, "min": -0.6098785400390625, "p10": -0.3625804901123047, "median": 0.06862640380859375, "p90": 0.7880813598632815, "max": 2.2016754150390625, "pos_frac": 0.609375, "sample": [0.4276542663574219, -0.3028717041015625, -0.6094207763671875, -0.3587379455566406, -0.28385353088378906, -0.47125816345214844, 0.8180694580078125, -0.0533294677734375, 1.0562667846679688, 0.5860595703125, 1.383270263671875, 0.51171875, 0.4404754638671875, 0.5483207702636719, 0.12365913391113281, 0.3643512725830078, 0.0686187744140625, -0.46969032287597656, 0.068634033203125, -0.01483154296875, 0.2522602081298828, -0.364227294921875, 0.0302886962890625, 0.39319610595703125, -0.4566650390625, 0.38304901123046875, 0.039093017578125, 0.9790878295898438, 1.4271240234375, 0.01457977294921875, -0.2607536315917969, -0.1670379638671875, -0.10280990600585938, -0.25672149658203125, -0.573028564453125, 0.06950187683105469, 0.015270233154296875, 0.32422828674316406, -0.6098785400390625, -0.33078575134277344, 2.2016754150390625, 0.18238067626953125, 0.0544891357421875, 0.34311866760253906, 0.2549247741699219, 0.6074905395507812, -0.2719841003417969, 0.6673431396484375, -0.11153030395507812, 0.9534454345703125, -0.13315200805664062, 0.718109130859375, 0.4643898010253906, 0.3792572021484375, 0.00591278076171875, -0.156097412109375, -0.12651824951171875, -0.107330322265625, 0.26950836181640625, -0.06812286376953125, 0.3811492919921875, 0.702392578125, 0.4233741760253906, -0.28649330139160156], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000059.npy"}
|
||||
{"epoch": 0.08919123204837491, "step": 60, "batch_size": 64, "mean": 0.09666106104850769, "std": 0.45909225940704346, "min": -1.2160186767578125, "p10": -0.46461715698242184, "median": 0.11669254302978516, "p90": 0.7003143310546875, "max": 1.1656150817871094, "pos_frac": 0.609375, "sample": [0.357208251953125, -0.0200042724609375, 0.24560165405273438, 0.1243133544921875, 0.7032623291015625, 0.8202476501464844, -0.2520599365234375, 0.031982421875, 0.15886688232421875, -0.289947509765625, 0.33910369873046875, 0.587982177734375, -0.04802894592285156, 0.26860618591308594, -0.4122161865234375, 0.7208061218261719, -0.12266921997070312, 0.10907173156738281, 0.9262065887451172, 0.6266937255859375, -0.377593994140625, 0.00341796875, 0.2003326416015625, -0.26061248779296875, 0.4415435791015625, -0.1053619384765625, 0.19791412353515625, 0.04535865783691406, 0.4623565673828125, -0.5872840881347656, 0.18691253662109375, -0.1256561279296875, 0.46234893798828125, 0.4678077697753906, 0.000919342041015625, 0.31982421875, 0.5074977874755859, -0.8151741027832031, -0.09696197509765625, -0.6523571014404297, -0.42642974853515625, 0.7439441680908203, 0.037212371826171875, -0.4693183898925781, 0.04834175109863281, 0.28369140625, 0.19472122192382812, -0.8365020751953125, -0.006153106689453125, -0.04428672790527344, 0.22383880615234375, 1.1656150817871094, 0.19278335571289062, 0.1307373046875, 0.482147216796875, -0.4536476135253906, 0.21704483032226562, -0.0992279052734375, 0.6934356689453125, -0.07649612426757812, -1.2160186767578125, -0.6103515625, 0.9755878448486328, -0.11462020874023438], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000060.npy"}
|
||||
{"epoch": 0.09070294784580499, "step": 61, "batch_size": 64, "mean": 0.13115930557250977, "std": 0.4847688376903534, "min": -1.1616592407226562, "p10": -0.4782142639160156, "median": 0.18262290954589844, "p90": 0.8225868225097659, "max": 1.09625244140625, "pos_frac": 0.609375, "sample": [-0.481475830078125, -0.00860595703125, 0.928436279296875, 0.8490982055664062, -0.16064453125, 0.17852020263671875, 0.427001953125, 0.61370849609375, 0.22525787353515625, -0.10868072509765625, 0.2026824951171875, -0.09539985656738281, -0.16380691528320312, -0.2332916259765625, 0.6473007202148438, -0.6061954498291016, 0.36988067626953125, -0.06356048583984375, 0.40296173095703125, 0.42597007751464844, 0.23984336853027344, 0.314910888671875, 0.02780914306640625, 0.18672561645507812, -0.1455249786376953, -0.03181648254394531, 0.361785888671875, -0.2750244140625, 0.21688079833984375, -0.2102508544921875, 0.5761623382568359, -0.08251380920410156, -0.39977264404296875, 1.065887451171875, -0.1256103515625, 0.364013671875, 0.2864837646484375, -0.884124755859375, 0.060626983642578125, 0.9314727783203125, 1.09625244140625, -1.1616592407226562, 0.5723342895507812, -0.70758056640625, 0.20706939697265625, 0.5621757507324219, 0.3121604919433594, 0.119873046875, -0.21666717529296875, 0.045429229736328125, 1.0125579833984375, -0.6713638305664062, 0.08069801330566406, 0.21970367431640625, 0.3912506103515625, -0.38431549072265625, 0.1752471923828125, 0.47780799865722656, -0.259613037109375, 0.19179534912109375, 0.9122085571289062, 0.7607269287109375, -0.6984138488769531, -0.47060394287109375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000061.npy"}
|
||||
{"epoch": 0.09221466364323508, "step": 62, "batch_size": 64, "mean": 0.21290069818496704, "std": 0.57977294921875, "min": -2.7657470703125, "p10": -0.25799179077148426, "median": 0.17054367065429688, "p90": 0.8478874206542971, "max": 1.407958984375, "pos_frac": 0.65625, "sample": [0.66748046875, -0.7228565216064453, -0.5124740600585938, 0.2184123992919922, 0.25689697265625, -0.12959671020507812, 1.0583229064941406, -0.09441184997558594, -0.00823211669921875, 0.12775421142578125, 1.1335868835449219, -0.09727668762207031, 0.4262237548828125, 0.2564849853515625, -0.13016510009765625, 0.1644287109375, 0.2414093017578125, -0.11472320556640625, -0.044498443603515625, 0.6504135131835938, 0.44179534912109375, -0.02744293212890625, -0.312774658203125, 0.47498130798339844, 0.08256149291992188, -0.0975189208984375, -0.36605072021484375, -0.341400146484375, 0.87957763671875, 0.7656402587890625, 0.15070343017578125, 0.12094306945800781, -0.07645416259765625, 0.0084075927734375, 1.068603515625, -2.7657470703125, 0.5068130493164062, 0.4985466003417969, -0.08628654479980469, 0.3490447998046875, 0.033046722412109375, 0.7995071411132812, 0.311676025390625, 0.44544219970703125, 0.7360668182373047, 0.1559906005859375, 0.775665283203125, -0.6347923278808594, -0.03893280029296875, 0.868621826171875, -0.12444305419921875, 0.3151969909667969, 0.15598297119140625, 0.32660865783691406, 0.17665863037109375, 0.1826934814453125, 1.407958984375, 1.177093505859375, -0.0162811279296875, -0.06885528564453125, 0.6005706787109375, 0.7928009033203125, 0.11541748046875, 0.5108280181884766], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000062.npy"}
|
||||
{"epoch": 0.09372637944066516, "step": 63, "batch_size": 64, "mean": 0.32430848479270935, "std": 0.4566391408443451, "min": -0.79766845703125, "p10": -0.23470458984375, "median": 0.33501434326171875, "p90": 0.8165985107421876, "max": 1.65087890625, "pos_frac": 0.765625, "sample": [0.34552764892578125, 0.2961883544921875, -0.6123504638671875, -0.31299591064453125, 0.79034423828125, -0.1473846435546875, 0.04819679260253906, 1.418304443359375, 1.2831268310546875, 0.2885894775390625, -0.3121795654296875, -0.22890472412109375, -0.08584022521972656, -0.79766845703125, 0.4315185546875, 0.277984619140625, -0.01104736328125, -0.039093017578125, 0.41675567626953125, 0.5854854583740234, -0.366058349609375, 0.1872844696044922, 0.39803314208984375, 0.527679443359375, 0.732879638671875, 0.4461803436279297, 0.5387725830078125, -0.23719024658203125, 0.23000335693359375, 0.6548309326171875, 0.25054168701171875, 0.7689170837402344, 0.9105796813964844, 0.4568138122558594, 0.0989532470703125, 0.4096832275390625, 0.4842185974121094, -0.20541763305664062, 0.2087726593017578, 0.3442192077636719, 0.3759956359863281, 1.65087890625, 0.1260051727294922, 0.9559402465820312, 0.42674827575683594, 0.6249332427978516, 0.550018310546875, 0.6418418884277344, 0.328033447265625, 0.3090476989746094, 0.18926429748535156, 0.4707794189453125, 0.1748199462890625, -0.5797042846679688, 0.7444686889648438, 0.287567138671875, 0.827850341796875, 0.22745132446289062, -0.13616180419921875, -0.05296134948730469, 0.5572586059570312, 1.0946044921875, 0.3419952392578125, 0.14481353759765625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000063.npy"}
|
||||
{"epoch": 0.09523809523809523, "step": 64, "batch_size": 64, "mean": 0.10075005888938904, "std": 0.4748300611972809, "min": -1.22589111328125, "p10": -0.4444549560546875, "median": 0.0510406494140625, "p90": 0.6603324890136719, "max": 1.722015380859375, "pos_frac": 0.59375, "sample": [0.36182403564453125, 1.722015380859375, -0.10219383239746094, 0.2082977294921875, 0.1485595703125, 0.0442657470703125, 0.5171775817871094, 0.08155250549316406, 0.03909873962402344, 0.5312576293945312, -0.17215538024902344, -0.0665130615234375, 0.43807220458984375, 0.06707382202148438, 0.6268692016601562, -0.17591094970703125, 0.027191162109375, 0.07861328125, 0.6623458862304688, 0.6842288970947266, 1.145843505859375, 0.3300933837890625, -0.06904983520507812, -0.05189704895019531, -0.00830841064453125, 0.19790267944335938, 0.0578155517578125, -1.22589111328125, 0.101593017578125, 0.4356708526611328, 0.014375686645507812, -0.8143844604492188, -0.41979217529296875, -0.6111183166503906, 0.26775169372558594, -0.249664306640625, 0.18451690673828125, -0.81854248046875, 0.006542205810546875, 0.35198211669921875, 0.6611175537109375, -0.03601837158203125, 0.6585006713867188, 0.3525562286376953, 0.249969482421875, -0.3358306884765625, -0.08363151550292969, -0.245635986328125, -0.26793670654296875, 0.6035003662109375, 0.1267242431640625, -0.05871009826660156, 0.9412689208984375, -0.0974578857421875, -0.15389633178710938, 0.285003662109375, 0.28813934326171875, -0.033935546875, -0.45502471923828125, 0.01071929931640625, -0.7254486083984375, -0.46787261962890625, -0.1389141082763672, 0.8237075805664062], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000064.npy"}
|
||||
{"epoch": 0.09674981103552532, "step": 65, "batch_size": 64, "mean": 0.3682248294353485, "std": 0.5804725885391235, "min": -1.1138153076171875, "p10": -0.32382926940917967, "median": 0.33969879150390625, "p90": 0.947578239440918, "max": 1.709503173828125, "pos_frac": 0.796875, "sample": [0.6625461578369141, 0.01861572265625, 0.3386688232421875, 0.156280517578125, -0.18784141540527344, 0.673797607421875, 0.321502685546875, 1.6725921630859375, 0.1721363067626953, 0.8382644653320312, 0.2686767578125, 0.19803619384765625, 0.8500785827636719, 1.1658172607421875, 0.8594932556152344, 0.1472034454345703, 0.48137664794921875, 0.5646762847900391, 0.1352996826171875, 1.5225982666015625, -0.2782135009765625, 0.36126708984375, -0.78717041015625, 1.45001220703125, 0.9358367919921875, 0.05718231201171875, -1.089935302734375, -0.3325462341308594, -0.34375, 0.04088592529296875, 0.159332275390625, 0.4133453369140625, -1.1138153076171875, -0.3775901794433594, 0.6118087768554688, 0.3164825439453125, 0.42852020263671875, 0.8680419921875, 0.7164840698242188, 0.8670673370361328, 0.7480487823486328, 0.23239898681640625, 1.222076416015625, 0.23837661743164062, 0.4435234069824219, -0.30348968505859375, 0.37543678283691406, 0.36803436279296875, -0.6839084625244141, 0.11116790771484375, 0.5362091064453125, 0.2886505126953125, 0.2069244384765625, 0.9420948028564453, -0.1018829345703125, 0.8877716064453125, 0.9499282836914062, 0.024168014526367188, 0.5449676513671875, 0.8138427734375, 0.340728759765625, -0.005828857421875, 1.709503173828125, -0.085418701171875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000065.npy"}
|
||||
{"epoch": 0.0982615268329554, "step": 66, "batch_size": 64, "mean": 0.23333919048309326, "std": 0.5533595681190491, "min": -0.9761276245117188, "p10": -0.3485113143920898, "median": 0.21069717407226562, "p90": 0.8945709228515626, "max": 2.566253662109375, "pos_frac": 0.703125, "sample": [0.21248245239257812, 0.16827392578125, 0.06939506530761719, 0.33481788635253906, 0.10969924926757812, 0.37657928466796875, 0.3105335235595703, 0.212615966796875, -0.09137153625488281, 0.3138618469238281, 0.24044036865234375, 0.9636764526367188, 0.9060516357421875, 0.3525733947753906, 0.110595703125, -0.7669754028320312, -0.13385009765625, 0.6376571655273438, 1.2288055419921875, 0.8677825927734375, 0.5766983032226562, 0.9371109008789062, -0.1197509765625, 0.08343887329101562, -0.3080291748046875, 0.19672393798828125, 0.39165687561035156, 0.4134712219238281, 0.10640716552734375, 0.49500274658203125, 0.26149749755859375, -0.3582897186279297, 0.031494140625, 0.5504131317138672, 0.29384613037109375, -0.31771087646484375, 0.22162628173828125, -0.12415695190429688, 0.20891189575195312, 0.10412979125976562, -0.45330810546875, -0.9761276245117188, 0.2978668212890625, -0.546295166015625, 0.03917694091796875, -0.4302520751953125, 0.23336410522460938, 0.026241302490234375, 0.7860107421875, 0.07277107238769531, -0.0021495819091796875, 0.28977203369140625, 0.3668098449707031, 2.566253662109375, 1.388916015625, -0.3256950378417969, -0.08428192138671875, -0.600799560546875, 0.83184814453125, 1.40509033203125, 0.38715362548828125, -0.07510566711425781, -0.06224822998046875, -0.269439697265625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000066.npy"}
|
||||
{"epoch": 0.09977324263038549, "step": 67, "batch_size": 64, "mean": 0.22206458449363708, "std": 0.5852482318878174, "min": -0.69488525390625, "p10": -0.421713638305664, "median": 0.06479549407958984, "p90": 0.9352699279785158, "max": 2.4080810546875, "pos_frac": 0.59375, "sample": [-0.01369476318359375, -0.600677490234375, -0.34734344482421875, 0.01447296142578125, -0.102264404296875, 0.28796958923339844, 1.1260929107666016, 0.616943359375, 0.6786651611328125, 0.0090179443359375, 0.0213775634765625, 0.1453418731689453, -0.0286712646484375, -0.45752716064453125, 0.6439971923828125, 0.711212158203125, -0.15816307067871094, 0.8442268371582031, -0.17996978759765625, 0.44344329833984375, 0.07631683349609375, 0.9597396850585938, -0.69488525390625, 1.470703125, -0.152740478515625, -0.593017578125, 0.43070220947265625, 0.034069061279296875, 0.878173828125, -0.3668231964111328, 0.11884307861328125, 0.6552467346191406, -0.2659034729003906, 0.3216667175292969, 0.6530990600585938, 0.11368942260742188, -0.07299423217773438, 0.27065467834472656, -0.4452381134033203, 0.5329437255859375, 2.4080810546875, -0.49886322021484375, 1.084716796875, 0.7647476196289062, -0.0168304443359375, -0.17615890502929688, -0.2718658447265625, 0.3102397918701172, -0.20183563232421875, -0.11673736572265625, 0.2758216857910156, 1.759918212890625, 0.05327415466308594, 0.03713417053222656, 0.30008697509765625, -0.03369140625, -0.26871490478515625, 0.14711761474609375, -0.11057090759277344, -0.493865966796875, 1.1360015869140625, 0.1152191162109375, 0.73236083984375, -0.30214691162109375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000067.npy"}
|
||||
{"epoch": 0.10128495842781557, "step": 68, "batch_size": 64, "mean": 0.27010834217071533, "std": 0.5788272619247437, "min": -1.5687255859375, "p10": -0.40729217529296874, "median": 0.2497711181640625, "p90": 0.9600730895996095, "max": 1.65533447265625, "pos_frac": 0.71875, "sample": [1.65533447265625, 0.29518890380859375, 0.30239295959472656, 0.51171875, 0.43166351318359375, 0.251739501953125, 0.6204757690429688, 0.17670440673828125, 0.38867950439453125, -0.2545623779296875, -0.5178756713867188, -1.5687255859375, 0.9352874755859375, 1.1896591186523438, -0.219970703125, 0.9706954956054688, 0.8981704711914062, 0.1989288330078125, -0.3275299072265625, 0.67742919921875, 0.1248321533203125, -0.45371437072753906, 1.3277587890625, 0.4528350830078125, -0.060352325439453125, 0.3809776306152344, 0.46337127685546875, 0.018610000610351562, 0.1312255859375, 0.7276458740234375, 0.09932136535644531, 0.43975257873535156, 0.3566131591796875, 0.18926239013671875, -0.2727546691894531, 0.21649169921875, 1.4834442138671875, -0.10106658935546875, -0.3359832763671875, 0.1581859588623047, 1.19073486328125, -0.4109039306640625, 0.261962890625, 0.65618896484375, -0.39886474609375, -0.517852783203125, 0.08325958251953125, 0.20932769775390625, -0.479522705078125, 0.190765380859375, 0.9028472900390625, 0.13101959228515625, -0.021240234375, 0.6843109130859375, 0.4720458984375, 1.0414657592773438, -0.12352752685546875, 0.7706146240234375, 0.5974578857421875, 0.7694587707519531, 0.4913444519042969, 0.247802734375, -1.1465988159179688, -0.2770233154296875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000068.npy"}
|
||||
{"epoch": 0.10279667422524566, "step": 69, "batch_size": 64, "mean": 0.20509576797485352, "std": 0.5228826999664307, "min": -0.897552490234375, "p10": -0.4729164123535156, "median": 0.1993694305419922, "p90": 0.84647216796875, "max": 1.8341522216796875, "pos_frac": 0.671875, "sample": [0.22287940979003906, -0.06487274169921875, 0.9025535583496094, -0.306915283203125, 0.9531021118164062, -0.13835906982421875, -0.16160202026367188, 0.1282672882080078, 0.1073150634765625, 0.12237548828125, 0.022272109985351562, 0.08484268188476562, 0.10486602783203125, 0.4290008544921875, 0.6136283874511719, 0.8260726928710938, -0.17556381225585938, 0.37885284423828125, 0.3880157470703125, -0.7230453491210938, 0.8347015380859375, -0.855010986328125, 1.8341522216796875, 0.43224334716796875, 0.64666748046875, 0.07147598266601562, 0.212371826171875, 0.47635650634765625, -0.6528778076171875, 0.6491546630859375, 0.18636703491210938, 1.014984130859375, 0.3262825012207031, -0.47156524658203125, 0.7382164001464844, -0.897552490234375, 0.4854774475097656, -0.1635723114013672, -0.4647064208984375, 0.5765609741210938, 0.42132568359375, 0.08593177795410156, -0.1234588623046875, 0.29245758056640625, 0.12331390380859375, 0.880828857421875, 0.5172653198242188, 0.6518669128417969, -0.12714385986328125, -0.8077392578125, -0.4734954833984375, 0.3504505157470703, 0.08998870849609375, -0.19672393798828125, -0.07461166381835938, -0.7426834106445312, 0.54034423828125, 0.6165351867675781, 0.4196624755859375, 0.4763679504394531, 0.90777587890625, -0.10060882568359375, 0.8515167236328125, -0.14645004272460938], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000069.npy"}
|
||||
{"epoch": 0.10430839002267574, "step": 70, "batch_size": 64, "mean": 0.2062298059463501, "std": 0.5611727833747864, "min": -1.2487297058105469, "p10": -0.45961399078369136, "median": 0.21925735473632812, "p90": 0.8393882751464844, "max": 1.9077911376953125, "pos_frac": 0.625, "sample": [1.1595840454101562, -0.3408222198486328, 0.341217041015625, -0.311370849609375, 0.7385959625244141, -0.31108665466308594, -0.13741683959960938, 0.06627655029296875, 0.5690841674804688, -0.25839805603027344, 0.24799346923828125, -0.6787528991699219, -0.80419921875, 0.8429107666015625, -0.10601043701171875, -0.42453575134277344, -0.00095367431640625, -0.47464752197265625, -0.07621383666992188, 0.22322463989257812, 1.020477294921875, 0.21529006958007812, 0.4886360168457031, 0.3998241424560547, -0.01123046875, 0.2820281982421875, 0.24271774291992188, 0.7914505004882812, -0.7216033935546875, 0.566131591796875, 0.1515178680419922, 0.42645263671875, -0.37885284423828125, 0.6862602233886719, 0.8311691284179688, 0.1254119873046875, 0.93511962890625, 0.818328857421875, 0.4990043640136719, 0.5936107635498047, 0.12941360473632812, 0.1603069305419922, -0.06537628173828125, 0.43119049072265625, -0.1960601806640625, -0.0301055908203125, -1.2487297058105469, 1.16571044921875, 0.49401092529296875, 0.4926643371582031, 1.0861587524414062, 0.7777557373046875, -0.12672805786132812, -0.04083251953125, 0.3396129608154297, 0.35564422607421875, -0.7200469970703125, -0.7031745910644531, 0.7050857543945312, 0.2578849792480469, 0.0907745361328125, -0.2911567687988281, 0.000690460205078125, 1.9077911376953125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000070.npy"}
|
||||
{"epoch": 0.10582010582010581, "step": 71, "batch_size": 64, "mean": 0.1253620833158493, "std": 0.5031548142433167, "min": -1.4314422607421875, "p10": -0.3692207336425781, "median": 0.04889869689941406, "p90": 0.8026252746582031, "max": 1.348968505859375, "pos_frac": 0.578125, "sample": [-0.18858909606933594, 0.5189437866210938, 0.46979522705078125, 0.7976531982421875, -0.07018280029296875, 0.06589508056640625, 0.6386394500732422, -0.3489532470703125, 0.0569000244140625, 1.348968505859375, 0.03744697570800781, 0.055816650390625, 0.4893035888671875, -0.168792724609375, -0.0884552001953125, -0.003070831298828125, -0.25325775146484375, 1.1282882690429688, -0.3114051818847656, 0.264801025390625, -1.4314422607421875, 1.009368896484375, 0.331634521484375, -0.15134811401367188, 1.030466079711914, 0.266937255859375, -0.091644287109375, 0.21801185607910156, 0.9221267700195312, -0.20594024658203125, -0.380035400390625, 0.37646484375, -0.21575927734375, -1.0271949768066406, 0.38626861572265625, 0.3802604675292969, 0.2787952423095703, -0.1047821044921875, 0.6498031616210938, 0.01711273193359375, -0.030853271484375, -0.6349639892578125, 0.8550872802734375, 0.7365875244140625, -0.05902671813964844, 0.011868476867675781, -0.6798095703125, 0.2141876220703125, 0.092742919921875, 0.041980743408203125, 0.025526046752929688, -0.24766921997070312, 0.68212890625, 0.8047561645507812, -0.37790679931640625, -0.41680145263671875, 0.29247283935546875, 0.3228607177734375, -0.066802978515625, 0.11841583251953125, -0.18695068359375, 0.5079479217529297, -0.340606689453125, -0.3408470153808594], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000071.npy"}
|
||||
{"epoch": 0.1073318216175359, "step": 72, "batch_size": 64, "mean": 0.2809233069419861, "std": 0.5843484401702881, "min": -0.9744853973388672, "p10": -0.5147834777832031, "median": 0.29828739166259766, "p90": 0.9514286041259766, "max": 1.8035736083984375, "pos_frac": 0.65625, "sample": [-0.2765312194824219, 0.6814041137695312, 0.28002166748046875, -0.002925872802734375, -0.1481781005859375, 0.7695045471191406, 0.0073490142822265625, 0.9169921875, -0.9744853973388672, 0.25981903076171875, 0.7127304077148438, 1.0680389404296875, 0.4071197509765625, 0.9559707641601562, 0.22355270385742188, -0.16396141052246094, -0.3924999237060547, 0.3742046356201172, 0.051361083984375, -0.22360992431640625, 1.6363067626953125, 0.3601360321044922, 0.7220497131347656, -0.04134941101074219, -0.3101043701171875, 0.04503631591796875, -0.1565704345703125, -0.6928291320800781, -0.3708953857421875, -0.5864486694335938, 1.8035736083984375, 0.6817779541015625, 0.15798187255859375, 1.3113250732421875, -0.5501480102539062, -0.5545806884765625, 0.4723224639892578, 0.1554107666015625, 0.5595893859863281, -0.1605377197265625, 0.7000541687011719, 0.7224845886230469, 0.45954132080078125, -0.178680419921875, 0.4598846435546875, 0.68463134765625, 1.2807769775390625, -0.18309783935546875, -0.06772994995117188, 0.7639427185058594, 0.508392333984375, 0.699371337890625, 1.2336807250976562, -0.5756568908691406, 0.5503768920898438, 0.31655311584472656, 0.5735702514648438, 0.0211944580078125, 0.7440948486328125, 0.32266807556152344, -0.48604583740234375, 0.9408302307128906, 0.0074310302734375, -0.527099609375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000072.npy"}
|
||||
{"epoch": 0.10884353741496598, "step": 73, "batch_size": 64, "mean": 0.09511902928352356, "std": 0.5789110660552979, "min": -1.523712158203125, "p10": -0.49517707824707025, "median": 0.1530923843383789, "p90": 0.8735694885253912, "max": 1.5042190551757812, "pos_frac": 0.5625, "sample": [0.3448200225830078, -0.22845840454101562, -0.4387054443359375, -0.5173683166503906, -0.7754192352294922, 0.24658966064453125, -0.0826416015625, -1.1044158935546875, -0.40185546875, 0.3310699462890625, 0.04107666015625, -0.42626190185546875, 0.11713981628417969, -0.3784637451171875, -0.3696746826171875, -0.1954212188720703, -0.130767822265625, 0.4433097839355469, 0.45357513427734375, -0.22606658935546875, -1.523712158203125, -0.44339752197265625, -0.40053367614746094, 0.6596603393554688, -0.806182861328125, 0.7440872192382812, -0.13642120361328125, -0.24945831298828125, -0.0237579345703125, -0.18088912963867188, 0.14956092834472656, 0.019031524658203125, 0.3961677551269531, -0.1534576416015625, 0.23663330078125, 0.15662384033203125, 0.47423744201660156, 1.3980712890625, -0.9000034332275391, -0.18416595458984375, 0.3247222900390625, 0.9290618896484375, -0.8308258056640625, 0.256072998046875, -0.40155029296875, 0.21282958984375, 0.26103973388671875, 0.4008350372314453, 0.45105743408203125, 0.9873428344726562, -0.22247314453125, 1.0929183959960938, 1.1850299835205078, 0.2860221862792969, 0.39855194091796875, 0.21044921875, 0.5351638793945312, 0.6102981567382812, 0.29546356201171875, 0.29180908203125, 0.5043163299560547, 1.5042190551757812, 0.942230224609375, -0.0711212158203125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000073.npy"}
|
||||
{"epoch": 0.11035525321239607, "step": 74, "batch_size": 64, "mean": 0.32702159881591797, "std": 0.6083571910858154, "min": -0.9503326416015625, "p10": -0.32051925659179686, "median": 0.23325061798095703, "p90": 0.9602462768554688, "max": 2.8952255249023438, "pos_frac": 0.703125, "sample": [0.1530303955078125, -0.43027496337890625, -0.257659912109375, -0.23284149169921875, 0.7150611877441406, -0.05115509033203125, 0.6151199340820312, 0.81494140625, 0.8248138427734375, 1.263031005859375, 0.9801654815673828, 0.9680099487304688, 0.767303466796875, -0.02391815185546875, -0.4414482116699219, 0.6159553527832031, 0.172760009765625, 0.1674041748046875, -0.26779937744140625, 0.9066925048828125, 0.16216659545898438, 0.48291015625, 0.6511878967285156, 0.14127349853515625, 0.16777420043945312, 0.015722274780273438, -0.6367130279541016, -0.28546905517578125, 1.0044593811035156, -0.9503326416015625, 0.23989105224609375, 0.70770263671875, 0.2336292266845703, 0.06466484069824219, -0.4726753234863281, 0.29273223876953125, -0.26874542236328125, 0.5569992065429688, -0.9236526489257812, -0.26774024963378906, 0.7056598663330078, -0.335540771484375, 0.212249755859375, 0.6098098754882812, 0.6034812927246094, 0.887451171875, 2.8952255249023438, -0.11226463317871094, 1.2977066040039062, 0.9421310424804688, -0.2195281982421875, -0.04397010803222656, 0.5489273071289062, 0.14466094970703125, 0.0293121337890625, 0.23287200927734375, 0.23099517822265625, 0.363739013671875, 0.8704681396484375, 0.8310699462890625, 0.8401279449462891, 0.255828857421875, -0.041835784912109375, 1.005828857421875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000074.npy"}
|
||||
{"epoch": 0.11186696900982615, "step": 75, "batch_size": 64, "mean": 0.33626896142959595, "std": 0.6062875390052795, "min": -1.3972129821777344, "p10": -0.28485927581787107, "median": 0.2791471481323242, "p90": 1.112294960021973, "max": 1.68115234375, "pos_frac": 0.703125, "sample": [0.019498825073242188, 0.15921783447265625, 0.3788642883300781, 0.8020553588867188, 1.6065216064453125, -0.14620399475097656, 0.47896575927734375, 0.18799591064453125, 0.6913604736328125, 0.5148735046386719, -0.2255096435546875, -0.10990524291992188, -0.7224369049072266, 1.073699951171875, -0.19802474975585938, 0.7900123596191406, 0.16098785400390625, 0.024976730346679688, 0.19610214233398438, 0.016773223876953125, 1.68115234375, 0.7172622680664062, 1.364532470703125, 1.2420730590820312, 0.2914714813232422, 0.8434715270996094, 0.9836654663085938, -0.5865325927734375, 0.17636489868164062, 0.5243911743164062, -0.03941154479980469, 1.128835678100586, 0.9890899658203125, 0.4286689758300781, 0.8025436401367188, -0.07204627990722656, 1.1389007568359375, 0.26682281494140625, 1.4433212280273438, 0.7840576171875, -0.8728561401367188, 0.23342132568359375, -1.3972129821777344, 0.19626426696777344, 0.516815185546875, -0.035717010498046875, 0.8936729431152344, 0.1098175048828125, -0.19017410278320312, -0.3006572723388672, 0.3009185791015625, 0.8273468017578125, -0.21009063720703125, -0.14887237548828125, 0.293792724609375, -0.682373046875, 0.714599609375, -0.38382530212402344, -0.24799728393554688, 0.6048851013183594, 0.6308212280273438, -0.017343521118164062, 0.8167877197265625, 0.06072998046875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000075.npy"}
|
||||
{"epoch": 0.11337868480725624, "step": 76, "batch_size": 64, "mean": 0.22527173161506653, "std": 0.5991920828819275, "min": -1.2007827758789062, "p10": -0.5468893051147461, "median": 0.1454620361328125, "p90": 0.9305608749389651, "max": 2.0339584350585938, "pos_frac": 0.65625, "sample": [-0.0114898681640625, 1.1601066589355469, -0.56536865234375, 0.07086181640625, 0.13080596923828125, -0.6560726165771484, -0.012037277221679688, -0.000637054443359375, 0.2581768035888672, 1.63885498046875, 1.2113380432128906, 2.0339584350585938, 0.033039093017578125, -0.8327865600585938, 0.3506927490234375, -1.2007827758789062, 0.35672760009765625, 0.5452060699462891, 0.0660552978515625, 0.3207359313964844, 1.2658538818359375, 0.9646453857421875, -0.2921295166015625, 0.80938720703125, 0.7854461669921875, -0.039829254150390625, 0.054775238037109375, 1.5644760131835938, -0.6482982635498047, -0.05055046081542969, 0.41539764404296875, -0.4710350036621094, -0.5201797485351562, 0.246978759765625, -0.1719207763671875, 0.8361434936523438, 0.33972930908203125, -0.4713134765625, -0.040760040283203125, 0.05538177490234375, 0.11799049377441406, -0.6075935363769531, -0.18024826049804688, 0.20509719848632812, -0.17023468017578125, 0.46160316467285156, 0.08241653442382812, -0.5583362579345703, 0.6275405883789062, 0.5315017700195312, -0.050689697265625, -0.09914398193359375, 0.1537017822265625, 0.727203369140625, 0.6726226806640625, 0.8510303497314453, 0.15503311157226562, 0.17969322204589844, 0.19649505615234375, 0.1372222900390625, 0.8383560180664062, 0.2290019989013672, 0.2857818603515625, 0.10176277160644531], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000076.npy"}
|
||||
{"epoch": 0.11489040060468632, "step": 77, "batch_size": 64, "mean": 0.1791057288646698, "std": 0.556281328201294, "min": -1.24810791015625, "p10": -0.4813392639160155, "median": 0.15529251098632812, "p90": 0.8531841278076172, "max": 1.5261154174804688, "pos_frac": 0.65625, "sample": [0.28232765197753906, 0.20977020263671875, -0.06278610229492188, -0.9117355346679688, 0.2770271301269531, 0.81280517578125, 0.8429222106933594, -0.05009651184082031, -0.7656745910644531, 0.1485748291015625, 1.073150634765625, 0.6282272338867188, 0.5126514434814453, -0.2690887451171875, -0.059734344482421875, 0.6625385284423828, 0.439788818359375, 0.5615081787109375, 0.223968505859375, -0.12258148193359375, 0.2447681427001953, 0.021024703979492188, 1.5261154174804688, 0.08748245239257812, 0.0683135986328125, -0.23763275146484375, -0.15850257873535156, 0.02072906494140625, 0.038745880126953125, 0.025798797607421875, -0.954193115234375, -0.9919662475585938, 0.16744613647460938, 0.15659332275390625, 0.5072212219238281, 0.5672607421875, 0.37545013427734375, 0.9341392517089844, -0.06900405883789062, 0.9004745483398438, -0.5822372436523438, -0.38065338134765625, 0.06475067138671875, 0.7257595062255859, 0.661956787109375, 1.2278900146484375, 0.15399169921875, 0.17546844482421875, 1.1270065307617188, 0.0836334228515625, 0.8575820922851562, -0.32830047607421875, 0.15960121154785156, 0.8070468902587891, 0.7378463745117188, -0.0240936279296875, -0.2955436706542969, -0.5244903564453125, -1.24810791015625, -0.37894439697265625, -0.067718505859375, 0.5298080444335938, 0.5732784271240234, -0.256591796875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000077.npy"}
|
||||
{"epoch": 0.1164021164021164, "step": 78, "batch_size": 64, "mean": 0.20233124494552612, "std": 0.6585701107978821, "min": -1.1918487548828125, "p10": -0.8161689758300781, "median": 0.26206398010253906, "p90": 0.950548553466797, "max": 1.7327346801757812, "pos_frac": 0.65625, "sample": [0.5252285003662109, -0.877349853515625, 0.4040241241455078, 0.02508544921875, 0.14612579345703125, 1.6458263397216797, 1.0499267578125, -0.5561065673828125, 0.19660377502441406, -0.6862411499023438, 0.648773193359375, -0.777252197265625, -0.07002449035644531, 0.36370849609375, 0.259246826171875, 0.23751068115234375, -0.9708480834960938, 1.7327346801757812, 0.579010009765625, 0.5202713012695312, -0.06981658935546875, -0.1166229248046875, 1.003692626953125, 0.6244697570800781, -0.2935028076171875, 0.31281280517578125, 0.9226760864257812, -0.9337844848632812, 0.8981552124023438, -0.94317626953125, 0.7018814086914062, 0.962493896484375, 0.4512939453125, 1.6214065551757812, 0.5976333618164062, -1.06768798828125, -1.1918487548828125, 0.0453643798828125, 0.44980812072753906, 0.8063602447509766, 0.5134391784667969, -0.0168304443359375, -0.04955291748046875, -0.336456298828125, -0.46053314208984375, 0.6182022094726562, 0.06674575805664062, -0.005084991455078125, 0.32784271240234375, 0.6010284423828125, -0.2736396789550781, 0.2648811340332031, 0.39646148681640625, 0.471282958984375, 0.031768798828125, 0.032657623291015625, 0.3110065460205078, -0.8328475952148438, 0.8233604431152344, -0.43289947509765625, 1.3515625, 0.13681411743164062, -0.14428329467773438, 0.37641143798828125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000078.npy"}
|
||||
{"epoch": 0.11791383219954649, "step": 79, "batch_size": 64, "mean": 0.3186036944389343, "std": 0.49354472756385803, "min": -0.7719650268554688, "p10": -0.3097450256347656, "median": 0.32312870025634766, "p90": 0.848558044433594, "max": 1.865325927734375, "pos_frac": 0.734375, "sample": [0.30683135986328125, 0.28217315673828125, -0.5696964263916016, 0.5966854095458984, -0.37454986572265625, 0.4471702575683594, -0.7719650268554688, 0.04959678649902344, 0.5067520141601562, -0.10230255126953125, -0.2475566864013672, 0.6590652465820312, 0.6802635192871094, 1.5816764831542969, -0.45467376708984375, -0.041900634765625, 0.5410575866699219, 0.40322113037109375, -0.16217422485351562, 0.356414794921875, -0.04131126403808594, 1.2433624267578125, 0.205780029296875, 0.796295166015625, 0.364471435546875, -0.06938934326171875, 0.8709564208984375, 0.2717437744140625, 0.29749298095703125, 0.27971649169921875, 0.1138153076171875, -0.3363971710205078, 0.35768890380859375, 0.9711952209472656, 0.3149127960205078, 0.35077476501464844, 0.5465164184570312, 0.33797645568847656, -0.039295196533203125, -0.01702117919921875, 0.5772743225097656, 0.3163299560546875, 0.40871429443359375, -0.09116172790527344, -0.6828937530517578, 0.3299274444580078, 0.31041717529296875, 0.6919021606445312, 0.6719551086425781, 0.44427490234375, 0.09307479858398438, 0.2943115234375, -0.054290771484375, -0.5047454833984375, 0.18703842163085938, 1.865325927734375, 0.5324554443359375, 0.35817718505859375, 0.46829986572265625, 1.4303550720214844, 0.5567092895507812, 0.4705924987792969, 1.0156173706054688, 0.1956024169921875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000079.npy"}
|
||||
{"epoch": 0.11942554799697656, "step": 80, "batch_size": 64, "mean": 0.2758333683013916, "std": 0.623872697353363, "min": -1.1351470947265625, "p10": -0.44136543273925777, "median": 0.2764902114868164, "p90": 0.9879215240478516, "max": 2.1653060913085938, "pos_frac": 0.703125, "sample": [0.1636810302734375, 0.9896965026855469, 0.35495758056640625, 0.8087387084960938, 0.8630123138427734, -1.1351470947265625, 0.3055877685546875, 0.29912376403808594, -0.1287841796875, -0.23062896728515625, 1.2318992614746094, -0.23999786376953125, -0.05889129638671875, -0.3870277404785156, 0.08744239807128906, 2.1653060913085938, 0.46057891845703125, 0.4475250244140625, 0.5510616302490234, 0.05274009704589844, 0.21247482299804688, 0.0122528076171875, -0.2519683837890625, 0.681671142578125, 0.6764068603515625, 1.1650657653808594, 0.46318817138671875, -0.46465301513671875, 1.2224578857421875, 1.052734375, 0.46929359436035156, 0.6640396118164062, 0.2507057189941406, 0.48444366455078125, 0.07305145263671875, 0.4918365478515625, -0.5915184020996094, 0.19359588623046875, 0.5763435363769531, -1.056793212890625, -1.03839111328125, -0.3709869384765625, -0.26812744140625, 0.8449516296386719, -0.0362396240234375, 0.1678466796875, 1.88818359375, 0.3912506103515625, 0.7151947021484375, -0.6619777679443359, 0.9837799072265625, 0.3182964324951172, 0.092132568359375, 0.4864959716796875, 0.2538566589355469, -0.17568016052246094, 0.11590576171875, 0.05123138427734375, 0.6559810638427734, 0.3954315185546875, -0.11748123168945312, -0.59759521484375, -0.321990966796875, 0.9557647705078125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000080.npy"}
|
||||
{"epoch": 0.12093726379440665, "step": 81, "batch_size": 64, "mean": 0.36920061707496643, "std": 0.6170979738235474, "min": -0.9394607543945312, "p10": -0.2810348510742187, "median": 0.26395416259765625, "p90": 1.1404781341552734, "max": 2.086709976196289, "pos_frac": 0.71875, "sample": [0.44347381591796875, -0.10893821716308594, 1.0881538391113281, -0.13422393798828125, 1.8751335144042969, 0.13631439208984375, 0.05547142028808594, -0.08443450927734375, 0.10027503967285156, -0.08795166015625, 0.1774444580078125, 0.353729248046875, -0.15252685546875, 0.39684295654296875, 0.07476615905761719, 0.12012481689453125, 0.2250843048095703, 0.07396888732910156, -0.9394607543945312, 0.9237632751464844, 0.26720428466796875, 0.14312362670898438, 0.9254150390625, -0.365264892578125, 1.387664794921875, 0.1453704833984375, 0.3611602783203125, 0.735565185546875, -0.12437820434570312, 0.83636474609375, 0.3147430419921875, 0.8043212890625, 0.5481338500976562, 0.35175514221191406, 0.16485595703125, 1.1923980712890625, -0.5936126708984375, 1.7962646484375, 0.95526123046875, 0.596343994140625, 0.7776870727539062, 0.5510578155517578, 0.47003936767578125, -0.6049308776855469, 1.1490097045898438, -0.115966796875, 1.2952690124511719, -0.473785400390625, 0.98150634765625, 0.5835723876953125, 2.086709976196289, 0.8576202392578125, 0.43968963623046875, -0.9314117431640625, 0.18652725219726562, -0.30017852783203125, -0.021570205688476562, 0.17379379272460938, -0.026988983154296875, 0.5868072509765625, -0.16022682189941406, 1.1205711364746094, 0.26070404052734375, -0.23636627197265625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000081.npy"}
|
||||
{"epoch": 0.12244897959183673, "step": 82, "batch_size": 64, "mean": 0.4032464325428009, "std": 0.46993181109428406, "min": -0.9572601318359375, "p10": -0.09904823303222653, "median": 0.4003314971923828, "p90": 1.027062225341797, "max": 1.481689453125, "pos_frac": 0.84375, "sample": [0.2905845642089844, 0.6815452575683594, 0.5646896362304688, 0.0498199462890625, 0.7579498291015625, 0.3836669921875, 0.36678314208984375, 0.6046600341796875, 0.3179931640625, 0.2751502990722656, 0.7159748077392578, 0.5432357788085938, 0.0124664306640625, 1.2601165771484375, 0.5286026000976562, 1.0179672241210938, 0.50103759765625, 0.3211326599121094, 0.6341323852539062, 0.0278472900390625, 0.04059600830078125, 0.76165771484375, 0.6255149841308594, 0.28534889221191406, -0.37845611572265625, 0.36321258544921875, 0.4024848937988281, 0.051544189453125, 1.059103012084961, 0.7758903503417969, 0.3127784729003906, -0.196624755859375, 1.481689453125, -0.0625, -0.6311492919921875, 0.1309661865234375, 1.067840576171875, -0.6899795532226562, 0.47931671142578125, 0.9443264007568359, 0.5270004272460938, -0.3567771911621094, 0.658050537109375, -0.11471176147460938, 0.8099651336669922, 0.48815155029296875, 0.9691658020019531, 0.4631805419921875, 0.8970108032226562, 1.0309600830078125, -0.9572601318359375, 0.16809654235839844, 0.3379669189453125, 0.47835540771484375, -0.012531280517578125, 0.06335639953613281, -0.05377197265625, 0.49079132080078125, 1.0630741119384766, 0.24498939514160156, 0.325714111328125, 0.3981781005859375, 0.0542144775390625, 1.1556854248046875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000082.npy"}
|
||||
{"epoch": 0.12396069538926682, "step": 83, "batch_size": 64, "mean": 0.40185046195983887, "std": 0.6413590908050537, "min": -1.3135261535644531, "p10": -0.4808507919311523, "median": 0.47087860107421875, "p90": 1.1227079391479493, "max": 2.30242919921875, "pos_frac": 0.71875, "sample": [1.16302490234375, 0.943939208984375, 1.0013313293457031, 0.08643722534179688, 0.8894023895263672, 0.2792510986328125, -0.0550384521484375, -0.04367828369140625, 1.143157958984375, 0.632293701171875, -0.7270584106445312, 0.1980743408203125, -0.1229705810546875, 0.807098388671875, 0.7052230834960938, 0.1411590576171875, -0.4920654296875, 0.9100723266601562, 0.04973030090332031, 0.50335693359375, 1.0530776977539062, 0.8343544006347656, 0.5216064453125, -0.3866138458251953, -0.8094024658203125, -0.5624008178710938, 0.8173370361328125, 0.33942413330078125, 0.27857017517089844, 0.02072906494140625, 0.28692626953125, -0.02666473388671875, 1.5111198425292969, -1.3135261535644531, -0.4546833038330078, 0.41683006286621094, 0.5740947723388672, -0.49591827392578125, 0.48723793029785156, -0.8189239501953125, -0.058506011962890625, 0.6712646484375, 0.9485244750976562, 0.6251239776611328, -0.06916999816894531, 0.4599571228027344, -0.01949310302734375, 1.1048583984375, -0.1390228271484375, 0.8579483032226562, 1.7166519165039062, 0.362701416015625, 0.4897422790527344, 0.4974708557128906, 0.5558547973632812, 0.8438453674316406, 0.7555255889892578, 1.1303577423095703, 1.1957626342773438, 2.30242919921875, 0.4133148193359375, 0.4527740478515625, 0.4818000793457031, -0.1472015380859375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000083.npy"}
|
||||
{"epoch": 0.1254724111866969, "step": 84, "batch_size": 64, "mean": 0.17705348134040833, "std": 0.6008948087692261, "min": -1.401824951171875, "p10": -0.5923675537109374, "median": 0.1739025115966797, "p90": 1.059250831604004, "max": 1.3426990509033203, "pos_frac": 0.671875, "sample": [0.11968994140625, 1.0700912475585938, 0.39642333984375, 0.6688880920410156, 0.3762054443359375, 0.7827320098876953, 0.5151901245117188, -0.3600959777832031, 0.4018840789794922, -0.8738174438476562, -0.6811447143554688, 1.1688880920410156, 0.116455078125, 0.5368270874023438, 0.693939208984375, 0.1085357666015625, -0.3028450012207031, 0.27691650390625, 0.29535675048828125, -0.5221977233886719, 0.043426513671875, 0.3531341552734375, 0.17309951782226562, 0.20070648193359375, 0.4410514831542969, -0.09552001953125, 0.2060089111328125, 0.83258056640625, 0.27918243408203125, -0.490631103515625, 0.8549652099609375, -0.110809326171875, 0.00861358642578125, -0.4712677001953125, -0.44136810302734375, 1.0603294372558594, 1.3426990509033203, 1.0567340850830078, 0.10966110229492188, -0.337066650390625, -1.401824951171875, 1.141571044921875, 0.028583526611328125, 0.2013225555419922, 0.845458984375, 0.20510292053222656, -0.13426589965820312, 1.0906944274902344, 0.14130020141601562, 0.13946533203125, -0.02873992919921875, -1.04522705078125, 0.9614467620849609, 0.163726806640625, 0.17470550537109375, 0.2626495361328125, 0.5370025634765625, -0.08098602294921875, -0.865570068359375, -0.6473617553710938, 1.2919540405273438, -0.325164794921875, -0.6224403381347656, -0.50543212890625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000084.npy"}
|
||||
{"epoch": 0.12698412698412698, "step": 85, "batch_size": 64, "mean": 0.46305617690086365, "std": 0.7041880488395691, "min": -1.3974456787109375, "p10": -0.3931377410888671, "median": 0.43445396423339844, "p90": 1.3578653335571294, "max": 2.1509246826171875, "pos_frac": 0.75, "sample": [1.267923355102539, -0.20266342163085938, 0.35002899169921875, -0.02780914306640625, 0.20197296142578125, 1.0500030517578125, 0.6367950439453125, 0.4245643615722656, -0.30502891540527344, 1.5449638366699219, 1.5592803955078125, -1.3974456787109375, 0.9075164794921875, 2.072784423828125, 0.5528163909912109, 0.9018745422363281, 1.0665206909179688, 0.39878082275390625, 0.2614326477050781, 0.8187103271484375, 0.15042877197265625, 0.6602249145507812, -0.0089111328125, 1.2571830749511719, -0.022708892822265625, 0.101898193359375, 0.47537994384765625, 0.7670097351074219, 1.0957603454589844, -0.0411224365234375, -0.6970558166503906, 1.0006752014160156, -0.6850738525390625, -0.129119873046875, -0.43089866638183594, 0.20455169677734375, 1.7463150024414062, 1.2202644348144531, 0.08855438232421875, 0.18646240234375, -0.7876319885253906, 0.7564773559570312, 1.3964118957519531, 0.5154609680175781, 2.1509246826171875, 1.5686511993408203, 0.5034942626953125, 0.7419147491455078, 0.3370475769042969, 0.5377979278564453, 0.010814666748046875, 0.0211944580078125, 0.381561279296875, 0.7465591430664062, -0.00490570068359375, 0.1868133544921875, -0.27989959716796875, -0.6937026977539062, 0.5685348510742188, -0.7252883911132812, 1.0594253540039062, 0.35430908203125, 0.82244873046875, 0.44434356689453125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000085.npy"}
|
||||
{"epoch": 0.12849584278155707, "step": 86, "batch_size": 64, "mean": 0.35092705488204956, "std": 0.6410995125770569, "min": -0.9628448486328125, "p10": -0.3656448364257812, "median": 0.3145589828491211, "p90": 1.202721405029297, "max": 2.397186279296875, "pos_frac": 0.6875, "sample": [0.9123764038085938, 0.6130390167236328, 0.169952392578125, 0.0904998779296875, 1.258108139038086, 1.0305709838867188, 0.7471923828125, 0.31488800048828125, 0.5739364624023438, -0.4356822967529297, -0.33373260498046875, -0.4996490478515625, -0.221343994140625, 1.8110198974609375, 0.2009735107421875, 0.699981689453125, 0.5765857696533203, -0.5974559783935547, -0.25920867919921875, -0.16716384887695312, 0.363037109375, -0.04207611083984375, 0.09357452392578125, -0.07087135314941406, 0.478851318359375, -0.03078460693359375, 0.51373291015625, 0.3644142150878906, 1.1694869995117188, 0.31422996520996094, 1.2169647216796875, -0.130889892578125, 0.7265472412109375, 1.0100173950195312, 0.33856201171875, -0.0309906005859375, -0.35782623291015625, -0.14633750915527344, 1.1257896423339844, 0.27400970458984375, -0.22452163696289062, 0.47341156005859375, 0.060970306396484375, 0.163787841796875, 0.45470428466796875, 1.0446662902832031, 0.7289676666259766, 0.6408176422119141, 1.2303390502929688, -0.5652446746826172, 1.714630126953125, 0.08209228515625, -0.8019866943359375, 0.05103874206542969, 0.7315292358398438, 2.397186279296875, 0.03565788269042969, -0.36899566650390625, 0.1977081298828125, -0.9628448486328125, 0.4007377624511719, -0.2803668975830078, 1.2193679809570312, 0.3713493347167969], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000086.npy"}
|
||||
{"epoch": 0.13000755857898716, "step": 87, "batch_size": 64, "mean": 0.3619861900806427, "std": 0.7993157505989075, "min": -1.9247283935546875, "p10": -0.585661506652832, "median": 0.4503145217895508, "p90": 1.2831859588623051, "max": 2.31549072265625, "pos_frac": 0.71875, "sample": [0.6730499267578125, 2.31549072265625, 0.1876220703125, 0.09820556640625, 1.5083160400390625, 1.091827392578125, -1.65069580078125, -0.7273941040039062, 0.9531784057617188, 1.0961952209472656, -0.963409423828125, -0.2229442596435547, 0.1603679656982422, -0.5889110565185547, -0.7890625, 0.96844482421875, 0.4445037841796875, 1.0924339294433594, 0.9167652130126953, 1.3556747436523438, -0.5780792236328125, 1.9240226745605469, 0.7564220428466797, 0.4073486328125, 0.6458282470703125, -1.9247283935546875, -0.5365180969238281, 0.15302467346191406, 1.4550399780273438, -0.5139808654785156, 0.10047149658203125, -0.3076324462890625, 1.3252830505371094, -0.3935966491699219, 0.18853759765625, -0.39606475830078125, 0.7075271606445312, 0.45612525939941406, 0.5428333282470703, 0.9970760345458984, 0.4901542663574219, 0.71820068359375, 0.777587890625, -0.8557853698730469, 0.1416015625, -0.3763084411621094, 0.7766246795654297, 0.6892929077148438, 0.5422859191894531, 0.2346343994140625, 0.9983444213867188, 0.4060211181640625, 0.9904022216796875, -0.33754730224609375, 0.5346603393554688, 0.2327728271484375, 0.14037513732910156, -0.2992706298828125, 0.325347900390625, 0.48542022705078125, 1.5888671875, 1.0542221069335938, -0.204345703125, 1.1849594116210938], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000087.npy"}
|
||||
{"epoch": 0.13151927437641722, "step": 88, "batch_size": 64, "mean": 0.36120113730430603, "std": 0.7292470932006836, "min": -1.3792877197265625, "p10": -0.6214370727539062, "median": 0.4049530029296875, "p90": 1.238732147216797, "max": 1.8430023193359375, "pos_frac": 0.6875, "sample": [0.5251312255859375, 0.416778564453125, -0.2319812774658203, 1.365478515625, 1.530731201171875, 0.2161407470703125, 1.3656158447265625, -0.18300247192382812, 1.0340805053710938, 0.19111061096191406, 1.2302513122558594, 0.7426834106445312, 0.8365707397460938, 0.5722713470458984, -1.0759735107421875, 0.8658065795898438, 0.7732391357421875, -0.23836517333984375, -1.3792877197265625, 1.2307586669921875, 1.1752204895019531, 0.2332916259765625, 0.032825469970703125, -0.42523956298828125, -0.9268264770507812, 1.1807403564453125, 1.4769668579101562, 1.8430023193359375, 0.2890281677246094, 0.14161300659179688, -0.7986984252929688, -0.01697540283203125, 0.249755859375, 0.9420013427734375, 1.3946609497070312, -0.604827880859375, 0.7275924682617188, 0.6000633239746094, 1.1188087463378906, 0.022844314575195312, 0.3135356903076172, 0.5434036254882812, 1.1658477783203125, 0.4215373992919922, -0.53997802734375, 0.42913818359375, -0.4148216247558594, -0.3910255432128906, 1.2421493530273438, 0.8407039642333984, 0.9178524017333984, 0.23679351806640625, 0.7657928466796875, -0.681793212890625, -0.6285552978515625, -0.2733020782470703, 0.41432952880859375, 0.13919830322265625, 0.39557647705078125, 1.1868915557861328, -0.08568572998046875, -0.3571929931640625, -0.16824722290039062, -0.7991619110107422], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000088.npy"}
|
||||
{"epoch": 0.1330309901738473, "step": 89, "batch_size": 64, "mean": 0.37503573298454285, "std": 0.7525627017021179, "min": -2.351806640625, "p10": -0.44390411376953126, "median": 0.3787384033203125, "p90": 1.307742500305176, "max": 2.0356216430664062, "pos_frac": 0.78125, "sample": [0.4090099334716797, 0.4380226135253906, 0.3164405822753906, 0.09889984130859375, 1.4661407470703125, 1.6792335510253906, -0.11536407470703125, 0.517578125, 0.031185150146484375, 1.3141632080078125, 1.112945556640625, 0.3152008056640625, -0.06634521484375, 1.6593780517578125, 0.05231475830078125, 0.41652488708496094, 0.2480621337890625, 0.466278076171875, 0.33451080322265625, -1.288461685180664, -0.13899993896484375, -0.05411529541015625, 0.6426315307617188, 0.847686767578125, 0.42797088623046875, 1.654296875, 0.7986068725585938, 0.22970962524414062, -0.8179645538330078, 0.7467193603515625, 0.22786712646484375, 0.6362762451171875, 0.4425163269042969, 0.23632049560546875, 1.0821990966796875, 0.08960914611816406, 2.0356216430664062, 0.8850631713867188, -2.351806640625, -0.2891387939453125, 1.0305824279785156, 1.1247940063476562, 1.4300918579101562, 1.2927608489990234, 0.22826004028320312, 0.18651580810546875, 0.6905708312988281, 1.0853424072265625, 0.9639549255371094, -0.44793701171875, -0.6273345947265625, 0.3643684387207031, 0.5662689208984375, 0.21431732177734375, 0.5706958770751953, -1.3698348999023438, 0.2077198028564453, 0.3931083679199219, -0.48517417907714844, -0.4344940185546875, 0.1548614501953125, -0.39640045166015625, 0.4929924011230469, 0.029468536376953125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000089.npy"}
|
||||
{"epoch": 0.1345427059712774, "step": 90, "batch_size": 64, "mean": 0.317741334438324, "std": 0.7145399451255798, "min": -1.7015380859375, "p10": -0.5186225891113281, "median": 0.28462791442871094, "p90": 1.1063751220703129, "max": 2.7379150390625, "pos_frac": 0.65625, "sample": [-0.8050899505615234, 1.0024185180664062, -0.3144645690917969, 0.6518974304199219, 0.7745323181152344, 0.16498947143554688, 1.1997203826904297, 0.882965087890625, 0.70318603515625, -1.7015380859375, 1.2125816345214844, 0.3436088562011719, 0.49503326416015625, 0.7631034851074219, 0.579986572265625, -0.1731719970703125, 0.07157135009765625, 0.3305511474609375, 0.03363800048828125, 0.6516609191894531, 1.3605880737304688, 1.016632080078125, 1.1884307861328125, 2.099071502685547, 0.92364501953125, 0.18885231018066406, -0.5903701782226562, 2.7379150390625, 0.9070587158203125, 0.9278030395507812, 0.389801025390625, 0.2226409912109375, -0.08037567138671875, 0.423736572265625, 0.0505523681640625, -0.47431182861328125, -0.3754119873046875, -0.5376129150390625, 0.458587646484375, 0.5066986083984375, -0.29682159423828125, -0.2816162109375, 1.14483642578125, 0.569366455078125, 0.6265029907226562, 0.283599853515625, -0.14940261840820312, 0.9747753143310547, -0.09155654907226562, -0.24003219604492188, -0.07545089721679688, 0.8540611267089844, 0.175445556640625, 0.05202484130859375, -0.0023288726806640625, 0.2856559753417969, -0.3457183837890625, -0.7586250305175781, -0.05263519287109375, 0.20928192138671875, -0.6128921508789062, -0.751708984375, -0.3043231964111328, 0.911895751953125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000090.npy"}
|
||||
{"epoch": 0.1360544217687075, "step": 91, "batch_size": 64, "mean": 0.5263521671295166, "std": 0.7802544832229614, "min": -1.3666839599609375, "p10": -0.26419429779052733, "median": 0.5110292434692383, "p90": 1.6738834381103516, "max": 2.7325897216796875, "pos_frac": 0.78125, "sample": [0.768280029296875, 0.5249404907226562, 1.0285396575927734, -0.26656150817871094, 0.147674560546875, 1.3583831787109375, -0.2586708068847656, -0.24221038818359375, 1.9419784545898438, 2.7325897216796875, 0.08929443359375, -0.20503997802734375, 0.7457942962646484, 0.3875446319580078, 0.25987815856933594, -0.40297508239746094, 1.7035369873046875, 0.7666587829589844, 0.2833442687988281, -0.5958251953125, 1.31591796875, 2.2093124389648438, 0.07794952392578125, 0.5098857879638672, 1.8464126586914062, 0.8764915466308594, 0.5610218048095703, 0.1336193084716797, 1.2031784057617188, 0.5330352783203125, 0.581146240234375, 0.17622756958007812, -0.1727447509765625, 0.054889678955078125, 0.8806629180908203, 0.5424385070800781, 0.261810302734375, 0.6097564697265625, 0.13782310485839844, -1.3666839599609375, -0.08370208740234375, 0.196380615234375, 0.0928192138671875, -1.2725906372070312, 0.16595840454101562, 1.0688514709472656, -0.3379783630371094, 1.8098869323730469, 0.7049674987792969, 1.1159210205078125, 0.7610359191894531, -0.38646888732910156, 0.03054046630859375, 0.06413745880126953, 1.6653785705566406, 1.5889129638671875, -0.2370758056640625, -0.017374038696289062, 1.6775283813476562, 1.2612533569335938, 0.5213470458984375, 0.3125934600830078, 0.7327384948730469, 0.5121726989746094], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000091.npy"}
|
||||
{"epoch": 0.13756613756613756, "step": 92, "batch_size": 64, "mean": 0.34143590927124023, "std": 0.7637795209884644, "min": -1.4445209503173828, "p10": -0.7376258850097656, "median": 0.40921783447265625, "p90": 1.3364238739013674, "max": 1.9528694152832031, "pos_frac": 0.65625, "sample": [0.43408203125, -0.04048728942871094, -0.35248565673828125, -0.11054611206054688, 0.1152191162109375, 0.7724952697753906, 0.07866668701171875, 0.58221435546875, -0.23232269287109375, -0.2690448760986328, -0.45722198486328125, 0.8715095520019531, 1.0085639953613281, -1.0604324340820312, 1.9272651672363281, 1.5123348236083984, 0.8042755126953125, -0.40683937072753906, 1.038726806640625, -0.857177734375, 0.17919921875, -0.32640838623046875, 1.0341911315917969, 0.5870437622070312, 1.03839111328125, -0.055316925048828125, 0.68402099609375, 0.4616241455078125, 0.1394805908203125, 0.3843536376953125, -0.7690010070800781, 1.452423095703125, 0.19403076171875, -0.15828704833984375, 0.510284423828125, -0.6644172668457031, 0.0558013916015625, 0.3327484130859375, -0.9002685546875, -0.59466552734375, 1.0258407592773438, 0.3650550842285156, 0.6491165161132812, 0.8126983642578125, 0.8906879425048828, -0.6138725280761719, 1.3043136596679688, 0.8717193603515625, 1.9528694152832031, -0.8654327392578125, -0.8542366027832031, 1.7751922607421875, 0.5605659484863281, -0.017721176147460938, -0.019134521484375, -1.4445209503173828, 0.8515167236328125, 0.45092010498046875, 0.643310546875, 1.380615234375, 1.3501853942871094, 0.10817146301269531, 1.0895233154296875, 0.6404914855957031], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000092.npy"}
|
||||
{"epoch": 0.13907785336356765, "step": 93, "batch_size": 64, "mean": 0.35534799098968506, "std": 0.7666413187980652, "min": -1.762481689453125, "p10": -0.45601463317871094, "median": 0.347320556640625, "p90": 1.1722122192382816, "max": 3.2115707397460938, "pos_frac": 0.671875, "sample": [0.5541706085205078, 0.7014541625976562, 0.82269287109375, 0.3930931091308594, -1.762481689453125, -0.01665496826171875, 0.7356338500976562, 0.295013427734375, -0.21416473388671875, 1.0795211791992188, 1.5778675079345703, 1.3129100799560547, 3.2115707397460938, -1.1901969909667969, 0.2923603057861328, 0.5797214508056641, 0.6562271118164062, 0.32222747802734375, 0.09062767028808594, -0.19283294677734375, 0.40795135498046875, -0.060302734375, -0.0514984130859375, -0.13175201416015625, 0.2573223114013672, -0.162261962890625, 1.2119369506835938, -0.11138153076171875, -0.44220924377441406, -0.27906036376953125, 0.588470458984375, -0.06118583679199219, 0.7918987274169922, 1.0326385498046875, -1.4993667602539062, 0.619537353515625, -0.7348251342773438, 0.4141807556152344, 0.529266357421875, 0.278533935546875, 0.8774452209472656, 0.204559326171875, 0.5297451019287109, 1.0043087005615234, 0.2254791259765625, 0.6889572143554688, 0.3932018280029297, 0.18935775756835938, 1.6197509765625, 0.1780986785888672, 1.8610172271728516, 0.5865287780761719, 1.0579757690429688, 0.7348785400390625, -0.000335693359375, -0.4619312286376953, 1.2932548522949219, -0.47922515869140625, 0.37241363525390625, -0.04953765869140625, 0.7420005798339844, -0.734619140625, -0.19778823852539062, 0.2600822448730469], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000093.npy"}
|
||||
{"epoch": 0.14058956916099774, "step": 94, "batch_size": 64, "mean": 0.4076548218727112, "std": 0.917546272277832, "min": -2.9432849884033203, "p10": -0.5364421844482422, "median": 0.4733734130859375, "p90": 1.433643913269043, "max": 4.058441162109375, "pos_frac": 0.703125, "sample": [1.4237060546875, 0.53314208984375, 0.7990341186523438, 1.6853561401367188, 0.53173828125, -1.2722320556640625, -0.2813873291015625, 0.46178627014160156, 0.4666290283203125, 1.429840087890625, -0.0014247894287109375, 1.9595756530761719, 0.464752197265625, -2.9432849884033203, 0.21024703979492188, 0.4193077087402344, -0.2530479431152344, 0.2734699249267578, 0.4801177978515625, 0.280914306640625, 1.459646224975586, 1.2518844604492188, 0.7021427154541016, 0.6188125610351562, 0.9859771728515625, 0.2981758117675781, 0.5629844665527344, -0.5440330505371094, -0.7457466125488281, 0.566009521484375, 0.6915264129638672, -0.4392223358154297, 0.0457611083984375, -0.0564117431640625, 0.5982742309570312, 0.4882316589355469, 0.3182373046875, 0.7866973876953125, 0.6336708068847656, 4.058441162109375, 1.4352741241455078, 0.8009719848632812, -0.3329200744628906, 0.020111083984375, 1.6110076904296875, 0.7461090087890625, -0.5187301635742188, 1.064422607421875, -0.4404144287109375, 0.8128299713134766, 1.6058349609375, 0.4452991485595703, -0.031280517578125, -0.4127235412597656, 0.65252685546875, 0.11568260192871094, -0.7535018920898438, 0.7512931823730469, -0.660980224609375, 0.7897891998291016, 0.6207714080810547, -0.3667144775390625, -0.7541046142578125, -0.05994415283203125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000094.npy"}
|
||||
{"epoch": 0.1421012849584278, "step": 95, "batch_size": 64, "mean": 0.5614703297615051, "std": 0.7733269929885864, "min": -1.5735321044921875, "p10": -0.3399972915649414, "median": 0.4679136276245117, "p90": 1.5057022094726562, "max": 2.5652618408203125, "pos_frac": 0.78125, "sample": [1.8292388916015625, -0.9914474487304688, 0.8457679748535156, 0.0222625732421875, -0.31571197509765625, 0.32549476623535156, 0.9458465576171875, 0.24407386779785156, 1.2050628662109375, 0.759918212890625, 0.3384666442871094, 0.6388931274414062, 1.936981201171875, 0.3164253234863281, -0.9433975219726562, 0.19717025756835938, 1.1655426025390625, 0.3969879150390625, 0.3987407684326172, 1.0079689025878906, 0.7655181884765625, 1.7100906372070312, -0.3712654113769531, 1.1476325988769531, -0.2236785888671875, -0.33573150634765625, 2.138458251953125, 0.759307861328125, 0.5431594848632812, -1.5735321044921875, 0.47348785400390625, 0.42491912841796875, 0.23266983032226562, 1.1129302978515625, -0.3418254852294922, 0.452301025390625, 1.5071945190429688, 0.5719070434570312, -0.0136871337890625, 0.296295166015625, -0.6349945068359375, 0.3240814208984375, 0.43256378173828125, 0.49479103088378906, -0.0050811767578125, 0.6474761962890625, 1.341644287109375, -0.013502120971679688, 0.6836280822753906, 1.8524398803710938, 1.3025436401367188, 1.5022201538085938, -0.6697120666503906, 0.22156524658203125, 1.1098709106445312, 2.5652618408203125, 0.4623394012451172, 0.42807769775390625, 1.0364322662353516, 0.8307723999023438, 0.08671188354492188, 1.2562313079833984, 1.1781463623046875, -0.09984397888183594], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000095.npy"}
|
||||
{"epoch": 0.1436130007558579, "step": 96, "batch_size": 64, "mean": 0.518730878829956, "std": 0.7453513741493225, "min": -1.3573532104492188, "p10": -0.3501384735107422, "median": 0.4815397262573242, "p90": 1.4840774536132815, "max": 2.0314483642578125, "pos_frac": 0.734375, "sample": [0.8568325042724609, -1.1919708251953125, 0.978607177734375, 0.218597412109375, 0.7308197021484375, 0.14251708984375, 1.0117950439453125, -0.5221710205078125, 0.6504058837890625, 0.2492523193359375, 0.19580841064453125, 0.07080078125, 1.1951522827148438, 1.745391845703125, 0.7607154846191406, 1.5188713073730469, -0.4224853515625, 0.5178413391113281, 0.4452381134033203, -0.12594985961914062, 1.3954505920410156, 0.9853134155273438, 1.1729049682617188, -1.3573532104492188, 0.6180496215820312, 0.2539081573486328, 1.4250602722167969, 0.3893585205078125, 1.163604736328125, 0.22011566162109375, 1.1058998107910156, 1.074005126953125, 1.822418212890625, 1.138946533203125, -0.2459583282470703, 0.2956390380859375, -1.008331298828125, 0.19248580932617188, 0.3501091003417969, 0.7118949890136719, 1.5015106201171875, -0.4216899871826172, 0.9733200073242188, -0.26866912841796875, 2.0314483642578125, 0.1999053955078125, 1.5224952697753906, -0.27587890625, 1.4996414184570312, -0.3580474853515625, 1.0747337341308594, -0.052764892578125, 0.3247871398925781, 1.0495452880859375, 1.385498046875, 0.755523681640625, -0.08544349670410156, 0.17463302612304688, -0.09576988220214844, 1.4477615356445312, 0.8030242919921875, -0.24218368530273438, -0.14250946044921875, -0.3316841125488281], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000096.npy"}
|
||||
{"epoch": 0.14512471655328799, "step": 97, "batch_size": 64, "mean": 0.38704511523246765, "std": 0.8727914094924927, "min": -1.0009536743164062, "p10": -0.41055545806884763, "median": 0.21381759643554688, "p90": 1.6041313171386726, "max": 4.083282470703125, "pos_frac": 0.625, "sample": [-0.25848388671875, 0.30713653564453125, -0.6763496398925781, 2.3201370239257812, 0.8855915069580078, 1.9675140380859375, 0.000499725341796875, -0.6953811645507812, 0.06128692626953125, -0.914398193359375, -0.5892467498779297, 0.120361328125, 0.37066650390625, 0.5184803009033203, -0.36594390869140625, -1.0009536743164062, -0.05632209777832031, 1.0001049041748047, 0.8537330627441406, 0.1387462615966797, 0.5657196044921875, 0.5519981384277344, -0.07366561889648438, -0.17176055908203125, 0.025909423828125, -0.4232177734375, -0.02916717529296875, 0.4108734130859375, 0.9939842224121094, -0.07285308837890625, 0.420135498046875, -0.244842529296875, 1.4035873413085938, 0.25920677185058594, 0.6193675994873047, 0.3985137939453125, 1.8880577087402344, -0.28118896484375, -0.3810100555419922, 1.80889892578125, 0.28823089599609375, -0.2631568908691406, 4.083282470703125, 0.618621826171875, -0.23326873779296875, -0.5405063629150391, 0.25807762145996094, 0.17839813232421875, 2.0379638671875, -0.024993896484375, -0.19106483459472656, 0.04447174072265625, 0.9767608642578125, -0.17937850952148438, 1.1837539672851562, 1.6900787353515625, 1.2037296295166016, 0.48134613037109375, 0.249237060546875, 1.1350326538085938, 0.6564292907714844, 0.1632537841796875, -0.3268928527832031, -0.37424468994140625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000097.npy"}
|
||||
{"epoch": 0.14663643235071808, "step": 98, "batch_size": 64, "mean": 0.3075885772705078, "std": 0.714857816696167, "min": -1.0642051696777344, "p10": -0.6443265914916992, "median": 0.3586597442626953, "p90": 1.3380142211914063, "max": 1.8738059997558594, "pos_frac": 0.671875, "sample": [-0.25901031494140625, 0.5441913604736328, 0.6618480682373047, 0.8215866088867188, 0.22074127197265625, 0.2282257080078125, 0.4722900390625, 0.18042755126953125, 0.5152587890625, 1.09490966796875, -0.97869873046875, -0.23881149291992188, 0.7665252685546875, -0.468017578125, 1.4867935180664062, 0.46819305419921875, 0.5783576965332031, 1.8738059997558594, 0.4226875305175781, -0.0640869140625, 0.3875389099121094, 1.5989532470703125, -0.529266357421875, 0.628173828125, 0.09411239624023438, -0.11235809326171875, 0.1952362060546875, 0.05960845947265625, -0.5723171234130859, -0.9953346252441406, 0.5647125244140625, 1.3280487060546875, 0.375244140625, 0.901123046875, -0.6557197570800781, 0.313873291015625, 0.7406959533691406, 1.5817794799804688, -0.019098281860351562, 0.4327964782714844, 0.7048110961914062, -0.21207427978515625, -0.7381362915039062, 0.7321834564208984, -1.0125579833984375, -0.6177425384521484, 1.1059341430664062, 0.010519027709960938, -0.06292724609375, 1.4334030151367188, 0.3420753479003906, 1.8115386962890625, -0.5020561218261719, -0.8470859527587891, 0.930755615234375, 0.5341110229492188, 0.3072700500488281, -0.26377105712890625, 0.18230819702148438, 1.34228515625, -0.17278671264648438, 0.6590957641601562, 0.4377021789550781, -1.0642051696777344], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000098.npy"}
|
||||
{"epoch": 0.14814814814814814, "step": 99, "batch_size": 64, "mean": 0.43467745184898376, "std": 0.7978219985961914, "min": -1.2728271484375, "p10": -0.5818153381347656, "median": 0.3772773742675781, "p90": 1.3718645095825197, "max": 2.7163467407226562, "pos_frac": 0.71875, "sample": [1.7797012329101562, 2.254241943359375, 0.7055130004882812, 1.8797225952148438, 0.1764678955078125, 0.23281097412109375, -0.03256988525390625, -0.2916240692138672, 0.118194580078125, 2.7163467407226562, 0.375213623046875, 1.3965225219726562, 1.3143291473388672, 1.1816139221191406, 0.23746299743652344, 0.16725921630859375, -0.16110992431640625, 0.2784862518310547, 0.2864265441894531, 1.0175399780273438, 0.7696456909179688, 0.13854598999023438, -0.12384033203125, 0.02799224853515625, -0.5997161865234375, 0.37934112548828125, 0.632293701171875, -0.7191925048828125, -0.31790924072265625, 1.3966064453125, 0.40990447998046875, 0.65802001953125, 0.2734565734863281, -0.06122589111328125, 0.5940933227539062, 0.5717926025390625, 0.9438552856445312, -1.2441959381103516, 0.4488983154296875, 0.9397850036621094, 0.5328540802001953, -0.14678955078125, 1.2973823547363281, 2.249908447265625, -0.7047996520996094, 1.2190475463867188, 1.2391281127929688, -0.20129966735839844, -1.2728271484375, 0.9149093627929688, 1.0686302185058594, 0.39499664306640625, 0.5518341064453125, 0.5177879333496094, -0.7396087646484375, -0.06640625, -0.5400466918945312, 0.7052841186523438, 0.2670860290527344, -0.33130645751953125, 0.5010223388671875, 0.23495101928710938, -0.82318115234375, 0.20009994506835938], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000099.npy"}
|
||||
{"epoch": 0.14965986394557823, "step": 100, "batch_size": 64, "mean": 0.3342531621456146, "std": 0.9665762186050415, "min": -2.4451446533203125, "p10": -0.6622316360473632, "median": 0.24811744689941406, "p90": 1.610251617431641, "max": 2.945037841796875, "pos_frac": 0.625, "sample": [-0.180999755859375, 0.32407379150390625, -0.47129058837890625, 1.3813629150390625, 1.8694915771484375, -0.3944587707519531, 0.30531883239746094, 1.2147102355957031, 1.6551971435546875, 0.60198974609375, 0.6193389892578125, 0.15847015380859375, 0.9835662841796875, 0.37847137451171875, 0.1863555908203125, 2.7560806274414062, -0.793487548828125, -0.12502670288085938, 0.2788105010986328, 0.1797332763671875, 0.5559482574462891, 0.26421356201171875, -0.750152587890625, -0.23887062072753906, 0.1468353271484375, 0.6871261596679688, 0.11318397521972656, 0.29987335205078125, 0.09021759033203125, -0.3089599609375, -0.3575439453125, -1.0576934814453125, -0.8001937866210938, -0.6677322387695312, -0.6493968963623047, -0.3280220031738281, -0.3675537109375, -0.021514892578125, -1.3628463745117188, 2.1344165802001953, -0.6461944580078125, 1.418060302734375, -0.34740447998046875, 0.3946723937988281, 0.24962997436523438, 1.5053787231445312, 1.2689208984375, -0.0849609375, 0.21293067932128906, -0.5438919067382812, 2.0150146484375, 0.8748321533203125, 0.4545745849609375, 0.24660491943359375, 0.5250396728515625, 0.8340969085693359, -0.4398040771484375, 2.167572021484375, 1.05914306640625, 0.8932647705078125, 2.945037841796875, -0.3207855224609375, -2.4451446533203125, 0.8465728759765625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000100.npy"}
|
||||
{"epoch": 0.15117157974300832, "step": 101, "batch_size": 64, "mean": 0.6107259392738342, "std": 0.7648286819458008, "min": -0.8511104583740234, "p10": -0.34430465698242185, "median": 0.5603599548339844, "p90": 1.5544181823730472, "max": 3.0084762573242188, "pos_frac": 0.796875, "sample": [0.4549369812011719, 0.3833122253417969, 0.1105194091796875, -0.29682159423828125, 1.4914169311523438, 1.2451248168945312, 0.8412857055664062, 0.4583263397216797, 0.021392822265625, 0.8935089111328125, 0.2558269500732422, 0.7865676879882812, -0.47906494140625, -0.1342010498046875, 1.6461753845214844, -0.0524749755859375, 0.7537994384765625, 0.6411266326904297, -0.12473106384277344, -0.364654541015625, 1.5759124755859375, 0.1534881591796875, 0.6758880615234375, 0.12955856323242188, 0.22491455078125, 0.9898834228515625, 1.1701278686523438, 2.1763763427734375, 0.38902854919433594, 1.5042648315429688, 1.2930412292480469, 1.22491455078125, 1.4745960235595703, 0.09419059753417969, -0.755950927734375, -0.8511104583740234, 0.3626861572265625, -0.1348419189453125, -0.09783172607421875, -0.3880043029785156, -0.5691757202148438, 0.7733154296875, 0.8119354248046875, 0.7160415649414062, 1.658935546875, 0.7004318237304688, 0.51556396484375, 0.2142486572265625, 0.07110404968261719, 0.38311195373535156, 2.1727676391601562, 0.2647819519042969, 1.5766773223876953, 1.2421607971191406, 0.093719482421875, 0.0250396728515625, 1.4901657104492188, 0.6051559448242188, -0.6127815246582031, 1.0025558471679688, 1.1280975341796875, 1.1631755828857422, 3.0084762573242188, 0.9084510803222656], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000101.npy"}
|
||||
{"epoch": 0.15268329554043839, "step": 102, "batch_size": 64, "mean": 0.33820363879203796, "std": 0.8395882844924927, "min": -1.5554885864257812, "p10": -0.8162162780761718, "median": 0.32030773162841797, "p90": 1.3653255462646487, "max": 2.67645263671875, "pos_frac": 0.671875, "sample": [0.5221939086914062, -0.097259521484375, 0.21601104736328125, 1.0875511169433594, -1.2697982788085938, 0.29578590393066406, 0.17432403564453125, 0.4321327209472656, -0.9716796875, -0.08489608764648438, 1.4682083129882812, 0.666656494140625, 0.48923492431640625, 0.2102832794189453, 0.626251220703125, -0.27014923095703125, -0.7396163940429688, 2.591339111328125, -1.0091476440429688, 0.3338775634765625, -0.9295902252197266, 0.380462646484375, 0.0159454345703125, 0.17397308349609375, 1.029977798461914, 0.8187465667724609, 0.980682373046875, 0.7996788024902344, -0.8490447998046875, 0.30673789978027344, 1.3093109130859375, 1.433349609375, -0.727813720703125, 1.0599021911621094, 0.27037811279296875, -0.05785369873046875, 0.5744361877441406, 2.0624771118164062, 0.4478912353515625, -0.06381607055664062, 0.05800628662109375, 1.486663818359375, -0.9056777954101562, 0.5848007202148438, -0.27439117431640625, 0.19416046142578125, 0.238677978515625, -0.20196533203125, 0.837249755859375, 0.3954429626464844, 0.7954864501953125, 1.022186279296875, 0.5234489440917969, -0.5131797790527344, -0.15981674194335938, -0.39453887939453125, -0.03200531005859375, 2.67645263671875, -0.5538406372070312, -1.5554885864257812, 1.3893318176269531, 0.58843994140625, 1.2703857421875, 0.46806907653808594], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000102.npy"}
|
||||
{"epoch": 0.15419501133786848, "step": 103, "batch_size": 64, "mean": 0.45145532488822937, "std": 0.8584631681442261, "min": -1.741302490234375, "p10": -0.5290233612060546, "median": 0.20259952545166016, "p90": 1.5246952056884768, "max": 2.9305877685546875, "pos_frac": 0.703125, "sample": [-0.2742195129394531, 0.9478340148925781, 1.17401123046875, -0.8352584838867188, 1.1099281311035156, 0.1339874267578125, 0.7342529296875, 0.09105682373046875, -0.043060302734375, 0.79815673828125, -0.15763092041015625, 0.6642074584960938, -0.74603271484375, 0.0144805908203125, -0.27802276611328125, 1.0675392150878906, 1.1820049285888672, 1.2442359924316406, -1.741302490234375, 1.7429466247558594, 0.00229644775390625, -0.18264389038085938, 0.19144630432128906, 0.2945213317871094, 1.17999267578125, -0.5781631469726562, -0.10625839233398438, 1.7654876708984375, -0.5456619262695312, 0.5430641174316406, 0.44245147705078125, 2.0632705688476562, 1.852783203125, 0.12042427062988281, -0.4902000427246094, 0.7447967529296875, -0.7754440307617188, 0.441162109375, -0.00762939453125, 0.1837139129638672, 1.5540618896484375, 0.12382888793945312, 0.5927085876464844, 0.21375274658203125, 1.0708656311035156, -0.373748779296875, 1.4143257141113281, 0.11826705932617188, 1.4561729431152344, -0.4383068084716797, 2.77044677734375, 0.944793701171875, -0.6602325439453125, 0.0468902587890625, 0.8751983642578125, -0.093109130859375, 0.15964508056640625, 0.112274169921875, -0.13721084594726562, 0.1857147216796875, 0.735626220703125, 0.7683525085449219, 2.9305877685546875, 0.5537109375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000103.npy"}
|
||||
{"epoch": 0.15570672713529857, "step": 104, "batch_size": 64, "mean": 0.3594653904438019, "std": 0.8665570020675659, "min": -1.4208831787109375, "p10": -0.6392829895019531, "median": 0.2921113967895508, "p90": 1.5698047637939463, "max": 3.015350341796875, "pos_frac": 0.65625, "sample": [-0.109100341796875, 0.707611083984375, 0.8781394958496094, 0.4401741027832031, -1.3417434692382812, -0.21576690673828125, 0.2602958679199219, 0.9673309326171875, 0.41931915283203125, -0.443450927734375, 0.30668067932128906, -0.4217681884765625, 3.015350341796875, 1.2800445556640625, -0.8253097534179688, -0.6530227661132812, 0.385711669921875, 0.2720069885253906, -0.13390350341796875, 0.3618316650390625, 0.3791656494140625, -0.70611572265625, 1.01788330078125, 0.4882354736328125, -1.1383819580078125, 0.10118484497070312, 0.3899345397949219, -0.7118377685546875, 1.73944091796875, 0.1992015838623047, 0.12560653686523438, -0.11092948913574219, -0.3154029846191406, 0.26085853576660156, 0.5934104919433594, 0.5001964569091797, 0.01153564453125, -0.6023635864257812, 2.0495452880859375, -0.31122589111328125, 0.9629364013671875, -0.009817123413085938, 2.118988037109375, -0.3389453887939453, 0.4276885986328125, 1.2978668212890625, 1.3104209899902344, -0.014879226684570312, -1.4208831787109375, -0.22649383544921875, 1.0617866516113281, 0.11926651000976562, 0.8421516418457031, 1.1968269348144531, 0.37299156188964844, 0.3492851257324219, 0.25843048095703125, 0.2775421142578125, 0.5374832153320312, -0.6072235107421875, -0.36147308349609375, 2.092376708984375, 1.68096923828125, 1.9681167602539062], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000104.npy"}
|
||||
{"epoch": 0.15721844293272866, "step": 105, "batch_size": 64, "mean": 0.39167043566703796, "std": 0.8935391902923584, "min": -1.95452880859375, "p10": -0.6488517761230468, "median": 0.30161476135253906, "p90": 1.565786361694337, "max": 2.788698196411133, "pos_frac": 0.71875, "sample": [0.5443096160888672, 0.20388031005859375, 1.017974853515625, 0.60009765625, 0.18402862548828125, 0.7868289947509766, -1.3983917236328125, 2.458648681640625, -0.3696441650390625, 0.4403705596923828, 0.3355216979980469, 0.9454364776611328, 0.9912643432617188, 1.7852325439453125, 0.19832229614257812, 1.924835205078125, 0.050624847412109375, -1.95452880859375, -0.21221351623535156, 2.0271987915039062, 0.0957183837890625, 0.00193023681640625, 0.48093414306640625, -0.5280532836914062, -0.11519622802734375, -0.9391326904296875, 0.9783439636230469, 1.0082855224609375, 0.7316131591796875, -0.0019378662109375, 0.00559234619140625, 0.4559745788574219, -0.815185546875, 0.140533447265625, 0.15804290771484375, 1.312255859375, 0.5456123352050781, -0.0177459716796875, -0.70062255859375, 0.7235202789306641, -0.4433097839355469, 0.06883621215820312, 0.0465087890625, 1.1805381774902344, 1.0777740478515625, 1.6744422912597656, -1.5279388427734375, -0.0919952392578125, -0.860870361328125, -0.331939697265625, 0.37274169921875, 0.9791755676269531, 0.145233154296875, 2.123260498046875, 0.157623291015625, 0.26770782470703125, 2.788698196411133, 0.8827056884765625, -0.17483139038085938, 0.6958389282226562, -0.319122314453125, 0.8324661254882812, 0.8594512939453125, 0.5836334228515625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000105.npy"}
|
||||
{"epoch": 0.15873015873015872, "step": 106, "batch_size": 64, "mean": 0.450268030166626, "std": 0.8257230520248413, "min": -1.8544158935546875, "p10": -0.37996215820312496, "median": 0.39570045471191406, "p90": 1.5149700164794926, "max": 2.338092803955078, "pos_frac": 0.71875, "sample": [-0.029266357421875, 1.2276172637939453, 1.0866661071777344, 0.7746810913085938, 0.5730743408203125, -0.39800262451171875, 2.338092803955078, 1.1718063354492188, 0.22854232788085938, 0.5038642883300781, 0.222808837890625, 0.9802894592285156, 2.0415496826171875, 0.9476699829101562, -0.2567329406738281, -0.20875167846679688, 0.9922103881835938, 0.20233917236328125, 0.33376312255859375, 0.1098480224609375, -0.1557464599609375, -0.3291168212890625, -0.2101593017578125, -0.237457275390625, -0.20146560668945312, -0.6619911193847656, -0.30757713317871094, -0.3210601806640625, 0.6856460571289062, 2.170196533203125, -0.33786773681640625, 0.493682861328125, 0.23595428466796875, 0.929046630859375, 0.9211273193359375, 1.813079833984375, 1.4359664916992188, -0.5012626647949219, 1.44097900390625, 0.2315673828125, 0.7512893676757812, 0.3625144958496094, 2.073942184448242, 0.7282791137695312, -0.6632766723632812, 0.10433197021484375, 0.4284400939941406, 0.9784202575683594, -1.8544158935546875, 0.5495052337646484, 0.242950439453125, 0.8286514282226562, 0.5862655639648438, 1.5466804504394531, 0.17826080322265625, -0.7060928344726562, 0.3606071472167969, 1.5519866943359375, 0.10926055908203125, 0.3629608154296875, -1.7994384765625, 0.6149578094482422, 0.6222152709960938, 0.9232463836669922], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000106.npy"}
|
||||
{"epoch": 0.1602418745275888, "step": 107, "batch_size": 64, "mean": 0.2766827940940857, "std": 0.9204331040382385, "min": -1.9253921508789062, "p10": -0.7706192016601563, "median": 0.2319192886352539, "p90": 1.5210977554321297, "max": 3.0643081665039062, "pos_frac": 0.640625, "sample": [0.07098579406738281, -1.9253921508789062, 0.6475372314453125, 3.0643081665039062, 0.823577880859375, 0.7618789672851562, -0.04541015625, 0.7532196044921875, -0.9866256713867188, 0.8248329162597656, -1.604949951171875, 0.6046333312988281, 0.4180183410644531, 0.5865249633789062, 0.32972145080566406, -1.0177383422851562, 0.04499053955078125, -0.00623321533203125, 0.04228019714355469, -0.933013916015625, -0.158477783203125, 2.0938720703125, 1.947784423828125, 0.3116893768310547, 1.1236915588378906, -1.4014739990234375, 1.5927257537841797, -0.6831092834472656, -0.6465377807617188, -0.24648666381835938, 0.5586891174316406, -0.5115280151367188, -0.0269317626953125, 0.5497283935546875, -0.17786407470703125, -0.3349609375, 1.67047119140625, 0.1072998046875, 1.3539657592773438, 1.8216552734375, 1.1544952392578125, 1.1410980224609375, 0.10533905029296875, -0.28081512451171875, 0.5263195037841797, 1.0174331665039062, 0.3883705139160156, 0.38549041748046875, 0.27272796630859375, -0.4683685302734375, 1.663543701171875, 1.1852264404296875, 0.12969207763671875, -0.7453212738037109, -0.2757377624511719, 0.2154541015625, 0.274139404296875, 0.2274017333984375, 0.2364368438720703, -0.5997543334960938, -0.77130126953125, -0.7690277099609375, 0.10736083984375, 1.1901473999023438], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000107.npy"}
|
||||
{"epoch": 0.1617535903250189, "step": 108, "batch_size": 64, "mean": 0.4842491149902344, "std": 0.8433753848075867, "min": -1.7332305908203125, "p10": -0.48018836975097645, "median": 0.4143791198730469, "p90": 1.6049430847167974, "max": 2.7007675170898438, "pos_frac": 0.65625, "sample": [0.617156982421875, 0.41349029541015625, 1.4271011352539062, 0.8250198364257812, -0.2254161834716797, 1.6856155395507812, 0.4152679443359375, 0.8897628784179688, 2.2143707275390625, -0.0021820068359375, -0.5978164672851562, 2.7007675170898438, 0.8244094848632812, -0.5347251892089844, 0.7496490478515625, 0.5968017578125, -0.28389739990234375, 0.23934364318847656, 0.110992431640625, 1.1574363708496094, 1.42901611328125, 0.6257400512695312, -0.06286811828613281, -0.048553466796875, 1.0774154663085938, -0.76068115234375, 1.1148529052734375, -0.589691162109375, 2.048736572265625, 0.6710205078125, -0.118438720703125, 0.02938079833984375, 1.13568115234375, -0.07675552368164062, 0.8465042114257812, 1.9305477142333984, -0.33782196044921875, 0.061702728271484375, -1.7332305908203125, 1.407562255859375, -0.17259979248046875, 1.6578903198242188, 0.26004981994628906, 0.11639022827148438, -0.8143386840820312, 0.2966041564941406, -0.2281513214111328, 1.7239952087402344, -0.28968048095703125, -0.32953643798828125, -0.7847137451171875, 0.2616729736328125, -0.352935791015625, 0.9546432495117188, 0.109100341796875, 1.011138916015625, 0.440673828125, 0.961700439453125, 0.9343833923339844, 1.4813995361328125, -0.0604248046875, 1.0248641967773438, -0.23394775390625, 1.1504974365234375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000108.npy"}
|
||||
{"epoch": 0.16326530612244897, "step": 109, "batch_size": 64, "mean": 0.510068416595459, "std": 0.8143371343612671, "min": -0.8730373382568359, "p10": -0.26758422851562497, "median": 0.4045381546020508, "p90": 1.3688285827636724, "max": 3.571502685546875, "pos_frac": 0.671875, "sample": [0.4586029052734375, 1.0668792724609375, 2.365325927734375, -0.023731231689453125, 1.1507244110107422, 0.5913715362548828, 1.6355400085449219, 1.272003173828125, 0.745880126953125, 1.1016159057617188, 0.2925872802734375, -0.004486083984375, 1.2738571166992188, 1.1020317077636719, 0.0147705078125, -0.270294189453125, 1.1132545471191406, 0.4534454345703125, 0.7246322631835938, 0.5555381774902344, 0.8697566986083984, 1.2114906311035156, -0.081085205078125, -0.016600608825683594, -0.1161346435546875, 2.3810577392578125, -0.49492645263671875, 0.02178955078125, -0.1275177001953125, 3.571502685546875, 0.15410423278808594, -0.377899169921875, -0.22249221801757812, 0.2882843017578125, 0.2830467224121094, 0.4935779571533203, -0.6433925628662109, -0.0374908447265625, -0.0191192626953125, 0.7717437744140625, -0.17253875732421875, 0.10144233703613281, -0.163116455078125, -0.261260986328125, 0.40549468994140625, -0.43170928955078125, 0.24755859375, 0.4035816192626953, -0.18975830078125, -0.7932243347167969, 0.8163070678710938, 0.06624603271484375, 0.6269989013671875, 1.8659820556640625, 0.8739356994628906, 1.4095306396484375, 2.2309837341308594, 0.8522186279296875, 0.5669021606445312, 0.2854804992675781, 0.9100112915039062, -0.8730373382568359, -0.1018829345703125, 0.43898773193359375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000109.npy"}
|
||||
{"epoch": 0.16477702191987906, "step": 110, "batch_size": 64, "mean": 0.3329389691352844, "std": 0.8171271681785583, "min": -1.624624252319336, "p10": -0.49706954956054683, "median": 0.20769119262695312, "p90": 1.5495521545410158, "max": 2.5645904541015625, "pos_frac": 0.671875, "sample": [0.11138534545898438, -0.12336349487304688, -1.38128662109375, -0.2924652099609375, 0.20635223388671875, -0.055553436279296875, 0.5552902221679688, 1.6525039672851562, 0.56170654296875, 1.6529312133789062, 1.3143768310546875, 0.5787181854248047, 0.11399078369140625, 1.6887588500976562, 0.39998626708984375, -0.07468414306640625, 0.15730667114257812, 1.50518798828125, -0.3785972595214844, -1.624624252319336, -0.1508331298828125, 0.8895263671875, 0.35416412353515625, 0.2090301513671875, 0.1102294921875, 0.4870147705078125, 0.5433082580566406, -0.20075225830078125, -0.1411285400390625, -0.46573638916015625, 1.0356864929199219, 0.17856979370117188, 1.1381988525390625, -0.6838188171386719, 0.02063751220703125, 0.7780036926269531, -0.52655029296875, 2.06536865234375, 0.0540008544921875, 0.55517578125, 0.606170654296875, 0.35931396484375, 0.2041015625, -0.16141510009765625, 0.256439208984375, 0.7299270629882812, 1.05255126953125, -0.9466400146484375, 0.3353271484375, -1.2797164916992188, 0.388397216796875, -0.2061920166015625, 0.26267242431640625, -0.510498046875, 0.15380096435546875, 2.307373046875, 0.8216400146484375, 1.5685653686523438, 0.05678749084472656, 2.5645904541015625, -0.40753746032714844, 0.7787704467773438, -0.209564208984375, -0.2347869873046875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000110.npy"}
|
||||
{"epoch": 0.16628873771730915, "step": 111, "batch_size": 64, "mean": 0.43631690740585327, "std": 0.9110887050628662, "min": -1.6231918334960938, "p10": -0.7448215484619141, "median": 0.4512643814086914, "p90": 1.587863159179688, "max": 2.28009033203125, "pos_frac": 0.6875, "sample": [-0.1902751922607422, -1.1491622924804688, -1.3190994262695312, 1.77301025390625, 0.9809074401855469, 0.4793701171875, -0.7555503845214844, 1.1879043579101562, 1.3105430603027344, 0.4231586456298828, 0.6961822509765625, 1.2686080932617188, -0.8279647827148438, 0.10286712646484375, 0.6698150634765625, 1.2733917236328125, -0.17800140380859375, 0.9348087310791016, -0.6794815063476562, 1.2491874694824219, -0.26375579833984375, 0.36962890625, 0.5701217651367188, 1.3676948547363281, 0.306915283203125, 0.9797496795654297, 0.84771728515625, 1.7574005126953125, 0.2726020812988281, 1.02587890625, 0.06384658813476562, -0.7023048400878906, -1.29974365234375, -0.5737953186035156, 0.5157394409179688, 0.8219985961914062, 0.9278526306152344, 1.4954605102539062, -0.2228374481201172, 1.3795394897460938, 2.1063003540039062, 1.2283477783203125, 1.96270751953125, -0.5463848114013672, 0.39923858642578125, 0.10646247863769531, 0.3830299377441406, 1.868316650390625, -1.0058670043945312, 0.7364406585693359, 1.6274642944335938, 2.28009033203125, 0.950042724609375, -0.5321006774902344, -1.6231918334960938, -0.06454086303710938, -0.71978759765625, 0.621856689453125, -0.4473915100097656, 0.9274063110351562, 0.193359375, -0.011932373046875, 0.2574119567871094, 0.33707427978515625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000111.npy"}
|
||||
{"epoch": 0.16780045351473924, "step": 112, "batch_size": 64, "mean": 0.48680511116981506, "std": 0.8550528287887573, "min": -1.8546142578125, "p10": -0.4336801528930664, "median": 0.6389846801757812, "p90": 1.4307418823242188, "max": 2.2050323486328125, "pos_frac": 0.734375, "sample": [0.63714599609375, 2.2050323486328125, 1.4411468505859375, 0.8339347839355469, -0.12528228759765625, 0.7555694580078125, 1.9701156616210938, 0.5795364379882812, 2.15350341796875, 1.2986869812011719, -0.0786895751953125, -0.561004638671875, 0.00710296630859375, 0.9396648406982422, -0.06577301025390625, -0.8866119384765625, 0.734619140625, -0.05429840087890625, -1.5194473266601562, 0.08715057373046875, 0.883392333984375, -1.8546142578125, 1.3585090637207031, 1.3320541381835938, 0.0510406494140625, 1.0064926147460938, 0.6707763671875, 0.6111488342285156, 0.95458984375, 0.7190532684326172, 1.15863037109375, 0.42420196533203125, -0.3918342590332031, 0.15466690063476562, 1.406463623046875, 0.6408233642578125, 0.6463680267333984, 0.00450897216796875, 1.5382957458496094, -0.09505462646484375, 0.14127731323242188, 0.96478271484375, 0.8525428771972656, 0.4938545227050781, -0.39034271240234375, 2.0457916259765625, 1.3544158935546875, 1.0895309448242188, -0.4015369415283203, 0.139739990234375, -1.6524276733398438, 1.7736129760742188, -0.69000244140625, 0.7833023071289062, 0.3737030029296875, 0.6749305725097656, -0.4198760986328125, -0.43959617614746094, 0.8876075744628906, 0.37957763671875, 0.7908248901367188, -0.2620697021484375, 0.9536666870117188, 0.14060211181640625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000112.npy"}
|
||||
{"epoch": 0.1693121693121693, "step": 113, "batch_size": 64, "mean": 0.46567538380622864, "std": 0.9820100665092468, "min": -3.14996337890625, "p10": -0.7838945388793945, "median": 0.4845771789550781, "p90": 1.752721405029297, "max": 2.1026840209960938, "pos_frac": 0.765625, "sample": [-0.8856887817382812, -1.1212615966796875, 0.34590911865234375, -0.005626678466796875, -0.3864479064941406, 2.014404296875, 0.5011520385742188, 0.14221954345703125, 0.9749679565429688, 1.7406330108642578, 0.001232147216796875, 0.0381011962890625, 0.18296051025390625, 1.5755081176757812, -0.832366943359375, 0.8303298950195312, 0.5658454895019531, 1.3179817199707031, -0.7211570739746094, -1.1900997161865234, 0.04461669921875, 0.21540451049804688, 1.0784645080566406, -1.36688232421875, -0.7323055267333984, -0.0658416748046875, 1.5181159973144531, 0.5143165588378906, 0.3820037841796875, 0.2668018341064453, 1.8058624267578125, 1.622589111328125, 0.062244415283203125, -0.3138427734375, 1.3611373901367188, 1.8944263458251953, -3.14996337890625, 1.1384620666503906, 1.7327880859375, 0.2868804931640625, 0.5504531860351562, 1.7579021453857422, 0.849273681640625, 2.045745849609375, -0.7965660095214844, 0.8721923828125, 0.7081680297851562, 0.23732948303222656, -0.7543277740478516, 1.3062000274658203, 0.3342132568359375, 0.19855499267578125, 0.5139484405517578, 1.1765594482421875, 1.962493896484375, -0.2115192413330078, 0.9149169921875, 0.24585342407226562, 0.6455230712890625, 0.4680023193359375, 0.7033843994140625, 0.5058441162109375, 0.08251953125, 2.1026840209960938], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000113.npy"}
|
||||
{"epoch": 0.1708238851095994, "step": 114, "batch_size": 64, "mean": 0.716437578201294, "std": 0.9607676863670349, "min": -1.112884521484375, "p10": -0.5779628753662108, "median": 0.6909275054931641, "p90": 1.8003852844238288, "max": 3.937255859375, "pos_frac": 0.78125, "sample": [0.364959716796875, 0.7642440795898438, 0.3721046447753906, 0.5274238586425781, 1.3190956115722656, 0.5227203369140625, 1.1335029602050781, -0.7412872314453125, 1.2674713134765625, -0.6305198669433594, -0.017030715942382812, 1.3811683654785156, 1.2887077331542969, 0.1309967041015625, 0.7608070373535156, -1.112884521484375, 1.1010589599609375, 0.76031494140625, 0.1966724395751953, 0.2532958984375, 0.8969192504882812, 0.4843597412109375, 2.4287338256835938, 0.17983436584472656, 0.5140609741210938, 1.4602127075195312, 0.453765869140625, 1.4160728454589844, -0.45532989501953125, 1.4050731658935547, 0.2707538604736328, 0.7523155212402344, 1.9028244018554688, 0.8315277099609375, 1.2728347778320312, 0.2577857971191406, 1.2011032104492188, 1.4395370483398438, 1.5740909576416016, -1.0441665649414062, 0.984222412109375, 1.1637210845947266, 0.75970458984375, -0.2381153106689453, -0.01927947998046875, -0.20949745178222656, 3.012359619140625, 1.0784835815429688, -0.6349716186523438, 0.018672943115234375, -0.7536087036132812, 3.937255859375, 0.7562217712402344, 2.31201171875, 1.6170272827148438, 1.87896728515625, 0.6295394897460938, -0.1850605010986328, -0.231475830078125, -0.8023281097412109, 2.5729713439941406, 0.2388763427734375, 0.555938720703125, 0.5252361297607422], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000114.npy"}
|
||||
{"epoch": 0.17233560090702948, "step": 115, "batch_size": 64, "mean": 0.5153576135635376, "std": 0.8893477320671082, "min": -1.1607666015625, "p10": -0.4923141479492187, "median": 0.4154386520385742, "p90": 1.3804056167602543, "max": 4.047088623046875, "pos_frac": 0.796875, "sample": [0.06525802612304688, 0.2719917297363281, 1.0119171142578125, -0.8171844482421875, 2.00042724609375, 1.2986507415771484, 0.1598968505859375, 1.1392364501953125, 0.54473876953125, 4.047088623046875, 0.6153354644775391, -0.444793701171875, 3.5421829223632812, -0.5638198852539062, 0.6747283935546875, 0.790679931640625, 0.0626373291015625, 0.589630126953125, 0.4027862548828125, -0.20587921142578125, 0.9173774719238281, 0.4794921875, 0.15483856201171875, -0.37274169921875, -0.7046604156494141, 0.8453578948974609, -0.5126800537109375, 0.5231437683105469, -0.2036285400390625, 1.4447441101074219, 0.0405426025390625, 0.4744415283203125, 0.2214202880859375, 0.2281646728515625, 0.1481781005859375, 1.4154434204101562, 0.170013427734375, 0.6278514862060547, 0.728240966796875, 0.6009635925292969, 1.2870330810546875, -0.8139801025390625, 1.1318588256835938, 0.19732666015625, 1.2617645263671875, 0.42809104919433594, 1.2290878295898438, 1.0011444091796875, 0.2986412048339844, -1.1607666015625, 0.3629627227783203, 0.0400390625, 1.9372329711914062, -0.1743946075439453, -0.2439136505126953, 0.04239463806152344, 0.20072174072265625, 0.1480121612548828, -0.6753387451171875, 0.4570808410644531, 1.0637435913085938, 0.39768218994140625, 0.5184440612792969, 1.6360092163085938], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000115.npy"}
|
||||
{"epoch": 0.17384731670445955, "step": 116, "batch_size": 64, "mean": 0.5405769348144531, "std": 0.797072172164917, "min": -1.1853256225585938, "p10": -0.26151046752929685, "median": 0.4769477844238281, "p90": 1.748801803588868, "max": 2.456817626953125, "pos_frac": 0.75, "sample": [0.8480758666992188, -0.152618408203125, 0.8424453735351562, 1.5058364868164062, 0.85699462890625, -1.1853256225585938, -0.270233154296875, 0.1898651123046875, 0.6932735443115234, 0.8480377197265625, 0.8545455932617188, -0.0053577423095703125, 0.185882568359375, 0.8321380615234375, 0.3440971374511719, 0.08153533935546875, 0.1733989715576172, 0.012441635131835938, 1.2468376159667969, 0.051868438720703125, -1.0755424499511719, 1.2138137817382812, 0.8904533386230469, 0.6233024597167969, -0.34511756896972656, 0.9081916809082031, 0.1263275146484375, 1.8331184387207031, 0.667083740234375, 0.7937850952148438, -0.20367431640625, 0.46448707580566406, -0.1279125213623047, 0.2749481201171875, 2.3994598388671875, 0.4735870361328125, 1.8650436401367188, 2.456817626953125, 0.041156768798828125, 0.28380584716796875, 0.14154052734375, 0.48030853271484375, 1.55206298828125, 0.27742767333984375, 1.3784332275390625, 1.4599227905273438, 1.9123153686523438, 0.7775192260742188, 0.5398941040039062, 1.0032806396484375, -0.173492431640625, 0.9560165405273438, -0.12781524658203125, -0.4469146728515625, 0.778167724609375, -0.24115753173828125, 0.2939033508300781, -0.12114715576171875, -0.09833526611328125, 0.4970245361328125, 1.980499267578125, -1.1275310516357422, 1.9637985229492188, -0.5756702423095703], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000116.npy"}
|
||||
{"epoch": 0.17535903250188964, "step": 117, "batch_size": 64, "mean": 0.48596563935279846, "std": 0.8725306391716003, "min": -2.5600967407226562, "p10": -0.5884843826293944, "median": 0.5271825790405273, "p90": 1.6166191101074219, "max": 2.5258827209472656, "pos_frac": 0.75, "sample": [0.6515426635742188, 1.201986312866211, 0.1681671142578125, 0.146881103515625, 0.2698516845703125, 0.900482177734375, 0.6183204650878906, 0.3165912628173828, 0.5387115478515625, 0.8933773040771484, -0.029499053955078125, -0.7998218536376953, 1.17926025390625, -0.12413787841796875, 0.32305145263671875, 1.62457275390625, 0.7904510498046875, 0.9847793579101562, 1.115447998046875, 1.5980606079101562, -0.82135009765625, 0.3008308410644531, 0.8422355651855469, 2.0223846435546875, -0.33572959899902344, 0.8371543884277344, -0.7852630615234375, 0.26625823974609375, 0.17682838439941406, 2.5258827209472656, 0.614044189453125, 1.976806640625, 1.3751029968261719, -0.3987388610839844, 0.328826904296875, 1.0662422180175781, 1.1268692016601562, 1.5099945068359375, 0.5084991455078125, 1.8315658569335938, 0.5166473388671875, 0.089752197265625, -0.19632339477539062, 0.9423828125, -0.49903106689453125, -2.5600967407226562, 0.5377178192138672, 0.2020416259765625, 1.6991043090820312, 0.0761260986328125, 0.33596038818359375, 0.7536773681640625, 0.7865333557128906, -1.0299034118652344, 0.9936065673828125, -1.3725204467773438, -0.6268215179443359, -0.22957992553710938, -0.1169891357421875, -0.0631561279296875, 1.6919174194335938, 0.8460006713867188, 0.302764892578125, 0.6854686737060547], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000117.npy"}
|
||||
{"epoch": 0.17687074829931973, "step": 118, "batch_size": 64, "mean": 0.547337532043457, "std": 0.877411425113678, "min": -1.31353759765625, "p10": -0.630281639099121, "median": 0.5336856842041016, "p90": 1.8658546447753908, "max": 2.40655517578125, "pos_frac": 0.75, "sample": [0.543426513671875, 0.7510223388671875, 1.6357879638671875, -0.499664306640625, -1.0948410034179688, 1.258880615234375, 0.09027862548828125, -0.08544921875, 1.1917495727539062, 0.000843048095703125, 0.4625091552734375, 0.7591094970703125, 0.42685699462890625, 1.9927711486816406, 0.2633838653564453, -0.20728302001953125, -1.0892181396484375, 0.6560440063476562, 0.11696243286132812, 0.6880550384521484, 0.32666015625, 0.9255828857421875, 0.6613998413085938, 2.40655517578125, 0.8283348083496094, 0.2929039001464844, 0.20319366455078125, -0.588287353515625, 1.0289630889892578, -0.11675834655761719, 0.9384880065917969, 0.8197536468505859, 1.879058837890625, -0.08601760864257812, -0.0277252197265625, 0.2389373779296875, -0.1840972900390625, 2.2303848266601562, 1.3037261962890625, -1.31353759765625, -0.6482791900634766, 2.036376953125, -0.7899322509765625, 0.1736602783203125, 0.6449050903320312, 0.00528717041015625, -0.7439346313476562, 0.5975341796875, 1.8350448608398438, 1.0081329345703125, 0.80548095703125, 0.9450035095214844, -0.6767196655273438, 2.3937454223632812, 2.013996124267578, 0.5880069732666016, 0.133758544921875, 1.2371578216552734, 1.7937278747558594, 1.341787338256836, 0.5239448547363281, 0.030658721923828125, 0.49816131591796875, -0.346649169921875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000118.npy"}
|
||||
{"epoch": 0.17838246409674982, "step": 119, "batch_size": 64, "mean": 0.3900427222251892, "std": 0.9585937857627869, "min": -1.7999839782714844, "p10": -0.657650375366211, "median": 0.3739614486694336, "p90": 1.6417922973632815, "max": 2.662078857421875, "pos_frac": 0.6875, "sample": [0.6777420043945312, 1.7521438598632812, -0.32245635986328125, 0.8174400329589844, -0.2698211669921875, 0.019001007080078125, 1.4936485290527344, 0.216583251953125, -0.46484375, 0.3418693542480469, -0.26996612548828125, 0.48758697509765625, 0.9722900390625, 0.8057174682617188, 1.3080101013183594, -1.2250823974609375, 0.0681915283203125, -1.7999839782714844, 0.15203475952148438, 0.31683349609375, -0.6684741973876953, 1.66754150390625, 0.5575942993164062, 1.0912303924560547, 1.7334213256835938, -0.4473114013671875, -1.1482658386230469, 1.3106193542480469, 0.9817047119140625, 2.3613739013671875, 0.6309280395507812, -1.4417800903320312, 0.56475830078125, 0.9161815643310547, 0.041259765625, 0.881866455078125, 0.6226768493652344, 0.05802154541015625, 0.17615509033203125, 1.7067642211914062, 1.5577468872070312, 0.5564956665039062, -0.4861183166503906, 0.3194122314453125, -0.4970550537109375, -0.6323947906494141, 0.9111785888671875, 0.9312248229980469, 1.4887847900390625, 0.36949920654296875, 1.5817108154296875, -1.58013916015625, -1.6093063354492188, -0.43093109130859375, 0.37842369079589844, -0.25222015380859375, 1.2474689483642578, 2.662078857421875, 1.6700325012207031, 0.2569427490234375, 0.5763473510742188, -0.29094696044921875, -0.0582275390625, -0.3804779052734375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000119.npy"}
|
||||
{"epoch": 0.17989417989417988, "step": 120, "batch_size": 64, "mean": 0.5686200261116028, "std": 0.8054189682006836, "min": -0.9866104125976562, "p10": -0.22234649658203123, "median": 0.418609619140625, "p90": 1.5802093505859378, "max": 2.9937591552734375, "pos_frac": 0.6875, "sample": [-0.715362548828125, -0.08275604248046875, -0.961700439453125, 1.145315170288086, 0.7854461669921875, -0.491424560546875, 0.9914970397949219, 0.227783203125, 0.31446075439453125, -0.5167007446289062, 0.5126800537109375, -0.12199783325195312, 0.30391693115234375, 0.653167724609375, -0.10306549072265625, 2.235137939453125, 0.3016510009765625, -0.10601043701171875, -0.095184326171875, 0.7721328735351562, 1.9860305786132812, 0.7427902221679688, 0.3874969482421875, 0.9197654724121094, 0.7541046142578125, 0.7233409881591797, 0.8608245849609375, 1.5261306762695312, 1.1378555297851562, 0.744476318359375, 2.3483505249023438, -0.01093292236328125, 1.3342437744140625, 0.40164947509765625, 1.1284637451171875, 0.416412353515625, 0.7349090576171875, -0.060520172119140625, 1.6498031616210938, 0.5052032470703125, -0.3448009490966797, 0.25768280029296875, -0.07082366943359375, 2.9937591552734375, -0.018522262573242188, 0.19610595703125, -0.9866104125976562, 0.9816741943359375, 1.3200759887695312, 1.21124267578125, 0.5554332733154297, -0.207061767578125, 1.6033859252929688, -0.1254425048828125, 1.3005218505859375, -0.07524871826171875, 0.420806884765625, 2.4488754272460938, 0.3376121520996094, -0.2288970947265625, 0.13592529296875, -0.042072296142578125, 1.156789779663086, 0.2918853759765625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000120.npy"}
|
||||
{"epoch": 0.18140589569160998, "step": 121, "batch_size": 64, "mean": 0.47881391644477844, "std": 0.813716471195221, "min": -1.26251220703125, "p10": -0.47366790771484374, "median": 0.45372772216796875, "p90": 1.5234195709228515, "max": 2.8104248046875, "pos_frac": 0.6875, "sample": [1.7930145263671875, 0.814788818359375, -0.7090301513671875, -0.13157081604003906, 1.8074607849121094, 1.3431396484375, 1.6959037780761719, -0.0591278076171875, 0.8706474304199219, -0.39971160888671875, 1.3307342529296875, 0.094696044921875, 2.6300506591796875, 0.028390884399414062, 0.721923828125, -0.23077964782714844, 1.4394378662109375, -1.26251220703125, 0.5172080993652344, -0.8966751098632812, -0.0997161865234375, -0.044239044189453125, 1.251251220703125, 0.5107421875, 0.8143768310546875, 0.9012985229492188, 0.5994720458984375, -0.5813961029052734, 0.8167648315429688, 0.7090339660644531, -0.02869415283203125, 2.8104248046875, 1.5245170593261719, 0.445404052734375, -0.362548828125, -0.75360107421875, -0.1213836669921875, 0.30777740478515625, 0.9658355712890625, 0.5108604431152344, 0.4087657928466797, 0.3122081756591797, 1.2428512573242188, 0.8548202514648438, -0.358917236328125, 0.5233078002929688, -0.7680625915527344, 0.4620513916015625, 1.5208587646484375, 0.7062721252441406, -0.4639854431152344, -0.047496795654296875, 0.8804397583007812, 0.6975440979003906, 0.29569053649902344, 1.62451171875, 0.34284400939941406, -0.3778572082519531, 0.3919811248779297, -0.4778175354003906, 0.2655220031738281, 0.815826416015625, 0.1327972412109375, 0.08576583862304688], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000121.npy"}
|
||||
{"epoch": 0.18291761148904007, "step": 122, "batch_size": 64, "mean": 0.7386811971664429, "std": 0.968778669834137, "min": -1.654937744140625, "p10": -0.17397956848144527, "median": 0.6416511535644531, "p90": 1.6997062683105473, "max": 4.1533203125, "pos_frac": 0.859375, "sample": [4.1533203125, 0.06549072265625, 0.1655731201171875, 1.7408065795898438, -1.4055633544921875, -0.1936359405517578, 0.524658203125, 1.2589035034179688, 0.17662811279296875, 0.5312862396240234, 1.1250152587890625, -0.028228759765625, 0.06390571594238281, 0.16102981567382812, 0.020719528198242188, 0.8427734375, 0.2020416259765625, 0.8348007202148438, -0.3047828674316406, 1.4527301788330078, 0.3358173370361328, 2.9099197387695312, 0.8923721313476562, 1.5947151184082031, 2.7041282653808594, 0.16119384765625, 0.6310348510742188, 0.19927597045898438, 1.0835113525390625, 0.6522674560546875, 0.4989013671875, 1.972625732421875, 0.11055374145507812, 1.1921844482421875, -1.654937744140625, 0.7235260009765625, 0.3199138641357422, -0.21408843994140625, 1.3931560516357422, 0.5914154052734375, -0.1281147003173828, 0.7508926391601562, -0.8036956787109375, 1.6038055419921875, 1.0356597900390625, 1.369100570678711, 0.69775390625, 0.2551536560058594, 1.0629806518554688, 0.12464141845703125, 0.6022186279296875, 1.108896255493164, 0.1964740753173828, 0.3752250671386719, 0.24590301513671875, 0.8779830932617188, 0.6589546203613281, 0.9648399353027344, 0.8176193237304688, 0.9480361938476562, 2.6527481079101562, 1.5197582244873047, -0.242401123046875, 3.0962066650390625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000122.npy"}
|
||||
{"epoch": 0.18442932728647016, "step": 123, "batch_size": 64, "mean": 0.5666618347167969, "std": 0.9654145240783691, "min": -2.04156494140625, "p10": -0.5019693374633788, "median": 0.5770893096923828, "p90": 1.6907501220703125, "max": 2.95062255859375, "pos_frac": 0.703125, "sample": [1.9443740844726562, 0.352020263671875, -0.12053680419921875, -0.802093505859375, -1.0847339630126953, 0.017511367797851562, 0.024372100830078125, 0.097564697265625, 2.8933868408203125, 0.9909210205078125, 0.2712860107421875, 1.8505096435546875, -0.127716064453125, -0.78875732421875, 2.1825027465820312, 1.3925018310546875, 0.1989269256591797, 1.0171794891357422, -1.0066375732421875, 0.636260986328125, 0.9476852416992188, 1.6706161499023438, 1.0306854248046875, 1.0389537811279297, 1.6993789672851562, 1.0170669555664062, 0.8480930328369141, -0.07624053955078125, 0.5764923095703125, 0.7469635009765625, -0.013256072998046875, 2.3347702026367188, -1.6293048858642578, 0.6934375762939453, -0.3826007843017578, -0.5531272888183594, 1.6434707641601562, -0.07381057739257812, -0.06784629821777344, 0.6848297119140625, 0.13616180419921875, 0.5776863098144531, 1.4931068420410156, 0.7144088745117188, 0.6724758148193359, 0.5149307250976562, -0.002166748046875, 0.2225360870361328, -0.16472816467285156, 1.5809898376464844, 1.2695045471191406, 0.4118804931640625, -0.24080657958984375, 1.549041748046875, -2.04156494140625, 1.243499755859375, -0.2217731475830078, 0.730010986328125, -0.08654022216796875, 1.2617073059082031, 0.33519744873046875, 0.5587348937988281, 2.95062255859375, 0.7263412475585938], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000123.npy"}
|
||||
{"epoch": 0.18594104308390022, "step": 124, "batch_size": 64, "mean": 0.7790486812591553, "std": 0.929240345954895, "min": -1.4747352600097656, "p10": -0.2874011993408203, "median": 0.7001991271972656, "p90": 2.021397399902344, "max": 3.018524169921875, "pos_frac": 0.765625, "sample": [-0.0784912109375, 2.0332489013671875, 1.455322265625, -1.4747352600097656, 0.6305694580078125, 1.076507568359375, 2.4716339111328125, 1.1444206237792969, 0.495330810546875, 2.5128173828125, 0.669281005859375, 1.1137351989746094, 0.34766197204589844, 1.8237152099609375, 0.5162239074707031, 0.5983924865722656, 1.0147247314453125, 1.2657194137573242, -0.09554290771484375, -0.7707881927490234, 3.018524169921875, 0.5653266906738281, 1.860260009765625, -0.29322052001953125, -0.7010917663574219, 0.4649772644042969, 0.27292633056640625, 0.7119293212890625, 2.2499542236328125, -0.076812744140625, 1.2053852081298828, 1.24005126953125, 1.331695556640625, 0.8342437744140625, 0.14709091186523438, 0.1585693359375, 1.9835052490234375, -0.7394256591796875, 2.4195938110351562, 0.35308837890625, 0.3733062744140625, 2.3672866821289062, -0.36548614501953125, 1.782196044921875, 1.1156330108642578, 0.6884689331054688, 0.9514961242675781, -0.08954620361328125, -0.6765899658203125, 1.888936996459961, 1.2135772705078125, 0.2803497314453125, -0.12839126586914062, 1.0904655456542969, 0.7205352783203125, 0.0584716796875, 0.5586166381835938, -0.028839111328125, -0.2738227844238281, 0.9117431640625, -0.12241744995117188, 1.0273666381835938, 0.7656936645507812, 1.993743896484375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000124.npy"}
|
||||
{"epoch": 0.1874527588813303, "step": 125, "batch_size": 64, "mean": 0.6283919215202332, "std": 0.9730545878410339, "min": -1.375579833984375, "p10": -0.8126392364501953, "median": 0.6664638519287109, "p90": 1.8348804473876956, "max": 3.0383224487304688, "pos_frac": 0.796875, "sample": [2.40557861328125, 1.5116958618164062, 1.0316085815429688, 0.67889404296875, 0.5870819091796875, -0.9882431030273438, 3.0383224487304688, 0.08253288269042969, 1.3914794921875, 1.0451889038085938, 1.766448974609375, 0.27689361572265625, 0.08695602416992188, 1.2269439697265625, 0.2949390411376953, 2.2341232299804688, 1.9424591064453125, 0.7733306884765625, 0.46950531005859375, 0.23159408569335938, 2.383258819580078, 0.08784103393554688, 0.9137802124023438, 0.8128890991210938, 0.38957977294921875, 0.4862251281738281, 1.6799468994140625, -0.567108154296875, -1.375579833984375, 0.7058753967285156, 1.5234832763671875, -0.03497314453125, -0.30806732177734375, 1.45294189453125, 1.380807876586914, 0.0103302001953125, 1.2050743103027344, 0.9586524963378906, 1.0200119018554688, 1.2411651611328125, 0.8366546630859375, -0.881072998046875, 0.275634765625, 0.9680099487304688, 1.0651054382324219, -1.1080398559570312, 0.8344650268554688, 1.8642082214355469, 0.10798454284667969, 0.04823875427246094, -1.093414306640625, 0.9888916015625, 0.3872489929199219, 0.5193557739257812, -1.2128524780273438, 0.6540336608886719, 0.45896148681640625, -0.8067245483398438, 1.1717643737792969, -0.5830440521240234, 0.5040016174316406, -0.8151741027832031, -0.5557632446289062, 2.5351409912109375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000125.npy"}
|
||||
{"epoch": 0.1889644746787604, "step": 126, "batch_size": 64, "mean": 0.577207624912262, "std": 1.1996939182281494, "min": -1.9290924072265625, "p10": -0.655010986328125, "median": 0.441497802734375, "p90": 1.8497047424316413, "max": 6.3822174072265625, "pos_frac": 0.71875, "sample": [0.8591442108154297, -0.04095649719238281, 1.6767292022705078, 0.5626449584960938, -0.41678428649902344, 2.064319610595703, 1.7021713256835938, 0.2061939239501953, 0.6388015747070312, -0.9279022216796875, 1.4312744140625, 1.572052001953125, -1.44818115234375, 0.508514404296875, 2.3519058227539062, 0.6781501770019531, 0.31424713134765625, -0.648773193359375, 0.3171844482421875, -0.17497634887695312, 0.217926025390625, 1.492889404296875, 0.03952598571777344, 0.31647491455078125, -0.3270606994628906, -0.4173469543457031, 0.3427886962890625, -0.30794525146484375, 1.2043609619140625, -0.3785686492919922, -0.40244293212890625, 0.31145477294921875, -1.0801239013671875, 0.94525146484375, 0.33423805236816406, -0.657684326171875, 0.2216053009033203, 6.3822174072265625, 0.9000816345214844, -1.9290924072265625, 1.2530746459960938, 1.4239120483398438, 0.5495853424072266, 0.14108657836914062, 0.08469772338867188, 0.8180160522460938, 0.8170242309570312, 2.3180999755859375, 0.4728240966796875, 1.428009033203125, 1.1355705261230469, 2.0743942260742188, 0.6837234497070312, 0.6666488647460938, 0.818084716796875, -0.2368316650390625, -0.7714385986328125, 1.912933349609375, 0.26369476318359375, 0.6306304931640625, 0.4101715087890625, 2.9134597778320312, -1.1032791137695312, -0.197113037109375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000126.npy"}
|
||||
{"epoch": 0.19047619047619047, "step": 127, "batch_size": 64, "mean": 0.7032221555709839, "std": 0.9693413972854614, "min": -1.27960205078125, "p10": -0.4923744201660156, "median": 0.7317047119140625, "p90": 2.143193817138672, "max": 2.9569015502929688, "pos_frac": 0.71875, "sample": [-0.8994140625, 0.9207611083984375, 1.022369384765625, 0.9483413696289062, 0.8383407592773438, 0.21112060546875, 0.6344680786132812, 2.1382217407226562, -0.48679351806640625, 1.8423080444335938, 1.4724254608154297, -0.6198654174804688, 0.9993820190429688, 2.168485641479492, 1.8518524169921875, 0.7802391052246094, 1.1257095336914062, 1.0813865661621094, -0.2790336608886719, 0.14047622680664062, 0.80029296875, -1.27960205078125, 0.9637527465820312, 2.3745651245117188, 0.9218635559082031, -0.5419464111328125, 0.888458251953125, 0.2285003662109375, -1.0606536865234375, 0.30266571044921875, 0.5816307067871094, 0.6877365112304688, -0.111236572265625, 1.0265655517578125, -0.028684616088867188, 2.0920867919921875, 0.22538185119628906, 2.171192169189453, 1.768280029296875, 2.14532470703125, 0.7756729125976562, 2.164398193359375, -0.09510040283203125, -0.5700492858886719, 1.1036605834960938, -0.227630615234375, -0.1032867431640625, -0.11287689208984375, 2.9569015502929688, -0.4947662353515625, 1.707122802734375, 2.73699951171875, 0.23286056518554688, -0.15686798095703125, 0.4522552490234375, 0.0136566162109375, 1.2598724365234375, 0.2793121337890625, -0.11188507080078125, 1.627634048461914, 0.3197956085205078, 0.1723766326904297, 1.4173126220703125, -0.3881053924560547], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000127.npy"}
|
||||
{"epoch": 0.19198790627362056, "step": 128, "batch_size": 64, "mean": 0.5983456373214722, "std": 1.1144750118255615, "min": -2.6193161010742188, "p10": -0.8579780578613281, "median": 0.7076206207275391, "p90": 1.9183288574218755, "max": 3.593780517578125, "pos_frac": 0.71875, "sample": [1.7833404541015625, 0.840789794921875, 0.6501960754394531, 1.388620376586914, 1.0977210998535156, 0.5360183715820312, 1.1738739013671875, 0.5999755859375, 1.0984001159667969, 2.0150299072265625, 1.1841278076171875, -1.2049407958984375, 0.9887733459472656, -0.5472488403320312, 1.1452293395996094, -0.6414108276367188, 0.21853065490722656, -0.5429153442382812, -0.8374862670898438, 1.455108642578125, 1.194183349609375, -1.3670406341552734, -0.8098182678222656, 3.30279541015625, 2.437744140625, 1.0167999267578125, 0.389404296875, 2.08172607421875, -0.16156959533691406, 1.2356128692626953, 0.865142822265625, 0.6362762451171875, 0.5336112976074219, 1.2401657104492188, 1.2277565002441406, 0.7660293579101562, 0.106353759765625, 2.346891403198242, -0.8962345123291016, -0.961029052734375, -1.10406494140625, 0.5237617492675781, 1.0738983154296875, -0.09449577331542969, -0.1662464141845703, -0.728668212890625, 1.055105209350586, 1.3698196411132812, 1.3804550170898438, -0.6108150482177734, 1.1560516357421875, -0.1468048095703125, 3.593780517578125, 0.765045166015625, 1.1523208618164062, 0.5138874053955078, -0.86676025390625, 0.5869598388671875, 1.9761810302734375, 0.20244407653808594, -2.6193161010742188, 0.12990570068359375, 1.2220954895019531, 0.343048095703125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000128.npy"}
|
||||
{"epoch": 0.19349962207105065, "step": 129, "batch_size": 64, "mean": 0.4919281303882599, "std": 0.8847977519035339, "min": -1.7706832885742188, "p10": -0.5250396728515625, "median": 0.4063539505004883, "p90": 1.668495559692383, "max": 2.7392120361328125, "pos_frac": 0.703125, "sample": [1.2265167236328125, -0.012298583984375, 0.7014617919921875, -0.7098388671875, -0.28845787048339844, 2.475200653076172, -0.6760406494140625, -0.7980575561523438, -0.20433807373046875, 1.6740379333496094, 0.9573516845703125, 0.4512138366699219, 0.9232635498046875, -0.40195465087890625, -0.470458984375, -0.548431396484375, 1.7295074462890625, 0.08119773864746094, 1.53338623046875, 1.0643730163574219, 0.4837799072265625, 0.75775146484375, 0.442626953125, 0.1942291259765625, -0.3777427673339844, 2.311798095703125, 1.4463329315185547, -0.370208740234375, 0.7559318542480469, 1.4150543212890625, 0.7199172973632812, 0.2420654296875, -0.2581939697265625, 1.4803962707519531, 0.40906333923339844, 0.7775917053222656, -1.0306472778320312, 0.26300811767578125, 0.6928997039794922, 2.133451461791992, 0.7808837890625, 0.2951068878173828, 0.126739501953125, 0.3887920379638672, 0.4036445617675781, 0.6504116058349609, 0.13744354248046875, 2.7392120361328125, -0.0991668701171875, 1.6944503784179688, 0.4884071350097656, 1.0138626098632812, 1.5507888793945312, -0.04741859436035156, 0.1923389434814453, -0.38573455810546875, 1.6555633544921875, -1.7706832885742188, 0.049285888671875, 0.09765243530273438, 0.13228416442871094, -0.6056060791015625, -0.04734611511230469, 0.8457489013671875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000129.npy"}
|
||||
{"epoch": 0.19501133786848074, "step": 130, "batch_size": 64, "mean": 0.6577669382095337, "std": 0.8597301244735718, "min": -0.995574951171875, "p10": -0.30051383972167967, "median": 0.5615711212158203, "p90": 1.6870147705078127, "max": 3.151397705078125, "pos_frac": 0.703125, "sample": [1.399200439453125, 0.40814208984375, 2.1438827514648438, 0.9725723266601562, 2.306499481201172, 1.47369384765625, 1.281707763671875, 1.481668472290039, 0.5031661987304688, -0.18378257751464844, -0.995574951171875, 1.4892349243164062, 0.8729324340820312, 0.5988483428955078, -0.20430755615234375, 0.7109222412109375, 1.2863273620605469, 1.3225250244140625, 1.4379043579101562, 1.6502609252929688, 1.4113883972167969, 1.1772918701171875, 2.5390243530273438, 1.2971343994140625, -0.13817596435546875, -0.3129615783691406, 0.660736083984375, -0.21102142333984375, -0.9599475860595703, 0.21237564086914062, 0.1809539794921875, 1.110626220703125, 0.7357254028320312, 0.025056838989257812, -0.18175506591796875, 3.151397705078125, 1.07623291015625, -0.029695510864257812, 0.8814697265625, 0.046031951904296875, -0.3752899169921875, 1.90838623046875, -0.08756256103515625, -0.3329315185546875, 0.9488925933837891, 0.9254608154296875, 0.31876373291015625, -0.386138916015625, 1.7027664184570312, -0.07543563842773438, -0.7371482849121094, 0.5272560119628906, -0.1027984619140625, 0.5458488464355469, -0.173126220703125, 0.4998435974121094, 0.4938507080078125, 0.4022407531738281, -0.2714691162109375, 0.502197265625, 1.9825439453125, 0.7906665802001953, 0.5772933959960938, -0.11473846435546875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000130.npy"}
|
||||
{"epoch": 0.1965230536659108, "step": 131, "batch_size": 64, "mean": 0.7407891750335693, "std": 1.0479203462600708, "min": -1.530792236328125, "p10": -0.5064468383789062, "median": 0.8147163391113281, "p90": 2.2514348983764654, "max": 2.8449325561523438, "pos_frac": 0.71875, "sample": [0.4599342346191406, 0.9193592071533203, -0.08965873718261719, -0.3122291564941406, 0.10802078247070312, 2.7995147705078125, 0.8245849609375, 1.0006847381591797, -0.5015869140625, 1.5634002685546875, 0.9828948974609375, 0.16982078552246094, -0.2978172302246094, -0.5767860412597656, 0.3105659484863281, 1.0212326049804688, -0.8764190673828125, 1.6134567260742188, 2.0416183471679688, 0.12811279296875, -0.39105987548828125, -0.7179794311523438, 1.5965118408203125, 1.6246566772460938, 0.34405517578125, 0.508697509765625, 0.9767894744873047, -1.530792236328125, 0.7491111755371094, -0.3018035888671875, 0.8238182067871094, 0.9576644897460938, 2.2883758544921875, 1.549285888671875, 1.2025527954101562, 1.046661376953125, 0.7287025451660156, 2.8441009521484375, 0.6986770629882812, 2.54205322265625, 0.3704643249511719, 2.8449325561523438, -0.3328514099121094, 2.3083419799804688, 0.8957405090332031, -0.09796905517578125, 0.2674846649169922, 1.8112640380859375, -0.5085296630859375, 1.90081787109375, -0.2097625732421875, -0.6154403686523438, 0.9913787841796875, -0.3638458251953125, -1.4072723388671875, 1.2869720458984375, 0.052486419677734375, 0.8158035278320312, 1.8696136474609375, -0.1340484619140625, 0.813629150390625, 2.1652393341064453, 2.439474105834961, 1.4178009033203125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000131.npy"}
|
||||
{"epoch": 0.1980347694633409, "step": 132, "batch_size": 64, "mean": 0.6121453046798706, "std": 1.149186134338379, "min": -2.6103363037109375, "p10": -0.7623773574829099, "median": 0.5176229476928711, "p90": 1.9137760162353519, "max": 4.209953308105469, "pos_frac": 0.75, "sample": [2.6112060546875, 0.1256847381591797, 1.249542236328125, 1.5392112731933594, -0.0017852783203125, -0.8501415252685547, 1.1524429321289062, 0.2339305877685547, -0.5575942993164062, 1.3511886596679688, 1.7323284149169922, 1.6101703643798828, 0.9454269409179688, 0.46276092529296875, -1.2781410217285156, 0.11163711547851562, 0.14024734497070312, 1.3367462158203125, -0.03070068359375, 0.0618133544921875, 0.21166610717773438, -2.6103363037109375, 1.224782943725586, -0.432891845703125, -0.49195098876953125, -1.1589889526367188, 0.7136077880859375, 1.8592300415039062, 1.6039543151855469, 0.2981224060058594, 0.38607025146484375, 0.10548782348632812, -2.4151573181152344, 0.7668647766113281, 0.730010986328125, 0.6408843994140625, 2.0878143310546875, 1.4066162109375, 1.1827507019042969, 0.5214614868164062, -0.11532402038574219, 4.209953308105469, 0.49652862548828125, 2.0032119750976562, 1.6391181945800781, 0.598602294921875, 0.26532745361328125, 0.28505706787109375, 0.6403903961181641, 0.5137844085693359, 1.7858619689941406, -0.9150543212890625, 2.616912841796875, 1.3255844116210938, 1.9371528625488281, 0.1158599853515625, 0.329315185546875, -0.2609100341796875, -0.1983642578125, 1.2047653198242188, -0.9293670654296875, 0.705596923828125, 2.397430419921875, -0.05010986328125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000132.npy"}
|
||||
{"epoch": 0.19954648526077098, "step": 133, "batch_size": 64, "mean": 0.5576624870300293, "std": 0.9770480394363403, "min": -1.94696044921875, "p10": -0.4593118667602539, "median": 0.5135383605957031, "p90": 1.7019607543945316, "max": 3.2262039184570312, "pos_frac": 0.71875, "sample": [1.1868209838867188, -0.1844024658203125, 0.10457611083984375, 1.5894775390625, 0.09426116943359375, 0.84552001953125, -0.13401412963867188, 1.7748336791992188, 0.07963371276855469, -0.6107025146484375, 0.9545974731445312, 1.3266143798828125, 1.3101253509521484, 0.17882919311523438, -0.4200172424316406, 0.7312240600585938, 1.7501678466796875, -1.8529205322265625, 1.0519981384277344, -0.47779083251953125, 0.12714385986328125, -0.12615203857421875, 0.23379135131835938, 1.402130126953125, 1.361053466796875, -0.12213134765625, 1.4916229248046875, 0.3379364013671875, 0.45697021484375, -0.4761524200439453, 0.7864151000976562, 0.1531391143798828, -1.94696044921875, 1.2481250762939453, -0.2856292724609375, 0.5166015625, 0.3952484130859375, -0.25601959228515625, 1.493011474609375, 1.22442626953125, 0.5738601684570312, 0.792633056640625, 1.2209930419921875, -1.451974868774414, 0.5104751586914062, 2.4036407470703125, 0.8349075317382812, 1.4387664794921875, 0.8675537109375, 3.2262039184570312, -0.08123779296875, 0.42640113830566406, -0.26540184020996094, 2.32977294921875, 1.2173995971679688, 0.2740287780761719, 0.9209976196289062, 1.8109054565429688, 0.846405029296875, -0.2814216613769531, 2.0669326782226562, 0.13726806640625, -1.3461494445800781, -0.0959625244140625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000133.npy"}
|
||||
{"epoch": 0.20105820105820105, "step": 134, "batch_size": 64, "mean": 0.8395991325378418, "std": 0.9921358227729797, "min": -1.4399375915527344, "p10": -0.3048183441162109, "median": 0.8623561859130859, "p90": 2.026166534423828, "max": 3.636016845703125, "pos_frac": 0.765625, "sample": [0.7906417846679688, 0.0836639404296875, 0.9822006225585938, 0.8655929565429688, 0.5695915222167969, 0.4906482696533203, -0.05724334716796875, 2.7967605590820312, 1.9830093383789062, 0.9615325927734375, 1.3777542114257812, 1.1851806640625, 1.2708892822265625, 1.2784385681152344, 0.19817352294921875, 0.12252044677734375, 1.6389694213867188, 0.501617431640625, -0.3295021057128906, 1.1995162963867188, 1.5189437866210938, 0.5724716186523438, 1.3804492950439453, 0.7562255859375, 0.4107170104980469, 2.2328414916992188, 0.9506149291992188, 0.5288848876953125, -0.9211063385009766, 1.3842811584472656, -0.6494140625, -0.08882522583007812, 1.9509429931640625, -0.13672637939453125, 0.26580238342285156, -1.4399375915527344, 1.4086837768554688, 0.9177665710449219, 2.0446624755859375, 1.0604896545410156, 2.498260498046875, 1.534149169921875, -0.43651580810546875, -1.041717529296875, 1.5489730834960938, -0.247222900390625, 1.768707275390625, -0.451171875, 2.2788543701171875, 0.8378429412841797, -0.17161178588867188, 1.6594047546386719, 1.2882003784179688, 0.0811309814453125, 3.636016845703125, -0.006011962890625, -0.020725250244140625, 0.8668479919433594, 0.5938701629638672, 0.8591194152832031, 1.6468963623046875, 2.9746551513671875, 0.14830780029296875, -0.16963768005371094], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000134.npy"}
|
||||
{"epoch": 0.20256991685563114, "step": 135, "batch_size": 64, "mean": 0.5151640176773071, "std": 1.0108195543289185, "min": -1.626922607421875, "p10": -0.6595565795898437, "median": 0.37554359436035156, "p90": 1.9919689178466797, "max": 3.0949935913085938, "pos_frac": 0.6875, "sample": [1.1076278686523438, -1.0514144897460938, 1.353912353515625, 0.2840557098388672, 0.5461597442626953, -0.35523223876953125, 0.5561294555664062, -0.7204742431640625, -0.14675140380859375, 3.0949935913085938, 0.753875732421875, -0.14530181884765625, 1.65521240234375, 0.10216331481933594, -0.6612091064453125, -1.1493301391601562, 1.0125579833984375, 2.0011367797851562, -0.21105003356933594, 0.41207313537597656, 0.16426467895507812, 0.1102142333984375, -1.626922607421875, -0.35410308837890625, 0.5472488403320312, 1.9705772399902344, 0.096405029296875, -0.4888153076171875, 0.37621307373046875, 0.4678993225097656, 0.12372589111328125, 1.2611122131347656, 1.4591217041015625, 0.12161064147949219, 2.2060775756835938, 0.8464069366455078, 0.3684234619140625, 0.7886371612548828, 1.6780624389648438, -1.3184547424316406, -0.030450820922851562, 2.194244384765625, 0.09598541259765625, 0.45177459716796875, 0.3748741149902344, 0.6557769775390625, -0.19622802734375, -1.3245677947998047, 1.2180709838867188, 1.7495956420898438, 1.3268318176269531, -0.65570068359375, -0.11785888671875, 1.2482929229736328, 2.041717529296875, 2.450305938720703, -0.38884544372558594, 1.1511917114257812, -0.05975151062011719, 0.3553428649902344, -0.431640625, 0.7316818237304688, 2.519744873046875, 0.37326812744140625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000135.npy"}
|
||||
{"epoch": 0.20408163265306123, "step": 136, "batch_size": 64, "mean": 0.6782038807868958, "std": 1.396689534187317, "min": -2.1558570861816406, "p10": -1.1280521392822265, "median": 0.7373924255371094, "p90": 2.37236499786377, "max": 5.57928466796875, "pos_frac": 0.765625, "sample": [3.6237411499023438, 1.829559326171875, -1.7521381378173828, -1.0390968322753906, -0.635345458984375, 0.8805885314941406, 0.16225433349609375, 0.5311965942382812, 0.29474639892578125, 0.7969646453857422, 1.0153961181640625, -1.553253173828125, -0.3723011016845703, 1.087301254272461, 0.09371185302734375, 1.08673095703125, 2.293478012084961, 0.1494903564453125, 0.3391704559326172, -0.5414237976074219, 1.165130615234375, 0.6486701965332031, 1.1533660888671875, 2.4061737060546875, -1.0109786987304688, 0.6641387939453125, 2.6252670288085938, 0.31280517578125, 0.97210693359375, 3.233154296875, 1.357696533203125, 0.3644447326660156, 1.038116455078125, 1.7283096313476562, -0.258575439453125, 0.180267333984375, 0.7761917114257812, 1.0420284271240234, 0.9849853515625, 0.7611770629882812, -2.1558570861816406, 1.5079498291015625, 0.7136077880859375, 1.415863037109375, 0.04946136474609375, 1.7389240264892578, 0.15047454833984375, 0.8804550170898438, -1.4265899658203125, 5.57928466796875, 1.0600833892822266, -2.066680908203125, -1.1661758422851562, 0.64263916015625, -0.5419845581054688, 1.765960693359375, 1.0403976440429688, 0.08724212646484375, 2.6740798950195312, 2.8029727935791016, 1.999481201171875, 0.5535354614257812, -0.2895317077636719, -2.0457916259765625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000136.npy"}
|
||||
{"epoch": 0.20559334845049132, "step": 137, "batch_size": 64, "mean": 0.6741572618484497, "std": 1.1814607381820679, "min": -2.710968017578125, "p10": -0.8318679809570312, "median": 0.734710693359375, "p90": 1.8856201171875007, "max": 4.2635345458984375, "pos_frac": 0.71875, "sample": [-0.07587051391601562, -2.710968017578125, 1.2761001586914062, 1.46490478515625, 1.6766510009765625, 1.9482421875, 1.710540771484375, 2.9055938720703125, 0.2670249938964844, 1.4332733154296875, 1.2826366424560547, 1.3938217163085938, 1.4706916809082031, 2.1708831787109375, 2.0876998901367188, 1.2287063598632812, 1.1530609130859375, 1.633026123046875, 0.639434814453125, 0.05193138122558594, -0.6597328186035156, 0.5494651794433594, 1.6067657470703125, -0.9269638061523438, 0.2985820770263672, 0.47662353515625, 1.441314697265625, 3.1658096313476562, -0.5910186767578125, -0.23247528076171875, -1.2655868530273438, 0.2579231262207031, 0.04798126220703125, 0.829986572265625, 4.2635345458984375, 0.2649383544921875, -0.10817909240722656, -1.2014503479003906, 0.4867210388183594, -0.320159912109375, -0.8536643981933594, 0.07879066467285156, 0.9024066925048828, 1.0908660888671875, 0.9232673645019531, -0.39961814880371094, 0.2976646423339844, 1.491140365600586, 0.4549407958984375, 2.12432861328125, -0.04677391052246094, -0.7810096740722656, -0.021673202514648438, -1.0191631317138672, 1.388925552368164, 0.955322265625, 0.9382228851318359, -0.48822021484375, -1.3887557983398438, 1.2643890380859375, 0.31473541259765625, 1.1538429260253906, 1.6351318359375, 1.739501953125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000137.npy"}
|
||||
{"epoch": 0.20710506424792138, "step": 138, "batch_size": 64, "mean": 0.49200859665870667, "std": 1.142454743385315, "min": -2.691192626953125, "p10": -1.04991455078125, "median": 0.5673809051513672, "p90": 1.9661865234375004, "max": 3.438018798828125, "pos_frac": 0.734375, "sample": [0.4199256896972656, 1.828582763671875, 1.4378490447998047, 1.3669853210449219, 0.5932121276855469, 0.6959953308105469, 0.7624969482421875, 1.8601760864257812, -1.1549530029296875, 0.2800331115722656, 0.5113258361816406, -1.3006134033203125, 2.3908233642578125, 0.6370849609375, 3.438018798828125, -1.1751556396484375, -0.9640274047851562, 1.2776851654052734, 0.6641693115234375, 0.8328933715820312, 0.040191650390625, 1.5912246704101562, 1.35595703125, 1.5135822296142578, 0.08094406127929688, 0.93902587890625, 1.4673023223876953, 0.9741668701171875, 0.12005996704101562, 0.03849220275878906, 1.4279603958129883, -0.23871612548828125, 2.020862579345703, -0.5893402099609375, -0.6740760803222656, 0.06390380859375, -0.3874530792236328, -2.691192626953125, -0.24692344665527344, 2.0650291442871094, 0.27886962890625, 2.2075729370117188, 0.1035919189453125, 0.5415496826171875, 0.17150497436523438, -2.2343788146972656, 2.3396854400634766, 0.6311988830566406, 0.25940704345703125, 0.812347412109375, 0.6133460998535156, 0.6378631591796875, 0.49550628662109375, -1.0867233276367188, -1.8084259033203125, -0.10585594177246094, -0.9106636047363281, 0.6231956481933594, -0.05471038818359375, 0.8087692260742188, 2.0116195678710938, 0.3396034240722656, 1.5894927978515625, -0.04932403564453125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000138.npy"}
|
||||
{"epoch": 0.20861678004535147, "step": 139, "batch_size": 64, "mean": 0.7158380746841431, "std": 1.0611125230789185, "min": -1.2394256591796875, "p10": -0.620747947692871, "median": 0.5289411544799805, "p90": 2.295323753356934, "max": 2.863279342651367, "pos_frac": 0.75, "sample": [1.1517219543457031, 1.5867042541503906, -0.6442031860351562, -0.4147491455078125, -1.2394256591796875, 1.5013198852539062, 0.01676177978515625, 2.637115478515625, 0.05078887939453125, 0.33623504638671875, 0.4407844543457031, 2.863279342651367, 2.1328659057617188, -0.5660190582275391, 1.980377197265625, 2.3806304931640625, 2.8131179809570312, 0.2037677764892578, -0.3865814208984375, 1.4315376281738281, -0.0746612548828125, -1.0394287109375, -0.4473876953125, 0.5002117156982422, 1.1346817016601562, 1.049713134765625, 0.87042236328125, 0.2973213195800781, 1.4590072631835938, 0.9568023681640625, 0.8871192932128906, 1.9817047119140625, 0.2578468322753906, 0.86810302734375, -0.5410003662109375, 1.3174896240234375, 0.4229736328125, 0.6539192199707031, 0.463836669921875, 0.11278915405273438, 1.7409381866455078, 0.3180389404296875, 2.3184070587158203, 0.20221710205078125, -0.5390663146972656, 0.8129768371582031, 0.3410186767578125, 0.7945213317871094, 2.3249969482421875, -0.055049896240234375, 2.574432373046875, -0.8973731994628906, 0.7647933959960938, -0.6752777099609375, -0.9476280212402344, 0.459381103515625, -0.07510757446289062, 1.744598388671875, -0.9761886596679688, 1.4602813720703125, 2.2414627075195312, 1.9074172973632812, 0.5576705932617188, 0.0086822509765625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000139.npy"}
|
||||
{"epoch": 0.21012849584278157, "step": 140, "batch_size": 64, "mean": 0.9338581562042236, "std": 1.3367775678634644, "min": -3.4091033935546875, "p10": -0.7730361938476562, "median": 0.835423469543457, "p90": 2.5624771118164062, "max": 3.4513931274414062, "pos_frac": 0.796875, "sample": [2.22802734375, 0.757110595703125, 0.3002281188964844, 0.38976287841796875, 0.8140411376953125, 0.957794189453125, -1.3904190063476562, -1.9446334838867188, 2.2800159454345703, 2.2159347534179688, -3.4091033935546875, 1.149261474609375, 0.9886627197265625, 0.2174072265625, 1.7462234497070312, 0.8612136840820312, -1.147134780883789, 0.572601318359375, -0.7996826171875, 1.73175048828125, 0.6399459838867188, 1.7110748291015625, 1.695892333984375, 0.8407363891601562, 0.36252593994140625, 2.00189208984375, 0.7608051300048828, 3.386178970336914, 0.45461273193359375, -0.5965423583984375, 0.41595458984375, -0.5572509765625, 1.7833328247070312, 0.5158157348632812, -0.43761634826660156, 0.236602783203125, 2.663846969604492, 2.5966949462890625, 2.3671722412109375, 2.4642486572265625, 1.40167236328125, 2.0761280059814453, 2.7542343139648438, 2.36572265625, 1.5208892822265625, -1.0925235748291016, 2.5412750244140625, 0.4115715026855469, 2.571563720703125, -0.27709388732910156, -1.2213859558105469, 1.570068359375, 0.4658164978027344, 0.74749755859375, 3.4513931274414062, 0.8301105499267578, 2.930858612060547, -0.02622222900390625, 1.2642440795898438, 1.2676239013671875, 0.4979515075683594, 2.0250473022460938, -0.7108612060546875, 0.57635498046875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000140.npy"}
|
||||
{"epoch": 0.21164021164021163, "step": 141, "batch_size": 64, "mean": 0.7517873048782349, "std": 1.2912704944610596, "min": -2.70635986328125, "p10": -0.5257743835449218, "median": 0.49286651611328125, "p90": 2.583234786987305, "max": 3.911163330078125, "pos_frac": 0.703125, "sample": [-2.70635986328125, -0.03754425048828125, 1.748809814453125, 0.3047370910644531, 0.6716194152832031, 0.40993499755859375, 0.41788482666015625, 1.2292900085449219, 0.007213592529296875, -1.0209732055664062, 0.5159072875976562, 0.5631980895996094, 2.037931442260742, -0.0636138916015625, 2.4929656982421875, 0.5632572174072266, 2.0162811279296875, 0.22453880310058594, -0.08950233459472656, -1.16436767578125, 0.10666656494140625, -0.03190803527832031, 0.46982574462890625, 1.3525753021240234, 2.2637100219726562, -0.1281280517578125, -0.0292510986328125, 1.7282867431640625, 0.16827392578125, -0.056976318359375, -0.7390823364257812, -1.4500045776367188, 0.5454864501953125, 0.7969894409179688, 1.4865226745605469, -0.11690902709960938, 1.3387641906738281, 0.27194976806640625, 3.9102325439453125, 2.4000244140625, 2.7221145629882812, 1.6108722686767578, 1.1272773742675781, 1.8634567260742188, 2.2013397216796875, 0.017120361328125, -0.3282184600830078, 1.38360595703125, 2.77203369140625, 3.1412353515625, -1.331939697265625, 0.41320228576660156, 0.10380363464355469, 2.681121826171875, 0.9925537109375, 3.911163330078125, -0.4280986785888672, 2.6219215393066406, -0.43135833740234375, 0.8821563720703125, 0.0009765625, 0.6012535095214844, -0.5662384033203125, -0.25522613525390625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000141.npy"}
|
||||
{"epoch": 0.21315192743764172, "step": 142, "batch_size": 64, "mean": 0.5593085289001465, "std": 1.0587800741195679, "min": -2.0740966796875, "p10": -0.59765625, "median": 0.43713951110839844, "p90": 1.8451051712036133, "max": 3.0135040283203125, "pos_frac": 0.71875, "sample": [0.1675872802734375, -1.0950088500976562, 0.5774936676025391, 1.5377197265625, 1.8271942138671875, 2.4796218872070312, 0.41252899169921875, -0.000492095947265625, 0.8245353698730469, 0.12492752075195312, 1.0426483154296875, 0.07136154174804688, 1.2726154327392578, -1.9434585571289062, -0.11948394775390625, 0.07521820068359375, 1.2771759033203125, -2.0740966796875, 0.4617500305175781, 0.22991561889648438, 0.8601760864257812, 1.9566497802734375, 0.7648048400878906, 0.7933769226074219, 1.0207138061523438, 0.5153045654296875, -0.13504791259765625, 0.47601318359375, 2.7642478942871094, -0.6095237731933594, -0.6427154541015625, 0.1370849609375, 1.296722412109375, 3.0135040283203125, 0.308380126953125, -1.2349395751953125, -0.056396484375, -0.5638141632080078, -0.35114288330078125, 0.6729011535644531, 2.92205810546875, 1.7966461181640625, 0.2624549865722656, 0.25713348388671875, 0.28592681884765625, 1.2669105529785156, -0.31853485107421875, 0.351898193359375, -0.5335350036621094, 1.151214599609375, -0.9673652648925781, -0.5699653625488281, 1.5524444580078125, 1.6170501708984375, 0.24888992309570312, 0.465301513671875, 1.5819778442382812, 1.8527812957763672, 2.142547607421875, -0.027273178100585938, 0.028961181640625, -0.16045761108398438, 1.5196151733398438, 0.933013916015625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000142.npy"}
|
||||
{"epoch": 0.2146636432350718, "step": 143, "batch_size": 64, "mean": 0.6277639865875244, "std": 1.1969457864761353, "min": -3.303558349609375, "p10": -0.4576282501220703, "median": 0.4900398254394531, "p90": 2.0072784423828125, "max": 3.623577117919922, "pos_frac": 0.78125, "sample": [0.37496185302734375, 0.360260009765625, 0.6668205261230469, 0.5886764526367188, 0.8893051147460938, -0.461273193359375, 0.8538703918457031, -0.17502784729003906, 2.193025588989258, -1.1101036071777344, 1.9980392456054688, 0.068023681640625, 3.0791015625, 0.17960357666015625, 0.7273712158203125, 1.4340438842773438, -2.6397438049316406, 0.8931732177734375, 2.0112380981445312, 0.4680938720703125, 1.4864349365234375, -0.08362579345703125, 3.623577117919922, 1.5635604858398438, 0.8681163787841797, 0.3014507293701172, 1.8367767333984375, -1.407440185546875, 0.3180503845214844, 0.4537162780761719, 0.4180145263671875, 0.26439666748046875, -0.3702545166015625, 1.7265815734863281, 2.516338348388672, -0.4491233825683594, -0.29486083984375, 0.44808387756347656, 0.23152923583984375, 0.39003753662109375, 0.7875404357910156, -3.303558349609375, 1.8864822387695312, 0.49068450927734375, -0.05541801452636719, 0.4893951416015625, 0.7326011657714844, 2.2288055419921875, 0.35363197326660156, 0.9658775329589844, -0.7852783203125, 0.53277587890625, 1.2177619934082031, 3.3525924682617188, -1.2193984985351562, 1.257101058959961, 0.8071060180664062, -0.14078521728515625, 1.950714111328125, 1.192544937133789, 0.2325916290283203, 0.08948135375976562, 0.2592582702636719, 0.6135692596435547], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000143.npy"}
|
||||
{"epoch": 0.2161753590325019, "step": 144, "batch_size": 64, "mean": 0.7318645715713501, "std": 1.2877947092056274, "min": -2.8340072631835938, "p10": -0.6966575622558593, "median": 0.6689071655273438, "p90": 2.540077781677247, "max": 3.662109375, "pos_frac": 0.703125, "sample": [2.765634536743164, 2.112213134765625, -0.5697784423828125, 0.01247406005859375, -0.7318344116210938, 0.6045646667480469, -0.9970912933349609, 1.06207275390625, 0.1355438232421875, 0.633819580078125, 0.11614990234375, 0.47841644287109375, 1.27471923828125, 1.2261276245117188, 0.96685791015625, -0.2465972900390625, 1.7456817626953125, 0.7039947509765625, -0.354461669921875, 1.223388671875, 0.74078369140625, 0.9773750305175781, 0.2260894775390625, 0.150970458984375, -0.05207633972167969, 0.5452957153320312, 1.79400634765625, 2.809967041015625, -0.9229393005371094, -0.06329345703125, 1.06427001953125, 1.061309814453125, -0.23167800903320312, 0.31523895263671875, 1.0000801086425781, 0.1217498779296875, -0.10774040222167969, 2.1821670532226562, 2.908905029296875, 2.1965789794921875, 3.071441650390625, 1.6657333374023438, -0.9231147766113281, 2.3188323974609375, -0.61322021484375, 0.8024520874023438, 0.11858367919921875, -0.8774204254150391, 1.9053077697753906, -0.6145782470703125, -0.03373527526855469, -0.233489990234375, 0.11318206787109375, 2.247467041015625, -2.4805755615234375, 2.634897232055664, 3.662109375, 3.0017852783203125, 0.876373291015625, -2.8340072631835938, 1.2380828857421875, -0.06484031677246094, 1.4109001159667969, 1.5682086944580078], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000144.npy"}
|
||||
{"epoch": 0.21768707482993196, "step": 145, "batch_size": 64, "mean": 0.73653244972229, "std": 1.318104863166809, "min": -3.464324951171875, "p10": -0.763339614868164, "median": 0.5457744598388672, "p90": 2.5633302688598647, "max": 3.575389862060547, "pos_frac": 0.71875, "sample": [0.906005859375, 1.1114501953125, -0.6933059692382812, -0.605072021484375, 0.6682205200195312, -3.464324951171875, 1.380462646484375, -0.01006317138671875, 3.575389862060547, 1.8342437744140625, 0.5736656188964844, 1.9765968322753906, 0.04965972900390625, 1.4787139892578125, 0.32837867736816406, 0.3157958984375, -0.5260734558105469, 2.0139694213867188, -0.1176605224609375, 1.0582046508789062, 0.3120231628417969, -0.03288078308105469, -0.4404563903808594, 1.205169677734375, 3.216938018798828, -0.467864990234375, 0.34363555908203125, 1.7413330078125, 2.6929168701171875, 0.4661521911621094, 3.2276687622070312, 1.9843521118164062, 1.2522659301757812, 3.3849029541015625, -0.2412109375, 2.2609615325927734, 1.138031005859375, 1.9266834259033203, 0.17695999145507812, 1.9982051849365234, -1.1227874755859375, 0.5131645202636719, 2.8030624389648438, 1.508840560913086, 0.05172538757324219, -1.224517822265625, 0.51788330078125, 0.35672950744628906, -0.4830780029296875, 0.42266082763671875, 0.66912841796875, 0.9962501525878906, 1.4319591522216797, -0.7933540344238281, 0.9452972412109375, 3.01666259765625, 1.7627105712890625, -0.7947120666503906, -1.371734619140625, 0.15325927734375, 0.9976806640625, -0.8178253173828125, 0.260040283203125, -0.6610107421875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000145.npy"}
|
||||
{"epoch": 0.21919879062736206, "step": 146, "batch_size": 64, "mean": 0.7474343776702881, "std": 1.1964890956878662, "min": -1.4811019897460938, "p10": -0.7495983123779295, "median": 0.657623291015625, "p90": 2.072143936157227, "max": 5.185733795166016, "pos_frac": 0.765625, "sample": [1.4620285034179688, 1.7184391021728516, -1.2298812866210938, 0.3311614990234375, 0.260467529296875, -0.17144775390625, 0.029266357421875, 0.3088951110839844, 0.9292373657226562, 1.0037651062011719, -1.15081787109375, -1.4811019897460938, 2.112579345703125, 1.5756587982177734, -0.24834442138671875, -0.512420654296875, 1.1685066223144531, 0.07352447509765625, 1.771383285522461, -0.0199737548828125, 0.8857269287109375, 2.4137954711914062, 1.157745361328125, 0.39397621154785156, -1.2044315338134766, -0.4487342834472656, 0.7817935943603516, 2.2045516967773438, -0.8445053100585938, 1.6621780395507812, 0.49122047424316406, -1.4059486389160156, 4.0299835205078125, 0.250762939453125, -0.13628005981445312, 0.09481048583984375, 0.5458450317382812, 0.434539794921875, 1.1384735107421875, -0.5281486511230469, 1.9777946472167969, 1.1349601745605469, 0.31449127197265625, 1.58441162109375, 1.2971038818359375, 2.2623672485351562, 1.3685779571533203, 0.8558063507080078, 1.937713623046875, 1.043588638305664, 1.1758995056152344, 1.005767822265625, 1.3351478576660156, 0.680084228515625, -1.1753368377685547, -0.2978363037109375, 5.185733795166016, 0.635162353515625, 0.4633522033691406, 0.4260425567626953, 1.6689567565917969, 0.4419898986816406, 0.4411201477050781, 2.224620819091797], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000146.npy"}
|
||||
{"epoch": 0.22071050642479215, "step": 147, "batch_size": 64, "mean": 0.546623945236206, "std": 1.2476752996444702, "min": -4.40069580078125, "p10": -0.7168403625488281, "median": 0.5069684982299805, "p90": 2.2209623336791995, "max": 2.9371871948242188, "pos_frac": 0.703125, "sample": [0.8736019134521484, 0.9074077606201172, 0.9834041595458984, -0.5754241943359375, 1.4395980834960938, 1.46942138671875, -0.6552581787109375, 0.75921630859375, 0.3812103271484375, -0.01983642578125, 0.871673583984375, 1.1243438720703125, 2.150177001953125, 2.506805419921875, -0.5639572143554688, 1.6905479431152344, 2.9371871948242188, -4.40069580078125, 1.5678043365478516, -0.23505020141601562, 0.5228900909423828, 0.07727813720703125, 0.24202346801757812, -0.210662841796875, 0.376708984375, 2.609973907470703, -1.0178375244140625, 1.6050071716308594, 0.8456268310546875, 0.32110595703125, 0.13118553161621094, 1.03729248046875, 0.97637939453125, -1.5155754089355469, -0.5437469482421875, 0.4739837646484375, 0.8334312438964844, 2.6882476806640625, 2.557098388671875, -0.6227264404296875, -1.649627685546875, 0.298492431640625, 0.6467666625976562, 1.3780021667480469, 1.5609703063964844, -0.7432327270507812, 2.6592559814453125, -0.48033905029296875, 1.3167839050292969, 0.4910469055175781, 0.3632354736328125, 0.8807563781738281, 0.2436370849609375, -0.25105857849121094, -0.9673309326171875, 1.2375717163085938, 0.4468231201171875, 0.4715862274169922, 1.5435333251953125, -0.37158203125, -1.6686382293701172, 0.832855224609375, 2.2512989044189453, -0.10673332214355469], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000147.npy"}
|
||||
{"epoch": 0.2222222222222222, "step": 148, "batch_size": 64, "mean": 0.8294271230697632, "std": 1.2197202444076538, "min": -1.8130264282226562, "p10": -0.40687236785888664, "median": 0.7686500549316406, "p90": 2.5792835235595706, "max": 3.4625701904296875, "pos_frac": 0.734375, "sample": [2.64508056640625, -1.1237716674804688, 0.8332920074462891, -0.8544254302978516, 1.647796630859375, 3.3106460571289062, 0.9994163513183594, -1.4601802825927734, 0.780609130859375, 2.2974090576171875, -0.21668434143066406, 2.3133811950683594, 1.0094451904296875, 0.9513397216796875, 2.4696311950683594, -0.06879425048828125, -0.35093116760253906, 2.7455596923828125, 1.5177841186523438, 0.002399444580078125, 0.6890335083007812, -0.29882049560546875, 2.5876731872558594, 0.34627723693847656, 3.031383514404297, 2.5597076416015625, 1.5571823120117188, -0.3473243713378906, 3.4625701904296875, 0.784698486328125, 0.4176788330078125, -0.0753021240234375, 1.7726573944091797, -1.15509033203125, 1.011016845703125, 1.449920654296875, 0.27558135986328125, 1.4729461669921875, -1.8130264282226562, -0.15960121154785156, 0.17242431640625, 0.6341171264648438, 2.5539627075195312, 3.1064453125, -1.3802032470703125, 1.4094371795654297, 1.1458797454833984, 0.9561176300048828, 0.35263824462890625, 0.6742095947265625, 0.6672821044921875, 0.7566909790039062, 0.3362865447998047, 0.11660003662109375, 0.1300506591796875, 1.886260986328125, 1.3842315673828125, 0.8925933837890625, 0.2650890350341797, -0.17697906494140625, -0.1309185028076172, 0.8948516845703125, -0.43084716796875, -0.15105056762695312], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000148.npy"}
|
||||
{"epoch": 0.2237339380196523, "step": 149, "batch_size": 64, "mean": 0.9019420742988586, "std": 1.260189414024353, "min": -1.8929328918457031, "p10": -0.6218009948730466, "median": 0.6981353759765625, "p90": 2.9680059432983414, "max": 3.685626983642578, "pos_frac": 0.796875, "sample": [1.1394882202148438, 1.423583984375, 2.6024398803710938, 0.0989532470703125, 1.005645751953125, 0.19356536865234375, 0.42720794677734375, 2.004030227661133, 0.00913238525390625, 1.844289779663086, -1.1012039184570312, 0.25560760498046875, 0.7013626098632812, 0.4841728210449219, 0.639801025390625, 0.3960418701171875, 0.6121063232421875, 0.5250244140625, 0.026250839233398438, -0.1698150634765625, 0.2835845947265625, 3.1268692016601562, 2.6130619049072266, -0.9568710327148438, 3.481527328491211, 1.8566970825195312, 0.1423969268798828, -0.8527889251708984, 0.9059982299804688, -1.8929328918457031, -0.1682567596435547, 3.1201248168945312, -0.7237701416015625, 3.375673294067383, 0.21537399291992188, 2.1139678955078125, 1.7697277069091797, -0.2877349853515625, 3.3331298828125, -0.2992095947265625, -0.9988861083984375, 1.3401641845703125, 0.04836845397949219, 2.11517333984375, 3.685626983642578, 1.8469104766845703, 0.08560943603515625, 1.3312053680419922, 0.4735527038574219, -0.38387298583984375, 0.89556884765625, 0.6949081420898438, 0.977813720703125, -0.996795654296875, 0.281341552734375, 1.32354736328125, 1.339019775390625, 3.3062515258789062, 1.2178573608398438, 0.7119960784912109, 0.8345413208007812, 1.6280975341796875, -0.0990447998046875, 1.7910842895507812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000149.npy"}
|
||||
{"epoch": 0.2252456538170824, "step": 150, "batch_size": 64, "mean": 1.072885274887085, "std": 1.2774544954299927, "min": -2.338348388671875, "p10": -0.41072139739990227, "median": 1.0103578567504883, "p90": 2.813794517517091, "max": 3.640228271484375, "pos_frac": 0.796875, "sample": [1.3667964935302734, -0.27574729919433594, 3.389404296875, 0.8296890258789062, -2.0572891235351562, -0.10800933837890625, 0.5958576202392578, 2.0615463256835938, 1.3772430419921875, 2.937253952026367, 0.7703628540039062, 0.970611572265625, 1.130401611328125, 2.317230224609375, -0.33325767517089844, 0.965179443359375, 0.987640380859375, 3.496856689453125, 1.0369720458984375, 1.9013442993164062, 2.01263427734375, -0.05750274658203125, 1.2752227783203125, -1.645050048828125, 1.0321197509765625, 1.9572525024414062, 0.7840938568115234, 1.6315956115722656, -0.0364227294921875, 1.959716796875, 1.1826534271240234, 0.8014678955078125, 1.9829444885253906, 0.6104545593261719, 3.4512710571289062, -1.0405120849609375, 0.8991775512695312, 0.9885959625244141, 1.0771026611328125, 3.3432464599609375, 1.996795654296875, 3.1379852294921875, 0.4166717529296875, 0.666839599609375, 1.8906631469726562, 1.2599029541015625, 1.9026470184326172, 2.1787147521972656, -0.4439201354980469, 3.640228271484375, -0.07003021240234375, 0.4821929931640625, 0.8059158325195312, 0.45708465576171875, 0.572601318359375, 1.6528701782226562, -0.4936408996582031, -2.338348388671875, 2.5257225036621094, -1.0023040771484375, 0.14291000366210938, 1.7059860229492188, 0.944091796875, 1.06292724609375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000150.npy"}
|
||||
{"epoch": 0.22675736961451248, "step": 151, "batch_size": 64, "mean": 1.0629799365997314, "std": 1.1657031774520874, "min": -1.2099876403808594, "p10": -0.4377408981323242, "median": 1.0853090286254883, "p90": 2.636215209960939, "max": 3.8166046142578125, "pos_frac": 0.796875, "sample": [1.2921104431152344, 0.8375358581542969, 1.9369792938232422, -0.041439056396484375, 0.17639541625976562, 0.22560691833496094, 3.468994140625, 0.20972061157226562, 1.84796142578125, 0.16913795471191406, 1.2004318237304688, 1.9137821197509766, 2.2732696533203125, -0.1554107666015625, 1.0298042297363281, 1.641611099243164, 1.9458427429199219, 0.01146697998046875, 1.089996337890625, 0.3284912109375, -0.68609619140625, -0.3139076232910156, 1.879852294921875, -1.2099876403808594, 3.8166046142578125, -0.9932403564453125, 0.9539279937744141, 0.8637542724609375, -0.4331226348876953, -0.43972015380859375, 1.0806217193603516, -1.1498489379882812, 2.7917633056640625, 1.5640716552734375, 0.37420654296875, -0.5885429382324219, 1.2051239013671875, 1.4846267700195312, -0.003662109375, 3.23577880859375, 0.34473228454589844, 1.0440559387207031, 1.8172378540039062, 0.6263427734375, 2.8300628662109375, 1.8106155395507812, 2.027637481689453, 2.9468841552734375, 1.8878097534179688, 0.7705307006835938, 1.2088127136230469, 1.234527587890625, 0.8909568786621094, 2.990997314453125, -0.9803371429443359, 1.9633865356445312, 1.651031494140625, 1.1087837219238281, -0.09319877624511719, 0.94488525390625, 2.1709136962890625, 2.250102996826172, 0.142425537109375, 1.607025146484375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000151.npy"}
|
||||
{"epoch": 0.22826908541194255, "step": 152, "batch_size": 64, "mean": 0.9407035708427429, "std": 1.3522568941116333, "min": -1.9187469482421875, "p10": -0.7309497833251953, "median": 0.7930622100830078, "p90": 2.707399559020997, "max": 3.8947525024414062, "pos_frac": 0.75, "sample": [-0.46338462829589844, 0.710968017578125, 1.707071304321289, 0.33506011962890625, -1.1153945922851562, -0.8702163696289062, 1.6148529052734375, 1.409698486328125, 0.6348724365234375, -0.17779541015625, 0.7610816955566406, 0.053131103515625, 1.4730911254882812, 0.32038116455078125, 1.6373291015625, 2.4884185791015625, 0.3077507019042969, 1.082366943359375, 0.5995311737060547, -0.7169227600097656, -1.5028228759765625, 2.363292694091797, 1.04412841796875, 1.6926803588867188, -0.14609146118164062, 0.3266258239746094, 2.2078170776367188, 1.168304443359375, 0.3507671356201172, 0.3959503173828125, -0.1935100555419922, 2.801248550415039, 0.4818439483642578, -0.04369354248046875, -0.7369613647460938, 0.825042724609375, 1.3710403442382812, 0.1264190673828125, 2.2011642456054688, 2.015960693359375, -1.5351104736328125, 3.0690155029296875, 2.172374725341797, 0.5023880004882812, 2.9915237426757812, 3.1523818969726562, 2.0409164428710938, -0.5337257385253906, 1.7414169311523438, 1.191162109375, -1.9187469482421875, 2.2874298095703125, 0.3253765106201172, 3.8616256713867188, 1.5303802490234375, 0.9950294494628906, 3.8947525024414062, 2.37469482421875, -0.6652679443359375, 3.7632980346679688, 0.45790863037109375, -0.8224983215332031, -0.6106338500976562, 1.3982391357421875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000152.npy"}
|
||||
{"epoch": 0.22978080120937264, "step": 153, "batch_size": 64, "mean": 0.8861297369003296, "std": 1.1174407005310059, "min": -1.2828750610351562, "p10": -0.42410087585449213, "median": 0.8632888793945312, "p90": 2.370441818237305, "max": 3.1936683654785156, "pos_frac": 0.734375, "sample": [1.1066246032714844, 1.208892822265625, 1.1304779052734375, 2.10711669921875, 1.579721450805664, 2.26690673828125, 2.0153656005859375, -0.3649940490722656, 0.8292503356933594, 0.31003570556640625, 0.84600830078125, 2.4828224182128906, 0.40866851806640625, 2.8092727661132812, 1.00445556640625, -0.2553253173828125, 1.6926536560058594, -0.16338348388671875, -0.108489990234375, 2.6526031494140625, 3.1936683654785156, 1.32177734375, -0.9803237915039062, 0.8384780883789062, 2.8562049865722656, -0.8624992370605469, 2.414813995361328, 0.22819137573242188, 0.054180145263671875, 2.253093719482422, -1.2828750610351562, 0.7756690979003906, -1.1817340850830078, 0.8335189819335938, 0.516632080078125, -0.0810546875, 1.4173355102539062, 2.9704055786132812, 1.3876953125, 0.8805694580078125, 0.42425537109375, -0.3557929992675781, 1.73626708984375, 1.7000465393066406, 1.4957237243652344, -0.06374740600585938, 1.6495361328125, 0.9809188842773438, 1.3409271240234375, -0.22618865966796875, 0.084686279296875, 1.0763702392578125, -0.170196533203125, 0.17653656005859375, 1.6542434692382812, -0.8459091186523438, -0.1558074951171875, 1.997751235961914, -1.154815673828125, 2.1588134765625, 1.4290351867675781, 0.41905975341796875, -0.449432373046875, 0.6975936889648438], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000153.npy"}
|
||||
{"epoch": 0.23129251700680273, "step": 154, "batch_size": 64, "mean": 0.7147954702377319, "std": 1.5440865755081177, "min": -3.4707183837890625, "p10": -1.1133758544921875, "median": 0.812800407409668, "p90": 2.5475135803222657, "max": 4.197853088378906, "pos_frac": 0.703125, "sample": [-1.2370338439941406, 1.39544677734375, -3.3382110595703125, -2.7061805725097656, 1.7032546997070312, 1.6101360321044922, -1.0145645141601562, 0.5755119323730469, 0.6855316162109375, -1.1537551879882812, 1.74346923828125, -0.10746383666992188, 1.7187519073486328, -0.5014801025390625, -1.0191574096679688, -1.3366584777832031, -0.5522499084472656, 0.8522243499755859, 0.12202072143554688, 1.0804977416992188, 2.5689620971679688, 3.3314342498779297, 2.4928951263427734, 1.139404296875, -1.6794281005859375, 0.7454738616943359, 4.197853088378906, 1.1829986572265625, 4.05999755859375, -3.4707183837890625, 0.6180248260498047, 1.279275894165039, 2.497467041015625, 1.1438255310058594, 1.0993900299072266, -0.049739837646484375, 0.6181392669677734, 1.005300521850586, 1.3656082153320312, 0.5484905242919922, -0.3002777099609375, -0.643402099609375, 1.2953948974609375, -0.5081405639648438, 0.5262451171875, 3.7730674743652344, 1.0258674621582031, 1.3889389038085938, 0.9687576293945312, -0.038059234619140625, 0.40557861328125, -0.0168609619140625, 0.77337646484375, -0.8800735473632812, 3.28680419921875, 1.4975204467773438, 2.6364402770996094, 0.11623001098632812, 2.066781997680664, 1.256134033203125, 1.64556884765625, 0.7369728088378906, 0.41001319885253906, 1.1092853546142578], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000154.npy"}
|
||||
{"epoch": 0.2328042328042328, "step": 155, "batch_size": 64, "mean": 0.7621853947639465, "std": 1.556504487991333, "min": -2.8606796264648438, "p10": -1.4220466613769531, "median": 0.9890327453613281, "p90": 2.4652120590209963, "max": 4.53399658203125, "pos_frac": 0.6875, "sample": [0.9813690185546875, 0.4285736083984375, 2.2303924560546875, 1.5026893615722656, 0.18949127197265625, 0.4052886962890625, 0.06046295166015625, 1.7446556091308594, -0.23577499389648438, 3.6406021118164062, 0.9966964721679688, 1.950155258178711, -0.735931396484375, -2.8606796264648438, 0.24403762817382812, -0.678497314453125, 1.790924072265625, 2.0800018310546875, 1.4309768676757812, 2.493967056274414, 2.64422607421875, -0.34178924560546875, 2.3981170654296875, -1.0458049774169922, 1.9175872802734375, 3.4140472412109375, 1.0672454833984375, 1.585418701171875, -0.6597824096679688, 1.5141105651855469, -2.130126953125, 0.5198497772216797, 4.53399658203125, 1.7646026611328125, 1.5463638305664062, 2.554840087890625, -0.00756072998046875, -1.4256515502929688, -2.227203369140625, 0.7214126586914062, -1.8485107421875, -0.004627227783203125, 1.668701171875, 1.2123031616210938, 1.8384933471679688, 1.3202781677246094, 0.7500686645507812, 1.980712890625, -1.0435943603515625, 1.2022323608398438, 1.45599365234375, 0.560577392578125, -0.15100860595703125, -0.97235107421875, 0.7300634384155273, -1.41363525390625, 1.2185325622558594, -1.4655704498291016, -1.9474449157714844, -0.153411865234375, 1.0687942504882812, 4.358634948730469, 1.7674083709716797, 0.6439285278320312], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000155.npy"}
|
||||
{"epoch": 0.23431594860166288, "step": 156, "batch_size": 64, "mean": 1.0765368938446045, "std": 1.3473998308181763, "min": -1.2710227966308594, "p10": -0.5941703796386718, "median": 0.9700889587402344, "p90": 2.9090042114257817, "max": 4.637975692749023, "pos_frac": 0.796875, "sample": [0.9279327392578125, -0.38367652893066406, -0.2851905822753906, -1.2710227966308594, 1.424652099609375, 3.153045654296875, 4.637975692749023, 0.3533477783203125, 0.8764801025390625, 1.85968017578125, 0.7178726196289062, 2.1140518188476562, -0.06513214111328125, 1.810699462890625, 3.771728515625, 2.350666046142578, 2.6407852172851562, 1.6674537658691406, -0.8324432373046875, 1.5116310119628906, 0.27060699462890625, 0.2851448059082031, 1.1426544189453125, 0.9551467895507812, 0.04754638671875, 0.15529632568359375, -0.5452346801757812, 1.583740234375, 1.1698379516601562, 2.8150634765625, 1.2387542724609375, 1.280517578125, -0.6216888427734375, 2.4243392944335938, 0.12619781494140625, 4.243999481201172, -1.1824951171875, 2.1738433837890625, -0.6460838317871094, 2.9492645263671875, 1.543121337890625, 3.9010009765625, 0.6603584289550781, 1.0054550170898438, 0.260162353515625, -0.2893867492675781, -0.47888946533203125, 0.2295379638671875, 1.808013916015625, 1.19091796875, 0.7452640533447266, -0.62164306640625, 1.3826370239257812, 0.2066497802734375, 1.6955718994140625, 1.1120376586914062, 0.18054962158203125, 0.05237579345703125, 2.808013916015625, 0.1737213134765625, 0.9850311279296875, 3.26751708984375, 0.8484992980957031, -0.615142822265625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000156.npy"}
|
||||
{"epoch": 0.23582766439909297, "step": 157, "batch_size": 64, "mean": 0.8422449827194214, "std": 1.3635458946228027, "min": -3.3473434448242188, "p10": -0.3633102416992187, "median": 0.6085376739501953, "p90": 2.4065328598022466, "max": 4.999755859375, "pos_frac": 0.78125, "sample": [1.8604965209960938, 3.2328529357910156, -0.04323577880859375, 1.4066314697265625, 2.24493408203125, -0.676300048828125, 1.4420318603515625, 0.259857177734375, 2.772552490234375, 0.3262481689453125, -0.338775634765625, 0.13803863525390625, 2.727142333984375, 0.6066703796386719, -3.3473434448242188, 0.12720870971679688, 0.9146881103515625, 0.5369873046875, 1.33612060546875, 2.7882919311523438, 1.2394866943359375, 1.9719276428222656, 4.341583251953125, 1.2825736999511719, 2.249908447265625, 0.3676605224609375, 0.35302734375, -0.2109966278076172, 0.6104049682617188, -0.2680988311767578, 1.5129146575927734, 0.09169769287109375, 1.5853271484375, 0.27584075927734375, 0.39388275146484375, 2.1010208129882812, -0.3738250732421875, 2.4736576080322266, 1.8915557861328125, 0.32422637939453125, -0.17000579833984375, 0.8085708618164062, 0.061279296875, 1.8332901000976562, 4.999755859375, 0.7100410461425781, -0.23274993896484375, 0.4292716979980469, 0.4422111511230469, -0.2242107391357422, 0.8809127807617188, -1.319305419921875, 0.7721443176269531, 0.3519172668457031, 1.1190834045410156, 1.5802421569824219, 0.5095443725585938, 1.354522705078125, -1.7554168701171875, 1.708761215209961, 0.36797332763671875, -0.6492156982421875, -1.89019775390625, 1.6863861083984375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000157.npy"}
|
||||
{"epoch": 0.23733938019652306, "step": 158, "batch_size": 64, "mean": 0.8647523522377014, "std": 1.3257935047149658, "min": -2.19219970703125, "p10": -0.8112567901611328, "median": 0.8292245864868164, "p90": 2.412129402160645, "max": 4.7958984375, "pos_frac": 0.71875, "sample": [0.8122482299804688, 1.4625930786132812, 1.7886886596679688, 1.2739791870117188, -2.0948333740234375, 1.7420654296875, 2.7502593994140625, 1.0484542846679688, -0.251739501953125, -0.6484107971191406, 2.124725341796875, 1.7377853393554688, 1.61737060546875, 1.1986732482910156, -1.390523910522461, 1.6680126190185547, 2.046539306640625, 1.0525646209716797, 0.7960681915283203, 2.755279541015625, -0.15820693969726562, 1.6578216552734375, 1.5353736877441406, 0.7722892761230469, 2.2619991302490234, -0.16373443603515625, -0.9176788330078125, 3.911956787109375, 0.552001953125, 1.002532958984375, 0.1396636962890625, 0.7723159790039062, 1.8717041015625, 0.35076141357421875, 1.8671150207519531, 1.6683292388916016, 1.75750732421875, -0.9047088623046875, 1.283355712890625, -0.8202896118164062, 2.7355194091796875, 0.3053092956542969, 1.6947784423828125, 2.476470947265625, 1.5423965454101562, 0.8308506011962891, 0.551788330078125, -0.1287994384765625, 4.7958984375, 0.04067039489746094, 0.2514495849609375, -0.6704177856445312, -0.5946331024169922, 0.8275985717773438, 0.11235427856445312, -0.2838249206542969, -0.8461380004882812, 2.870716094970703, 1.26019287109375, 0.8066902160644531, -0.7901802062988281, -0.014240264892578125, -0.16800689697265625, -2.19219970703125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000158.npy"}
|
||||
{"epoch": 0.23885109599395313, "step": 159, "batch_size": 64, "mean": 0.8647797703742981, "std": 1.3309510946273804, "min": -3.6060333251953125, "p10": -0.44886589050292963, "median": 0.8067512512207031, "p90": 2.618747901916504, "max": 3.5836181640625, "pos_frac": 0.796875, "sample": [0.7228202819824219, 1.8122119903564453, -0.06529998779296875, 2.1271133422851562, 1.4741935729980469, -0.46254730224609375, -2.099395751953125, 2.850811004638672, -0.13998794555664062, 3.5836181640625, 2.375274658203125, 0.2670745849609375, 2.8403244018554688, 0.1967315673828125, -0.10303878784179688, 0.4257392883300781, 1.5852203369140625, 3.1887893676757812, 1.3251762390136719, 0.08609771728515625, 0.12281417846679688, -1.76495361328125, 0.7655868530273438, 1.037139892578125, 0.5428543090820312, 1.1141605377197266, 0.267913818359375, -0.394561767578125, 2.8920745849609375, 3.5689659118652344, -1.0274505615234375, 1.3546905517578125, 0.4881134033203125, -0.4801063537597656, 0.21405792236328125, 0.95556640625, 2.434497833251953, 0.45804595947265625, 0.387054443359375, 0.3626861572265625, 1.1424427032470703, -0.856689453125, 2.6134033203125, 0.108367919921875, 1.0960006713867188, 2.6210384368896484, 0.16452407836914062, 0.9110279083251953, -0.11590576171875, 0.9075546264648438, -0.4169425964355469, 2.0286388397216797, 1.0493354797363281, 0.8479156494140625, 0.04560661315917969, -3.6060333251953125, 1.7580108642578125, 1.2218856811523438, 0.39969635009765625, 0.3338794708251953, 1.6005477905273438, 1.745086669921875, 2.4651565551757812, 1.9912796020507812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000159.npy"}
|
||||
{"epoch": 0.24036281179138322, "step": 160, "batch_size": 64, "mean": 0.806533694267273, "std": 1.4162830114364624, "min": -2.5529403686523438, "p10": -0.8550329208374021, "median": 0.6300315856933594, "p90": 2.6704490661621096, "max": 5.1932220458984375, "pos_frac": 0.75, "sample": [0.2169628143310547, 0.10503387451171875, 0.5651016235351562, 2.7007598876953125, 0.8630428314208984, 1.9811935424804688, 0.21929931640625, -0.009698867797851562, 1.2174530029296875, 1.71063232421875, 4.4650421142578125, 0.23578643798828125, 2.839426040649414, 1.0353240966796875, 1.121164321899414, 3.0638198852539062, 0.282684326171875, -1.3599395751953125, 1.2770576477050781, 2.012849807739258, -0.21715545654296875, -1.4318580627441406, 0.04085540771484375, -1.0634918212890625, 1.292562484741211, 1.1710777282714844, 1.0515289306640625, 0.014543533325195312, 1.3262138366699219, -0.21216201782226562, 0.5540142059326172, 1.0486183166503906, 5.1932220458984375, 3.3999061584472656, 0.5116176605224609, -0.6463699340820312, 2.1047592163085938, 0.4545135498046875, -0.030473709106445312, 2.3005752563476562, 0.16807937622070312, 1.2943077087402344, -0.3021659851074219, -0.62628173828125, 0.12405204772949219, 1.495574951171875, -2.5529403686523438, -0.9444599151611328, 1.2956962585449219, 1.1971893310546875, -0.6448593139648438, -0.1530017852783203, -0.9839401245117188, 1.5575294494628906, 1.9932403564453125, 0.6949615478515625, 0.13662338256835938, 2.5997238159179688, -2.0203094482421875, 1.177032470703125, 1.5989837646484375, 2.8231124877929688, 0.27947998046875, 0.005035400390625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000160.npy"}
|
||||
{"epoch": 0.2418745275888133, "step": 161, "batch_size": 64, "mean": 0.9392315745353699, "std": 1.5447535514831543, "min": -2.3669891357421875, "p10": -0.7861844062805176, "median": 0.7337799072265625, "p90": 2.762579727172852, "max": 4.730247497558594, "pos_frac": 0.671875, "sample": [0.7074546813964844, -1.9287300109863281, 1.2343673706054688, 0.7848968505859375, 0.011383056640625, -0.8413009643554688, 2.5439453125, 3.6411972045898438, -0.4708995819091797, -0.7695798873901367, -1.1749343872070312, 1.3620681762695312, 0.5437068939208984, 2.2108917236328125, 2.722015380859375, 0.8039627075195312, 1.4305229187011719, -2.3669891357421875, 1.1775588989257812, -0.36759185791015625, -0.9428176879882812, -0.27350616455078125, 0.75909423828125, -0.4524383544921875, 0.9208831787109375, -0.7933006286621094, -0.1303558349609375, 2.0224475860595703, 1.8291244506835938, -0.21309661865234375, 2.6881580352783203, 2.0091552734375, 2.1871814727783203, 4.641632080078125, 3.25750732421875, -0.6116409301757812, -0.8863372802734375, 4.6356658935546875, 2.048431396484375, 1.6266136169433594, 0.5791110992431641, 3.7223777770996094, -0.035007476806640625, 0.708465576171875, 0.3962249755859375, -0.31610870361328125, 1.1524391174316406, 1.4924182891845703, -0.043914794921875, 0.4508056640625, 0.35979461669921875, 0.1343231201171875, 2.675945281982422, 0.18398666381835938, -0.43526458740234375, -0.11460494995117188, 1.4013671875, 1.2276248931884766, 2.7799644470214844, 0.94964599609375, 4.730247497558594, -0.2949028015136719, 0.2563896179199219, 2.5731430053710938], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000161.npy"}
|
||||
{"epoch": 0.24338624338624337, "step": 162, "batch_size": 64, "mean": 1.2653292417526245, "std": 1.4792031049728394, "min": -2.4916839599609375, "p10": -0.6905817031860348, "median": 1.257248878479004, "p90": 3.3648902893066417, "max": 4.5673980712890625, "pos_frac": 0.8125, "sample": [2.4061279296875, 3.7028121948242188, 0.627899169921875, 0.9557647705078125, -0.0275421142578125, 1.8573722839355469, 1.4102020263671875, 0.36075592041015625, 3.0574493408203125, 2.1857357025146484, 1.4701957702636719, 1.5179824829101562, 1.6429252624511719, 4.5673980712890625, 1.7962875366210938, 1.0596141815185547, 0.8570556640625, 4.2281951904296875, 0.774444580078125, 3.8256187438964844, 1.2976150512695312, 1.5271987915039062, 0.213134765625, 0.1769866943359375, 0.7971572875976562, -1.5729904174804688, 3.5587158203125, 1.1916084289550781, -0.9426803588867188, 2.0518798828125, 0.09870719909667969, 1.9715728759765625, -0.10498809814453125, 2.3784542083740234, 4.1995697021484375, 1.369781494140625, 2.495004653930664, 1.765909194946289, -0.41441917419433594, 0.3910045623779297, -1.227508544921875, -0.976226806640625, 2.339038848876953, 2.9287757873535156, 1.7241783142089844, 1.388458251953125, -0.9780654907226562, 0.5682525634765625, -2.4916839599609375, -0.23641204833984375, 0.4849128723144531, -0.8089370727539062, 1.8055381774902344, 1.2168827056884766, 3.4966506958007812, 2.8707962036132812, 0.135009765625, -0.2109832763671875, 0.7470550537109375, 0.5558013916015625, 0.7915782928466797, 2.6806392669677734, 2.3944835662841797, 1.0573196411132812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000162.npy"}
|
||||
{"epoch": 0.24489795918367346, "step": 163, "batch_size": 64, "mean": 1.2088189125061035, "std": 1.6507235765457153, "min": -2.23114013671875, "p10": -0.8265457153320311, "median": 1.0605602264404297, "p90": 3.5327301025390625, "max": 5.2462310791015625, "pos_frac": 0.796875, "sample": [-0.04711437225341797, 2.2921524047851562, -2.23114013671875, 3.4647293090820312, 1.8368301391601562, 0.4663200378417969, -1.5647392272949219, 0.8287811279296875, 4.40576171875, -0.8692092895507812, 1.3680419921875, 2.4278717041015625, 4.008491516113281, 0.055755615234375, 1.0874099731445312, 2.338245391845703, 1.2059173583984375, 0.1737518310546875, 5.2462310791015625, 1.0137481689453125, 0.14626693725585938, 1.3414993286132812, 2.284320831298828, -0.6130332946777344, -1.2075366973876953, 3.48455810546875, 1.0337104797363281, 2.8891448974609375, 0.3508167266845703, 1.4491348266601562, 1.9131927490234375, 1.0200881958007812, 1.616851806640625, 0.073028564453125, 3.2374725341796875, 1.7929611206054688, -0.9987201690673828, 3.9629898071289062, -0.7269973754882812, 0.4566192626953125, 0.9425239562988281, 5.1788787841796875, 1.3103866577148438, 2.2334365844726562, -1.199920654296875, -0.3591880798339844, -0.4533882141113281, 0.2563591003417969, 0.5272235870361328, 0.044574737548828125, -1.6935348510742188, 2.007669448852539, 1.5397453308105469, 0.7680282592773438, 0.2626533508300781, -0.6034393310546875, 3.7958984375, 3.553375244140625, 2.031219482421875, 2.095844268798828, 1.5927658081054688, 1.1172676086425781, 0.7061595916748047, 0.6956672668457031], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000163.npy"}
|
||||
{"epoch": 0.24640967498110355, "step": 164, "batch_size": 64, "mean": 1.226841926574707, "std": 1.6098395586013794, "min": -2.8231582641601562, "p10": -0.6941764831542968, "median": 0.9905366897583008, "p90": 3.219814491271973, "max": 5.162322998046875, "pos_frac": 0.796875, "sample": [-1.5497665405273438, 0.5673332214355469, 0.2931404113769531, 1.0408058166503906, 1.0378799438476562, 0.28924560546875, 5.162322998046875, 2.78375244140625, 0.06932830810546875, 0.4530792236328125, 3.1342315673828125, -0.654693603515625, 2.2803897857666016, -0.9477729797363281, 2.310901641845703, 2.9375457763671875, 2.9796981811523438, 1.10345458984375, 4.60040283203125, -0.3419685363769531, 0.4398612976074219, -1.1052703857421875, 1.2073822021484375, 0.9476318359375, -0.06346893310546875, 0.8712596893310547, 2.6103973388671875, 0.9458770751953125, 0.964324951171875, 0.14704132080078125, 1.631124496459961, 3.57684326171875, -0.5085659027099609, 1.3856010437011719, 3.599010467529297, -0.7110977172851562, 1.5276031494140625, 4.0061187744140625, 4.599269866943359, -0.8820571899414062, 0.5770072937011719, 1.6602325439453125, 0.3339805603027344, 1.8875999450683594, 1.1107139587402344, 0.8657627105712891, 2.2325210571289062, 0.3505439758300781, 3.047119140625, -0.23527908325195312, 0.693572998046875, 1.0563030242919922, 2.9599227905273438, -0.7186126708984375, -0.30251121520996094, -2.8231582641601562, 0.31179046630859375, 3.1413440704345703, 3.2534446716308594, 0.014629364013671875, 2.988433837890625, 0.5436859130859375, 1.0167484283447266, 1.8138885498046875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000164.npy"}
|
||||
{"epoch": 0.24792139077853365, "step": 165, "batch_size": 64, "mean": 1.1873916387557983, "std": 1.2789064645767212, "min": -2.0301666259765625, "p10": -0.31261024475097643, "median": 1.209604263305664, "p90": 2.948144912719727, "max": 4.734703063964844, "pos_frac": 0.828125, "sample": [3.0807876586914062, 1.555795669555664, 1.9243278503417969, 1.5900154113769531, 0.5153522491455078, 0.0808258056640625, 0.5373306274414062, -0.0548248291015625, 1.1861915588378906, 1.364166259765625, 0.099395751953125, 1.773956298828125, 2.5244140625, 0.33849525451660156, -0.0217132568359375, 1.2523269653320312, 1.5747909545898438, 0.7993087768554688, -0.5104465484619141, 0.6771202087402344, -2.0301666259765625, -0.00152587890625, 4.734703063964844, -0.6718387603759766, 1.9552001953125, 3.514272689819336, 0.7955417633056641, 1.115377426147461, 3.396087646484375, 2.9902992248535156, 1.0762176513671875, 0.0906982421875, 4.276641845703125, 2.52044677734375, 1.6334877014160156, -1.045745849609375, 0.4202117919921875, 2.391172409057617, 1.5635414123535156, 1.34381103515625, -0.37490272521972656, 1.4597244262695312, 1.2330169677734375, -0.16726112365722656, 3.0549545288085938, 0.5795135498046875, 0.5207977294921875, 1.0488204956054688, 1.2721881866455078, 0.446533203125, 0.9574737548828125, 1.7785263061523438, -0.8618850708007812, 2.573282241821289, 1.1106739044189453, -0.9766044616699219, 0.6837844848632812, 0.3164844512939453, 1.3651390075683594, 1.7762451171875, 1.6977767944335938, 2.8497848510742188, 2.0554122924804688, 1.2375335693359375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000165.npy"}
|
||||
{"epoch": 0.2494331065759637, "step": 166, "batch_size": 64, "mean": 0.8285011053085327, "std": 1.4874504804611206, "min": -2.0526123046875, "p10": -0.7884315490722655, "median": 0.6464195251464844, "p90": 2.689092636108399, "max": 5.014739990234375, "pos_frac": 0.65625, "sample": [2.3030548095703125, -0.20443344116210938, -1.53155517578125, 2.4337005615234375, 1.9405288696289062, -0.15832901000976562, -0.2850151062011719, 2.2899551391601562, 1.6762237548828125, 3.19342041015625, -0.208526611328125, 0.5311679840087891, 2.566326141357422, -0.9998626708984375, 2.5040740966796875, -0.7542572021484375, 0.13599395751953125, 2.983186721801758, 1.8052978515625, -0.8963165283203125, -0.20438575744628906, 1.862701416015625, -0.8030776977539062, 0.9699211120605469, 2.7417068481445312, 1.0885848999023438, -0.3856201171875, 1.34136962890625, 1.1905136108398438, 0.40743255615234375, 0.0187835693359375, -2.0526123046875, 0.7315425872802734, 3.7617721557617188, 0.37007904052734375, -0.6467781066894531, 1.8464508056640625, 1.0867462158203125, 0.9931793212890625, 0.9650650024414062, -0.5135650634765625, 5.014739990234375, 0.2853851318359375, -2.0098342895507812, 1.975738525390625, 0.03302001953125, 0.44775390625, -0.21955490112304688, 0.750823974609375, 4.449241638183594, 0.6694374084472656, -1.6192970275878906, -0.169586181640625, 0.9906082153320312, -0.10246467590332031, -0.10172653198242188, 1.7684402465820312, -0.7029247283935547, 2.9374923706054688, -0.36114501953125, 1.3863296508789062, 2.3834381103515625, 0.5003070831298828, 0.6234016418457031], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000166.npy"}
|
||||
{"epoch": 0.2509448223733938, "step": 167, "batch_size": 64, "mean": 1.208269476890564, "std": 1.4022676944732666, "min": -1.68328857421875, "p10": -0.4010808944702148, "median": 1.0709466934204102, "p90": 3.100810241699219, "max": 5.0442352294921875, "pos_frac": 0.8125, "sample": [-0.06009674072265625, 0.4691295623779297, 1.1095256805419922, 1.6410713195800781, -1.3235015869140625, 1.3415908813476562, 2.7752227783203125, 0.4881744384765625, 0.9993000030517578, 0.5465126037597656, 2.2096405029296875, 1.30859375, 0.18035125732421875, 0.9120712280273438, 0.012842178344726562, 0.5317344665527344, 1.7900314331054688, 0.994171142578125, 2.1108551025390625, -0.6518020629882812, -1.68328857421875, 0.9162063598632812, 1.3497562408447266, 0.9796066284179688, 0.5992240905761719, 0.9326019287109375, 1.2658462524414062, -0.41749000549316406, 1.1832389831542969, 2.1804733276367188, 0.279052734375, 3.545370101928711, -0.10033988952636719, 1.659515380859375, 1.7597808837890625, 2.4193878173828125, 3.1310043334960938, 3.9036865234375, 3.399627685546875, 1.5177116394042969, 0.19649887084960938, 1.152862548828125, -1.1250686645507812, 1.0323677062988281, 2.8548412322998047, 0.2364654541015625, -1.4577713012695312, 3.0303573608398438, 0.2874298095703125, 1.5818405151367188, 2.9134178161621094, 0.021038055419921875, -0.07681655883789062, 4.256626129150391, 2.008983612060547, 3.436065673828125, -0.36279296875, 5.0442352294921875, 1.4176597595214844, 2.6145401000976562, 1.8343658447265625, -0.008237838745117188, 0.6564483642578125, -0.4225006103515625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000167.npy"}
|
||||
{"epoch": 0.25245653817082386, "step": 168, "batch_size": 64, "mean": 0.651225209236145, "std": 1.3040194511413574, "min": -3.134613037109375, "p10": -0.8861072540283202, "median": 0.7350006103515625, "p90": 2.597627258300782, "max": 3.1746749877929688, "pos_frac": 0.640625, "sample": [1.3439254760742188, -0.48462677001953125, 1.4994697570800781, -0.8130455017089844, 0.5996189117431641, 1.5935440063476562, 2.8183135986328125, -1.176971435546875, -0.9197616577148438, -1.488250732421875, 1.7665634155273438, -0.2613639831542969, 1.1500015258789062, 2.513673782348633, 0.757904052734375, 0.9176788330078125, 0.6316947937011719, 0.8429813385009766, -0.2210235595703125, 1.09625244140625, 1.6834640502929688, -0.08424758911132812, 0.2927360534667969, 0.8572559356689453, -0.06453514099121094, 0.8721351623535156, 2.692485809326172, 0.9274101257324219, -1.1754226684570312, -0.11870574951171875, 0.3612060546875, -0.496185302734375, -3.134613037109375, 2.51513671875, 1.9852294921875, 0.5759735107421875, 1.772125244140625, 1.0414810180664062, -0.189727783203125, 0.8988037109375, -0.2519493103027344, 3.1746749877929688, -0.582122802734375, 2.6329803466796875, 2.9052810668945312, -0.6460151672363281, 1.6721343994140625, 0.71209716796875, 0.493011474609375, -0.7732963562011719, 1.165924072265625, 0.2643280029296875, 1.5994987487792969, 0.2771167755126953, -0.51483154296875, 2.7198047637939453, -0.91741943359375, -1.9291801452636719, -0.1320953369140625, -0.17719650268554688, 1.906301498413086, 0.7985458374023438, 3.1096649169921875, 0.792572021484375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000168.npy"}
|
||||
{"epoch": 0.25396825396825395, "step": 169, "batch_size": 64, "mean": 1.090137243270874, "std": 1.770739197731018, "min": -2.6228408813476562, "p10": -0.8683820724487304, "median": 0.8614292144775391, "p90": 3.1692867279052734, "max": 7.619140625, "pos_frac": 0.71875, "sample": [1.1552581787109375, 1.973602294921875, 0.6026229858398438, -1.4849472045898438, 1.6544551849365234, 0.9668960571289062, -2.6228408813476562, 0.26444053649902344, 2.195404052734375, 1.528900146484375, -0.0257415771484375, 0.0875244140625, 3.22674560546875, 3.7182159423828125, -0.8758544921875, -0.6504554748535156, 0.4080619812011719, -0.1374053955078125, 1.7102508544921875, -0.4000396728515625, 0.4131317138671875, 2.34088134765625, 3.133411407470703, 0.8003158569335938, -0.25026702880859375, 5.325355529785156, 2.0318450927734375, 0.4775848388671875, 0.3424873352050781, 2.101551055908203, 0.7410087585449219, 0.736785888671875, 2.8232383728027344, -0.8509464263916016, 2.0516738891601562, 0.6701622009277344, 2.5332088470458984, -0.10782623291015625, 1.3754711151123047, 1.0004539489746094, 0.22267913818359375, 2.0034408569335938, -0.12325096130371094, -0.9200668334960938, 1.3154468536376953, 0.38759613037109375, 1.2025299072265625, 0.0541534423828125, 3.184661865234375, 1.8232574462890625, -0.6569290161132812, 1.730804443359375, 0.9225425720214844, -1.00042724609375, -0.6073150634765625, 7.619140625, 2.950958251953125, 3.3543243408203125, -1.9546146392822266, 5.298198699951172, 1.440673828125, 2.1148223876953125, -1.101104736328125, -0.47736358642578125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000169.npy"}
|
||||
{"epoch": 0.25547996976568405, "step": 170, "batch_size": 64, "mean": 0.9384795427322388, "std": 1.568624496459961, "min": -2.830493927001953, "p10": -1.3163011550903319, "median": 1.0120162963867188, "p90": 2.9422706604003914, "max": 3.9177017211914062, "pos_frac": 0.734375, "sample": [2.5418167114257812, 0.8443832397460938, 0.6929416656494141, 1.1518020629882812, 0.6064510345458984, 3.9177017211914062, 1.0655708312988281, 1.4212532043457031, -0.13944244384765625, 3.443998336791992, -1.0923919677734375, 0.3205547332763672, 1.7947463989257812, 3.014617919921875, 2.072010040283203, -0.17381668090820312, 0.9989013671875, 0.7396659851074219, 1.9240951538085938, 0.14636993408203125, -1.3945770263671875, 3.7751922607421875, 1.1106739044189453, 0.8862495422363281, -1.4798355102539062, 1.8553314208984375, -2.096893310546875, -0.38077545166015625, -2.830493927001953, 0.8722381591796875, 2.3590431213378906, 1.0087738037109375, 1.7753448486328125, 1.633707046508789, -2.0989303588867188, 0.4112701416015625, -2.085418701171875, 2.7734603881835938, 3.6048812866210938, -0.2695465087890625, 2.267059326171875, -0.3721504211425781, 0.5792827606201172, 2.5869674682617188, 1.0657386779785156, 3.7286224365234375, 0.729339599609375, 2.2891464233398438, 0.9622001647949219, 1.1899871826171875, -1.1262588500976562, -1.133657455444336, 1.0152587890625, 3.0680465698242188, -0.40625, -0.003406524658203125, 1.9323501586914062, 0.42237091064453125, 1.0725212097167969, 1.143758773803711, 2.0806312561035156, 1.9743194580078125, 2.1884002685546875, -1.9125137329101562], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000170.npy"}
|
||||
{"epoch": 0.25699168556311414, "step": 171, "batch_size": 64, "mean": 1.1450397968292236, "std": 1.7402117252349854, "min": -3.7163314819335938, "p10": -0.6651679992675781, "median": 1.078165054321289, "p90": 3.374059295654297, "max": 4.775726318359375, "pos_frac": 0.75, "sample": [1.5526199340820312, 1.0814437866210938, 2.5632476806640625, 2.1311073303222656, 0.8960723876953125, 4.173656463623047, 1.0266189575195312, 2.4998779296875, -0.3062000274658203, 1.056936264038086, 0.27918434143066406, 1.0214385986328125, 3.3625869750976562, 1.4494056701660156, 3.500354766845703, 0.093841552734375, 1.2413406372070312, 0.5989704132080078, 2.215269088745117, -0.5580596923828125, -0.2694873809814453, 0.6849784851074219, 1.0783729553222656, -1.88812255859375, 2.237010955810547, -0.19105911254882812, -0.5134429931640625, 0.7711334228515625, 3.3675384521484375, -2.0676136016845703, 0.44696044921875, 1.2724609375, 3.0949134826660156, -0.38933372497558594, 1.1388435363769531, 3.0235862731933594, 0.9128189086914062, -1.0368576049804688, 3.0149459838867188, 4.775726318359375, -3.7163314819335938, 0.5206661224365234, 1.5082054138183594, 1.1216049194335938, 0.6326446533203125, 3.1689605712890625, 2.39422607421875, 0.07209014892578125, -0.32794952392578125, 3.3768539428710938, 1.0816497802734375, 4.151268005371094, 3.00531005859375, 2.045013427734375, -0.6154632568359375, 0.287841796875, 3.69525146484375, -0.3790016174316406, -0.7471694946289062, -2.92578125, 4.0764007568359375, 1.1216812133789062, 1.0779571533203125, -0.6864700317382812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000171.npy"}
|
||||
{"epoch": 0.2585034013605442, "step": 172, "batch_size": 64, "mean": 1.3640680313110352, "std": 1.734261393547058, "min": -3.9118270874023438, "p10": -0.8053848266601562, "median": 1.273641586303711, "p90": 3.5357334136962897, "max": 5.674991607666016, "pos_frac": 0.828125, "sample": [-1.1314773559570312, 1.1681671142578125, 1.014984130859375, 1.2496719360351562, 0.08272552490234375, 0.7629871368408203, 0.04576301574707031, 1.3656349182128906, 0.9176254272460938, 3.582355499267578, 5.4417724609375, 3.1546630859375, 1.4931182861328125, -0.3069267272949219, 1.061981201171875, 5.674991607666016, 2.3360137939453125, -1.3012619018554688, -0.775787353515625, 0.6798820495605469, 3.4130706787109375, 0.8222618103027344, -1.6177616119384766, -1.0988922119140625, 2.2306442260742188, 1.555593490600586, 0.386505126953125, 2.1549224853515625, 1.8148040771484375, 1.1896820068359375, 1.3376235961914062, 1.0682830810546875, 1.514251708984375, 5.590324401855469, -0.2677001953125, -0.2817707061767578, 2.191631317138672, 1.3791351318359375, 4.227069854736328, 1.9764328002929688, 0.5788383483886719, 1.2976112365722656, 1.454599380493164, 2.0467395782470703, 2.4108142852783203, -0.8180694580078125, 1.9427375793457031, 0.7384681701660156, 2.2704544067382812, 1.7583389282226562, -0.8940048217773438, 3.849029541015625, 3.4269485473632812, -3.9118270874023438, 0.7512245178222656, 2.8651123046875, 0.7567977905273438, 0.6607513427734375, 0.4708251953125, 3.086669921875, 0.2557220458984375, 0.37230491638183594, 3.880645751953125, 1.946624755859375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000172.npy"}
|
||||
{"epoch": 0.2600151171579743, "step": 173, "batch_size": 64, "mean": 1.2412903308868408, "std": 2.0238771438598633, "min": -3.0085105895996094, "p10": -1.1010936737060546, "median": 1.0221290588378906, "p90": 3.957873916625977, "max": 6.405242919921875, "pos_frac": 0.6875, "sample": [-1.8010673522949219, 0.6071929931640625, 0.06121826171875, 3.0356216430664062, 0.6459884643554688, 2.4075241088867188, 1.3468074798583984, -0.2686500549316406, -0.26981544494628906, 1.4949951171875, -0.11623382568359375, 1.69232177734375, 4.139732360839844, -3.0085105895996094, -0.7918586730957031, 3.0836868286132812, 4.0214996337890625, -0.3908233642578125, 1.5950355529785156, 0.2032642364501953, 1.942596435546875, 3.1411895751953125, -0.4785499572753906, 6.405242919921875, -1.7026729583740234, -0.7501373291015625, 2.8866424560546875, 2.2088241577148438, 1.067169189453125, 0.3214874267578125, -0.586944580078125, 3.698760986328125, -0.22391128540039062, 4.10833740234375, 2.1350250244140625, 0.45911216735839844, 2.404735565185547, -1.1072196960449219, 1.9077682495117188, 5.1130828857421875, 1.012603759765625, -0.04395294189453125, 0.8703689575195312, 2.960296630859375, 2.226224899291992, -2.445932388305664, 0.5954055786132812, -0.14926910400390625, 5.1469573974609375, 3.293121337890625, 3.0105056762695312, -1.1124629974365234, 0.6032943725585938, 3.5403671264648438, 3.8094139099121094, -1.0867996215820312, -2.0509109497070312, -0.7215766906738281, 1.2446670532226562, 5.1219482421875, 0.33184814453125, 1.5149955749511719, 1.0316543579101562, 0.101348876953125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000173.npy"}
|
||||
{"epoch": 0.2615268329554044, "step": 174, "batch_size": 64, "mean": 1.3705698251724243, "std": 1.9510363340377808, "min": -3.4685134887695312, "p10": -1.0598018646240235, "median": 1.3873023986816406, "p90": 3.5347869873046878, "max": 6.38800048828125, "pos_frac": 0.75, "sample": [0.9779396057128906, 0.8986053466796875, 2.778583526611328, 1.500661849975586, 1.3098220825195312, -1.702545166015625, 0.34778594970703125, 3.0134429931640625, 2.4445571899414062, 1.96929931640625, 3.2093963623046875, 3.5401611328125, 0.38411712646484375, -0.7739295959472656, 3.354217529296875, -0.9962730407714844, 1.5966472625732422, 2.725393295288086, 1.1886978149414062, -2.669248580932617, 3.522247314453125, 1.631256103515625, -0.9691047668457031, 2.2306976318359375, 4.972431182861328, -0.8690452575683594, 1.46478271484375, -0.1612396240234375, 3.8863372802734375, 0.6040267944335938, 0.9118118286132812, 6.38800048828125, 4.419912338256836, -1.116302490234375, 2.2896461486816406, 3.484039306640625, 1.5834579467773438, 5.4126739501953125, -0.5825595855712891, 3.07635498046875, -0.357940673828125, 2.77117919921875, 1.6801071166992188, 0.16678237915039062, 0.2595634460449219, 2.214366912841797, 1.1072311401367188, -1.2434272766113281, 4.672248840332031, -0.18036937713623047, -3.4685134887695312, 2.8549118041992188, 0.4327125549316406, 0.7777900695800781, 2.8666610717773438, -1.8846206665039062, 1.2815399169921875, -0.01607513427734375, 2.1291770935058594, -1.0870285034179688, 1.5574016571044922, 1.951873779296875, 1.17242431640625, 0.7817153930664062], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000174.npy"}
|
||||
{"epoch": 0.26303854875283444, "step": 175, "batch_size": 64, "mean": 1.4862403869628906, "std": 2.1467983722686768, "min": -4.3768310546875, "p10": -1.2191162109374998, "median": 1.3631105422973633, "p90": 4.217867851257324, "max": 5.969621658325195, "pos_frac": 0.765625, "sample": [-1.4418449401855469, 1.2937335968017578, -1.3752288818359375, 1.084115982055664, 0.3641204833984375, 3.2379379272460938, 0.02569580078125, 5.620147705078125, 1.8485565185546875, -0.4886436462402344, 0.2635040283203125, 3.2160568237304688, 1.2193737030029297, 1.7743263244628906, -0.58233642578125, 4.232072830200195, 2.4697418212890625, 2.9302444458007812, 4.034004211425781, 0.6462631225585938, 1.4324874877929688, 1.5238761901855469, 0.2904052734375, 3.4284210205078125, 2.4374847412109375, 1.43426513671875, -0.3816413879394531, 0.17952728271484375, 4.074151992797852, -4.3768310546875, 3.3913345336914062, 1.9654998779296875, 0.869873046875, 2.6289596557617188, -1.0942535400390625, 0.2537422180175781, 0.6819877624511719, 5.969621658325195, -0.7787628173828125, -1.2726287841796875, 1.9594268798828125, -0.932586669921875, 0.055023193359375, 0.8591384887695312, -2.0917816162109375, -1.54583740234375, 2.20819091796875, -0.5667266845703125, 4.184722900390625, 3.564149856567383, 1.745737075805664, 2.9305553436279297, 0.6151237487792969, -1.379302978515625, 1.1815185546875, 3.3905868530273438, -0.8495445251464844, 4.469703674316406, 3.3054351806640625, 4.8538665771484375, 0.7253303527832031, 5.542449951171875, 2.4464645385742188, 5.418376922607422], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000175.npy"}
|
||||
{"epoch": 0.26455026455026454, "step": 176, "batch_size": 64, "mean": 1.141808271408081, "std": 2.2317376136779785, "min": -5.6620635986328125, "p10": -1.1479331970214843, "median": 1.0551719665527344, "p90": 4.00535011291504, "max": 8.519180297851562, "pos_frac": 0.71875, "sample": [3.141845703125, -3.1887283325195312, 4.055938720703125, 2.3335342407226562, 0.09951019287109375, 5.213531494140625, 0.62054443359375, 1.3264999389648438, 1.214590072631836, -0.148773193359375, -2.40179443359375, -0.7975234985351562, -1.5845680236816406, 1.4733047485351562, -1.1746139526367188, 3.4813709259033203, -0.3230743408203125, 1.9687995910644531, -1.0856781005859375, 0.05509185791015625, 0.09222412109375, -0.3090629577636719, 1.5545978546142578, 4.935844421386719, 1.4883880615234375, 1.1609306335449219, 0.0269622802734375, 1.6168289184570312, -1.2151107788085938, 1.8200225830078125, 4.6670074462890625, -0.2610321044921875, -0.8625411987304688, 2.058380126953125, 0.5068740844726562, 3.750804901123047, 0.109832763671875, 1.405792236328125, 0.3860816955566406, -1.4098968505859375, 0.6049880981445312, 0.4356689453125, 1.6607551574707031, 4.792026519775391, 0.7776870727539062, 3.887310028076172, -0.115692138671875, -0.48242950439453125, 3.434295654296875, 4.7297210693359375, -0.197845458984375, 1.742218017578125, 0.19196319580078125, 2.4981460571289062, 0.9841156005859375, 1.5050125122070312, 8.519180297851562, 2.1987152099609375, 0.05667877197265625, 3.038909912109375, 2.5611419677734375, -5.6620635986328125, 1.1262283325195312, -1.01373291015625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000176.npy"}
|
||||
{"epoch": 0.2660619803476946, "step": 177, "batch_size": 64, "mean": 1.9227689504623413, "std": 2.0328118801116943, "min": -3.4075927734375, "p10": -0.1366762161254882, "median": 1.8370895385742188, "p90": 4.099871253967286, "max": 7.839263916015625, "pos_frac": 0.859375, "sample": [1.70855712890625, 1.0551490783691406, 2.7010631561279297, 7.839263916015625, 2.068115234375, 0.3672599792480469, 0.9912109375, 3.424407958984375, 3.6426239013671875, 2.299896240234375, 7.287483215332031, 2.6786537170410156, 4.153509140014648, -0.016414642333984375, 0.7826385498046875, -0.9251670837402344, 2.9973907470703125, 0.104461669921875, -0.9042110443115234, 2.4691162109375, 3.2190093994140625, 2.3774032592773438, 1.0730724334716797, 6.111850738525391, 2.5849380493164062, 1.7046966552734375, 1.4638023376464844, 2.917522430419922, 0.09781074523925781, 0.3134880065917969, 1.197723388671875, 0.09627532958984375, 3.9747161865234375, 4.6245574951171875, -0.4450492858886719, 3.9341983795166016, 2.771820068359375, 1.1639785766601562, 1.411773681640625, 1.1191349029541016, -0.049957275390625, 1.955831527709961, 3.591663360595703, -1.9015655517578125, 3.3852691650390625, -1.5827102661132812, 2.0994720458984375, 1.713134765625, 2.2674713134765625, -0.1738414764404297, 1.8954925537109375, 0.28556060791015625, -3.4075927734375, 5.494855880737305, 1.989715576171875, 3.0651016235351562, 2.303274154663086, 3.611572265625, 1.5563201904296875, 0.2884635925292969, 1.2297401428222656, 5.00108528137207, 0.222442626953125, 1.7786865234375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000177.npy"}
|
||||
{"epoch": 0.2675736961451247, "step": 178, "batch_size": 64, "mean": 1.1327934265136719, "std": 1.652286410331726, "min": -2.628936767578125, "p10": -0.7475475311279297, "median": 0.8491458892822266, "p90": 3.2448795318603523, "max": 5.257972717285156, "pos_frac": 0.796875, "sample": [0.2204113006591797, 2.458953857421875, 2.6734580993652344, 0.282012939453125, 2.7436180114746094, 0.7962703704833984, -0.7475128173828125, 3.1126976013183594, 1.0359878540039062, 0.7422904968261719, -0.7475624084472656, 2.8524246215820312, -0.7284622192382812, -1.72119140625, 1.8917388916015625, 1.3260345458984375, 3.567729949951172, 0.8436470031738281, -0.35870361328125, 0.5247955322265625, 3.7239112854003906, 0.2774791717529297, 0.46178436279296875, -0.01318359375, 2.2750892639160156, 1.800811767578125, 1.5291595458984375, 0.583221435546875, 1.4139022827148438, 0.7203559875488281, 0.6724853515625, 0.08603477478027344, 0.26862335205078125, 4.323207855224609, 0.3054962158203125, 2.719818115234375, 1.082183837890625, 1.3758773803710938, 5.257972717285156, 0.37657928466796875, -2.628936767578125, 2.5, 4.94390869140625, 0.9543075561523438, -2.32879638671875, 2.4326553344726562, 0.10372161865234375, -0.5576858520507812, 3.3015289306640625, 1.3223896026611328, 1.1252670288085938, 4.0787200927734375, 0.5387210845947266, 2.470703125, 0.8306121826171875, -0.1663055419921875, 2.3507843017578125, 0.854644775390625, -1.7258834838867188, -1.2735939025878906, 1.2383460998535156, 0.7956695556640625, 2.2515869140625, -0.9230308532714844], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000178.npy"}
|
||||
{"epoch": 0.2690854119425548, "step": 179, "batch_size": 64, "mean": 1.2913641929626465, "std": 2.1816763877868652, "min": -3.5417633056640625, "p10": -1.3843273162841796, "median": 1.2794857025146484, "p90": 4.050404357910156, "max": 6.4778900146484375, "pos_frac": 0.703125, "sample": [1.0838546752929688, -0.323883056640625, 1.5480804443359375, 3.7438507080078125, -0.4760627746582031, 1.8642501831054688, 1.2888374328613281, 2.264251708984375, 6.0676116943359375, 2.7737960815429688, 3.3445281982421875, 5.273983001708984, 1.9308929443359375, -1.036041259765625, 1.728677749633789, -1.4193153381347656, -3.5417633056640625, 0.3994712829589844, 0.4599151611328125, 0.6958465576171875, -0.2940101623535156, 2.5725021362304688, -1.7642669677734375, 1.1329269409179688, 4.603351593017578, -0.2512969970703125, -0.5671768188476562, 2.546459197998047, 0.3654327392578125, 0.37182044982910156, 2.2712459564208984, 4.420417785644531, 0.9809036254882812, -1.3026885986328125, 0.19628143310546875, -1.715057373046875, -1.4680328369140625, 6.3318939208984375, 1.9303817749023438, -0.7544975280761719, 1.8915729522705078, 2.396392822265625, 1.8504390716552734, 2.1616172790527344, 1.7320480346679688, 3.03997802734375, 4.059669494628906, 0.9641571044921875, 4.028785705566406, 6.4778900146484375, 3.9674644470214844, 1.2701339721679688, 3.4698944091796875, 0.6052627563476562, -0.9860305786132812, 1.974884033203125, 0.670684814453125, -0.1067962646484375, 1.3178043365478516, -1.7155036926269531, -3.3018245697021484, -0.7497482299804688, 1.5126190185546875, -1.161458969116211], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000179.npy"}
|
||||
{"epoch": 0.2705971277399849, "step": 180, "batch_size": 64, "mean": 1.0894112586975098, "std": 2.0323712825775146, "min": -2.7741546630859375, "p10": -1.7415763854980468, "median": 1.1345462799072266, "p90": 3.7124591827392583, "max": 6.039344787597656, "pos_frac": 0.6875, "sample": [3.00164794921875, 0.18430328369140625, 2.2222328186035156, -0.36873626708984375, 1.123046875, 2.3195571899414062, -2.1414566040039062, 0.6583251953125, 0.3969879150390625, -0.5610675811767578, 3.1062240600585938, 3.2208938598632812, -2.3807525634765625, -0.5463924407958984, 1.7307891845703125, 0.290679931640625, -2.7741546630859375, 2.4928741455078125, 0.11807632446289062, -1.8594512939453125, 4.392303466796875, 0.7715396881103516, 2.5270137786865234, 1.241302490234375, 1.522695541381836, 3.186552047729492, -0.6114692687988281, 1.2407989501953125, 1.0699481964111328, 1.2087669372558594, 2.12628173828125, 3.588672637939453, -2.2162399291992188, -0.061614990234375, 2.354076385498047, 1.54827880859375, 0.4235095977783203, 4.724052429199219, -0.08498382568359375, -0.35584259033203125, 1.1460456848144531, 2.1000900268554688, 4.53826904296875, -0.3312358856201172, -1.2661571502685547, 0.5717353820800781, 4.1995391845703125, 0.2328338623046875, 6.039344787597656, -0.17305946350097656, 2.4088706970214844, -1.6950225830078125, 1.986358642578125, -1.334686279296875, -1.5095596313476562, 1.9562873840332031, 5.230400085449219, -1.7615280151367188, 3.1306114196777344, -2.2711944580078125, 2.26580810546875, 1.439910888671875, 3.7655105590820312, 0.22387313842773438], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000180.npy"}
|
||||
{"epoch": 0.272108843537415, "step": 181, "batch_size": 64, "mean": 1.7682645320892334, "std": 1.8725322484970093, "min": -2.3646392822265625, "p10": -0.4351537704467773, "median": 1.584798812866211, "p90": 4.12026596069336, "max": 7.3515625, "pos_frac": 0.8125, "sample": [0.6837692260742188, 2.257762908935547, 0.38082122802734375, -0.4836406707763672, 1.4407081604003906, 1.0205440521240234, 2.1046218872070312, -0.3889617919921875, 0.9073905944824219, 0.65765380859375, 2.77734375, 0.3141326904296875, 1.6413841247558594, 0.8393707275390625, 1.8731632232666016, 1.953826904296875, 2.5486793518066406, 3.4643402099609375, 2.4509658813476562, -2.3646392822265625, 1.52142333984375, 4.848503112792969, 3.196575164794922, 1.9122734069824219, 4.167610168457031, 4.4950408935546875, 0.8771896362304688, 7.3515625, 0.12522125244140625, -0.8516998291015625, 5.466278076171875, 2.3902740478515625, 4.009796142578125, -1.4057426452636719, 2.40191650390625, -0.0189666748046875, -0.18763160705566406, 4.676887512207031, 1.2910118103027344, -0.00205230712890625, 2.988727569580078, 0.5360050201416016, 2.840656280517578, 3.9123077392578125, 3.397258758544922, 0.2824249267578125, 1.2452526092529297, 1.809244155883789, 2.5839385986328125, -0.45495033264160156, 0.465423583984375, 3.0618629455566406, 6.657432556152344, 1.0646896362304688, 1.4566192626953125, 3.280557632446289, 1.5282135009765625, 1.448944091796875, 2.285554885864258, -0.8514251708984375, 1.8268203735351562, -0.05491447448730469, 2.551912307739258, -1.038330078125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000181.npy"}
|
||||
{"epoch": 0.273620559334845, "step": 182, "batch_size": 64, "mean": 1.6124354600906372, "std": 2.280653953552246, "min": -4.506439208984375, "p10": -1.0042085647583008, "median": 1.4527769088745117, "p90": 4.757452011108398, "max": 6.889434814453125, "pos_frac": 0.75, "sample": [6.889434814453125, -0.7339344024658203, -1.0149116516113281, 0.7724609375, 2.8358383178710938, 3.9567794799804688, 1.8840293884277344, -0.9792346954345703, -1.5777359008789062, 4.6893463134765625, 3.064054489135742, 1.4775772094726562, 6.09210205078125, -1.2915802001953125, 2.3688812255859375, 2.4564208984375, 0.8812255859375, 1.5326385498046875, 5.043903350830078, 0.9741439819335938, 1.4414196014404297, 0.06605148315429688, 0.5592041015625, 0.2560272216796875, 1.4641342163085938, -0.266571044921875, 4.792463302612305, 2.132814407348633, 3.495880126953125, 4.166177749633789, 5.15313720703125, 2.0302295684814453, 1.7409820556640625, 1.8279533386230469, 2.9304428100585938, -1.8031005859375, 0.7308502197265625, 1.226226806640625, 4.2391357421875, -0.022003173828125, 1.2601699829101562, -2.1070022583007812, 4.963142395019531, -0.7981033325195312, 3.767507553100586, 3.2851791381835938, 0.8986873626708984, 0.470855712890625, 2.6064910888671875, 1.1494369506835938, 1.8396148681640625, 4.373538970947266, -0.3580513000488281, 4.786640167236328, 3.9434585571289062, 0.8908214569091797, -0.729248046875, 1.3638973236083984, 3.8733291625976562, 0.4784660339355469, -0.03902435302734375, -3.0087738037109375, -0.6916236877441406, -4.506439208984375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000182.npy"}
|
||||
{"epoch": 0.2751322751322751, "step": 183, "batch_size": 64, "mean": 0.8008227348327637, "std": 2.0008888244628906, "min": -6.031558990478516, "p10": -1.5669239044189451, "median": 0.9080810546875, "p90": 3.3685508728027345, "max": 5.0240631103515625, "pos_frac": 0.640625, "sample": [0.7989425659179688, 3.5330963134765625, -0.6410369873046875, 0.09343338012695312, 1.0951404571533203, -0.927490234375, 0.1806182861328125, 3.6048126220703125, -0.9733352661132812, -0.04964447021484375, -6.031558990478516, 0.8793869018554688, 1.3473777770996094, -1.0209808349609375, 1.0646514892578125, 2.9024181365966797, -0.4620647430419922, 0.49431610107421875, 5.0240631103515625, 1.4922962188720703, 3.055755615234375, -2.0123138427734375, 3.13983154296875, -0.6754741668701172, 0.16590118408203125, -0.3118247985839844, 2.334442138671875, 1.5305099487304688, -1.7458953857421875, -1.7067413330078125, 4.1085357666015625, 2.91326904296875, 1.2465705871582031, 0.5912628173828125, -2.4650115966796875, 2.3679962158203125, 2.3177146911621094, 3.3701095581054688, -3.7691421508789062, 3.595081329345703, -1.3297996520996094, -0.4203758239746094, 3.0397186279296875, -0.09882545471191406, 0.013683319091796875, 1.3028411865234375, 2.1840248107910156, 1.0806312561035156, -0.8855743408203125, -0.5639553070068359, 1.239013671875, 1.8653488159179688, 0.0019683837890625, 0.9367752075195312, 3.6403045654296875, -0.09986495971679688, 1.6806793212890625, -1.668548583984375, -0.22832107543945312, -0.43670654296875, 2.35888671875, 3.3649139404296875, 2.493976593017578, 1.32684326171875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000183.npy"}
|
||||
{"epoch": 0.2766439909297052, "step": 184, "batch_size": 64, "mean": 1.3855886459350586, "std": 1.7812681198120117, "min": -1.5923480987548828, "p10": -0.33063144683837886, "median": 1.0087223052978516, "p90": 4.129558753967285, "max": 6.315683364868164, "pos_frac": 0.75, "sample": [1.1626434326171875, 4.612148284912109, 2.9305286407470703, 0.9455184936523438, 2.7623748779296875, 0.495819091796875, -0.5057563781738281, 0.24230194091796875, -0.06049346923828125, 1.777984619140625, 0.17392921447753906, 3.3694534301757812, 0.5425224304199219, 1.1416091918945312, 2.2884559631347656, -1.3171920776367188, 3.9087371826171875, 4.042423248291016, 0.12170791625976562, 5.238037109375, -0.0379638671875, 6.315683364868164, 1.3501853942871094, 5.013301849365234, -0.6045684814453125, 1.5824737548828125, 0.2600383758544922, 2.7917938232421875, 4.612186431884766, 1.0238990783691406, 0.9539337158203125, -0.22589874267578125, 0.44449615478515625, -0.16244125366210938, -0.04406929016113281, 1.2901992797851562, 4.166902542114258, 2.3161182403564453, 0.0458831787109375, 2.6405258178710938, -0.34069252014160156, 0.0507354736328125, 4.349037170410156, -0.0018558502197265625, 2.6736831665039062, 2.102571487426758, 1.4851226806640625, 0.9741058349609375, 0.38720703125, 3.8895797729492188, -0.798828125, 0.2027740478515625, 0.2653656005859375, 0.9935455322265625, 1.76287841796875, 2.772979736328125, -1.5923480987548828, 1.6832046508789062, -0.3071556091308594, 1.0730438232421875, -0.29924774169921875, -0.10476112365722656, -1.2373065948486328, 1.0886001586914062], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000184.npy"}
|
||||
{"epoch": 0.2781557067271353, "step": 185, "batch_size": 64, "mean": 1.4216983318328857, "std": 2.5244245529174805, "min": -4.268241882324219, "p10": -1.6740245819091795, "median": 1.4508028030395508, "p90": 5.164106178283691, "max": 7.706512451171875, "pos_frac": 0.703125, "sample": [1.000783920288086, 7.706512451171875, -0.25461578369140625, 2.404682159423828, 5.910064697265625, -3.4434471130371094, 1.60504150390625, 1.0330009460449219, 0.12042236328125, -0.45168304443359375, 5.17315673828125, 5.46435546875, 0.41357421875, -0.0347900390625, 6.813591003417969, 1.5244693756103516, 5.260406494140625, 1.37713623046875, 2.7541236877441406, 2.8072147369384766, -0.7591209411621094, 3.2772674560546875, 2.4573097229003906, 1.7292098999023438, 1.126434326171875, 3.0238990783691406, 1.531219482421875, 2.215452194213867, 1.9531936645507812, 2.447662353515625, -1.0468196868896484, -0.13610076904296875, 2.3479442596435547, 4.585357666015625, -2.7120742797851562, 2.9573974609375, 1.8003997802734375, 3.2909698486328125, 1.1451454162597656, -0.09631156921386719, -4.268241882324219, 1.0060958862304688, -1.5386276245117188, 2.767333984375, 1.9258193969726562, -3.5671539306640625, 2.967670440673828, 0.2390613555908203, 5.466064453125, -0.3073749542236328, 1.09442138671875, -3.306344985961914, 3.25469970703125, -1.348663330078125, -0.48281097412109375, 1.2305717468261719, -1.7320518493652344, -1.857339859008789, 0.8553142547607422, -0.749298095703125, 0.18610382080078125, 1.9590110778808594, 5.142988204956055, 3.72900390625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000185.npy"}
|
||||
{"epoch": 0.2796674225245654, "step": 186, "batch_size": 64, "mean": 1.3478801250457764, "std": 2.207749843597412, "min": -4.028053283691406, "p10": -1.4529699325561523, "median": 1.3617677688598633, "p90": 4.181637191772461, "max": 6.350830078125, "pos_frac": 0.71875, "sample": [3.725616455078125, 5.720970153808594, -2.7363624572753906, 1.2114696502685547, 2.0818252563476562, 0.6894989013671875, 2.5291290283203125, -1.8244476318359375, -1.6534500122070312, 2.6857147216796875, 0.3850250244140625, -0.6048126220703125, 1.2561187744140625, 4.077226638793945, 6.134429931640625, 3.877408981323242, 1.467416763305664, 1.2388381958007812, 0.0585784912109375, -1.52362060546875, 0.8023204803466797, 1.8253459930419922, 3.0887203216552734, 4.549110412597656, -4.028053283691406, -1.1798324584960938, -0.4557781219482422, -1.2881183624267578, 0.26629638671875, 0.9782028198242188, 1.2002029418945312, 4.139865875244141, -0.6216278076171875, -0.979248046875, -0.030376434326171875, -0.7045841217041016, 1.6891021728515625, 4.1995391845703125, 2.1425399780273438, 1.731781005859375, -0.20209884643554688, -2.2000579833984375, -0.23096084594726562, 0.907440185546875, 1.56463623046875, 1.1127548217773438, 2.897127151489258, 2.705108642578125, -1.6690826416015625, 1.8185043334960938, 4.5690460205078125, 1.7208576202392578, 2.097808837890625, 6.311767578125, 0.24120330810546875, 1.2285690307617188, 6.350830078125, 2.234926223754883, 2.0581398010253906, 2.5239620208740234, 1.9381256103515625, 1.59197998046875, -1.1763458251953125, 1.74810791015625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000186.npy"}
|
||||
{"epoch": 0.2811791383219955, "step": 187, "batch_size": 64, "mean": 1.6290010213851929, "std": 2.462125539779663, "min": -4.996337890625, "p10": -1.1345146179199217, "median": 1.6299018859863281, "p90": 4.197435760498047, "max": 8.553634643554688, "pos_frac": 0.78125, "sample": [2.2064666748046875, 0.2860870361328125, 5.557807922363281, 3.8430328369140625, 8.553634643554688, 3.01397705078125, 3.7515335083007812, 0.5002784729003906, 1.557342529296875, 2.308971405029297, -1.1727676391601562, 3.468170166015625, 0.8971138000488281, 4.0720977783203125, 3.2593841552734375, 1.1063785552978516, 0.3955402374267578, 4.251152038574219, 1.7024612426757812, 2.269073486328125, -4.996337890625, -4.436866760253906, 4.820152282714844, 0.2488269805908203, 1.5019950866699219, -0.5495834350585938, 0.3522186279296875, 0.2709503173828125, 1.9239349365234375, 3.8429412841796875, 3.2549095153808594, 3.9782638549804688, -2.1110992431640625, 0.13321304321289062, 2.5716552734375, 4.745338439941406, 0.6816520690917969, -0.101165771484375, 0.2865753173828125, 1.8369808197021484, -1.045257568359375, -1.2449874877929688, 3.7176856994628906, -0.04009437561035156, 0.38863372802734375, 2.0849533081054688, 3.4818344116210938, -0.9961624145507812, 4.654571533203125, 3.5822906494140625, 3.8874359130859375, 1.2369308471679688, 2.5029373168945312, 6.8412933349609375, 1.1851749420166016, -0.6813812255859375, 2.654296875, 2.6549835205078125, 0.46379852294921875, 1.4895477294921875, -0.07424163818359375, -1.8583316802978516, -3.552001953125, 2.8398666381835938], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000187.npy"}
|
||||
{"epoch": 0.28269085411942557, "step": 188, "batch_size": 64, "mean": 1.4481827020645142, "std": 2.449981927871704, "min": -8.966583251953125, "p10": -0.6797798156738281, "median": 1.1198787689208984, "p90": 4.4887733459472665, "max": 5.9129791259765625, "pos_frac": 0.78125, "sample": [1.2312393188476562, -0.13465309143066406, 4.580326080322266, -1.2363910675048828, 0.41770172119140625, 3.309326171875, 3.6105804443359375, 0.9165935516357422, -0.107086181640625, -0.0622100830078125, 1.0708465576171875, 4.09210205078125, 2.997100830078125, -0.49897003173828125, 0.07003593444824219, 0.3592720031738281, 4.2459869384765625, 4.115104675292969, 0.4516181945800781, -0.6137542724609375, 1.0480289459228516, 0.0100555419921875, 0.9646759033203125, -0.169891357421875, 2.2824172973632812, 1.231597900390625, -0.8905010223388672, 2.9037017822265625, 0.4512062072753906, 4.983585357666016, 4.057929992675781, 0.35089874267578125, 2.8021240234375, 5.593986511230469, 3.5669631958007812, 0.10141181945800781, 4.123527526855469, 1.2452239990234375, 0.06391143798828125, 1.0495986938476562, 5.881542205810547, 0.390655517578125, 1.4868736267089844, 1.9680023193359375, 1.1651535034179688, -1.5525360107421875, 1.6019878387451172, 4.7613983154296875, 1.9518890380859375, 0.6012153625488281, 5.9129791259765625, -4.1951751708984375, -0.7080764770507812, 1.5511703491210938, 0.931732177734375, 1.99053955078125, 1.0746040344238281, -1.5781822204589844, 4.275150299072266, 5.069562911987305, 2.8187389373779297, 2.1891632080078125, -0.5233345031738281, -8.966583251953125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000188.npy"}
|
||||
{"epoch": 0.2842025699168556, "step": 189, "batch_size": 64, "mean": 1.7021600008010864, "std": 2.140864133834839, "min": -2.379993438720703, "p10": -1.4259422302246094, "median": 1.5241641998291016, "p90": 4.810741233825684, "max": 6.3876800537109375, "pos_frac": 0.828125, "sample": [0.43497467041015625, 2.0493221282958984, 1.8490123748779297, 0.414825439453125, 1.8212738037109375, 3.9606170654296875, 5.225914001464844, 3.0619449615478516, 0.5040454864501953, 2.2050018310546875, -1.6400604248046875, 3.4810829162597656, -1.8935089111328125, 0.5294189453125, 0.0998992919921875, -0.30538177490234375, 6.3876800537109375, 2.883800506591797, -2.379993438720703, 1.4676246643066406, 0.8106307983398438, 4.105894088745117, 0.9872589111328125, 2.8065185546875, 1.05487060546875, 1.024057388305664, 1.924835205078125, 0.9274997711181641, 1.6199989318847656, -0.13532257080078125, 4.128089904785156, -0.5687713623046875, 1.8644180297851562, 6.071561813354492, 5.368507385253906, 2.0277938842773438, 0.4440460205078125, 0.6470260620117188, 0.6466503143310547, 2.8496665954589844, 4.680168151855469, 0.94842529296875, 1.4428462982177734, 0.7443275451660156, 2.211700439453125, 2.651409149169922, -1.4470748901367188, -2.373472213745117, 1.024169921875, 1.9580764770507812, 0.6065673828125, 4.866701126098633, -1.3766326904296875, 5.237831115722656, 3.47900390625, 0.620147705078125, -1.8232460021972656, -2.038837432861328, 2.0193443298339844, 3.8153076171875, 1.5807037353515625, 1.1045608520507812, 5.666679382324219, 4.576810836791992], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000189.npy"}
|
||||
{"epoch": 0.2857142857142857, "step": 190, "batch_size": 64, "mean": 1.8837926387786865, "std": 2.3850462436676025, "min": -3.6837005615234375, "p10": -0.7400171279907224, "median": 1.4941177368164062, "p90": 5.390975952148438, "max": 6.3357696533203125, "pos_frac": 0.78125, "sample": [1.7220439910888672, 3.344696044921875, 5.475616455078125, -3.5863494873046875, 2.6917991638183594, 3.9576263427734375, 0.10218048095703125, 1.58624267578125, -2.4957046508789062, 6.046722412109375, 1.1894874572753906, 2.6402206420898438, 2.078899383544922, -0.21985626220703125, 1.1142501831054688, -0.1334991455078125, -0.054073333740234375, 4.57635498046875, 1.152353286743164, 2.425172805786133, 2.7478981018066406, 4.504634857177734, 0.72406005859375, 0.7876548767089844, -1.5348758697509766, 1.3845062255859375, -3.6837005615234375, 0.7653465270996094, 1.1575202941894531, 1.5418281555175781, 3.5115203857421875, -0.21672439575195312, 1.4469451904296875, 2.333526611328125, 2.5066375732421875, 0.05015754699707031, 2.9416046142578125, -0.0548553466796875, -0.5344409942626953, 1.3115501403808594, 0.47126007080078125, 6.19287109375, 4.624835968017578, 5.7313232421875, 0.6885814666748047, -0.03211212158203125, -1.3929615020751953, 4.86907958984375, 0.034423828125, -1.3985462188720703, 0.8159637451171875, 2.184732437133789, 1.1012153625488281, 4.39117431640625, 3.3741912841796875, 6.3143157958984375, 5.5207977294921875, 1.541290283203125, 4.753732681274414, 6.3357696533203125, -0.8281211853027344, 5.1934814453125, 1.1457672119140625, 3.624683380126953], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000190.npy"}
|
||||
{"epoch": 0.2872260015117158, "step": 191, "batch_size": 64, "mean": 1.7088083028793335, "std": 2.3775970935821533, "min": -4.086517333984375, "p10": -1.1470474243164062, "median": 1.7449169158935547, "p90": 4.752265930175781, "max": 6.712493896484375, "pos_frac": 0.765625, "sample": [4.053291320800781, -1.0727157592773438, -1.1572418212890625, 2.0591583251953125, 6.712493896484375, 2.1025772094726562, 2.783050537109375, 4.7974700927734375, 1.7645034790039062, 0.6689605712890625, 1.9732818603515625, 5.874755859375, 2.2218246459960938, 1.7253303527832031, 2.3995361328125, 1.2075481414794922, -1.9302749633789062, -0.673828125, 6.57220458984375, 1.4470596313476562, 1.511077880859375, 5.85894775390625, 2.6060237884521484, 3.9208145141601562, -2.3784408569335938, 3.585845947265625, -0.0455780029296875, 4.266849517822266, -1.123260498046875, 0.5695590972900391, 2.131246566772461, -0.25591278076171875, 0.6020050048828125, 1.5832862854003906, 6.5550384521484375, -4.086517333984375, 0.3474159240722656, -2.029693603515625, 3.7134437561035156, 1.7957077026367188, 2.159679412841797, 2.2784080505371094, 0.381011962890625, 2.8152694702148438, 2.79266357421875, 0.6543769836425781, 4.1002197265625, 3.2354507446289062, -0.15157127380371094, -0.49596405029296875, -0.21303176879882812, 1.2428741455078125, 4.09149169921875, 0.8280181884765625, -1.3792152404785156, 3.1689453125, 5.8284149169921875, -2.9285888671875, 1.8079566955566406, 0.01122283935546875, 0.50115966796875, 4.64678955078125, 0.5772171020507812, 0.75408935546875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000191.npy"}
|
||||
{"epoch": 0.2887377173091459, "step": 192, "batch_size": 64, "mean": 1.31644868850708, "std": 2.1602187156677246, "min": -4.7291717529296875, "p10": -0.9456287384033203, "median": 1.2517061233520508, "p90": 4.0892383575439455, "max": 7.724992752075195, "pos_frac": 0.765625, "sample": [-4.7291717529296875, -1.7467155456542969, 0.6274452209472656, 3.7809715270996094, 1.964731216430664, 0.11869239807128906, -1.7624149322509766, 3.9722747802734375, 4.01837158203125, 0.7245864868164062, -0.9384002685546875, -0.33014678955078125, -0.09653472900390625, 0.056735992431640625, 1.3078536987304688, 2.3858184814453125, 4.119609832763672, 0.8061370849609375, 1.2816295623779297, 0.0482177734375, 2.675802230834961, 5.182701110839844, -0.04490470886230469, 0.4103565216064453, 3.095458984375, 0.12767601013183594, -0.5829925537109375, 0.6171684265136719, 2.3133697509765625, 0.4447364807128906, 2.8076400756835938, 2.0438919067382812, 1.5580081939697266, 1.9848060607910156, -0.5487480163574219, 0.05309867858886719, 2.0538177490234375, 0.30743980407714844, -0.28461456298828125, 2.4854812622070312, 1.2217826843261719, 0.4438896179199219, -3.9864540100097656, 7.724992752075195, 4.863311767578125, 2.054513931274414, -1.0174407958984375, 1.6919384002685547, 1.4578399658203125, -1.5247268676757812, 3.4118499755859375, 4.159366607666016, 0.27642059326171875, 5.055625915527344, 0.14062881469726562, 2.8671951293945312, 4.005413055419922, -0.6258087158203125, 2.3392562866210938, 4.1891937255859375, 1.7704334259033203, -0.9487266540527344, 0.7369308471679688, 1.635406494140625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000192.npy"}
|
||||
{"epoch": 0.29024943310657597, "step": 193, "batch_size": 64, "mean": 1.7677130699157715, "std": 2.3235886096954346, "min": -3.5113868713378906, "p10": -0.7048698425292967, "median": 1.2260818481445312, "p90": 4.90563659667969, "max": 9.599456787109375, "pos_frac": 0.828125, "sample": [1.0863113403320312, 0.4895896911621094, 2.086648941040039, 1.7862625122070312, 0.9667797088623047, -0.7684783935546875, -0.5564498901367188, 4.167503356933594, 0.9400787353515625, 0.1612567901611328, 3.5632781982421875, 0.15244483947753906, 3.6983489990234375, 1.221405029296875, 2.5943603515625, -1.0233535766601562, 3.3797378540039062, 0.5696830749511719, 1.622152328491211, 3.284008026123047, -1.3117828369140625, 5.459497451782227, 9.599456787109375, 0.3897285461425781, 3.34039306640625, 3.0593719482421875, 0.7882652282714844, 2.2598724365234375, 0.00681304931640625, 1.0648536682128906, 5.791175842285156, 2.37689208984375, 0.5386428833007812, 2.929485321044922, 2.1751022338867188, -1.6973419189453125, -1.0169143676757812, 0.7822341918945312, 2.5124053955078125, 2.274188995361328, 0.9111328125, 0.4988059997558594, 2.794811248779297, 8.7650146484375, -0.8849143981933594, -0.2468414306640625, 1.2391414642333984, 0.008571624755859375, 0.16435623168945312, 4.239891052246094, 3.7291793823242188, -0.3399314880371094, 5.243621826171875, 5.818830490112305, 5.190956115722656, 1.3215789794921875, 1.2307586669921875, -3.5113868713378906, 1.7898330688476562, 2.667409896850586, 0.6627998352050781, 0.8834114074707031, -0.027660369873046875, 0.24036216735839844], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000193.npy"}
|
||||
{"epoch": 0.29176114890400606, "step": 194, "batch_size": 64, "mean": 1.5379945039749146, "std": 2.5628421306610107, "min": -3.7601146697998047, "p10": -1.6641046524047851, "median": 1.7540006637573242, "p90": 4.625209999084473, "max": 7.697883605957031, "pos_frac": 0.703125, "sample": [3.1932296752929688, 1.72979736328125, 2.1785964965820312, -1.6512565612792969, 2.9917068481445312, 0.9893264770507812, -1.6696109771728516, 0.8526382446289062, -0.34172821044921875, 2.950469970703125, 2.017658233642578, 2.8124237060546875, 1.0484085083007812, 2.435028076171875, -3.5125045776367188, -2.036956787109375, 5.673713684082031, -0.6701202392578125, 2.045612335205078, -1.2217063903808594, 0.78692626953125, 0.9912071228027344, 3.9447097778320312, 3.50421142578125, -0.8205966949462891, 2.5859451293945312, -0.6978683471679688, 1.69915771484375, 0.097900390625, 1.7934188842773438, -0.19536590576171875, 3.0894813537597656, -2.2308273315429688, -1.1602516174316406, -3.60589599609375, -0.4636554718017578, 4.532560348510742, 4.6649169921875, 3.366809844970703, 5.492238998413086, -3.7601146697998047, 2.1691207885742188, 1.7782039642333984, 1.4180774688720703, 6.613945007324219, 6.790727615356445, 3.4621410369873047, -2.998321533203125, 6.468025207519531, 3.051422119140625, 3.340932846069336, -0.653594970703125, 1.828399658203125, 3.805694580078125, 2.0462646484375, 7.697883605957031, 2.052806854248047, -0.1775531768798828, -0.2827796936035156, 3.4326095581054688, 1.1437702178955078, 0.5394020080566406, 0.021694183349609375, 1.4531440734863281], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000194.npy"}
|
||||
{"epoch": 0.29327286470143615, "step": 195, "batch_size": 64, "mean": 1.657346487045288, "std": 2.3989763259887695, "min": -4.8082275390625, "p10": -0.6831600189208984, "median": 1.573380470275879, "p90": 4.510957908630371, "max": 7.698436737060547, "pos_frac": 0.78125, "sample": [1.147613525390625, 2.2489356994628906, -1.30657958984375, 0.745452880859375, 5.760686874389648, 2.7679977416992188, 1.5666217803955078, 1.58013916015625, 2.1746978759765625, 3.6608734130859375, 4.53668212890625, 3.6171302795410156, -3.3546524047851562, -0.65130615234375, -2.163665771484375, -4.8082275390625, -0.390472412109375, 1.368072509765625, 0.5816211700439453, 4.659555435180664, -0.1379241943359375, 4.106414794921875, 3.1690425872802734, 4.012176513671875, 0.6147079467773438, 0.3331451416015625, 2.927949905395508, 5.726478576660156, 0.0914764404296875, 2.7327423095703125, 0.43060874938964844, 0.7546844482421875, 7.698436737060547, 1.9149169921875, 3.1605072021484375, -0.6968116760253906, -0.06539726257324219, 3.470439910888672, 3.2137298583984375, 3.9657745361328125, 0.5638523101806641, -0.4014129638671875, 0.07424163818359375, 0.39984130859375, 0.15979957580566406, 1.961111068725586, -0.641387939453125, 3.1746063232421875, -2.8206558227539062, 0.05853462219238281, -0.5593795776367188, 0.71722412109375, 6.617774963378906, 4.34576416015625, 2.1920318603515625, 3.2073516845703125, 4.477508544921875, 1.7795562744140625, -2.4007415771484375, 3.6852493286132812, 1.2473602294921875, 0.8668746948242188, 1.675506591796875, 4.525293350219727], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000195.npy"}
|
||||
{"epoch": 0.2947845804988662, "step": 196, "batch_size": 64, "mean": 1.6336736679077148, "std": 2.375190496444702, "min": -6.924800872802734, "p10": -1.0958093643188476, "median": 1.326446533203125, "p90": 4.7383563995361335, "max": 6.0707855224609375, "pos_frac": 0.84375, "sample": [0.5705413818359375, -6.924800872802734, -1.1464195251464844, 3.1497650146484375, 0.6340885162353516, -1.0326290130615234, 1.1821060180664062, 1.3434295654296875, 1.7216110229492188, -1.5070037841796875, 0.8090057373046875, 4.259552001953125, 2.8308639526367188, 1.217376708984375, 2.1240386962890625, 0.7425689697265625, -4.352695465087891, 4.340095520019531, -2.3285064697265625, 5.651126861572266, -0.4746074676513672, 0.767547607421875, 4.185951232910156, 0.12473106384277344, 0.003124237060546875, 2.016998291015625, 2.417816162109375, 2.7325897216796875, 3.3805999755859375, 5.6815643310546875, 0.6783809661865234, 3.751798629760742, 1.3094635009765625, -0.21623802185058594, 0.490631103515625, 1.2479248046875, 0.44815635681152344, 1.9879512786865234, 1.119293212890625, 5.285102844238281, 1.200469970703125, 2.6439056396484375, 1.4448204040527344, 2.449432373046875, 3.0629844665527344, -1.1228866577148438, 0.4065570831298828, 2.568878173828125, 2.181976318359375, 4.7742919921875, 2.8577728271484375, 4.654506683349609, 0.10579681396484375, 0.17617034912109375, 2.2294750213623047, 6.0707855224609375, 5.648534774780273, 5.814994812011719, 0.6428680419921875, 0.49323272705078125, 1.9470596313476562, -1.1556816101074219, 1.0878219604492188, 4.1484527587890625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000196.npy"}
|
||||
{"epoch": 0.2962962962962963, "step": 197, "batch_size": 64, "mean": 1.9544826745986938, "std": 3.3157589435577393, "min": -8.440231323242188, "p10": -1.6856782913208008, "median": 1.5859498977661133, "p90": 5.751386642456055, "max": 9.543655395507812, "pos_frac": 0.765625, "sample": [-0.9875030517578125, 7.992767333984375, 1.5662784576416016, -0.3933258056640625, 0.761566162109375, 2.0981216430664062, -2.646503448486328, -8.440231323242188, 1.4309577941894531, -0.8593368530273438, 5.202728271484375, 4.711088180541992, -2.7428741455078125, -2.2228126525878906, 0.2326641082763672, 6.991497039794922, 0.9586868286132812, 1.3597869873046875, 7.5839691162109375, 6.532417297363281, 0.9240646362304688, 1.1791610717773438, 5.751575469970703, 1.8365898132324219, 2.5517730712890625, 1.2101058959960938, 0.9512176513671875, -1.1899185180664062, 3.640390396118164, 3.6349945068359375, 4.370185852050781, 5.6492767333984375, -4.3668212890625, 5.199974060058594, -1.6861572265625, -5.541568756103516, 2.3494338989257812, 0.1104278564453125, 3.2528839111328125, 3.9467697143554688, -0.39813232421875, 3.6446151733398438, 5.525321960449219, 0.5236968994140625, -0.38777923583984375, 6.172698974609375, 1.605621337890625, 0.611328125, 1.2736568450927734, 5.280168533325195, -1.684560775756836, 0.00815582275390625, 3.0914268493652344, 5.178070068359375, 1.1810054779052734, -1.5883331298828125, 9.543655395507812, 2.7315444946289062, 0.36809539794921875, 1.9289817810058594, 4.639595031738281, 5.750946044921875, 2.514280319213867, 4.66853141784668], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000197.npy"}
|
||||
{"epoch": 0.29780801209372637, "step": 198, "batch_size": 64, "mean": 2.153754234313965, "std": 2.516483783721924, "min": -3.62921142578125, "p10": -0.6550941467285155, "median": 2.1478805541992188, "p90": 4.9725919723510765, "max": 10.241116523742676, "pos_frac": 0.828125, "sample": [-0.690399169921875, 2.6496505737304688, 1.166269302368164, 1.8053436279296875, 1.0567169189453125, 3.283935546875, -1.4832706451416016, 2.4228897094726562, 0.7505092620849609, 1.6551666259765625, 3.9517822265625, 2.1054229736328125, -0.120330810546875, -1.8674144744873047, 8.46513557434082, 6.113029479980469, 3.6614532470703125, -0.990234375, 3.78363037109375, 0.08868789672851562, 2.0797157287597656, 3.548797607421875, 4.394857406616211, -2.2318782806396484, 6.541961669921875, -3.1494598388671875, -0.5727157592773438, 0.1418743133544922, 3.9693679809570312, 1.6402416229248047, 2.7087020874023438, 3.1475677490234375, 0.45684814453125, -3.62921142578125, 0.26992034912109375, 0.40825653076171875, 0.9595565795898438, 1.1779766082763672, -0.0809783935546875, 2.0012893676757812, 2.4984588623046875, 3.2604141235351562, 6.340766906738281, 2.40777587890625, 3.1257057189941406, 3.1626052856445312, 0.6491889953613281, 5.198389053344727, 10.241116523742676, 2.190338134765625, 2.36175537109375, 1.4022903442382812, 2.3588714599609375, 4.445732116699219, 2.4800262451171875, 3.0326690673828125, 3.2040557861328125, 1.938436508178711, -0.25595664978027344, 1.468658447265625, 2.0131378173828125, 3.3748912811279297, 7.143226623535156, 2.207040786743164], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000198.npy"}
|
||||
{"epoch": 0.29931972789115646, "step": 199, "batch_size": 64, "mean": 1.5892664194107056, "std": 2.696166753768921, "min": -4.866367340087891, "p10": -2.063491058349609, "median": 1.6730775833129883, "p90": 4.845686340332032, "max": 5.911434173583984, "pos_frac": 0.765625, "sample": [2.145610809326172, 3.1525840759277344, 0.5427703857421875, 2.039031982421875, -1.0887012481689453, -2.1318511962890625, 5.475517272949219, 4.761318206787109, 2.110076904296875, 3.759490966796875, 3.5120086669921875, -0.557830810546875, 5.911434173583984, 3.839630126953125, 4.812812805175781, 4.858489990234375, 4.857563018798828, 3.0730972290039062, 3.6638031005859375, 4.361865997314453, -0.453125, -4.086326599121094, 2.4555397033691406, 3.7543792724609375, 0.31420135498046875, 1.0810165405273438, 1.6501636505126953, -1.2639389038085938, 2.2722625732421875, 1.2413711547851562, 1.8824234008789062, 0.6573295593261719, 5.847949981689453, 0.59234619140625, -0.6232995986938477, 0.4833984375, 1.16168212890625, -3.7938995361328125, -1.9039840698242188, 1.6959915161132812, -4.866367340087891, 3.4324951171875, 1.1075096130371094, 3.6388626098632812, -4.16607666015625, 4.228843688964844, 3.103090286254883, 1.5010452270507812, -0.13608169555664062, 0.0578460693359375, 0.09041595458984375, -3.7998428344726562, 0.26007080078125, 4.870174407958984, -3.1841964721679688, 0.5499038696289062, 4.817974090576172, 0.312103271484375, 4.21533203125, 2.3202762603759766, 4.574840545654297, -0.2048187255859375, 5.51885986328125, 1.4065876007080078], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000199.npy"}
|
||||
{"epoch": 0.30083144368858655, "step": 200, "batch_size": 64, "mean": 2.40299654006958, "std": 3.2138113975524902, "min": -6.6238555908203125, "p10": -0.9179630279541015, "median": 2.2614707946777344, "p90": 7.372082901000977, "max": 9.442886352539062, "pos_frac": 0.78125, "sample": [2.1063270568847656, 7.362285614013672, 8.753326416015625, 1.3912734985351562, 2.4979705810546875, 1.0874214172363281, 3.839324951171875, 0.44411468505859375, 3.147918701171875, -0.29931640625, -0.436920166015625, 3.1892242431640625, -1.2521934509277344, 9.271636962890625, 0.9042816162109375, -0.20425033569335938, 2.23065185546875, 0.29194068908691406, 5.739753723144531, -2.738800048828125, -0.470123291015625, 0.9125518798828125, 3.1446762084960938, 7.981136322021484, 0.5143013000488281, 3.4403820037841797, 1.5382022857666016, 1.5796966552734375, 2.4156150817871094, 0.4124412536621094, 5.914131164550781, 2.430267333984375, 2.4331741333007812, 7.37628173828125, 5.0689544677734375, -0.1913013458251953, -6.6238555908203125, 4.133975982666016, 8.166213989257812, 4.651744842529297, 6.5596771240234375, 2.2922897338867188, 9.442886352539062, 1.6486663818359375, 2.5100250244140625, -3.3200836181640625, -0.082855224609375, 1.88623046875, 2.3448944091796875, 8.469551086425781, 0.1704559326171875, 3.0921249389648438, 2.7787628173828125, -1.9620857238769531, 2.6486053466796875, 2.9072418212890625, -0.9486618041992188, -0.8463325500488281, 1.4831123352050781, 1.4546241760253906, 0.3313007354736328, -1.8003616333007812, 6.469970703125, 4.1072998046875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000200.npy"}
|
||||
{"epoch": 0.30234315948601664, "step": 201, "batch_size": 64, "mean": 1.9166680574417114, "std": 2.6434133052825928, "min": -2.9624786376953125, "p10": -1.2261070251464843, "median": 1.5383796691894531, "p90": 5.944896697998049, "max": 9.121299743652344, "pos_frac": 0.78125, "sample": [0.9502410888671875, 3.586090087890625, 1.4775924682617188, 0.6888580322265625, 0.2733039855957031, 3.08245849609375, 7.646053314208984, 0.4740028381347656, 1.8081741333007812, 0.22906494140625, 0.03781890869140625, 4.041421890258789, 1.814035415649414, 5.47222900390625, 0.6333522796630859, -1.17852783203125, 6.147468566894531, 3.5125045776367188, -0.4678802490234375, 9.121299743652344, -2.1491966247558594, 0.9206619262695312, 0.8600387573242188, -1.7702484130859375, 1.6084785461425781, 6.149158477783203, 1.7578277587890625, 2.0080490112304688, -1.968902587890625, 2.7619552612304688, 0.7927665710449219, 4.000701904296875, 1.5826034545898438, -0.047210693359375, 3.5306758880615234, 6.691135406494141, -1.2857666015625, -2.5765533447265625, 0.2479248046875, 1.4941558837890625, 3.8202552795410156, 3.5514488220214844, -0.919891357421875, 2.95947265625, -0.7360553741455078, 4.558280944824219, 1.3821334838867188, 1.313018798828125, 1.586151123046875, 5.10736083984375, -0.864227294921875, 2.502227783203125, 1.7987709045410156, 4.898342132568359, 1.7360343933105469, 6.4029083251953125, 4.522434234619141, -1.2464981079101562, 0.6103897094726562, -0.492279052734375, 7.220935821533203, 0.9538726806640625, 1.0063323974609375, -2.9624786376953125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000201.npy"}
|
||||
{"epoch": 0.30385487528344673, "step": 202, "batch_size": 64, "mean": 2.268734931945801, "std": 2.9935598373413086, "min": -4.681282043457031, "p10": -1.3789260864257809, "median": 2.388493537902832, "p90": 6.52961750030518, "max": 10.849437713623047, "pos_frac": 0.765625, "sample": [-0.9303951263427734, 1.3620452880859375, 7.4270477294921875, 7.41845703125, 3.97491455078125, -0.109527587890625, 2.9048538208007812, 3.1151962280273438, 2.0214996337890625, 2.7531967163085938, 0.4441375732421875, 2.8087425231933594, 0.4924049377441406, 1.8447303771972656, -0.49988555908203125, 3.350189208984375, 4.7797393798828125, -0.08271026611328125, 2.6235122680664062, -1.9744377136230469, 0.5865859985351562, 2.0596160888671875, 4.66912841796875, 2.4758758544921875, -1.5711536407470703, 4.445953369140625, 1.763397216796875, 10.849437713623047, 0.25643157958984375, 7.06764030456543, -0.21802902221679688, 3.7535839080810547, 1.661020278930664, 2.838531494140625, 1.9622573852539062, -4.681282043457031, 2.983489990234375, 2.9939308166503906, 2.4103851318359375, -2.9338150024414062, 4.09783935546875, 2.7232513427734375, 1.9104862213134766, 3.060455322265625, 3.0651092529296875, 2.6853256225585938, 5.27423095703125, 1.4395275115966797, 4.5186920166015625, -0.5178947448730469, -0.2070636749267578, -3.356304168701172, -0.48264312744140625, -2.0765857696533203, 1.6529693603515625, 2.424560546875, 8.284317016601562, 0.27570343017578125, 2.3666019439697266, 4.181064605712891, -2.546539306640625, 9.288589477539062, 2.2676830291748047, 7.772956848144531], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000202.npy"}
|
||||
{"epoch": 0.30536659108087677, "step": 203, "batch_size": 64, "mean": 1.7101508378982544, "std": 2.774703025817871, "min": -5.00970458984375, "p10": -1.0689348220825194, "median": 1.6705188751220703, "p90": 4.57501449584961, "max": 10.168018341064453, "pos_frac": 0.78125, "sample": [2.3214492797851562, 1.2137222290039062, -4.0179595947265625, -1.013885498046875, 0.29364013671875, 0.4886207580566406, -0.08419036865234375, 3.332763671875, -0.9942169189453125, 0.8143997192382812, 2.95361328125, 1.3075828552246094, 1.1796016693115234, 7.1956634521484375, 0.48116493225097656, 7.740234375, 4.4316253662109375, 0.18851852416992188, 4.285224914550781, 1.902618408203125, -5.00970458984375, -1.1501998901367188, 4.636466979980469, -0.65570068359375, -1.0925273895263672, -0.708770751953125, 0.9392127990722656, 7.6824188232421875, 1.894500732421875, 4.382118225097656, 3.5158920288085938, 2.5678558349609375, 4.167350769042969, 1.607858657836914, 0.161468505859375, 1.7496414184570312, 3.6348495483398438, 4.1053466796875, 5.6609649658203125, 0.7792263031005859, 3.1106643676757812, -2.6541900634765625, 1.8909378051757812, 1.9459915161132812, 2.5902328491210938, 0.30753326416015625, 3.7627716064453125, 4.274505615234375, 0.0549163818359375, 10.168018341064453, 1.7331790924072266, 0.5693893432617188, 0.39641571044921875, 2.2509384155273438, 1.8025360107421875, -3.569446563720703, -0.8867416381835938, 0.8628768920898438, 5.6514739990234375, 2.713724136352539, 1.9440383911132812, -1.7646026611328125, -0.9196739196777344, 0.3257026672363281], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000203.npy"}
|
||||
{"epoch": 0.30687830687830686, "step": 204, "batch_size": 64, "mean": 1.995117425918579, "std": 3.381747007369995, "min": -5.986236572265625, "p10": -2.070740699768066, "median": 1.9349822998046875, "p90": 6.187918090820313, "max": 12.20465087890625, "pos_frac": 0.6875, "sample": [12.20465087890625, 3.9265975952148438, -1.7455196380615234, 2.1533641815185547, 1.1942214965820312, 10.616363525390625, 1.9312210083007812, 1.9829120635986328, -2.379486083984375, -0.8602294921875, 5.1911468505859375, -3.1802444458007812, -1.3250198364257812, 4.630950927734375, 0.34464263916015625, 4.833137512207031, 3.9019699096679688, -0.212310791015625, -3.5366077423095703, -2.2101211547851562, 1.9270477294921875, 0.6738357543945312, 1.0436859130859375, 6.215553283691406, -0.2253398895263672, -0.64398193359375, 2.34771728515625, 3.5628890991210938, 0.715301513671875, 2.9031219482421875, 6.123435974121094, -0.808746337890625, 2.9858970642089844, 5.506683349609375, 3.348949432373047, -0.32285118103027344, 1.7629165649414062, 9.02714729309082, -2.3892478942871094, 7.096458435058594, -0.003185272216796875, 1.2602272033691406, 3.8227615356445312, 3.9170455932617188, -2.96343994140625, 2.4999847412109375, 1.9387435913085938, -0.4786224365234375, 2.799560546875, 0.09520339965820312, 6.8217315673828125, 6.823734283447266, 0.6194171905517578, 2.7703704833984375, 2.4685287475585938, 6.0324249267578125, 3.0587234497070312, 0.4663658142089844, -0.3460884094238281, 2.2069244384765625, -5.986236572265625, -1.4528579711914062, 3.7450428009033203, -0.7409515380859375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000204.npy"}
|
||||
{"epoch": 0.30839002267573695, "step": 205, "batch_size": 64, "mean": 1.7957442998886108, "std": 3.4763219356536865, "min": -7.947509765625, "p10": -2.207217216491699, "median": 1.2386856079101562, "p90": 6.444609832763672, "max": 9.80572509765625, "pos_frac": 0.71875, "sample": [9.80572509765625, 0.3042640686035156, -0.4953193664550781, 2.1327552795410156, 1.3362503051757812, 8.97610092163086, 1.7983245849609375, 1.977264404296875, -7.947509765625, 2.2190704345703125, 1.026123046875, 0.7512798309326172, 9.051300048828125, 2.577608108520508, 3.5265846252441406, -0.6888275146484375, -0.87835693359375, -0.6653900146484375, -0.392486572265625, 0.7230453491210938, 1.0519180297851562, 4.894004821777344, 0.2156505584716797, 4.37835693359375, 5.9617767333984375, -0.0270843505859375, -2.852752685546875, -1.1448822021484375, -0.5315971374511719, 6.343849182128906, 0.8215408325195312, -2.132375717163086, 3.768310546875, 1.6761245727539062, -1.6909561157226562, 3.854278564453125, 2.4698009490966797, 1.1411209106445312, 0.18596649169921875, 5.4504852294921875, 0.757171630859375, 6.48779296875, 3.9635543823242188, 1.87091064453125, 9.045166015625, 2.9876327514648438, 0.668426513671875, -0.2751312255859375, 6.9481658935546875, -4.156089782714844, 0.41829681396484375, 5.4280548095703125, 4.446651458740234, 2.0922775268554688, -2.465301513671875, 4.560295104980469, 3.4155406951904297, -2.439393997192383, 6.731441497802734, -2.2392921447753906, 0.72735595703125, -6.013677597045898, 0.8534698486328125, 2.142972946166992], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000205.npy"}
|
||||
{"epoch": 0.30990173847316704, "step": 206, "batch_size": 64, "mean": 1.5694081783294678, "std": 3.145664691925049, "min": -5.938749313354492, "p10": -1.9438224792480465, "median": 1.671147346496582, "p90": 5.819749450683595, "max": 8.19384765625, "pos_frac": 0.671875, "sample": [-1.0554027557373047, -2.0923004150390625, 0.7770004272460938, -3.2202835083007812, -3.0257129669189453, -0.1466064453125, 1.7155380249023438, 1.7032127380371094, 3.3729324340820312, 3.282398223876953, 1.6390819549560547, 8.19384765625, 4.9720458984375, -0.7403621673583984, -0.9698257446289062, 5.960479736328125, -1.1394195556640625, 2.0460433959960938, -0.469268798828125, 1.7423820495605469, 6.1787567138671875, 1.0166358947753906, 0.7084121704101562, 0.8606109619140625, -1.4468975067138672, 5.395965576171875, 2.7173824310302734, -5.938749313354492, 2.7733421325683594, 4.803016662597656, 5.4913787841796875, 5.388017654418945, 0.9396247863769531, 3.272083282470703, 4.394355773925781, 0.3770904541015625, -5.922966003417969, -2.8716583251953125, -1.2598190307617188, 3.2679061889648438, 4.407218933105469, 1.1870689392089844, 6.250335693359375, -1.5973739624023438, -0.9253158569335938, -1.460601806640625, 7.206331253051758, 3.296234130859375, 0.6452178955078125, 2.052001953125, 6.9028778076171875, 4.38079833984375, 3.571044921875, -0.5836029052734375, 2.8789901733398438, 3.1728897094726562, -0.24637985229492188, 0.3527793884277344, 1.724884033203125, 6.2364501953125, -4.459493637084961, -1.3858642578125, 0.958770751953125, 3.1865921020507812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000206.npy"}
|
||||
{"epoch": 0.31141345427059713, "step": 207, "batch_size": 64, "mean": 2.1917171478271484, "std": 3.2744922637939453, "min": -4.094085693359375, "p10": -1.744609260559082, "median": 2.1471996307373047, "p90": 6.15732421875, "max": 13.235809326171875, "pos_frac": 0.71875, "sample": [13.235809326171875, -0.5885353088378906, -1.2362079620361328, 1.0520591735839844, 2.0151519775390625, 1.9904251098632812, 1.621856689453125, 1.0593719482421875, -1.5005035400390625, 3.77703857421875, 2.622987747192383, 1.3739242553710938, -0.7326831817626953, 6.313758850097656, -1.0186080932617188, -0.4696502685546875, 2.1597137451171875, 5.20330810546875, 0.6170005798339844, 5.407680511474609, 5.820404052734375, 3.044403076171875, 2.615203857421875, 2.57989501953125, -1.2187919616699219, -3.2747955322265625, -2.047677993774414, 5.24896240234375, -0.6756134033203125, 1.3596878051757812, 3.6574783325195312, 0.378662109375, 7.451934814453125, 2.6434097290039062, 2.8213653564453125, -1.8121757507324219, -1.586954116821289, 3.9666748046875, 5.26617431640625, 0.9128036499023438, 6.6848602294921875, 6.1436309814453125, 1.1769828796386719, 2.1754913330078125, -0.25922393798828125, 3.2334060668945312, 2.3901939392089844, -2.1961936950683594, 3.846965789794922, 6.1631927490234375, -4.094085693359375, 5.130395889282227, -2.845062255859375, 3.3004894256591797, 2.27679443359375, 8.741958618164062, 5.63531494140625, 1.13311767578125, 1.7248077392578125, 2.134685516357422, -3.2173690795898438, -0.34979248046875, 2.673675537109375, 8.610702514648438], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000207.npy"}
|
||||
{"epoch": 0.3129251700680272, "step": 208, "batch_size": 64, "mean": 1.493491291999817, "std": 3.0418295860290527, "min": -4.0017547607421875, "p10": -2.342447280883789, "median": 1.2216205596923828, "p90": 5.757626914978029, "max": 8.61138916015625, "pos_frac": 0.6875, "sample": [0.5996265411376953, 0.47968292236328125, 5.875690460205078, -2.3608551025390625, 8.61138916015625, 6.029045104980469, 0.5434913635253906, -1.6403350830078125, 0.982147216796875, -4.0017547607421875, -0.532470703125, -2.11151123046875, 4.116352081298828, 2.289794921875, 5.072811126708984, 2.175689697265625, 2.2873382568359375, 3.651458740234375, -0.8970546722412109, -2.1016921997070312, 4.2011260986328125, -2.6984329223632812, 0.8014717102050781, 0.9128341674804688, 0.7984848022460938, 5.323183059692383, 3.750934600830078, -0.787200927734375, 3.414508819580078, 5.482145309448242, -1.2582950592041016, 1.1544113159179688, -0.3031158447265625, 2.1853179931640625, -3.63336181640625, 2.8067398071289062, 6.134784698486328, 1.326263427734375, 0.0015869140625, 1.2157630920410156, 7.3781585693359375, 1.3851318359375, 1.1069717407226562, -0.9029769897460938, 4.875202178955078, 1.2978363037109375, 4.6773834228515625, -0.8921585083007812, 2.5092239379882812, 1.4744377136230469, -3.5828399658203125, 2.8159713745117188, -0.03289794921875, 4.459022521972656, -3.192554473876953, 6.598258972167969, 1.22747802734375, -2.1355438232421875, -2.2994956970214844, 0.5774993896484375, -3.7463150024414062, 3.014141082763672, 1.732818603515625, 7.3406982421875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000208.npy"}
|
||||
{"epoch": 0.3144368858654573, "step": 209, "batch_size": 64, "mean": 2.363738536834717, "std": 3.5442471504211426, "min": -7.347785949707031, "p10": -1.694983673095703, "median": 2.1056671142578125, "p90": 6.938783264160157, "max": 10.680633544921875, "pos_frac": 0.75, "sample": [4.594291687011719, 6.407201766967773, 2.3440780639648438, 0.28766632080078125, -1.6970596313476562, 7.381591796875, 3.0256690979003906, 3.9677658081054688, 0.5792808532714844, 10.680633544921875, -1.5685501098632812, 2.2872161865234375, -0.429779052734375, 4.067657470703125, 0.745635986328125, 0.0876312255859375, -1.6901397705078125, 0.196929931640625, -1.5686531066894531, 8.845535278320312, 0.27207183837890625, -1.8920364379882812, 0.08733367919921875, 3.1881103515625, 1.3321533203125, 3.7502288818359375, 6.9815521240234375, 5.1959991455078125, 6.331085205078125, 1.4375762939453125, 3.392261505126953, 6.8389892578125, 1.9241180419921875, 5.784614562988281, 3.3559722900390625, -0.23214340209960938, 5.814929962158203, -0.9296112060546875, -7.347785949707031, 3.4995269775390625, -0.5998458862304688, 1.6499919891357422, -2.4430999755859375, -0.6132278442382812, 5.71795654296875, 3.2060985565185547, 9.933937072753906, 5.417934417724609, 2.5911407470703125, 4.8724822998046875, 1.573678970336914, -2.5330848693847656, 0.13293838500976562, -0.081634521484375, -1.724151611328125, 1.6975231170654297, 8.374622344970703, -5.2989044189453125, 4.515411376953125, 3.0753040313720703, 4.721088409423828, 7.369054794311523, 0.9500541687011719, 1.4444503784179688], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000209.npy"}
|
||||
{"epoch": 0.31594860166288735, "step": 210, "batch_size": 64, "mean": 2.548922061920166, "std": 3.558483600616455, "min": -4.309635162353516, "p10": -2.413394737243652, "median": 2.139659881591797, "p90": 7.438258361816406, "max": 10.370109558105469, "pos_frac": 0.75, "sample": [8.263778686523438, 0.5781097412109375, 1.98785400390625, 4.0176239013671875, -0.7123203277587891, 4.2187347412109375, 10.370109558105469, 3.9013519287109375, 10.025856018066406, 7.172821044921875, 2.3543739318847656, 6.5156097412109375, 5.447967529296875, -2.7311859130859375, 8.471900939941406, 0.3198051452636719, -3.2047863006591797, -2.517637252807617, 6.255638122558594, -3.0053176879882812, 7.3470306396484375, 2.1210556030273438, 1.3523197174072266, -0.192108154296875, 5.83697509765625, 0.39803314208984375, -0.33402061462402344, 1.6568412780761719, 7.47735595703125, 0.0299224853515625, 9.174385070800781, -0.06275558471679688, 4.010894775390625, 1.965057373046875, 2.6492347717285156, -0.04048919677734375, 0.7081794738769531, -0.00135040283203125, -0.22244644165039062, 7.241289138793945, -2.1701622009277344, 5.0177001953125, 2.256011962890625, 2.4227066040039062, 4.5613555908203125, 0.36218833923339844, 5.05351448059082, -0.16494369506835938, 4.258697509765625, 1.2336044311523438, 1.1005516052246094, 3.4550037384033203, 4.061847686767578, -4.298700332641602, 1.3057937622070312, 7.63421630859375, 4.985687255859375, 2.15826416015625, 4.387462615966797, -4.309635162353516, 0.6633129119873047, -3.6216049194335938, 1.106414794921875, 2.826028823852539], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000210.npy"}
|
||||
{"epoch": 0.31746031746031744, "step": 211, "batch_size": 64, "mean": 2.1951539516448975, "std": 3.033905506134033, "min": -4.715545654296875, "p10": -1.3344444274902338, "median": 1.8983449935913086, "p90": 5.792264556884766, "max": 13.344696044921875, "pos_frac": 0.8125, "sample": [0.7254886627197266, -4.715545654296875, 3.9763336181640625, 1.060037612915039, -0.02780914306640625, 1.396749496459961, -1.51397705078125, 2.03192138671875, 1.456390380859375, 1.2715930938720703, 4.6317138671875, 0.3466300964355469, 1.09197998046875, 5.814033508300781, 1.6037254333496094, 4.8726348876953125, 0.6334247589111328, -0.2736778259277344, 4.4766387939453125, 0.6257667541503906, 2.0554122924804688, -1.8585205078125, 2.3598098754882812, -2.2838287353515625, 2.4047393798828125, -2.2036914825439453, 0.9452972412109375, 2.7135162353515625, 8.41500473022461, 1.952260971069336, 3.1425819396972656, 1.4278411865234375, 2.0608062744140625, 0.3045196533203125, 5.33282470703125, 2.4066734313964844, 3.7004737854003906, 0.8208160400390625, -0.8780250549316406, 3.8724594116210938, 2.5230560302734375, 13.344696044921875, 8.702667236328125, 0.2510509490966797, -0.9139938354492188, 3.7649269104003906, 5.689083099365234, 2.531158447265625, 2.1247100830078125, 5.658538818359375, -2.7555313110351562, 1.8444290161132812, 6.7584991455078125, 5.847204208374023, 5.7414703369140625, 5.821846008300781, 1.3578376770019531, 4.239435195922852, 0.08692741394042969, 4.0170135498046875, -0.9155349731445312, -2.0066070556640625, 0.2970008850097656, 0.30493927001953125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000211.npy"}
|
||||
{"epoch": 0.31897203325774753, "step": 212, "batch_size": 64, "mean": 2.333421230316162, "std": 3.571045398712158, "min": -7.7138824462890625, "p10": -2.206197357177734, "median": 2.2100677490234375, "p90": 6.578324890136719, "max": 12.315887451171875, "pos_frac": 0.75, "sample": [6.818088531494141, 4.466396331787109, 5.69293212890625, 6.389863967895508, 0.15988540649414062, 1.988800048828125, 9.096332550048828, 1.9532413482666016, -2.7637481689453125, 3.9965057373046875, -1.0672111511230469, -7.7138824462890625, 6.659093856811523, 12.315887451171875, 2.3452835083007812, 5.059173583984375, 5.224689483642578, 2.1632213592529297, -3.2762374877929688, 3.716318130493164, 2.0177688598632812, 0.5911598205566406, -0.10641860961914062, 2.083433151245117, -3.4955596923828125, 0.2264404296875, 4.529998779296875, -3.3382511138916016, 7.154874801635742, 7.405914306640625, 2.2665939331054688, 3.8811073303222656, 4.323265075683594, 3.1966323852539062, 5.5940399169921875, 0.7934188842773438, -1.0521507263183594, 1.9985198974609375, 0.2910747528076172, 2.2569141387939453, -0.22588348388671875, -0.4530487060546875, 2.7166099548339844, -2.0712356567382812, 2.7307491302490234, 3.053009033203125, 3.0430908203125, 10.415596008300781, -2.9984169006347656, -1.3177337646484375, -0.32231712341308594, 1.5510711669921875, 3.9905624389648438, -1.1689872741699219, 2.5518646240234375, 5.773778915405273, 4.531791687011719, 6.047782897949219, 5.617069244384766, 0.9957427978515625, 0.912261962890625, 1.9569549560546875, -2.2640380859375, 0.42926788330078125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000212.npy"}
|
||||
{"epoch": 0.3204837490551776, "step": 213, "batch_size": 64, "mean": 2.6240556240081787, "std": 3.8336892127990723, "min": -7.680259704589844, "p10": -2.5943632125854488, "median": 2.1614913940429688, "p90": 8.18192939758301, "max": 10.934722900390625, "pos_frac": 0.765625, "sample": [4.763893127441406, -1.24755859375, 1.7180938720703125, 5.554683685302734, 1.5387954711914062, 1.9546928405761719, -2.9476699829101562, -0.5537681579589844, -3.095947265625, 6.188621520996094, 5.58758544921875, 1.3103828430175781, 0.16736602783203125, -2.7661991119384766, -7.680259704589844, -3.6697845458984375, 4.350584030151367, 10.084381103515625, -0.8574409484863281, 2.4378433227539062, 6.801990509033203, -3.9081268310546875, 8.402641296386719, 9.003028869628906, 7.053802490234375, 2.1949539184570312, -0.6913890838623047, -2.1934127807617188, 2.0798873901367188, -0.13892173767089844, -3.1441478729248047, 0.598175048828125, 3.845916748046875, 3.2253971099853516, 2.4932861328125, 2.994903564453125, 1.4658699035644531, 4.870033264160156, 6.105098724365234, 2.1279449462890625, 1.4166145324707031, 4.253166198730469, 9.676395416259766, 0.603668212890625, 5.613502502441406, 3.1683349609375, 1.9474716186523438, 5.139102935791016, 9.451095581054688, -0.16936492919921875, -1.4444580078125, 3.1713790893554688, 0.629180908203125, 2.4425811767578125, 1.8465118408203125, 2.1280288696289062, 0.80718994140625, 10.934722900390625, 2.6878738403320312, 0.8912200927734375, 3.7745361328125, 9.135612487792969, 6.143043518066406, 7.666934967041016], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000213.npy"}
|
||||
{"epoch": 0.3219954648526077, "step": 214, "batch_size": 64, "mean": 2.552309036254883, "std": 3.59794545173645, "min": -3.993438720703125, "p10": -1.4879940032958985, "median": 2.2590951919555664, "p90": 8.023617553710938, "max": 14.121963500976562, "pos_frac": 0.75, "sample": [3.9820003509521484, 0.8028182983398438, -2.0024871826171875, 2.8505592346191406, 1.4154052734375, 4.385734558105469, 1.0554008483886719, 0.5989494323730469, 3.27703857421875, 4.869178771972656, 7.901622772216797, 3.4338722229003906, 2.1591567993164062, 2.6276702880859375, 2.5230941772460938, 8.399459838867188, 2.5976295471191406, -1.4518356323242188, 4.915504455566406, 2.3590335845947266, -0.6370315551757812, 4.5709381103515625, 0.8299026489257812, 0.6523590087890625, -0.7244758605957031, 5.37141227722168, -2.0326080322265625, 2.367208480834961, 8.07590103149414, 4.612220764160156, 2.9565048217773438, -0.3491477966308594, -3.452432632446289, 1.1474533081054688, 4.693962097167969, 6.031978607177734, -3.993438720703125, 0.03530120849609375, 8.327754974365234, 14.121963500976562, 0.5407791137695312, -1.7964763641357422, 6.631786346435547, 2.7095184326171875, 9.104717254638672, 10.454254150390625, 1.0928764343261719, 0.41477203369140625, 1.6090927124023438, -1.5034904479980469, -0.79681396484375, 8.495681762695312, 2.049213409423828, -0.9234676361083984, 2.034738540649414, -0.5497531890869141, 6.830902099609375, 2.0501556396484375, -3.8072738647460938, 2.5514678955078125, 3.82806396484375, 4.6511383056640625, -0.9092063903808594, -0.7204303741455078], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000214.npy"}
|
||||
{"epoch": 0.3235071806500378, "step": 215, "batch_size": 64, "mean": 2.9051353931427, "std": 3.5608503818511963, "min": -3.97100830078125, "p10": -1.4038661956787108, "median": 2.70285701751709, "p90": 7.695291137695312, "max": 11.229595184326172, "pos_frac": 0.765625, "sample": [1.1440620422363281, 0.034854888916015625, -3.8625564575195312, 2.612579345703125, 5.490394592285156, -0.841796875, 2.77703857421875, -2.236083984375, 8.006658554077148, 7.574028015136719, 1.3490676879882812, -1.4757156372070312, -2.2962799072265625, 1.3455085754394531, 3.7860107421875, 8.981521606445312, 0.8900375366210938, 4.593875885009766, 0.15996551513671875, 1.434295654296875, 4.0953216552734375, 4.641044616699219, 1.4322090148925781, 3.117950439453125, 6.281940460205078, 0.6544914245605469, -1.0915756225585938, 3.14453125, 2.76513671875, -2.3732948303222656, -1.2362174987792969, 7.699470520019531, 0.9620933532714844, 1.4397430419921875, -1.1237869262695312, -3.97100830078125, 3.7436485290527344, 3.0416488647460938, 5.1567535400390625, 8.18634033203125, 2.6405773162841797, 2.1756973266601562, -0.899871826171875, -0.7822341918945312, 6.113800048828125, 8.82476806640625, 7.385093688964844, 11.229595184326172, 6.979691505432129, 7.685455322265625, 5.334659576416016, 7.685539245605469, 4.969581604003906, -1.6463775634765625, 1.807424545288086, 0.9080123901367188, 3.86322021484375, 8.883834838867188, 5.288215637207031, -0.010540008544921875, 7.002838134765625, 1.4669818878173828, 3.5117950439453125, -0.5229949951171875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000215.npy"}
|
||||
{"epoch": 0.3250188964474679, "step": 216, "batch_size": 64, "mean": 3.094120979309082, "std": 5.127089500427246, "min": -4.1276092529296875, "p10": -1.4908164978027343, "median": 2.1812610626220703, "p90": 7.579996871948243, "max": 30.877349853515625, "pos_frac": 0.765625, "sample": [1.6351318359375, 6.3650054931640625, 6.572235107421875, 8.999801635742188, 4.498815536499023, 1.2382354736328125, 2.7273902893066406, 0.839996337890625, 4.815773010253906, 7.332359313964844, 3.5747756958007812, 3.3116683959960938, 0.4978485107421875, 0.13599395751953125, 2.9056777954101562, -2.3018875122070312, 1.2269725799560547, 3.483978271484375, -1.234954833984375, 5.420707702636719, 4.653778076171875, 6.9479827880859375, 0.5818099975585938, 5.044578552246094, 3.9606704711914062, 1.5592288970947266, 7.646305084228516, 1.472055435180664, 1.022684097290039, 9.506881713867188, 3.9828338623046875, -3.672271728515625, 1.0597476959228516, 3.364826202392578, -1.0982818603515625, 4.276336669921875, -4.1276092529296875, 0.41904640197753906, 5.746891021728516, -1.5950698852539062, -0.8654861450195312, 30.877349853515625, 12.722923278808594, -1.5058975219726562, 0.1336212158203125, -3.4589385986328125, 4.144290924072266, 0.5901889801025391, -1.3929977416992188, 7.4252777099609375, 10.778331756591797, -3.888195037841797, -1.45562744140625, 0.18124008178710938, 7.316383361816406, -1.284555435180664, 1.5921039581298828, 3.0946426391601562, -0.9585647583007812, 8.510406494140625, -0.9815654754638672, 6.475608825683594, 1.2469406127929688, 5.9282989501953125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000216.npy"}
|
||||
{"epoch": 0.32653061224489793, "step": 217, "batch_size": 64, "mean": 3.4880096912384033, "std": 3.9669716358184814, "min": -4.6231231689453125, "p10": -0.7229171752929686, "median": 2.750293731689453, "p90": 9.4463586807251, "max": 16.07085418701172, "pos_frac": 0.828125, "sample": [4.6061553955078125, 9.779090881347656, -0.783355712890625, 1.8910293579101562, 0.40668296813964844, -0.8739814758300781, 0.7929821014404297, 16.07085418701172, -0.12677001953125, 0.810302734375, 9.699996948242188, 10.05133056640625, 2.813720703125, 1.8357658386230469, 3.0361099243164062, 2.7467498779296875, 3.3581924438476562, 1.4499855041503906, 1.5632171630859375, 4.3190460205078125, 0.9154415130615234, 4.486391067504883, 2.1365814208984375, 5.769598007202148, 4.585456848144531, 8.090703964233398, -2.3266029357910156, 1.3315505981445312, 0.9551162719726562, 5.102931976318359, 8.854536056518555, 3.658355712890625, 2.970245361328125, -2.20721435546875, 3.6002120971679688, -1.8687095642089844, 1.530914306640625, 6.939535140991211, -0.35572052001953125, -0.493194580078125, -4.6231231689453125, 8.057825088500977, 3.5880355834960938, -1.512847900390625, 0.8827133178710938, 1.8388595581054688, 2.7538375854492188, 2.180694580078125, 2.2917098999023438, 13.887283325195312, 1.6318988800048828, 6.843540191650391, 2.3730525970458984, 0.7184371948242188, 10.720748901367188, 5.339263916015625, 11.113716125488281, 6.4553680419921875, 2.2695388793945312, 7.21363639831543, 3.3377456665039062, 5.6215972900390625, 3.7077369689941406, -0.5818939208984375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000217.npy"}
|
||||
{"epoch": 0.328042328042328, "step": 218, "batch_size": 64, "mean": 1.7742631435394287, "std": 4.150019645690918, "min": -10.344764709472656, "p10": -2.9621599197387694, "median": 1.7916374206542969, "p90": 5.750675964355469, "max": 13.652938842773438, "pos_frac": 0.734375, "sample": [1.7302474975585938, 0.7161960601806641, -0.9728622436523438, 2.4090805053710938, 1.85302734375, 0.4614105224609375, 0.6307449340820312, 0.2529754638671875, 0.4996795654296875, 5.376518249511719, 2.1343765258789062, 1.424224853515625, 5.139789581298828, -6.0933685302734375, -2.9644699096679688, -2.9567699432373047, 1.0441513061523438, 6.023017883300781, -6.9075927734375, -0.4915618896484375, 3.5650634765625, 2.97369384765625, 2.7550048828125, 1.4624767303466797, 4.4682159423828125, 13.652938842773438, 4.363300323486328, 8.634429931640625, 5.576940536499023, 5.370147705078125, -2.5100746154785156, 5.75579833984375, 2.4147796630859375, -4.605152130126953, -0.8465080261230469, -0.459808349609375, 2.3950881958007812, 2.651063919067383, 0.3973846435546875, 5.7387237548828125, 8.774616241455078, 4.7171630859375, 0.020952224731445312, -0.47376251220703125, 1.0645751953125, -2.072132110595703, 4.802448272705078, 0.5541954040527344, -4.13909912109375, -1.7855072021484375, 3.547880172729492, -6.538177490234375, 5.244911193847656, 0.12544631958007812, 4.860557556152344, 5.542144775390625, 3.155853271484375, 8.766868591308594, -10.344764709472656, -1.7977294921875, 2.403972625732422, 1.120086669921875, 4.3281097412109375, 8.611907958984375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000218.npy"}
|
||||
{"epoch": 0.3295540438397581, "step": 219, "batch_size": 64, "mean": 1.7954490184783936, "std": 3.468604803085327, "min": -5.3824462890625, "p10": -2.9252685546875, "median": 1.4351749420166016, "p90": 6.2366985321044925, "max": 10.317527770996094, "pos_frac": 0.671875, "sample": [5.282928466796875, 6.151081085205078, 4.5714263916015625, 0.7103919982910156, 1.2919921875, -0.60205078125, -0.7648353576660156, 0.6540775299072266, 3.7658309936523438, -4.11639404296875, -0.11906814575195312, 9.634269714355469, 3.722625732421875, -1.3117237091064453, -2.817474365234375, 3.6949005126953125, 1.4228515625, 1.6819133758544922, 1.7662925720214844, -0.1607379913330078, -0.04826927185058594, -3.2862548828125, -0.9356613159179688, 1.1020965576171875, -0.6189727783203125, -1.788238525390625, 1.30426025390625, -0.6802577972412109, 3.3674774169921875, 3.7526779174804688, 3.8518104553222656, 7.6282501220703125, 7.7699127197265625, 7.162679672241211, 3.9111785888671875, 5.986198425292969, 0.40514373779296875, 3.670499801635742, -3.5101547241210938, 1.4474983215332031, 0.4496326446533203, 0.230743408203125, -3.7566680908203125, 1.675140380859375, 0.7290878295898438, -2.971466064453125, 4.195945739746094, -1.3730926513671875, 2.85858154296875, 10.317527770996094, 3.4197998046875, 4.184837341308594, 4.298969268798828, -0.2568511962890625, -4.510211944580078, 0.5656890869140625, 3.593414306640625, 3.8722381591796875, 4.7047882080078125, -2.3938980102539062, 6.2733917236328125, -5.3824462890625, 7.190761566162109, 2.0426483154296875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000219.npy"}
|
||||
{"epoch": 0.3310657596371882, "step": 220, "batch_size": 64, "mean": 2.509706497192383, "std": 3.947611093521118, "min": -6.307807922363281, "p10": -2.6290647506713865, "median": 2.364408493041992, "p90": 8.029058647155766, "max": 11.397018432617188, "pos_frac": 0.734375, "sample": [-0.6865520477294922, -2.2137908935546875, 2.2532520294189453, 2.3480186462402344, 0.6481208801269531, 6.06231689453125, 10.369842529296875, 1.7899322509765625, 3.2819080352783203, 3.5916519165039062, 0.292633056640625, 11.397018432617188, 0.9028892517089844, -3.4383544921875, -2.1055736541748047, 1.312591552734375, 8.8602294921875, -0.826263427734375, -1.4244270324707031, 0.7483634948730469, 5.648448944091797, 3.7943458557128906, -1.1487178802490234, -3.5149097442626953, 0.9474563598632812, -2.807039260864258, -0.4001026153564453, -3.0240325927734375, 4.383819580078125, 5.3590240478515625, 3.8986587524414062, 1.6402053833007812, 3.3248748779296875, 5.374540328979492, 9.13182258605957, 1.1463699340820312, 5.903900146484375, 5.126773834228516, 8.557863235473633, -1.2863807678222656, 1.2781600952148438, 3.2557010650634766, 1.0945968627929688, 2.5795650482177734, 10.280815124511719, -0.40948486328125, 6.3737640380859375, 0.30121803283691406, 2.473978042602539, 4.692005157470703, 4.619474411010742, -5.338232040405273, 6.187381744384766, 0.4625835418701172, 3.241241455078125, 2.38079833984375, -6.307807922363281, 6.7951812744140625, 4.966228485107422, -0.3407878875732422, 10.762245178222656, -2.979978561401367, 5.581394195556641, 3.4504547119140625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000220.npy"}
|
||||
{"epoch": 0.3325774754346183, "step": 221, "batch_size": 64, "mean": 2.737396717071533, "std": 4.246990203857422, "min": -6.54271125793457, "p10": -1.7252260208129881, "median": 2.393967628479004, "p90": 8.45634231567383, "max": 14.23611068725586, "pos_frac": 0.71875, "sample": [-0.8294849395751953, -1.757223129272461, 1.1204147338867188, 2.3949432373046875, -0.7840766906738281, 8.615119934082031, 1.0124626159667969, -0.02435302734375, 3.8978118896484375, -1.6505661010742188, -0.3405799865722656, 3.3811264038085938, 3.493206024169922, 3.3605194091796875, -1.6007308959960938, 1.1104106903076172, -0.95062255859375, 0.07837486267089844, 2.3929920196533203, 3.7975311279296875, 4.965147018432617, 5.692169189453125, 5.823982238769531, 6.14825439453125, 9.7984619140625, 3.3618698120117188, 5.369863510131836, 14.23611068725586, -1.8644943237304688, 0.2454357147216797, 5.7588348388671875, 8.085861206054688, 0.6526966094970703, 3.504047393798828, 13.475517272949219, 2.7562332153320312, 5.22564697265625, -3.5297393798828125, 3.2802200317382812, 4.8369598388671875, 4.072864532470703, -2.422454833984375, 1.8400535583496094, -0.4114704132080078, -0.19841766357421875, -1.5716629028320312, 1.5528488159179688, 10.870208740234375, 0.5606002807617188, -0.012058258056640625, -4.187835693359375, -6.54271125793457, 0.5980987548828125, 11.676376342773438, 3.600250244140625, -1.9043331146240234, 2.498809814453125, 1.8347320556640625, 13.482376098632812, 3.4508209228515625, 0.0384521484375, 2.008575439453125, 6.558937072753906, 3.259979248046875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000221.npy"}
|
||||
{"epoch": 0.3340891912320484, "step": 222, "batch_size": 64, "mean": 3.366135358810425, "std": 3.9861903190612793, "min": -5.7934722900390625, "p10": -1.8990640640258785, "median": 3.1808433532714844, "p90": 8.12291202545166, "max": 12.77490234375, "pos_frac": 0.8125, "sample": [-1.484588623046875, -5.7934722900390625, 4.633754730224609, 4.11114501953125, 5.795970916748047, 8.154417037963867, 5.756793975830078, 0.10061264038085938, 6.9465179443359375, 11.397407531738281, 3.84393310546875, 1.9835433959960938, -3.5770187377929688, 0.8629074096679688, 4.583829879760742, 0.04076385498046875, 2.380035400390625, 2.760845184326172, -0.07747650146484375, 6.294597625732422, 1.9670867919921875, 0.09247398376464844, 6.09619140625, 5.306983947753906, -2.5044631958007812, 5.430408477783203, 1.7050247192382812, 12.77490234375, 0.18121910095214844, 4.755216598510742, 2.9461441040039062, 2.671600341796875, 5.895477294921875, 6.6584625244140625, 5.8126068115234375, 6.2256317138671875, 4.258876800537109, -1.2330780029296875, -2.0766963958740234, -0.603546142578125, -3.3237075805664062, 8.049400329589844, 1.0907058715820312, 5.091133117675781, 1.5060157775878906, 10.69635009765625, 1.0151290893554688, -2.8593063354492188, -0.8402023315429688, 6.310161590576172, 3.8365707397460938, 0.22737503051757812, 7.71533203125, 10.112676620483398, 2.2121429443359375, 1.9831275939941406, 6.190559387207031, 3.4155426025390625, 2.575836181640625, 9.295196533203125, 5.664337158203125, 2.785125732421875, 10.816192626953125, -3.2080841064453125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000222.npy"}
|
||||
{"epoch": 0.3356009070294785, "step": 223, "batch_size": 64, "mean": 3.4662692546844482, "std": 3.781604290008545, "min": -2.5215797424316406, "p10": -1.280682373046875, "median": 3.960468292236328, "p90": 8.42883758544922, "max": 15.288253784179688, "pos_frac": 0.765625, "sample": [5.330738067626953, 4.521604537963867, 4.530342102050781, 3.9377212524414062, -1.8712310791015625, 5.396144866943359, -0.603424072265625, 1.9090728759765625, 2.7136497497558594, 4.619443893432617, 1.019287109375, 10.526504516601562, 4.668727874755859, 5.738624572753906, -2.5215797424316406, -1.7704048156738281, 0.046398162841796875, 5.477142333984375, 4.379852294921875, -1.0281600952148438, 7.478656768798828, 4.097404479980469, -1.2254981994628906, 3.098316192626953, -0.20394134521484375, 3.681060791015625, 0.4730510711669922, 6.3662872314453125, -1.504119873046875, -2.2550430297851562, 4.1978912353515625, 4.567516326904297, -0.7499847412109375, 2.9088401794433594, 3.16552734375, 10.782480239868164, 7.9647674560546875, 7.463741302490234, -0.2762908935546875, -1.3043327331542969, -0.5097579956054688, 0.9014301300048828, 4.1345977783203125, 2.522003173828125, 8.976776123046875, 1.5036430358886719, 8.640533447265625, 8.55589485168457, 3.98321533203125, 7.474250793457031, 15.288253784179688, -2.0260772705078125, 8.249458312988281, 8.505714416503906, 5.472023010253906, 4.002464294433594, 4.243934631347656, 5.2265472412109375, 1.8545112609863281, -1.1125049591064453, 0.18036651611328125, 7.946868896484375, 1.7183418273925781, 0.361968994140625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000223.npy"}
|
||||
{"epoch": 0.3371126228269085, "step": 224, "batch_size": 64, "mean": 2.1558704376220703, "std": 3.4956471920013428, "min": -4.421781539916992, "p10": -1.8790872573852537, "median": 1.7228431701660156, "p90": 6.236273956298828, "max": 14.995410919189453, "pos_frac": 0.71875, "sample": [2.385162353515625, 10.491363525390625, 5.236616134643555, 2.8606109619140625, 0.578094482421875, 7.736217498779297, 4.914173126220703, 0.8999290466308594, -0.06151390075683594, 1.7163047790527344, 6.265712738037109, 2.0491371154785156, 6.167583465576172, 7.911418914794922, -1.749795913696289, 0.7514820098876953, 1.626321792602539, 3.5587806701660156, 6.078056335449219, 0.8914337158203125, 0.40810394287109375, 5.484016418457031, 2.5884552001953125, -4.421781539916992, -1.1523895263671875, -1.1756210327148438, -0.12236785888671875, -2.6645431518554688, 2.72381591796875, 0.7862091064453125, 3.790660858154297, -3.0605125427246094, -1.9344978332519531, -0.4256591796875, -3.2724990844726562, 0.23009490966796875, -0.3177013397216797, -2.5271682739257812, 2.364704132080078, 3.826812744140625, 0.01641845703125, 2.8968772888183594, 3.393352508544922, -0.08395004272460938, -1.1251068115234375, 0.9820327758789062, 4.805776596069336, 2.615581512451172, -1.0771865844726562, 0.3481903076171875, -0.9875450134277344, 2.8609275817871094, 4.971551895141602, 4.337375640869141, 14.995410919189453, 2.7309799194335938, 1.6961746215820312, 6.1500244140625, 0.3888111114501953, 8.063831329345703, -1.9982147216796875, 6.6031494140625, 2.2266387939453125, 1.7293815612792969], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000224.npy"}
|
||||
{"epoch": 0.3386243386243386, "step": 225, "batch_size": 64, "mean": 3.392397403717041, "std": 4.3610663414001465, "min": -6.6950225830078125, "p10": -2.288543701171875, "median": 2.5553436279296875, "p90": 9.27353515625, "max": 14.985061645507812, "pos_frac": 0.765625, "sample": [0.4386005401611328, 2.50152587890625, -0.045291900634765625, 2.960418701171875, 1.76641845703125, 7.424835205078125, -0.17701339721679688, -0.111846923828125, 2.7881336212158203, 5.09185791015625, 2.5004348754882812, 8.039093017578125, 1.251495361328125, 7.063629150390625, -2.3680381774902344, 1.9058265686035156, 2.28533935546875, 2.3274612426757812, 4.252628326416016, 9.39508056640625, 11.335914611816406, 6.409345626831055, 11.592292785644531, 3.9875640869140625, 14.985061645507812, 2.131908416748047, 2.609161376953125, 3.7544898986816406, 10.2392578125, -2.4374771118164062, 1.5164222717285156, -6.6950225830078125, 3.0156936645507812, -2.981029510498047, 8.927536010742188, 6.378631591796875, 1.2557296752929688, 7.4857177734375, 2.831756591796875, 0.89825439453125, -0.11969757080078125, 2.1794281005859375, 10.453926086425781, -0.2066974639892578, 8.98992919921875, -2.336151123046875, -2.881011962890625, -0.7454833984375, 8.797782897949219, 1.2229461669921875, 5.282600402832031, 0.3331165313720703, 4.530143737792969, 8.039169311523438, 8.858707427978516, -0.7464046478271484, 1.06072998046875, 3.4046401977539062, 10.54482650756836, 2.0404300689697266, -4.095855712890625, 3.016876220703125, 5.1351318359375, -2.177459716796875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000225.npy"}
|
||||
{"epoch": 0.3401360544217687, "step": 226, "batch_size": 64, "mean": 3.0040855407714844, "std": 4.363777160644531, "min": -6.805572509765625, "p10": -1.1295841217041016, "median": 2.3725128173828125, "p90": 8.853445434570313, "max": 12.002388000488281, "pos_frac": 0.765625, "sample": [3.2455902099609375, -0.2094879150390625, 2.1374359130859375, 6.303659439086914, 3.362091064453125, 1.9926223754882812, 8.168128967285156, 2.585613250732422, 1.1304340362548828, -2.5446853637695312, 5.271659851074219, 0.5860843658447266, 3.0268096923828125, 8.221429824829102, -1.0427703857421875, -0.25170135498046875, 3.1841869354248047, -1.0773239135742188, 10.351432800292969, 6.7151947021484375, 9.439334869384766, 0.2462921142578125, 5.3362579345703125, 0.6738128662109375, 3.6275253295898438, 0.6674976348876953, 0.9888839721679688, 8.97174072265625, 8.15411376953125, 6.897918701171875, -0.9900875091552734, 7.5749053955078125, 2.9803123474121094, 1.7372817993164062, 10.425315856933594, 2.035919189453125, 4.056739807128906, -6.3997344970703125, -1.1519813537597656, 0.42659950256347656, 1.1391143798828125, 5.1202239990234375, -0.9831886291503906, -6.550899505615234, 10.646133422851562, 11.676719665527344, -6.805572509765625, 4.7652587890625, -5.808055877685547, 3.9196624755859375, 1.2917518615722656, 2.159412384033203, 1.3253822326660156, 4.847833633422852, 1.9661712646484375, 12.002388000488281, 6.5955810546875, 6.6177825927734375, 0.8072052001953125, -0.3748321533203125, 8.577423095703125, -1.3168163299560547, -1.009622573852539, 4.79736328125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000226.npy"}
|
||||
{"epoch": 0.3416477702191988, "step": 227, "batch_size": 64, "mean": 2.857992172241211, "std": 4.672081470489502, "min": -6.860801696777344, "p10": -2.205860900878906, "median": 2.1080894470214844, "p90": 8.643426895141602, "max": 12.90639877319336, "pos_frac": 0.6875, "sample": [8.516387939453125, 2.4206008911132812, -0.475860595703125, 8.013725280761719, 1.1655120849609375, -2.2362518310546875, 5.922080993652344, 12.551841735839844, -6.860801696777344, -2.851869583129883, 10.149543762207031, -1.2663040161132812, -0.9659442901611328, 8.775943756103516, 7.6645660400390625, 0.6583023071289062, 0.4105682373046875, 0.064971923828125, 8.697872161865234, 0.142059326171875, 3.083587646484375, 5.8828125, 3.2191085815429688, 2.783021926879883, 12.90639877319336, 3.1445884704589844, -1.45733642578125, 7.379554748535156, -1.3101940155029297, -0.23587417602539062, 5.3235015869140625, 11.385759353637695, 6.5608673095703125, -0.7402496337890625, 1.504364013671875, 6.238628387451172, 11.837566375732422, -0.193817138671875, 4.185096740722656, 8.38995361328125, 2.1238784790039062, 2.0923004150390625, -1.8567962646484375, 1.560943603515625, 3.3150177001953125, -0.32978057861328125, 0.8740234375, 2.4009170532226562, -4.0884246826171875, -0.4132518768310547, -5.3405303955078125, 5.534191131591797, 6.08294677734375, 7.3820037841796875, 1.0280170440673828, -2.13494873046875, 1.983551025390625, 0.6511917114257812, 7.061029434204102, -5.43658447265625, -0.59307861328125, -5.00518798828125, 7.405862808227539, 8.229911804199219], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000227.npy"}
|
||||
{"epoch": 0.3431594860166289, "step": 228, "batch_size": 64, "mean": 2.9821906089782715, "std": 4.681351184844971, "min": -7.081630706787109, "p10": -2.955514144897461, "median": 2.7143497467041016, "p90": 9.803631591796876, "max": 15.9058837890625, "pos_frac": 0.765625, "sample": [0.09976005554199219, 9.713857650756836, 2.0043716430664062, 10.074670791625977, 4.302837371826172, 3.4615325927734375, 1.9352474212646484, 10.685792922973633, 1.34185791015625, -5.237861633300781, 3.597951889038086, -1.1628494262695312, 3.813751220703125, -7.081630706787109, 1.877685546875, 1.3601303100585938, 5.572654724121094, -1.6115036010742188, 3.5259971618652344, 3.659975051879883, 0.1684741973876953, 9.814910888671875, -2.277069091796875, 4.6747894287109375, -3.08856201171875, 4.9430999755859375, 5.839632034301758, 15.9058837890625, 1.3763923645019531, 5.476146697998047, -0.15781211853027344, -2.9667739868164062, 1.4948234558105469, 0.124481201171875, 5.793216705322266, -2.929241180419922, 3.056835174560547, -0.8707675933837891, -5.5848388671875, -2.8675575256347656, 3.4941558837890625, 4.833076477050781, 11.54437255859375, -3.6172428131103516, 2.73193359375, 1.0659675598144531, 9.668693542480469, 10.076629638671875, -4.371150970458984, 2.470123291015625, 1.0897445678710938, 2.696765899658203, 6.4552764892578125, 5.9174041748046875, -1.7599105834960938, 6.4638824462890625, 9.777313232421875, 11.633743286132812, 6.2288055419921875, 2.9021530151367188, 0.7193679809570312, 2.29095458984375, 6.443212509155273, 2.2446365356445312], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000228.npy"}
|
||||
{"epoch": 0.34467120181405897, "step": 229, "batch_size": 64, "mean": 2.7022314071655273, "std": 4.495541095733643, "min": -9.800434112548828, "p10": -3.0508491516113283, "median": 3.026836395263672, "p90": 8.91033229827881, "max": 10.674957275390625, "pos_frac": 0.765625, "sample": [3.2983779907226562, 7.332159042358398, 5.589973449707031, 3.1677627563476562, 3.2011642456054688, 5.455223083496094, 3.3201465606689453, 3.2706432342529297, 2.404644012451172, -2.338470458984375, 2.166778564453125, 2.419586181640625, 4.799774169921875, -3.2211532592773438, 2.7855224609375, 2.074596405029297, -7.3912353515625, -2.2658729553222656, 8.729621887207031, 6.917392730712891, 2.8701095581054688, 1.113943099975586, 0.3988323211669922, 4.948219299316406, -5.128021240234375, 5.67059326171875, -3.0613021850585938, -0.018341064453125, 1.1221256256103516, -3.026458740234375, 5.8372650146484375, 10.550079345703125, -9.800434112548828, 8.98777961730957, -2.2213973999023438, 1.2924327850341797, 5.9643096923828125, -3.832447052001953, 3.3885421752929688, 4.8683624267578125, 9.955329895019531, 6.528812408447266, 1.3571605682373047, -0.4038734436035156, 5.870441436767578, -1.2557334899902344, 9.198896408081055, 5.796480178833008, 6.418609619140625, 3.4539527893066406, 2.8290176391601562, -0.7670574188232422, 3.2021026611328125, 0.824615478515625, 9.586341857910156, 4.410125732421875, 0.2337188720703125, 1.5717754364013672, 9.493640899658203, 10.674957275390625, 7.084556579589844, 2.8859100341796875, 0.5934486389160156, -8.24124526977539], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000229.npy"}
|
||||
{"epoch": 0.34618291761148906, "step": 230, "batch_size": 64, "mean": 3.0884273052215576, "std": 4.515745162963867, "min": -8.403190612792969, "p10": -1.4101341247558592, "median": 2.7371139526367188, "p90": 8.549152374267578, "max": 15.556304931640625, "pos_frac": 0.78125, "sample": [-2.1583290100097656, 3.6955718994140625, 4.749399185180664, 1.3093299865722656, 4.668933868408203, -5.654937744140625, -0.3132286071777344, 0.3555412292480469, 2.7181854248046875, 6.367073059082031, 5.90997314453125, 10.519790649414062, 5.726352691650391, 6.682792663574219, 2.315338134765625, 4.808544158935547, 7.582282066345215, 7.513919830322266, -5.222572326660156, 2.3227767944335938, -8.403190612792969, 2.0770416259765625, -0.209991455078125, 5.374988555908203, 4.229881286621094, -1.4704742431640625, 6.7949676513671875, 0.4991912841796875, 8.574180603027344, 7.2403717041015625, 5.892730712890625, 1.9285316467285156, -0.9816913604736328, 2.371835708618164, -0.48650360107421875, -0.44800567626953125, 1.6538848876953125, 1.3130741119384766, 15.556304931640625, 3.0877685546875, 2.1739330291748047, 2.75604248046875, 0.06322097778320312, 4.138496398925781, 12.036605834960938, 8.490753173828125, -5.793085098266602, 11.299636840820312, 0.6598892211914062, 0.6643409729003906, 11.2357177734375, -6.26727294921875, 4.707300186157227, 9.01569938659668, 3.3369197845458984, 3.8046226501464844, 1.8802757263183594, -0.5069561004638672, 1.8040351867675781, 1.9976425170898438, 4.550585746765137, 5.513458251953125, -1.2693405151367188, 2.875194549560547], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000230.npy"}
|
||||
{"epoch": 0.3476946334089191, "step": 231, "batch_size": 64, "mean": 4.558291912078857, "std": 4.61186408996582, "min": -7.6941070556640625, "p10": -1.397232627868652, "median": 3.974595069885254, "p90": 10.890254211425782, "max": 14.471332550048828, "pos_frac": 0.84375, "sample": [6.4571685791015625, 0.88897705078125, 7.418331146240234, 6.1946258544921875, 1.064605712890625, 5.993736267089844, 2.156444549560547, 10.959152221679688, 0.725433349609375, 1.3676433563232422, 4.003122329711914, 2.385955810546875, 1.2964859008789062, 1.7841835021972656, -2.2784576416015625, 4.34759521484375, 5.527557373046875, 9.566425323486328, 4.550323486328125, -0.9810638427734375, 8.445053100585938, -2.501476287841797, 2.8328590393066406, 6.930866241455078, 1.5057182312011719, 11.51971435546875, -3.6079864501953125, -2.5899219512939453, 11.484428405761719, 6.88946533203125, 10.7294921875, -2.081012725830078, 6.9863128662109375, 13.357086181640625, 7.893596649169922, 3.876007080078125, 2.0483322143554688, 2.8447799682617188, 5.264923095703125, -0.9663314819335938, 14.059539794921875, 2.1947097778320312, 3.5062026977539062, 2.9823989868164062, 8.786338806152344, 2.6232528686523438, 8.445655822753906, -1.0018444061279297, 2.3632545471191406, -1.5666847229003906, 7.468753814697266, 9.16314697265625, 3.9460678100585938, 6.7857666015625, 7.030292510986328, 12.467506408691406, 5.7498016357421875, 3.914823532104492, 3.6365814208984375, 7.3274688720703125, -7.6941070556640625, 1.682474136352539, 9.097780227661133, 14.471332550048828], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000231.npy"}
|
||||
{"epoch": 0.3492063492063492, "step": 232, "batch_size": 64, "mean": 3.2991318702697754, "std": 5.409703731536865, "min": -11.639495849609375, "p10": -3.013184356689453, "median": 2.0971126556396484, "p90": 11.180451202392579, "max": 17.765380859375, "pos_frac": 0.796875, "sample": [0.6289710998535156, 10.921577453613281, 6.119533538818359, 14.300468444824219, 2.841541290283203, 11.563644409179688, 2.5808944702148438, 9.080821990966797, 0.5069313049316406, -3.0680770874023438, -3.7761878967285156, 1.7884998321533203, 0.3328819274902344, 2.0154781341552734, 0.8185691833496094, 5.717433929443359, 1.3178272247314453, 0.9786128997802734, 3.3849029541015625, 1.2384757995605469, 0.35718536376953125, 7.4972686767578125, 5.8644561767578125, 5.740875244140625, 5.67156982421875, 4.242551803588867, -2.591257095336914, 12.655975341796875, -3.3284339904785156, 6.1976318359375, 1.1978778839111328, 8.776031494140625, -4.886953353881836, -0.32135772705078125, 3.74676513671875, 0.28060150146484375, 1.696420669555664, -4.893955230712891, 12.08355712890625, 1.84259033203125, 1.3830413818359375, 9.40130615234375, 0.07404327392578125, 16.6279296875, 11.291397094726562, 17.765380859375, -2.885101318359375, -1.2484207153320312, -1.7947769165039062, 2.14385986328125, 2.7505645751953125, 7.267753601074219, 8.291717529296875, 4.340023040771484, 2.050365447998047, -4.5897979736328125, -0.0061016082763671875, 3.4702529907226562, 3.4556198120117188, 1.6880035400390625, 1.2247238159179688, -11.639495849609375, 6.5828857421875, 2.3770580291748047], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000232.npy"}
|
||||
{"epoch": 0.3507180650037793, "step": 233, "batch_size": 64, "mean": 2.787163257598877, "std": 4.887899875640869, "min": -7.723045349121094, "p10": -3.056764602661133, "median": 1.9586181640625, "p90": 9.060537719726563, "max": 15.310615539550781, "pos_frac": 0.703125, "sample": [15.310615539550781, 2.7999801635742188, 10.986549377441406, -1.352630615234375, 3.4626617431640625, 7.041538238525391, 5.2441253662109375, 5.506130218505859, 0.4555511474609375, 0.49224090576171875, 3.8297576904296875, 12.942825317382812, 0.08741378784179688, -1.3604202270507812, -3.818634033203125, 6.1876373291015625, 6.9391021728515625, 1.3940677642822266, -1.0749435424804688, 0.2824687957763672, 7.5321197509765625, 2.1857833862304688, 5.742321014404297, 5.8887786865234375, -3.209991455078125, 4.7826080322265625, -3.3004074096679688, 5.236316680908203, -4.345268249511719, -4.3445587158203125, 7.463371276855469, 1.7308464050292969, 7.5242462158203125, 3.427562713623047, 9.093780517578125, -7.723045349121094, 6.427894592285156, 0.7214851379394531, -1.3074722290039062, 0.5332412719726562, 3.1967010498046875, -3.1346359252929688, 10.263099670410156, -0.48790740966796875, 3.2005081176757812, -2.0282821655273438, -1.7630462646484375, 1.66796875, 5.664772033691406, 5.490617752075195, -2.461883544921875, 0.3557758331298828, 1.2382011413574219, 12.813697814941406, 8.98297119140625, 13.37054443359375, 1.7314529418945312, -2.8750648498535156, 5.4120635986328125, -0.789581298828125, 0.2952308654785156, -2.457763671875, -2.018373489379883, 3.2957229614257812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000233.npy"}
|
||||
{"epoch": 0.35222978080120937, "step": 234, "batch_size": 64, "mean": 2.423401355743408, "std": 4.816473007202148, "min": -9.259712219238281, "p10": -3.150988960266113, "median": 1.8011054992675781, "p90": 9.239066314697268, "max": 13.723136901855469, "pos_frac": 0.65625, "sample": [2.18487548828125, -3.3335113525390625, 1.069580078125, -2.339111328125, 2.3567466735839844, -1.0871963500976562, 1.2517967224121094, -1.3646774291992188, 5.9324798583984375, -3.2734527587890625, 4.8195037841796875, 3.424327850341797, 6.6767120361328125, 5.970237731933594, 10.817794799804688, 0.7502555847167969, -2.6213226318359375, -3.9147605895996094, 8.670257568359375, -1.4799518585205078, 2.6262741088867188, 2.4641647338867188, 10.523780822753906, 13.723136901855469, -0.4330883026123047, -4.1483001708984375, 7.94706916809082, -0.5807437896728516, -9.259712219238281, 0.4246253967285156, 7.255638122558594, 9.568950653076172, 10.295291900634766, 10.206146240234375, -0.1495208740234375, -6.756807327270508, 0.5899810791015625, -0.353759765625, 2.8553504943847656, -1.1007118225097656, 5.20294189453125, 7.9879608154296875, 1.8953132629394531, 6.7708587646484375, 9.482841491699219, 1.3875408172607422, 1.2400856018066406, 6.432136535644531, 6.848930358886719, -2.8652400970458984, -1.0152435302734375, 0.5276603698730469, -6.253822326660156, 6.989677429199219, -2.2350311279296875, 1.7068977355957031, 5.866912841796875, 2.6201553344726562, -2.3926239013671875, 2.32598876953125, 1.2135238647460938, 7.344125747680664, 4.94268798828125, -1.13494873046875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000234.npy"}
|
||||
{"epoch": 0.35374149659863946, "step": 235, "batch_size": 64, "mean": 4.328647613525391, "std": 4.084649085998535, "min": -4.833381652832031, "p10": -0.3690017700195309, "median": 3.8923473358154297, "p90": 9.194815063476563, "max": 13.486404418945312, "pos_frac": 0.875, "sample": [1.3465385437011719, 6.23101806640625, 3.468994140625, 8.547908782958984, 8.821395874023438, 1.0078048706054688, 1.9318466186523438, 13.141670227050781, 6.4525909423828125, 6.599609375, 1.2904052734375, 5.507434844970703, 1.2934036254882812, 4.515115737915039, 3.9097442626953125, 6.902214050292969, 6.596559524536133, 0.7445259094238281, 5.873472213745117, 3.874950408935547, 2.209869384765625, 3.2480125427246094, -0.52581787109375, 5.7585906982421875, 8.605918884277344, 6.3128662109375, -1.4101638793945312, -1.4466094970703125, 12.268558502197266, -1.8748645782470703, 7.427070617675781, 1.7742424011230469, 5.26805305480957, 7.568450927734375, 9.244251251220703, 1.3592967987060547, 7.151279449462891, 1.3758773803710938, 9.201278686523438, -4.833381652832031, 1.2728538513183594, 8.710870742797852, 0.8024749755859375, 4.187889099121094, 7.433998107910156, 2.3700599670410156, 0.41289520263671875, 7.803714752197266, 13.486404418945312, 2.8827857971191406, 2.6133499145507812, -0.0030975341796875, 12.097091674804688, 2.43267822265625, 3.569568634033203, -3.2390594482421875, 10.159156799316406, 5.160942077636719, 8.409347534179688, 0.902313232421875, 9.179733276367188, 0.3004112243652344, 0.5450668334960938, -1.1979942321777344], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000235.npy"}
|
||||
{"epoch": 0.35525321239606955, "step": 236, "batch_size": 64, "mean": 3.550055980682373, "std": 5.707407474517822, "min": -10.722030639648438, "p10": -2.088578414916992, "median": 2.923290252685547, "p90": 11.709595298767095, "max": 21.462890625, "pos_frac": 0.71875, "sample": [6.719463348388672, -1.580474853515625, -0.01880645751953125, 7.816738128662109, 4.892433166503906, 0.60614013671875, 4.907402038574219, 12.181869506835938, -0.4009819030761719, 0.2061138153076172, 6.994102478027344, 2.94757080078125, -1.0587329864501953, 4.0747833251953125, -1.4374561309814453, 1.5466690063476562, 3.8487625122070312, 8.60296630859375, 6.6066131591796875, 10.607622146606445, 1.0623760223388672, 0.279022216796875, 1.8984375, -0.4911308288574219, 5.159339904785156, 3.8492507934570312, 6.668670654296875, 9.898193359375, 12.498638153076172, 6.08642578125, 12.321710586547852, -1.0794677734375, -1.290008544921875, 3.7749061584472656, 13.753982543945312, -3.0409393310546875, -2.22393798828125, 21.462890625, 0.5245895385742188, 2.8990097045898438, -4.468027114868164, 2.4412002563476562, 6.847036361694336, 7.540992736816406, 1.4941558837890625, 3.208169937133789, 0.9589080810546875, -3.0906143188476562, 10.602073669433594, 1.9058551788330078, -1.4132881164550781, -10.63458251953125, -10.722030639648438, 12.251338958740234, 12.743797302246094, 8.761085510253906, 6.1186676025390625, -2.9600467681884766, -0.6044464111328125, -1.7727394104003906, 8.855106353759766, 0.7521820068359375, 3.605653762817383, 2.7083816528320312], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000236.npy"}
|
||||
{"epoch": 0.35676492819349964, "step": 237, "batch_size": 64, "mean": 3.510627508163452, "std": 5.539363861083984, "min": -8.922210693359375, "p10": -3.075682067871093, "median": 3.5043888092041016, "p90": 9.544725036621095, "max": 20.056304931640625, "pos_frac": 0.78125, "sample": [5.701759338378906, 8.625381469726562, 2.069732666015625, 12.458915710449219, -0.42511749267578125, -2.48675537109375, 4.217857360839844, 6.926124572753906, 2.4894447326660156, 7.2293243408203125, 3.9659576416015625, 4.3740997314453125, -6.5579071044921875, -0.36132049560546875, 6.025794982910156, 7.823646545410156, -0.1569080352783203, 9.327011108398438, -5.6394195556640625, 7.9573516845703125, -8.922210693359375, 5.45501708984375, 5.8535003662109375, 6.273159027099609, 13.304931640625, -1.4900150299072266, 3.75531005859375, 7.553462982177734, -1.876007080078125, 2.2999744415283203, 4.378267288208008, 2.5058059692382812, 4.096031188964844, -7.9640350341796875, 1.5448665618896484, 2.387523651123047, 1.9429130554199219, -7.996345520019531, 13.864837646484375, 0.7401103973388672, 3.5088424682617188, 3.2424392700195312, 1.8020744323730469, 0.8612823486328125, 2.4895858764648438, 0.6370372772216797, 7.472143173217773, 3.5565948486328125, 9.638031005859375, 11.946159362792969, 6.254951477050781, 1.857635498046875, 11.322731018066406, 20.056304931640625, -8.212554931640625, 3.4999351501464844, -0.2745399475097656, 8.74934196472168, 3.5646095275878906, 3.235382080078125, 0.6292457580566406, 2.2404098510742188, -3.3280792236328125, 8.65853500366211], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000237.npy"}
|
||||
{"epoch": 0.35827664399092973, "step": 238, "batch_size": 64, "mean": 3.7690067291259766, "std": 5.459894180297852, "min": -6.949184417724609, "p10": -3.0679868698120116, "median": 2.733610153198242, "p90": 11.095456504821781, "max": 14.171920776367188, "pos_frac": 0.75, "sample": [0.037303924560546875, 1.82928466796875, 11.493059158325195, -4.533210754394531, 9.030567169189453, 13.716856002807617, 1.5224571228027344, 0.5244979858398438, 12.051406860351562, -0.20304107666015625, 1.8216323852539062, -1.60015869140625, 9.913284301757812, 2.9734954833984375, 4.687116622924805, 6.520946502685547, -0.8717136383056641, -3.1089115142822266, 5.750450134277344, 8.506961822509766, 7.135108947753906, 14.171920776367188, 7.220024108886719, 5.358957290649414, 2.527057647705078, 5.790971755981445, 6.9474639892578125, 2.4161453247070312, -5.896808624267578, 9.327690124511719, 8.64906120300293, 8.628068923950195, 0.7927532196044922, 4.5986480712890625, -1.2808914184570312, 1.8682575225830078, 9.47100830078125, -1.1560897827148438, 9.565967559814453, 12.314582824707031, 2.4773521423339844, -2.8845062255859375, 0.6429672241210938, -0.1199493408203125, 9.517478942871094, 2.9401626586914062, 14.050537109375, 8.357341766357422, 2.453824996948242, -0.162384033203125, 1.85205078125, 7.122474670410156, 10.167716979980469, -6.523769378662109, -4.953359603881836, 0.3321571350097656, -5.9368896484375, 4.7594757080078125, 11.946853637695312, 1.5504913330078125, 4.29876708984375, -2.9724960327148438, -6.949184417724609, 0.7371234893798828], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000238.npy"}
|
||||
{"epoch": 0.35978835978835977, "step": 239, "batch_size": 64, "mean": 3.9706602096557617, "std": 5.817076683044434, "min": -9.137189865112305, "p10": -4.383658599853515, "median": 3.67647647857666, "p90": 11.138089752197267, "max": 17.8321533203125, "pos_frac": 0.796875, "sample": [4.90155029296875, 5.235603332519531, 4.635162353515625, 1.7333869934082031, 11.505538940429688, 5.869834899902344, -4.585411071777344, 2.280242919921875, -0.8293914794921875, 3.1495819091796875, -3.91290283203125, 2.50518798828125, 0.2895774841308594, 10.834152221679688, 8.233856201171875, 0.367279052734375, -2.432403564453125, -7.1052703857421875, 6.4739990234375, 7.8302001953125, 1.2724895477294922, 5.479362487792969, 8.44569206237793, 11.268348693847656, 15.335887908935547, 2.1492366790771484, 5.850767135620117, 6.876499176025391, 3.6845474243164062, 3.6129302978515625, 8.984024047851562, 3.668405532836914, -5.926666259765625, -3.197713851928711, 3.257966995239258, 14.676101684570312, -0.06533432006835938, 2.2750511169433594, -5.80615234375, 6.204376220703125, -1.8979549407958984, 4.927574157714844, 3.8011322021484375, 3.1604461669921875, -5.8373260498046875, 6.1728057861328125, 15.088119506835938, 17.8321533203125, 16.909713745117188, -4.599056243896484, 0.8289947509765625, 2.6794967651367188, -9.137189865112305, 10.825431823730469, 2.4384765625, 2.4855804443359375, 10.147266387939453, 0.5225791931152344, 3.8365478515625, 8.189987182617188, 2.3909873962402344, 6.7761383056640625, 7.560661315917969, 3.9941062927246094], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000239.npy"}
|
||||
{"epoch": 0.36130007558578986, "step": 240, "batch_size": 64, "mean": 4.721988677978516, "std": 5.44388484954834, "min": -12.11871337890625, "p10": -2.1138790130615233, "median": 4.60467529296875, "p90": 11.768147659301759, "max": 15.273551940917969, "pos_frac": 0.8125, "sample": [5.721881866455078, -3.325958251953125, 8.19451904296875, 1.2249603271484375, 6.2800750732421875, 12.968704223632812, 4.176025390625, 10.76617431640625, -2.9574813842773438, -0.8895950317382812, 3.709259033203125, -0.5000286102294922, 5.821172714233398, 7.936460494995117, 11.806598663330078, 15.273551940917969, 8.592874526977539, -5.9106292724609375, 10.06134033203125, 8.389785766601562, -2.1666603088378906, 7.1024627685546875, 4.709175109863281, -4.4886627197265625, 6.693153381347656, 14.605697631835938, 3.796142578125, 7.8976898193359375, 14.258964538574219, 6.716468811035156, 9.992111206054688, 11.429306030273438, 2.76702880859375, 2.82110595703125, 3.638286590576172, 4.500175476074219, 6.776153564453125, 3.834747314453125, 9.344078063964844, 0.6731662750244141, 7.509880065917969, 2.3965606689453125, -2.504688262939453, 6.9371490478515625, 12.421035766601562, -12.11871337890625, 0.6821575164794922, 2.0817031860351562, 2.4939002990722656, 0.4508056640625, 4.051414489746094, 9.478069305419922, -0.5580253601074219, 5.949737548828125, -0.9430274963378906, 2.08245849609375, 11.678428649902344, 12.817558288574219, 1.343902587890625, 0.5673179626464844, 8.556732177734375, 1.0235366821289062, -1.99072265625, 5.559844970703125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000240.npy"}
|
||||
{"epoch": 0.36281179138321995, "step": 241, "batch_size": 64, "mean": 2.3397579193115234, "std": 4.839980125427246, "min": -6.1579437255859375, "p10": -3.3977588653564452, "median": 1.4868717193603516, "p90": 8.87075138092041, "max": 16.153762817382812, "pos_frac": 0.65625, "sample": [-3.739227294921875, -2.2709121704101562, 7.538824081420898, -3.8414478302001953, 4.073005676269531, -3.0897979736328125, 0.27367401123046875, 9.543235778808594, 5.905778884887695, 10.018905639648438, -2.1650390625, 8.444190979003906, -3.4190940856933594, 1.1596050262451172, 0.9135513305664062, 8.339248657226562, 0.691131591796875, 2.382486343383789, -2.0900115966796875, 2.365692138671875, 6.177558898925781, 4.1244964599609375, -2.9799957275390625, 3.2310409545898438, 3.4108200073242188, -1.1922683715820312, 8.915718078613281, 0.36865806579589844, 0.4132537841796875, -3.473297119140625, 8.765829086303711, 11.428909301757812, 16.153762817382812, 0.950592041015625, -2.287059783935547, -6.1579437255859375, -2.507720947265625, 5.832305908203125, -0.33608245849609375, 1.6630287170410156, -3.3479766845703125, 0.14068222045898438, -0.23667335510253906, 7.330513000488281, 1.7269420623779297, 4.130851745605469, -5.1207275390625, 13.765579223632812, 4.550895690917969, -4.3510894775390625, 3.752685546875, 3.0321884155273438, 5.298820495605469, 1.3107147216796875, 9.836700439453125, -1.0177116394042969, -3.2822113037109375, 6.429483413696289, 4.479972839355469, 3.993101119995117, 1.1040916442871094, 3.7659683227539062, -0.9547710418701172, -0.12892913818359375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000241.npy"}
|
||||
{"epoch": 0.36432350718065004, "step": 242, "batch_size": 64, "mean": 4.812246799468994, "std": 6.4810028076171875, "min": -10.281547546386719, "p10": -1.504492187499999, "median": 4.577797889709473, "p90": 14.40465240478516, "max": 21.9298095703125, "pos_frac": 0.8125, "sample": [-9.354219436645508, 5.052406311035156, 4.593162536621094, 3.4559173583984375, 8.978439331054688, -5.0851287841796875, 11.088447570800781, 6.4869384765625, 6.300392150878906, 11.091819763183594, 3.7389984130859375, 5.500099182128906, 0.09923362731933594, 17.855995178222656, 14.94937515258789, -0.5377197265625, 13.13363265991211, -1.9188232421875, 3.182220458984375, 2.9285850524902344, 2.0795135498046875, 7.34735107421875, 16.104190826416016, 2.3399200439453125, 0.514404296875, 7.2472076416015625, 0.2425537109375, 0.8973770141601562, 9.434547424316406, 7.9613494873046875, 7.7666168212890625, 0.14959144592285156, 19.819046020507812, 4.21540641784668, 7.809093475341797, 5.91497802734375, 0.47315025329589844, -0.2120513916015625, 21.9298095703125, 2.1370201110839844, -5.1097412109375, 7.319252014160156, 4.142494201660156, 1.6005020141601562, 6.559181213378906, -0.015766143798828125, 6.61912727355957, 4.998497009277344, 8.205520629882812, 10.911432266235352, 16.027202606201172, 5.641948699951172, 16.41625213623047, -2.8258838653564453, -0.1821441650390625, 4.709678649902344, 5.250511169433594, 2.2197265625, 2.644287109375, -10.281547546386719, 1.3930130004882812, -0.017032623291015625, -8.5159912109375, 4.562433242797852], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000242.npy"}
|
||||
{"epoch": 0.36583522297808013, "step": 243, "batch_size": 64, "mean": 3.8267946243286133, "std": 5.58229398727417, "min": -7.2810516357421875, "p10": -3.0188671112060543, "median": 3.1351547241210938, "p90": 11.366184234619142, "max": 14.787849426269531, "pos_frac": 0.75, "sample": [0.2766227722167969, 10.553627014160156, 6.483848571777344, -6.62445068359375, 12.27625846862793, 2.5795326232910156, -7.2810516357421875, 0.5188713073730469, 6.741691589355469, -2.2823333740234375, 10.234222412109375, 2.3178482055664062, 2.059906005859375, 2.2345504760742188, 1.6326942443847656, 13.1380615234375, -1.8732681274414062, 1.2074966430664062, 9.189502716064453, 0.6088104248046875, 0.6768035888671875, 14.332901000976562, 3.7957534790039062, -2.5526561737060547, -0.567291259765625, 10.01579475402832, -0.30062103271484375, 5.384429931640625, 9.867683410644531, 5.540458679199219, -1.9273834228515625, -5.852443695068359, 7.22698974609375, -2.7491493225097656, 7.618896484375, 8.512540817260742, 8.688583374023438, 10.032669067382812, -2.61993408203125, 3.154052734375, 14.787849426269531, 3.1162567138671875, 6.332725524902344, 13.336708068847656, 3.571044921875, -0.02679443359375, 0.22790145874023438, 3.5773372650146484, 12.375232696533203, 10.490394592285156, 2.6367950439453125, 1.828948974609375, 2.313678741455078, 10.761337280273438, -3.5924720764160156, 6.1438446044921875, 4.373760223388672, 1.6611518859863281, -3.4761810302734375, -3.13446044921875, -6.110038757324219, 11.625404357910156, 4.245353698730469, 5.5785675048828125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000243.npy"}
|
||||
{"epoch": 0.3673469387755102, "step": 244, "batch_size": 64, "mean": 5.0466718673706055, "std": 4.738553047180176, "min": -4.59869384765625, "p10": 0.09822502136230502, "median": 4.238964080810547, "p90": 12.02840919494629, "max": 15.761283874511719, "pos_frac": 0.890625, "sample": [5.329490661621094, 2.3398590087890625, 5.0518798828125, 1.28033447265625, 1.6212615966796875, 2.391031265258789, 0.4221229553222656, 12.206409454345703, 3.789487838745117, -2.496265411376953, 0.5300769805908203, 7.207225799560547, -1.288330078125, 15.096633911132812, 5.0540924072265625, 5.555267333984375, 4.3992462158203125, 13.38922119140625, -4.59869384765625, 8.218568801879883, 7.4101409912109375, 3.3594627380371094, -3.8511199951171875, 8.557483673095703, 2.2993011474609375, 6.420890808105469, 4.9815521240234375, 11.613075256347656, 13.668144226074219, 2.2071170806884766, 14.489631652832031, 3.1656875610351562, 1.2166919708251953, 1.0454330444335938, 1.8169364929199219, 1.8326301574707031, -0.6400718688964844, 7.347053527832031, 10.611785888671875, 15.761283874511719, 1.0088977813720703, 8.217315673828125, 1.7587051391601562, 3.2117385864257812, 2.0600738525390625, 12.733692169189453, 8.56240463256836, 10.66996955871582, -0.04058837890625, 3.0754737854003906, 1.2050914764404297, 4.078681945800781, 10.258729934692383, 5.7604827880859375, 10.290756225585938, 5.074371337890625, -0.753387451171875, 6.569366455078125, 2.4669342041015625, 1.6574287414550781, 5.826316833496094, 1.91290283203125, 11.512569427490234, 7.057060241699219], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000244.npy"}
|
||||
{"epoch": 0.3688586545729403, "step": 245, "batch_size": 64, "mean": 5.767277717590332, "std": 6.031308174133301, "min": -9.137702941894531, "p10": -0.278771591186523, "median": 5.624944686889648, "p90": 15.229051017761236, "max": 20.664710998535156, "pos_frac": 0.890625, "sample": [5.840721130371094, 3.36279296875, 5.318683624267578, 15.762924194335938, 3.883260726928711, 0.3315906524658203, -0.4668769836425781, 10.669931411743164, 5.618072509765625, 5.7830810546875, 1.7796249389648438, 3.8772125244140625, 9.701299667358398, 6.529529571533203, 8.148719787597656, 15.799861907958984, 0.2275524139404297, 7.154689788818359, -5.0487213134765625, 20.664710998535156, 15.887611389160156, 2.049102783203125, 1.898681640625, 9.530742645263672, 8.35678482055664, 8.483261108398438, 2.599090576171875, 0.26287841796875, 7.366828918457031, 0.27837371826171875, 11.535850524902344, -6.476484298706055, 1.1693992614746094, 8.012622833251953, 2.12530517578125, 5.945686340332031, -2.7603302001953125, 3.1364612579345703, 8.86709213256836, 16.30913543701172, 4.709026336669922, 2.3642730712890625, 17.519126892089844, 6.287742614746094, -1.0672016143798828, 5.631816864013672, 15.965703964233398, 13.983346939086914, 0.1601409912109375, 2.3613433837890625, 8.479278564453125, 8.684494018554688, -6.987884521484375, 9.670272827148438, 13.398582458496094, 4.894855499267578, -9.137702941894531, 12.859468460083008, 4.336761474609375, 5.0038604736328125, 3.5488815307617188, 7.4452056884765625, 2.8432350158691406, 6.634365081787109], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000245.npy"}
|
||||
{"epoch": 0.37037037037037035, "step": 246, "batch_size": 64, "mean": 4.994536399841309, "std": 6.915278434753418, "min": -10.231685638427734, "p10": -3.1308998107910155, "median": 4.920036315917969, "p90": 13.647237777709963, "max": 21.44440460205078, "pos_frac": 0.734375, "sample": [9.156124114990234, 11.211872100830078, 15.931303024291992, 16.12971305847168, 4.61090087890625, 1.33056640625, -10.231685638427734, 14.264266967773438, 4.77215576171875, -2.509950637817383, 8.543548583984375, 13.02392578125, 5.564697265625, -2.0481395721435547, 11.080451965332031, 5.517822265625, 3.3560333251953125, -3.451702117919922, -5.163116455078125, -0.4944610595703125, -0.943084716796875, 9.434349060058594, 1.8729019165039062, 13.914371490478516, 19.759567260742188, -2.5955047607421875, 4.0830535888671875, 20.27435302734375, -1.1149063110351562, 6.8911285400390625, 1.5479717254638672, 11.123924255371094, 5.716148376464844, 2.221822738647461, 3.79217529296875, 5.0679168701171875, -7.028656005859375, 8.414749145507812, 11.487510681152344, -0.2927703857421875, 7.241203308105469, -6.20806884765625, -3.0726394653320312, 12.162651062011719, 6.2671661376953125, -3.1558685302734375, 6.723777770996094, 10.633140563964844, 8.667831420898438, -7.871826171875, 2.5676116943359375, 2.444124221801758, 5.941278457641602, 21.44440460205078, 4.159130096435547, 2.3471298217773438, 9.263530731201172, 13.016990661621094, -1.609100341796875, 1.67303466796875, 6.5025787353515625, 0.7317733764648438, 6.846771240234375, -1.2876205444335938], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000246.npy"}
|
||||
{"epoch": 0.37188208616780044, "step": 247, "batch_size": 64, "mean": 2.68523907661438, "std": 6.304272651672363, "min": -17.67261505126953, "p10": -4.640261459350586, "median": 2.7258567810058594, "p90": 9.637388038635255, "max": 19.69866943359375, "pos_frac": 0.71875, "sample": [-4.7928924560546875, 0.8004493713378906, 1.7760086059570312, 2.9340591430664062, 4.106548309326172, -5.399871826171875, 0.9751434326171875, 11.24853515625, 13.003158569335938, -0.6929168701171875, 6.6738433837890625, 6.521568298339844, 9.671552658081055, 6.816911697387695, 2.2765655517578125, 4.01055908203125, 5.82487678527832, 8.570026397705078, 0.09662437438964844, -2.9217681884765625, 19.69866943359375, 9.557670593261719, -13.09573745727539, -4.284122467041016, 2.82244873046875, 0.493011474609375, 8.818593978881836, -0.8531990051269531, -5.275474548339844, 5.159095764160156, 10.97918701171875, 2.6518478393554688, 2.79986572265625, 4.093742370605469, 1.817901611328125, 2.130289077758789, 6.1767120361328125, -17.67261505126953, -2.5062942504882812, -1.4933452606201172, -1.0369319915771484, 3.004434585571289, -1.4071426391601562, 12.609359741210938, 15.42352294921875, 2.645000457763672, 1.6718559265136719, -1.4073600769042969, 1.5648956298828125, 4.718780517578125, 1.1966400146484375, 5.7691802978515625, -3.5797653198242188, 3.0009193420410156, -8.666311264038086, 6.0936737060546875, 5.162435531616211, 9.526567459106445, 1.9979476928710938, -6.1244049072265625, -3.952117919921875, 8.449172973632812, 7.3647613525390625, 4.312963485717773], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000247.npy"}
|
||||
{"epoch": 0.37339380196523053, "step": 248, "batch_size": 64, "mean": 4.874991416931152, "std": 7.31894063949585, "min": -8.564163208007812, "p10": -4.27126293182373, "median": 3.298309326171875, "p90": 15.027507019042968, "max": 23.11682891845703, "pos_frac": 0.765625, "sample": [6.114385604858398, 7.5589447021484375, -6.362640380859375, -6.749725341796875, 18.652297973632812, 9.447586059570312, 12.911636352539062, 8.390199661254883, 0.8677883148193359, -3.719087600708008, -0.5235519409179688, 12.836315155029297, 5.768882751464844, -4.33439826965332, 15.004905700683594, 4.00225830078125, 14.7933349609375, -1.0509414672851562, -7.483987808227539, 3.5531578063964844, -4.1239471435546875, 1.0627632141113281, 1.58538818359375, 3.668947219848633, 12.335464477539062, 5.125598907470703, 7.715822219848633, 2.8777732849121094, 4.049137115478516, 14.267181396484375, -3.6100692749023438, 15.037193298339844, 0.7657566070556641, -4.337440490722656, 16.906246185302734, 8.947969436645508, 2.45867919921875, 1.8638916015625, 2.708984375, 5.685401916503906, 7.915534973144531, 14.09991455078125, 2.3158035278320312, 19.83063507080078, 15.728233337402344, 3.0434608459472656, 2.0166053771972656, 0.6382942199707031, 3.5665054321289062, 6.823638916015625, 23.11682891845703, 1.8297500610351562, 7.299774169921875, 1.6356201171875, -8.564163208007812, 2.5267467498779297, 1.248016357421875, 16.24634552001953, 14.340873718261719, -1.8224258422851562, -0.5178165435791016, 1.167123794555664, -4.892364501953125, -2.2615737915039062], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000248.npy"}
|
||||
{"epoch": 0.3749055177626606, "step": 249, "batch_size": 64, "mean": 2.0840537548065186, "std": 6.682317733764648, "min": -15.734161376953125, "p10": -7.913768577575683, "median": 3.3790283203125, "p90": 9.667552185058595, "max": 14.725395202636719, "pos_frac": 0.6875, "sample": [6.148189544677734, -7.663446426391602, 3.9958038330078125, 3.0047454833984375, 0.6127853393554688, -4.0959625244140625, -8.86483383178711, 5.711936950683594, -0.288818359375, 5.935127258300781, 4.911020278930664, 7.983238220214844, 1.5039291381835938, 6.839874267578125, -10.26068115234375, -5.166351318359375, 3.766845703125, -12.363380432128906, -8.021049499511719, -4.430732727050781, 0.8075542449951172, -0.992462158203125, -1.6452560424804688, 4.479421615600586, 3.323516845703125, 4.20037841796875, 0.031398773193359375, -1.600677490234375, 14.426979064941406, -3.1442031860351562, -6.582389831542969, 7.2177581787109375, 1.4414119720458984, 2.9518394470214844, 10.469280242919922, -8.356903076171875, 5.497535705566406, 0.202056884765625, 6.307924270629883, -9.042190551757812, 3.9254417419433594, 12.916624069213867, 9.80950927734375, 14.725395202636719, 7.86029052734375, 2.821796417236328, -7.01055908203125, 8.96267318725586, 9.965118408203125, 4.9355316162109375, 4.808618545532227, 6.003887176513672, 9.336318969726562, 1.9996795654296875, 7.484785079956055, 3.434539794921875, -1.4464893341064453, 7.871442794799805, 12.50067138671875, -15.734161376953125, 7.5842742919921875, -5.036418914794922, 0.766387939453125, 5.6428680419921875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000249.npy"}
|
||||
{"epoch": 0.3764172335600907, "step": 250, "batch_size": 64, "mean": 5.341045379638672, "std": 7.1526408195495605, "min": -19.05620574951172, "p10": -2.039551544189453, "median": 4.917440414428711, "p90": 14.516229629516607, "max": 23.848953247070312, "pos_frac": 0.796875, "sample": [8.219850540161133, 3.1728992462158203, -2.062389373779297, 6.032447814941406, 2.51409912109375, 10.8756103515625, 6.578371047973633, 16.214599609375, 15.412425994873047, 2.8701171875, 9.347454071044922, -1.2572708129882812, 12.48651123046875, 2.7233734130859375, 8.220802307128906, 0.34806251525878906, 2.089263916015625, -19.05620574951172, 12.232988357543945, 1.1368637084960938, 4.470855712890625, -5.891195297241211, 5.364025115966797, -6.50001335144043, 6.724544525146484, -2.08782958984375, 15.314815521240234, 8.178230285644531, 1.3585739135742188, 15.098052978515625, 1.2007102966308594, 7.6710205078125, 12.549570083618164, 0.1689167022705078, 7.420200347900391, 1.0711097717285156, 0.9354763031005859, 9.714241027832031, 23.848953247070312, 4.234375, -4.83819580078125, -5.754375457763672, 3.8615989685058594, -0.9366989135742188, 17.592239379882812, 5.894594192504883, 12.3118896484375, 13.158641815185547, 12.256004333496094, 12.100290298461914, 0.324371337890625, -1.5169143676757812, 9.72662353515625, 1.5973796844482422, -0.03253173828125, 3.8823089599609375, 17.283355712890625, 11.550918579101562, 5.637706756591797, -0.03350639343261719, -1.9862632751464844, 11.723045349121094, 2.9343814849853516, 6.145515441894531], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000250.npy"}
|
||||
{"epoch": 0.3779289493575208, "step": 251, "batch_size": 64, "mean": 3.675110340118408, "std": 6.603434085845947, "min": -11.38348388671875, "p10": -4.007131004333496, "median": 3.3902587890625, "p90": 11.130619049072266, "max": 22.068527221679688, "pos_frac": 0.734375, "sample": [1.4897880554199219, 17.826553344726562, 18.657638549804688, 0.05128669738769531, 4.427360534667969, 3.3922882080078125, -11.38348388671875, 4.839649200439453, 3.2417144775390625, 22.068527221679688, -0.34293365478515625, -7.7691192626953125, 3.747039794921875, -1.6904449462890625, 1.5183773040771484, -4.362733840942383, 2.614370346069336, 16.12274169921875, 2.423849105834961, -7.9120330810546875, 0.9016761779785156, 17.13787841796875, 2.1937732696533203, -7.60577392578125, 1.5305023193359375, -1.7643051147460938, 10.971122741699219, 8.56878662109375, -2.8791046142578125, 2.425506591796875, 6.7812042236328125, 8.195480346679688, 0.7682380676269531, 4.242303848266602, 8.141792297363281, 4.825050354003906, 6.841300964355469, 3.6898345947265625, -3.0531768798828125, -1.0436038970947266, 9.149633407592773, 4.181640625, 7.160560607910156, 8.833843231201172, 2.238250732421875, 3.3882293701171875, 4.9482421875, 4.8055877685546875, 6.7299041748046875, 8.487777709960938, 11.198974609375, 4.137939453125, 12.636165618896484, -3.1773910522460938, -1.1778488159179688, -1.4670391082763672, 7.521709442138672, -7.500587463378906, 3.1592769622802734, -5.733848571777344, -1.7705917358398438, 5.263214111328125, 9.178962707519531, 3.1855335235595703], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000251.npy"}
|
||||
{"epoch": 0.3794406651549509, "step": 252, "batch_size": 64, "mean": 3.6078789234161377, "std": 7.235051155090332, "min": -19.831527709960938, "p10": -3.4785692214965818, "median": 3.422618865966797, "p90": 13.304355049133303, "max": 19.6927490234375, "pos_frac": 0.75, "sample": [3.5347747802734375, -14.396255493164062, -2.608041763305664, -12.322296142578125, 10.18569564819336, -1.5678482055664062, 1.7849159240722656, 9.13983154296875, 1.2458000183105469, 13.527366638183594, 15.861518859863281, 6.566493988037109, 0.9495773315429688, 7.753334045410156, 17.519485473632812, 1.2581710815429688, 14.887908935546875, -4.247592926025391, 5.0200653076171875, 1.1182708740234375, -5.5629730224609375, 2.209444046020508, 12.214126586914062, 4.008901596069336, 7.9024200439453125, 3.3104629516601562, 0.2848968505859375, -6.212043762207031, 12.783994674682617, -0.1865234375, 5.620447158813477, 15.348434448242188, 10.343591690063477, 5.624595642089844, 2.78204345703125, 0.7026271820068359, 4.953277587890625, 6.393857955932617, 5.830188751220703, -2.1738624572753906, 4.178142547607422, -1.3862380981445312, -0.9167060852050781, 5.1799163818359375, 3.7740325927734375, 19.6927490234375, -3.264739990234375, 2.7739334106445312, -19.831527709960938, -3.568927764892578, -1.7154712677001953, 1.067138671875, 3.7018375396728516, 1.7386035919189453, 4.743278503417969, 5.161582946777344, 3.15447998046875, 13.843681335449219, 2.6684646606445312, 8.856689453125, 12.662841796875, 10.0260009765625, -3.267732620239258, 0.243133544921875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000252.npy"}
|
||||
{"epoch": 0.38095238095238093, "step": 253, "batch_size": 64, "mean": 3.6810965538024902, "std": 6.574976444244385, "min": -7.993297576904297, "p10": -4.860061073303222, "median": 3.342034339904785, "p90": 12.480897521972658, "max": 18.997554779052734, "pos_frac": 0.671875, "sample": [5.369743347167969, -4.478324890136719, 18.05536651611328, -0.6553421020507812, -0.6548633575439453, 6.501895904541016, 3.8944473266601562, 11.952651977539062, 5.8597412109375, 2.523874282836914, 8.017385482788086, -2.2068939208984375, 12.011001586914062, 1.9900989532470703, 0.6473789215087891, -7.993297576904297, -4.750017166137695, 5.901805877685547, -5.216850280761719, 6.0594329833984375, 15.05535888671875, 8.405454635620117, -4.907222747802734, 8.232650756835938, 3.679372787475586, -0.8478546142578125, 11.959310531616211, 6.437736511230469, -1.2181015014648438, 5.510955810546875, 2.369049072265625, 2.0024032592773438, 7.417980194091797, 0.5803909301757812, 0.8073883056640625, 14.428863525390625, 4.3460540771484375, -7.1082763671875, 5.335304260253906, 6.101045608520508, 3.3366622924804688, -2.8732986450195312, 0.8819732666015625, -6.439287185668945, 12.682281494140625, -2.7707672119140625, -0.17233848571777344, 7.062797546386719, 18.997554779052734, 1.9031925201416016, 6.8104705810546875, 14.059104919433594, 18.790237426757812, -0.7844085693359375, 10.781583786010742, -2.8427276611328125, 2.361083984375, -2.2631797790527344, 9.941360473632812, -6.377960205078125, -5.1245574951171875, -2.247526168823242, 3.3474063873291016, 5.113414764404297], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000253.npy"}
|
||||
{"epoch": 0.382464096749811, "step": 254, "batch_size": 64, "mean": 4.567934036254883, "std": 6.820773124694824, "min": -7.69122314453125, "p10": -4.4192621231079094, "median": 3.5817642211914062, "p90": 14.03078842163086, "max": 20.91921615600586, "pos_frac": 0.734375, "sample": [-2.511627197265625, 8.650192260742188, 16.276092529296875, 3.6893157958984375, 1.028839111328125, 4.882316589355469, 14.82913589477539, 10.758926391601562, 1.387277603149414, 0.15367507934570312, -4.072547912597656, 5.848014831542969, 9.078102111816406, 6.400144577026367, 11.18924331665039, 10.581504821777344, 14.688095092773438, 8.7041015625, -4.751335144042969, 6.635414123535156, 2.9854812622070312, 20.91921615600586, -5.9350128173828125, 13.848072052001953, 1.9300918579101562, -7.69122314453125, 3.0451202392578125, 12.132146835327148, 8.84602165222168, -0.7138175964355469, 19.290603637695312, -3.3069305419921875, 3.474212646484375, -0.49910736083984375, -2.242046356201172, 6.89813232421875, -1.0776329040527344, 8.068473815917969, -0.1647796630859375, 6.1859588623046875, 4.352386474609375, 0.46608734130859375, 5.54237174987793, -4.567853927612305, 1.2424201965332031, 15.561180114746094, -1.3036422729492188, 13.889717102050781, 6.545379638671875, 2.0055313110351562, 2.5115509033203125, -3.1842880249023438, -5.466651916503906, 8.883956909179688, 14.09124755859375, 0.983551025390625, 2.9036407470703125, 2.8057289123535156, 12.585136413574219, 10.633949279785156, 5.721294403076172, -6.5183563232421875, 0.5167083740234375, -7.2911376953125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000254.npy"}
|
||||
{"epoch": 0.3839758125472411, "step": 255, "batch_size": 64, "mean": 5.382227897644043, "std": 7.407783508300781, "min": -12.459400177001953, "p10": -3.4528667449951165, "median": 4.686327934265137, "p90": 15.90327835083008, "max": 21.22174072265625, "pos_frac": 0.75, "sample": [16.78154754638672, -1.34613037109375, 21.22174072265625, 5.9266357421875, 9.737945556640625, 1.7826690673828125, -0.869964599609375, 0.3347587585449219, 1.8305320739746094, 9.93560791015625, -5.176231384277344, -0.4073200225830078, 4.459142684936523, 10.533821105957031, -3.7248287200927734, 3.602558135986328, 6.500160217285156, 4.91351318359375, 1.8965682983398438, -12.459400177001953, 13.13339614868164, -1.9635696411132812, 4.269386291503906, 3.0629501342773438, 15.251068115234375, 8.885078430175781, -4.555213928222656, 2.884815216064453, 15.423774719238281, 0.7838287353515625, 16.108779907226562, 11.767906188964844, -2.720947265625, 7.182651519775391, 7.773284912109375, 5.946208953857422, 6.322050094604492, 14.797706604003906, 19.55860137939453, 5.32725715637207, 1.4276790618896484, 7.158195495605469, 9.476974487304688, -4.85638427734375, 2.5156192779541016, -2.818288803100586, 2.6834182739257812, 20.535064697265625, -5.350761413574219, 6.676902770996094, 1.4290313720703125, 12.606475830078125, 4.0628204345703125, 20.49713897705078, 5.383811950683594, 6.518468856811523, 2.1554908752441406, 11.070636749267578, 20.534408569335938, -4.103239059448242, -2.45379638671875, -1.9546890258789062, 7.24029541015625, -0.6850051879882812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000255.npy"}
|
||||
{"epoch": 0.3854875283446712, "step": 256, "batch_size": 64, "mean": 4.620975494384766, "std": 7.676113128662109, "min": -16.56580352783203, "p10": -4.850598716735839, "median": 3.6152801513671875, "p90": 16.00970802307129, "max": 20.044387817382812, "pos_frac": 0.6875, "sample": [-1.91357421875, 4.4049530029296875, 7.075992584228516, 6.667949676513672, 12.689407348632812, -2.0226287841796875, -2.938508987426758, 16.890541076660156, 4.205421447753906, 0.8649692535400391, -3.5708656311035156, 2.014383316040039, -5.376708984375, 20.044387817382812, 4.251617431640625, 19.37718963623047, -1.00732421875, 3.7144622802734375, -16.56580352783203, 9.359939575195312, 2.3842697143554688, 8.58099365234375, -4.430627822875977, 3.5160980224609375, -5.632476806640625, -6.2834930419921875, 3.50006103515625, -0.6426239013671875, 6.44866943359375, 5.2523956298828125, -1.4808883666992188, 8.704231262207031, 4.487861633300781, 12.419742584228516, 11.712989807128906, 11.260993957519531, 16.278789520263672, 11.472892761230469, -4.3003082275390625, 15.071319580078125, -0.07872772216796875, 3.4619789123535156, 19.633514404296875, -5.174896240234375, 16.68055534362793, 2.818470001220703, 8.297958374023438, 2.5790481567382812, 1.7592601776123047, 18.8001708984375, 2.391693115234375, 10.751144409179688, -1.306549072265625, 2.230255126953125, 9.330902099609375, -2.7186317443847656, 13.48394775390625, -5.030586242675781, 1.666463851928711, 15.381851196289062, -1.6345176696777344, 5.24102783203125, -5.128547668457031, 5.819957733154297], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000256.npy"}
|
||||
{"epoch": 0.3869992441421013, "step": 257, "batch_size": 64, "mean": 6.215222358703613, "std": 8.744409561157227, "min": -13.351593017578125, "p10": -1.8341026306152342, "median": 4.915618896484375, "p90": 18.77730293273926, "max": 26.658203125, "pos_frac": 0.78125, "sample": [-0.6153297424316406, 2.495147705078125, 1.3033599853515625, 26.658203125, 21.770118713378906, 8.289283752441406, 0.06413841247558594, 13.245603561401367, 2.6237106323242188, 13.780555725097656, 10.755577087402344, 3.9818878173828125, 2.889291763305664, -1.5787734985351562, 13.619644165039062, 17.020652770996094, 18.304397583007812, 0.2216339111328125, -13.351593017578125, -1.92742919921875, 0.10836601257324219, 5.4742431640625, 6.352294921875, 1.7293701171875, 7.1687774658203125, -2.0081939697265625, 2.5599098205566406, 18.987648010253906, 11.522285461425781, -0.8031463623046875, 11.501874923706055, 18.979976654052734, 5.8133087158203125, 1.6925811767578125, 7.290321350097656, 4.798027038574219, -0.98626708984375, 15.423965454101562, 24.093624114990234, -3.0273704528808594, -12.269346237182617, 24.4154052734375, 6.242561340332031, 5.033210754394531, 1.051361083984375, -0.81036376953125, 4.453815460205078, 1.3324356079101562, 7.529777526855469, 8.571174621582031, 13.402639389038086, 2.5277328491210938, 16.715065002441406, 0.33568572998046875, -9.442840576171875, 23.226642608642578, -0.8105678558349609, -7.4041595458984375, 17.048614501953125, 6.7110748291015625, 5.519855499267578, 6.2220916748046875, 3.5670318603515625, -1.6163406372070312], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000257.npy"}
|
||||
{"epoch": 0.3885109599395314, "step": 258, "batch_size": 64, "mean": 4.827666282653809, "std": 6.089896202087402, "min": -8.832006454467773, "p10": -2.0684720993041985, "median": 4.103914260864258, "p90": 13.240315437316896, "max": 19.527023315429688, "pos_frac": 0.828125, "sample": [6.1707000732421875, 0.637786865234375, 2.1218032836914062, -2.3830432891845703, 2.8054122924804688, 15.584205627441406, 18.178970336914062, 6.10479736328125, 11.401451110839844, 1.0559253692626953, 3.557687759399414, 2.2068634033203125, -4.6937713623046875, 6.668123245239258, 2.7433395385742188, 13.442901611328125, -0.45697784423828125, -3.745023727416992, 4.17730712890625, -8.832006454467773, 3.481922149658203, 0.422576904296875, -7.526496887207031, 9.754554748535156, 4.0541534423828125, 4.472511291503906, 14.273666381835938, 5.3523101806640625, 2.614818572998047, 2.2002296447753906, 6.1240234375, 4.153675079345703, 10.2073974609375, 6.840717315673828, 2.2813186645507812, -2.6635971069335938, 4.723260879516602, 1.1203689575195312, -0.084136962890625, 19.527023315429688, 7.717399597167969, -1.33447265625, 16.57855224609375, 7.4361724853515625, 10.960723876953125, 11.021486282348633, 9.305068969726562, 6.93731689453125, 2.311279296875, 12.767614364624023, -0.9060592651367188, 1.2078628540039062, 4.833549499511719, 2.2268905639648438, 1.2639541625976562, 1.9782867431640625, 8.365631103515625, -7.389892578125, 3.9540939331054688, 0.2057018280029297, 9.667341232299805, 10.940498352050781, 16.52522850036621, 4.319669723510742], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000258.npy"}
|
||||
{"epoch": 0.3900226757369615, "step": 259, "batch_size": 64, "mean": 7.635554790496826, "std": 7.8367743492126465, "min": -9.738380432128906, "p10": -2.8071723937988278, "median": 7.634517669677734, "p90": 18.312074661254886, "max": 25.21484375, "pos_frac": 0.8125, "sample": [-0.92022705078125, -2.607696533203125, -9.738380432128906, 11.969764709472656, 6.843658447265625, 14.330825805664062, 18.663707733154297, 17.238265991210938, 8.765853881835938, -7.2473297119140625, 13.997230529785156, -1.0000839233398438, 4.546953201293945, 15.94976806640625, 12.196544647216797, 2.2855682373046875, -7.3848876953125, 1.908163070678711, 20.0889892578125, 10.924999237060547, 4.562480926513672, 12.502349853515625, 15.540996551513672, 18.512683868408203, 10.997627258300781, -1.9694480895996094, -5.492973327636719, 3.7659912109375, 7.386863708496094, 5.45036506652832, 11.788299560546875, 10.010627746582031, -3.243082046508789, 4.783294677734375, 5.077339172363281, 5.2577362060546875, 18.610023498535156, 4.1408843994140625, 12.188873291015625, 10.063568115234375, 17.84398651123047, 8.98550033569336, 1.5002822875976562, 25.21484375, 3.4966049194335938, 5.6633453369140625, -7.692089080810547, -0.39969444274902344, -2.8926620483398438, 0.9243812561035156, 20.738174438476562, 16.605403900146484, 11.5369873046875, 6.895904541015625, 7.882171630859375, 15.048809051513672, 5.680854797363281, 7.00299072265625, 2.4301795959472656, 9.236854553222656, 11.350845336914062, 18.827171325683594, 9.641433715820312, 12.40704345703125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000259.npy"}
|
||||
{"epoch": 0.3915343915343915, "step": 260, "batch_size": 64, "mean": 5.749767303466797, "std": 8.241132736206055, "min": -10.052560806274414, "p10": -4.704375839233398, "median": 5.60202693939209, "p90": 16.641180419921877, "max": 23.80572509765625, "pos_frac": 0.734375, "sample": [14.95587158203125, -8.381599426269531, 11.713720321655273, -2.8443260192871094, -3.475109100341797, 22.587677001953125, 4.5938568115234375, 5.253143310546875, 16.459579467773438, -5.9013671875, 17.599517822265625, 7.254621505737305, 13.257850646972656, 11.022933959960938, 12.455841064453125, -3.397247314453125, 16.719009399414062, 7.689369201660156, 4.846174240112305, 23.80572509765625, 15.040092468261719, 0.3741455078125, 14.413551330566406, 17.94831085205078, -0.16961288452148438, 0.08452606201171875, 14.569698333740234, 17.75832748413086, -0.9044952392578125, 4.0233917236328125, 5.650152206420898, 6.515045166015625, 11.38128662109375, 2.914043426513672, 2.2792835235595703, 8.69698715209961, 0.41699981689453125, -2.0855789184570312, -10.052560806274414, 6.7330169677734375, 1.29388427734375, 8.513069152832031, 0.4092731475830078, -0.08541107177734375, 7.4330596923828125, 9.426340103149414, 18.924781799316406, -4.737205505371094, 15.336736679077148, -2.70562744140625, -6.430368423461914, 8.062177658081055, 5.553901672363281, 2.65545654296875, 9.908769607543945, 8.134723663330078, -3.79388427734375, 12.272254943847656, 13.297927856445312, -7.270328521728516, 1.6764259338378906, -8.643524169921875, -4.627773284912109, 1.578582763671875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000260.npy"}
|
||||
{"epoch": 0.3930461073318216, "step": 261, "batch_size": 64, "mean": 4.69488525390625, "std": 7.889615058898926, "min": -9.5716552734375, "p10": -5.8962669372558585, "median": 3.9212608337402344, "p90": 15.628562927246096, "max": 25.198387145996094, "pos_frac": 0.734375, "sample": [0.4939308166503906, 5.16925048828125, 7.536520004272461, 0.18887710571289062, -4.274543762207031, 15.298664093017578, -1.1516075134277344, 17.342208862304688, 7.318613052368164, -2.1227970123291016, -9.5716552734375, -4.3254241943359375, -2.474700927734375, 15.10573959350586, 7.32594108581543, -2.2330169677734375, 0.2083740234375, 18.732070922851562, 0.9709053039550781, 18.89383316040039, 3.981170654296875, 5.426048278808594, -0.16687393188476562, -7.472076416015625, 15.37664794921875, 5.700157165527344, 25.198387145996094, 2.5189590454101562, 2.003082275390625, 3.499124526977539, 9.521984100341797, 7.371805191040039, 10.805900573730469, 14.087600708007812, 4.130807876586914, -2.220949172973633, 21.585186004638672, -6.1241302490234375, -7.1666412353515625, 9.599004745483398, 3.8613510131835938, 0.4187030792236328, 7.886635780334473, 8.572120666503906, 3.4903221130371094, -6.7261505126953125, 1.2096977233886719, -8.547836303710938, -5.364585876464844, 4.508281707763672, 7.885986328125, 0.1833648681640625, 11.352706909179688, 17.870933532714844, 3.5138015747070312, -8.5244140625, 6.307487487792969, 8.580589294433594, -0.229217529296875, 15.736526489257812, 3.8087921142578125, 8.203315734863281, 6.782218933105469, 3.6056461334228516], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000261.npy"}
|
||||
{"epoch": 0.3945578231292517, "step": 262, "batch_size": 64, "mean": 5.658448696136475, "std": 7.363590717315674, "min": -8.567466735839844, "p10": -2.864126777648926, "median": 3.939974784851074, "p90": 16.675731658935547, "max": 20.271484375, "pos_frac": 0.78125, "sample": [-8.567466735839844, 11.650444030761719, 4.023994445800781, 1.7143592834472656, -1.80999755859375, 13.825664520263672, 3.70135498046875, 2.7774658203125, 2.261505126953125, 18.136598587036133, 0.2564849853515625, 14.148643493652344, -1.49163818359375, -0.2372264862060547, 15.802440643310547, -6.862457275390625, 2.6501827239990234, 18.85027313232422, 12.678131103515625, 5.561908721923828, -1.831024169921875, -1.8314437866210938, 1.2850799560546875, 0.7587127685546875, 11.044181823730469, 17.151107788085938, 4.3703460693359375, 17.632293701171875, -2.775634765625, 0.8869495391845703, 14.844833374023438, 7.119499206542969, 8.73602294921875, 11.036375045776367, -3.3876266479492188, 16.516502380371094, 3.941446304321289, 11.868999481201172, 13.462844848632812, 1.6689071655273438, -2.9020519256591797, -1.532196044921875, -3.5604782104492188, 14.166976928710938, 20.271484375, 6.164861679077148, 3.9385032653808594, 2.053316116333008, -8.506805419921875, 2.664531707763672, 7.12554931640625, 1.5459671020507812, 17.252777099609375, 0.7049350738525391, 9.534547805786133, 3.7058067321777344, 10.395271301269531, 16.743972778320312, 2.5499725341796875, 8.462493896484375, 1.1052703857421875, 5.607421875, 8.548240661621094, -5.468696594238281], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000262.npy"}
|
||||
{"epoch": 0.3960695389266818, "step": 263, "batch_size": 64, "mean": 4.707237243652344, "std": 7.6227827072143555, "min": -14.488908767700195, "p10": -4.667570304870606, "median": 4.589073181152344, "p90": 15.983222961425781, "max": 19.935752868652344, "pos_frac": 0.734375, "sample": [3.8580665588378906, 11.61538314819336, -8.441764831542969, -0.9085769653320312, -4.563692092895508, -0.7522125244140625, 5.3487091064453125, 1.3250885009765625, 11.520793914794922, 16.828622817993164, 8.91021728515625, -0.7529067993164062, -0.2707061767578125, 9.775440216064453, 2.07110595703125, -10.494117736816406, 15.351768493652344, 2.6092529296875, 0.2181243896484375, -3.8850173950195312, 10.078964233398438, 9.404861450195312, 17.409866333007812, 5.351802825927734, -2.6652603149414062, 0.7837696075439453, 17.115753173828125, 6.913764953613281, 9.640480041503906, 11.356353759765625, 18.912229537963867, 5.7863006591796875, -4.719093322753906, 2.886993408203125, -4.46905517578125, 0.42873382568359375, -8.2852783203125, 15.993877410888672, 6.7182769775390625, 9.856521606445312, 5.186248779296875, -0.9150238037109375, 1.1314544677734375, 16.687522888183594, -4.712089538574219, 2.0568161010742188, 7.84283447265625, 11.947107315063477, 9.58847427368164, 5.149505615234375, -2.3756866455078125, 19.935752868652344, 4.0286407470703125, 0.2790870666503906, 7.454376220703125, 1.1017169952392578, 8.568244934082031, 10.517366409301758, 3.264057159423828, -5.643396377563477, 2.5644760131835938, -14.488908767700195, 15.958362579345703, 8.27281379699707], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000263.npy"}
|
||||
{"epoch": 0.3975812547241119, "step": 264, "batch_size": 64, "mean": 5.601801872253418, "std": 8.882551193237305, "min": -13.332107543945312, "p10": -5.379559326171875, "median": 4.277019500732422, "p90": 17.01388168334961, "max": 26.508567810058594, "pos_frac": 0.734375, "sample": [14.39501953125, -5.573211669921875, 2.2558670043945312, 0.8865909576416016, 1.269308090209961, -3.793558120727539, 1.3423080444335938, 0.02912139892578125, 11.322654724121094, 6.999725341796875, -4.057012557983398, 3.079357147216797, -13.332107543945312, 12.383872985839844, -1.7487525939941406, 7.8414154052734375, 12.228897094726562, 4.607666015625, -0.18018341064453125, 16.657318115234375, 9.21048355102539, 10.39744758605957, 8.71673583984375, 11.262199401855469, 11.26104736328125, 25.771808624267578, 0.23239707946777344, 6.644012451171875, -3.5687255859375, 3.908926010131836, 9.048629760742188, 0.4004669189453125, 17.575660705566406, 2.916339874267578, -1.0644340515136719, 3.751873016357422, 18.358951568603516, -5.874595642089844, 7.9631500244140625, 21.083641052246094, -5.853828430175781, -10.733856201171875, 2.5162429809570312, 13.128677368164062, 17.16669464111328, 13.83139419555664, -7.5351409912109375, 9.770309448242188, -10.568214416503906, 5.718055725097656, -4.927703857421875, 3.9463729858398438, 24.621492385864258, 7.427349090576172, 3.3667640686035156, -2.3172760009765625, 1.8360443115234375, 14.79827880859375, 7.5523223876953125, -0.34574127197265625, -3.3304443359375, 26.508567810058594, 13.286308288574219, 14.042339324951172], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000264.npy"}
|
||||
{"epoch": 0.39909297052154197, "step": 265, "batch_size": 64, "mean": 6.217557907104492, "std": 9.584976196289062, "min": -16.036579132080078, "p10": -5.3865829467773425, "median": 6.160958290100098, "p90": 19.207016754150395, "max": 21.461530685424805, "pos_frac": 0.78125, "sample": [3.2422523498535156, 2.4268569946289062, 0.3682518005371094, -10.012039184570312, 6.237737655639648, 1.4273605346679688, 3.1146087646484375, 9.669929504394531, -12.75067138671875, 18.569068908691406, 2.6535301208496094, 6.084178924560547, 10.552589416503906, 9.0506591796875, -14.3424072265625, -0.04299163818359375, 4.820230484008789, 4.14396858215332, 8.422292709350586, -16.036579132080078, 2.141408920288086, 14.91961669921875, 15.501419067382812, 4.632596969604492, 1.1537742614746094, 1.8194770812988281, 16.759817123413086, 2.9788246154785156, 6.933750152587891, 0.32169151306152344, 20.278823852539062, 17.130157470703125, -3.212646484375, 19.92043113708496, 12.303764343261719, 19.858501434326172, 3.178102493286133, 0.7414588928222656, -2.2675704956054688, -4.489013671875, 6.9400787353515625, 12.020309448242188, 10.557210922241211, 19.480422973632812, 21.186622619628906, 17.476585388183594, 21.461530685424805, 8.948341369628906, 4.9388275146484375, -10.92011833190918, 8.989578247070312, 6.2776947021484375, -1.6568565368652344, 18.51787567138672, 18.508193969726562, 14.76512336730957, 19.606712341308594, -14.232933044433594, 11.158126831054688, -3.5513877868652344, 13.754150390625, -1.0865135192871094, 12.352165222167969, -5.7712554931640625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000265.npy"}
|
||||
{"epoch": 0.40060468631897206, "step": 266, "batch_size": 64, "mean": 5.7439422607421875, "std": 8.461058616638184, "min": -13.086994171142578, "p10": -2.4157192230224607, "median": 3.963624954223633, "p90": 17.65820770263672, "max": 27.279800415039062, "pos_frac": 0.734375, "sample": [-1.8401107788085938, 18.818618774414062, 20.6422119140625, 1.5349445343017578, 10.287559509277344, -1.5923957824707031, 0.07953643798828125, -6.5231781005859375, 3.392292022705078, -1.1723365783691406, 6.913261413574219, 3.097259521484375, 1.2336349487304688, 9.264366149902344, 7.853397369384766, 7.202640533447266, 1.1863632202148438, 27.279800415039062, -1.5548591613769531, -0.02320098876953125, 22.566486358642578, 5.9337310791015625, 0.2069549560546875, 19.398500442504883, 12.27601432800293, 17.275192260742188, -2.4745407104492188, 17.822357177734375, 11.391252517700195, -0.7074356079101562, 10.798177719116211, 2.9058685302734375, 16.01742172241211, -2.2784690856933594, 7.285036087036133, -2.5786361694335938, 7.527618408203125, 14.709930419921875, 8.417827606201172, 1.1251106262207031, -1.04229736328125, -6.4921722412109375, 0.625091552734375, 2.9977455139160156, -11.458137512207031, 2.5388031005859375, 7.449199676513672, 4.508068084716797, 7.524284362792969, 10.450645446777344, -13.086994171142578, 1.5600643157958984, -5.65216064453125, 0.20635986328125, 14.076087951660156, 8.128803253173828, 23.78235626220703, -2.1278324127197266, 13.328147888183594, 3.4191818237304688, 13.033731460571289, 4.60578727722168, 16.382221221923828, -0.8428993225097656], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000266.npy"}
|
||||
{"epoch": 0.4021164021164021, "step": 267, "batch_size": 64, "mean": 4.635497093200684, "std": 8.959893226623535, "min": -22.047794342041016, "p10": -5.2038650512695295, "median": 3.7901611328125, "p90": 14.475617218017579, "max": 25.27521514892578, "pos_frac": 0.734375, "sample": [5.5160369873046875, 0.8675994873046875, 0.9798450469970703, -13.812906265258789, 3.830535888671875, -3.1266250610351562, 12.047298431396484, 18.4144287109375, 14.168838500976562, 2.065502166748047, -2.065694808959961, 7.059730529785156, 0.9060249328613281, 25.27521514892578, 1.6197757720947266, -10.267196655273438, 3.7609405517578125, -6.127463340759277, 0.691497802734375, -13.819931030273438, -3.396160125732422, -5.978595733642578, 0.619354248046875, 8.63621711730957, 12.198928833007812, 21.224878311157227, -1.4404258728027344, 0.7903804779052734, 12.192398071289062, -7.765586853027344, 7.72772216796875, 2.131378173828125, 16.328540802001953, 11.463947296142578, -2.7117462158203125, 6.637851715087891, 13.573402404785156, 1.6009368896484375, 3.3360595703125, 23.954978942871094, 5.73486328125, -0.22449874877929688, 21.664794921875, -2.1979122161865234, 14.607093811035156, 8.951210021972656, 14.157890319824219, 6.310966491699219, -22.047794342041016, 13.244773864746094, 8.084732055664062, -0.80596923828125, -1.7380447387695312, 3.5155067443847656, 4.821632385253906, -1.4035835266113281, 10.176309585571289, 10.710016250610352, 10.567333221435547, 3.8193817138671875, 3.474515914916992, 11.023075103759766, 5.067787170410156, 0.04981040954589844], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000267.npy"}
|
||||
{"epoch": 0.4036281179138322, "step": 268, "batch_size": 64, "mean": 5.360468864440918, "std": 7.5992431640625, "min": -8.791091918945312, "p10": -4.249388122558593, "median": 3.737783432006836, "p90": 16.22460174560547, "max": 22.072349548339844, "pos_frac": 0.734375, "sample": [2.8858642578125, 22.072349548339844, 1.2119197845458984, -8.791091918945312, 6.4995574951171875, 15.792442321777344, 4.83160400390625, -0.9700088500976562, -5.488925933837891, 12.616252899169922, 13.593421936035156, 0.46941375732421875, -0.9111824035644531, 0.3350067138671875, 2.6314468383789062, 1.5205345153808594, 6.856607437133789, -1.5852813720703125, -0.8575305938720703, 10.095256805419922, 13.652637481689453, 15.011260986328125, -0.30974388122558594, 0.27588653564453125, -4.887432098388672, 4.6490631103515625, 3.7806396484375, 7.228630065917969, 8.85052490234375, 3.2181472778320312, 13.206840515136719, 3.694927215576172, 3.350444793701172, 6.615589141845703, -0.5401077270507812, 16.409812927246094, 0.6781997680664062, -8.468212127685547, -5.23651123046875, 5.502786636352539, 12.088829040527344, 0.6843585968017578, -0.18900299072265625, 17.108718872070312, 15.312759399414062, 6.512451171875, 18.770965576171875, -1.2358589172363281, 3.0540618896484375, -4.632087707519531, -4.670295715332031, -3.3564224243164062, 14.132453918457031, 0.4662036895751953, 19.53179931640625, 6.327049255371094, 8.028911590576172, 9.565391540527344, 18.221908569335938, 6.634237289428711, 0.8042125701904297, 21.611724853515625, 9.833065032958984, -1.0264396667480469], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000268.npy"}
|
||||
{"epoch": 0.4051398337112623, "step": 269, "batch_size": 64, "mean": 6.3199920654296875, "std": 8.201637268066406, "min": -14.784011840820312, "p10": -3.2653285980224602, "median": 4.783441543579102, "p90": 17.989330673217776, "max": 23.5782470703125, "pos_frac": 0.78125, "sample": [1.2767791748046875, -5.255531311035156, 3.06072998046875, 14.711437225341797, 5.5985260009765625, -1.4416313171386719, 3.6719589233398438, 17.709796905517578, 7.815681457519531, -0.4799041748046875, -3.5538787841796875, 13.55831527709961, 8.540079116821289, 9.35809326171875, 18.58822250366211, 20.44189453125, 0.8691043853759766, 6.667350769042969, 18.412567138671875, -11.66156005859375, 3.0179290771484375, -5.138275146484375, 17.431838989257812, 1.7373886108398438, -6.634307861328125, 19.72146987915039, -0.22347450256347656, 23.5782470703125, 6.3279571533203125, 13.784744262695312, 7.6847381591796875, 4.898532867431641, -1.4795780181884766, 12.04608154296875, 1.434051513671875, 1.9047832489013672, 0.4667854309082031, 4.6683502197265625, -0.02841949462890625, 15.919065475463867, 19.500741958618164, 12.413864135742188, 3.0215988159179688, 7.094541549682617, 14.101974487304688, -4.262542724609375, 10.174524307250977, 2.9314956665039062, -2.5920448303222656, 2.465911865234375, 16.000762939453125, -0.07724761962890625, 18.109130859375, 0.8249053955078125, 2.00067138671875, -14.784011840820312, 7.6237335205078125, 13.01824951171875, 3.0321197509765625, 12.154273986816406, 9.676856994628906, 4.534019470214844, 14.640449523925781, 3.86956787109375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000269.npy"}
|
||||
{"epoch": 0.40665154950869237, "step": 270, "batch_size": 64, "mean": 4.712078094482422, "std": 8.23046875, "min": -18.358184814453125, "p10": -3.5216129302978514, "median": 4.218456268310547, "p90": 14.015963745117189, "max": 24.63054656982422, "pos_frac": 0.796875, "sample": [11.310997009277344, 19.39459800720215, -3.5823822021484375, 22.116119384765625, 3.4309844970703125, 21.3883056640625, 5.000522613525391, 2.9040298461914062, 8.051300048828125, 0.6108016967773438, 24.63054656982422, 7.2095794677734375, 4.364753723144531, 12.456497192382812, 1.7031784057617188, -6.2248992919921875, 4.0721588134765625, 8.935596466064453, -0.4958457946777344, 11.211742401123047, 3.8670196533203125, 9.862228393554688, 1.7099685668945312, -3.3798179626464844, 0.8914794921875, 5.384483337402344, 10.177618026733398, -18.358184814453125, 3.8965110778808594, 2.6411056518554688, 4.0482635498046875, 0.750274658203125, 0.33203887939453125, 13.077644348144531, 0.24823760986328125, 6.730648040771484, 13.366455078125, -1.866790771484375, 3.3834686279296875, -17.886157989501953, 12.00848388671875, 16.601112365722656, 14.20989990234375, -4.576883316040039, 4.6939697265625, 6.1849822998046875, -3.8477783203125, 4.4279632568359375, 13.242118835449219, 4.873268127441406, 14.374856948852539, 0.4010028839111328, -3.040771484375, 6.719718933105469, 2.9475631713867188, -1.4476242065429688, 1.4191436767578125, -15.734077453613281, 13.563446044921875, 4.502510070800781, 4.671333312988281, -0.3723411560058594, 3.8996200561523438, 4.486419677734375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000270.npy"}
|
||||
{"epoch": 0.40816326530612246, "step": 271, "batch_size": 64, "mean": 4.546772480010986, "std": 10.204014778137207, "min": -14.92562484741211, "p10": -6.806589889526367, "median": 1.443892478942871, "p90": 19.36755485534668, "max": 30.46295166015625, "pos_frac": 0.578125, "sample": [5.9445953369140625, 10.082122802734375, -3.690155029296875, -0.2025909423828125, 21.001853942871094, 16.36182403564453, 5.276376724243164, -0.028181076049804688, 19.31991958618164, -7.940399169921875, 18.520034790039062, -6.657157897949219, 21.101593017578125, -1.7501983642578125, 13.887802124023438, 16.188201904296875, 1.3616485595703125, 2.6255340576171875, 0.8731956481933594, 23.339996337890625, 4.930931091308594, 5.3635406494140625, 12.39417839050293, -5.161869049072266, -2.1232757568359375, -0.3715667724609375, -6.870632171630859, 12.095916748046875, 1.46624755859375, -0.5464839935302734, 1.4215373992919922, 10.485620498657227, -0.5457611083984375, -10.919342041015625, -1.3125762939453125, 20.583770751953125, 4.4917449951171875, 0.8696212768554688, 8.586814880371094, 19.387969970703125, -0.35944366455078125, -9.828353881835938, 30.46295166015625, -5.457984924316406, 1.6048736572265625, -0.4649200439453125, -0.5085067749023438, -14.772674560546875, -14.92562484741211, -1.6188125610351562, 1.9901237487792969, -1.22381591796875, 18.354576110839844, 19.025861740112305, 11.771829605102539, 4.55853271484375, -7.24810791015625, 5.806968688964844, -1.7173309326171875, 1.3861808776855469, -2.323078155517578, -4.7648162841796875, 27.530710220336914, 3.8718948364257812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000271.npy"}
|
||||
{"epoch": 0.40967498110355255, "step": 272, "batch_size": 64, "mean": 8.187671661376953, "std": 8.613451957702637, "min": -8.916202545166016, "p10": -0.6664161682128905, "median": 6.882362365722656, "p90": 22.1604778289795, "max": 28.588470458984375, "pos_frac": 0.84375, "sample": [0.8638553619384766, -2.1764373779296875, -3.5643157958984375, 8.466184616088867, 6.742774963378906, 23.403526306152344, 8.112945556640625, 16.287704467773438, 23.338218688964844, 4.578502655029297, 14.2235107421875, 11.22479248046875, -0.3759918212890625, -0.7301025390625, 9.242752075195312, 4.218751907348633, 7.6927337646484375, -0.5178146362304688, 16.26628875732422, 16.470806121826172, 13.605661392211914, 3.228902816772461, 13.76809310913086, 11.599668502807617, 23.535566329956055, 1.933258056640625, 11.539566040039062, -5.132499694824219, 6.370552062988281, 20.397254943847656, 8.15582275390625, 26.713415145874023, 7.021949768066406, 0.6208648681640625, 13.694915771484375, 2.284862518310547, 0.5101985931396484, 0.5936241149902344, 9.070255279541016, -8.916202545166016, 17.430870056152344, 23.486553192138672, 22.891845703125, 4.207679748535156, 0.4942436218261719, 0.9866199493408203, 28.588470458984375, 3.8411331176757812, 13.34543228149414, 14.31884765625, 6.330263137817383, 14.201904296875, -5.6205902099609375, 2.6952362060546875, 5.0267791748046875, 1.7216796875, 8.558273315429688, 10.405494689941406, -1.7031211853027344, -0.4279975891113281, 3.489227294921875, 1.1915130615234375, 20.45395278930664, 3.732290267944336], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000272.npy"}
|
||||
{"epoch": 0.41118669690098264, "step": 273, "batch_size": 64, "mean": 6.378169059753418, "std": 10.245058059692383, "min": -16.099552154541016, "p10": -7.807120513916012, "median": 5.465520858764648, "p90": 17.752397727966315, "max": 32.657562255859375, "pos_frac": 0.765625, "sample": [-2.26702880859375, 11.890796661376953, 15.314773559570312, -9.174842834472656, 12.59637451171875, -0.29854583740234375, -10.056304931640625, 16.27202606201172, -12.680843353271484, 5.895500183105469, 2.31939697265625, -4.6157684326171875, 13.03704833984375, 0.7987575531005859, 6.984291076660156, 12.141643524169922, 20.314353942871094, 2.1008644104003906, 18.386842727661133, 10.007164001464844, 15.245399475097656, 5.235546112060547, -2.8192977905273438, -2.172870635986328, -16.091964721679688, 15.076629638671875, 6.052642822265625, 1.9650344848632812, -1.7273426055908203, 2.6984939575195312, 9.770599365234375, 32.657562255859375, 13.590347290039062, -16.099552154541016, -0.3052520751953125, 10.591522216796875, 13.212940216064453, 13.604290008544922, 1.8451766967773438, 11.478004455566406, -10.820068359375, 5.69549560546875, 14.883415222167969, 0.5710601806640625, 4.907135009765625, 29.30706787109375, 25.435781478881836, 12.422279357910156, 4.9991607666015625, 13.294048309326172, 0.5833854675292969, 3.9022789001464844, 0.7667903900146484, -3.2341690063476562, 2.9765243530273438, 19.46253204345703, 3.0600357055664062, 12.514991760253906, 4.080356597900391, 0.7046966552734375, 14.598617553710938, 24.443143844604492, -9.528888702392578, 10.402725219726562], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000273.npy"}
|
||||
{"epoch": 0.4126984126984127, "step": 274, "batch_size": 64, "mean": 6.54940128326416, "std": 9.844841957092285, "min": -15.848506927490234, "p10": -3.8376743316650392, "median": 5.307659149169922, "p90": 20.09943313598633, "max": 25.156097412109375, "pos_frac": 0.765625, "sample": [11.736419677734375, -3.4134445190429688, -2.832683563232422, 3.3911895751953125, 1.9931182861328125, -13.183113098144531, 19.385147094726562, 17.126983642578125, 4.052284240722656, 3.4428768157958984, 13.58758544921875, 13.138931274414062, 0.30954742431640625, -3.991363525390625, 6.412548065185547, 22.677444458007812, 23.83139419555664, -12.216794967651367, 5.374439239501953, 5.627166748046875, 18.613449096679688, 15.953544616699219, 12.150142669677734, 19.265371322631836, 8.67495346069336, -0.0057964324951171875, 19.176834106445312, 24.61511993408203, 4.462421417236328, -0.49076080322265625, 0.16572189331054688, 2.3646316528320312, 13.999916076660156, -2.12152099609375, -2.0377635955810547, 13.729930877685547, -15.848506927490234, -5.071582794189453, 11.5380859375, 6.05218505859375, -3.8260345458984375, 7.4544525146484375, 1.4397697448730469, 1.2853851318359375, 0.7008380889892578, 6.192340850830078, 5.4511260986328125, -14.786094665527344, 9.698066711425781, 1.1242103576660156, 21.395851135253906, 20.997413635253906, 5.240879058837891, 4.1249542236328125, 4.436912536621094, -2.604288101196289, -3.842662811279297, 4.0880889892578125, 25.156097412109375, 17.213491439819336, 0.9542160034179688, 8.065681457519531, 20.405555725097656, 17.15935707092285], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000274.npy"}
|
||||
{"epoch": 0.41421012849584277, "step": 275, "batch_size": 64, "mean": 7.114006042480469, "std": 9.518577575683594, "min": -19.43183135986328, "p10": -2.1311466217041013, "median": 5.871575355529785, "p90": 20.902383804321293, "max": 27.976829528808594, "pos_frac": 0.796875, "sample": [8.574539184570312, 5.584144592285156, 1.6658172607421875, 13.048637390136719, 20.09914779663086, 5.083599090576172, 5.993995666503906, -19.43183135986328, 25.82447052001953, 21.246627807617188, 0.7642440795898438, 9.323974609375, 0.6469573974609375, 24.921363830566406, -0.2628364562988281, 6.695583343505859, 12.659133911132812, 10.339569091796875, 23.495689392089844, -5.912086486816406, -14.998687744140625, 26.768272399902344, 2.1222076416015625, 1.7513504028320312, 12.752006530761719, 0.848358154296875, -2.279712677001953, 0.6905899047851562, 0.23590469360351562, 3.5763702392578125, 2.7878570556640625, 27.976829528808594, 4.903083801269531, 9.974746704101562, 18.228675842285156, 10.26519775390625, 5.396284103393555, 5.749155044555664, -2.468475341796875, 0.7417640686035156, 8.800308227539062, 1.4230995178222656, -0.00278472900390625, -0.5769577026367188, 7.824897766113281, 17.34510040283203, 16.48846435546875, 8.936836242675781, -5.140007019042969, 6.381618499755859, 1.6948432922363281, -0.40550994873046875, -1.4428939819335938, 10.530136108398438, 22.212160110473633, 19.88367462158203, -1.7844924926757812, 10.816970825195312, 7.7030029296875, 17.353181838989258, 13.450531005859375, 1.8243408203125, -3.2784862518310547, 9.875812530517578], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000275.npy"}
|
||||
{"epoch": 0.41572184429327286, "step": 276, "batch_size": 64, "mean": 7.4599995613098145, "std": 8.657319068908691, "min": -13.934532165527344, "p10": -2.5876817703247066, "median": 6.126340866088867, "p90": 18.34831295013428, "max": 27.110755920410156, "pos_frac": 0.828125, "sample": [14.768144607543945, 5.117359161376953, -2.7548885345458984, 13.670310974121094, 5.008716583251953, 15.817401885986328, 27.110755920410156, -9.905319213867188, 6.265773773193359, 5.102466583251953, 5.986907958984375, 1.908203125, 10.578475952148438, 11.733146667480469, 7.6950225830078125, 16.8720703125, 3.9680862426757812, 0.2642402648925781, 5.82183837890625, 3.812023162841797, 5.034282684326172, 20.33910369873047, 16.029388427734375, 19.98396110534668, -2.1331100463867188, -6.049959182739258, 13.469749450683594, -3.0805206298828125, 13.607484817504883, 17.691930770874023, 2.8532180786132812, 4.362663269042969, -2.1975326538085938, -10.664762496948242, 17.622528076171875, -4.3799591064453125, 21.039594650268555, -0.7969932556152344, 9.50244140625, 13.598976135253906, -13.934532165527344, 5.6857757568359375, 23.757247924804688, 1.4398212432861328, 0.21234893798828125, 0.2032470703125, 15.610834121704102, 11.1866455078125, 18.629619598388672, 10.870414733886719, -1.9965858459472656, 2.2064971923828125, 9.353340148925781, 4.112613677978516, 6.718391418457031, 22.356216430664062, 16.45657730102539, 5.802181243896484, 6.646614074707031, 12.727306365966797, 2.3666439056396484, 10.494903564453125, 10.789093017578125, 1.0715560913085938], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000276.npy"}
|
||||
{"epoch": 0.41723356009070295, "step": 277, "batch_size": 64, "mean": 4.4561686515808105, "std": 9.12709903717041, "min": -14.540348052978516, "p10": -5.566394805908202, "median": 3.2830820083618164, "p90": 15.846288299560547, "max": 27.30773162841797, "pos_frac": 0.703125, "sample": [11.33864974975586, -7.97540283203125, 14.717039108276367, 1.156158447265625, 10.949724197387695, -2.923503875732422, 3.123760223388672, 3.1663589477539062, 16.58623695373535, 7.241584777832031, -6.072898864746094, -3.110443115234375, 7.347137451171875, 1.9575347900390625, -3.8099441528320312, 4.948444366455078, 3.194314956665039, -2.2752838134765625, 1.6618690490722656, 1.3886451721191406, 6.280052185058594, 9.754234313964844, 1.5620098114013672, 4.6985015869140625, -0.609039306640625, 15.41900634765625, 0.9172515869140625, -3.7660675048828125, 0.1886444091796875, 15.398845672607422, -0.2880821228027344, 7.0552825927734375, -13.76103401184082, 13.158878326416016, 6.359844207763672, 15.737068176269531, -4.384552001953125, -13.443107604980469, 7.004295349121094, 2.5020065307617188, 3.8037261962890625, -8.578479766845703, 24.532127380371094, 17.172958374023438, 6.847099304199219, 3.9391136169433594, 3.3718490600585938, 5.4874725341796875, -12.566337585449219, 27.30773162841797, 11.053016662597656, -3.4062862396240234, 15.893096923828125, 9.812187194824219, -3.829071044921875, 0.7260513305664062, 17.15232276916504, -14.540348052978516, 10.3624267578125, 26.355831146240234, -1.4731712341308594, 1.4443283081054688, -0.1427631378173828, 12.075895309448242], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000277.npy"}
|
||||
{"epoch": 0.41874527588813304, "step": 278, "batch_size": 64, "mean": 6.714378833770752, "std": 10.017135620117188, "min": -20.510513305664062, "p10": -3.617658996582031, "median": 5.233619689941406, "p90": 20.74747772216797, "max": 27.642379760742188, "pos_frac": 0.78125, "sample": [-3.7213516235351562, -6.5606689453125, 17.716140747070312, 5.0823822021484375, 4.2548828125, -1.9804019927978516, -13.788558959960938, 4.0628814697265625, 21.21300506591797, 26.806930541992188, 6.521995544433594, 0.5229949951171875, 20.314453125, 24.97117042541504, 10.736427307128906, -1.1449031829833984, 5.060567855834961, 2.8187084197998047, 10.030563354492188, 20.933059692382812, 6.810436248779297, 10.90219497680664, 3.781005859375, 2.2932357788085938, -20.510513305664062, 27.642379760742188, -2.5406646728515625, 11.93174934387207, 7.307952880859375, 5.708000183105469, 4.279754638671875, 11.035919189453125, -4.911163330078125, -3.3757095336914062, 17.735076904296875, 1.5069808959960938, 9.086181640625, 4.55035400390625, 12.33958625793457, 16.20177459716797, 25.80823516845703, 25.25977325439453, -2.0586814880371094, 7.6754608154296875, 5.4845123291015625, 12.222198486328125, 5.384857177734375, 1.7879905700683594, -10.620933532714844, 0.8648719787597656, 7.808826446533203, 15.6396484375, -11.441940307617188, -0.8642482757568359, -0.10967254638671875, 19.556068420410156, 4.55035400390625, 16.478317260742188, 7.720592498779297, 2.6040878295898438, 3.645305633544922, 0.39654541015625, 15.682876586914062, 0.6203765869140625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000278.npy"}
|
||||
{"epoch": 0.42025699168556313, "step": 279, "batch_size": 64, "mean": 8.823453903198242, "std": 9.609671592712402, "min": -19.260234832763672, "p10": -0.6502477645874023, "median": 7.0437517166137695, "p90": 22.19444961547852, "max": 28.760265350341797, "pos_frac": 0.84375, "sample": [-0.6535282135009766, 5.781726837158203, 28.760265350341797, 5.781318664550781, -19.260234832763672, 16.904434204101562, 3.468536376953125, -2.094146728515625, 0.496490478515625, 8.534172058105469, 6.412712097167969, 19.160446166992188, 7.6486968994140625, 24.266708374023438, 7.588417053222656, 25.155838012695312, 11.540771484375, 19.867481231689453, -5.027841567993164, -0.21927642822265625, -5.727041244506836, 19.754669189453125, 4.066394805908203, 3.8296051025390625, 5.4540863037109375, 8.952842712402344, 6.985319137573242, 0.10279083251953125, 3.3800811767578125, 21.601821899414062, 3.919879913330078, 15.215278625488281, 20.607757568359375, 5.451866149902344, -6.3185577392578125, 17.39019203186035, 13.914379119873047, 8.340057373046875, -0.5972442626953125, 23.978607177734375, 20.764266967773438, 7.102184295654297, 25.95486831665039, 3.0755462646484375, -0.6425933837890625, 7.503547668457031, 2.6131210327148438, 2.7592315673828125, 14.772781372070312, 22.44843292236328, 13.918596267700195, -4.170417785644531, 9.916486740112305, 21.480915069580078, 1.4860000610351562, 1.4745864868164062, 4.31182861328125, 1.8875274658203125, 13.415328979492188, 11.753641128540039, 14.519355773925781, 4.33929443359375, 25.718151092529297, 3.8825836181640625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000279.npy"}
|
||||
{"epoch": 0.4217687074829932, "step": 280, "batch_size": 64, "mean": 9.556790351867676, "std": 9.013075828552246, "min": -6.621433258056641, "p10": -0.9472103118896482, "median": 7.342704772949219, "p90": 23.719522857666014, "max": 29.5714111328125, "pos_frac": 0.859375, "sample": [12.219329833984375, 28.015941619873047, 5.189970016479492, 13.81640625, 1.0120048522949219, 24.827608108520508, 0.7621059417724609, -1.8259124755859375, 2.380979537963867, 14.040420532226562, 9.561325073242188, 5.8843841552734375, 10.961112976074219, 15.1060791015625, 23.74748992919922, 23.654266357421875, 3.9203567504882812, 7.057353973388672, 22.085037231445312, 3.4620895385742188, -1.8351821899414062, 2.6316757202148438, -1.0326461791992188, 2.2000656127929688, 23.329612731933594, 14.047531127929688, 5.282947540283203, 9.258384704589844, 18.632965087890625, 28.0465087890625, 8.875591278076172, -1.7490997314453125, 7.0368804931640625, 9.491806030273438, 1.7044563293457031, 12.073028564453125, 29.5714111328125, -2.1127471923828125, 26.167678833007812, 19.434906005859375, -4.6157989501953125, -6.621433258056641, 0.07828521728515625, -0.1745738983154297, 8.768035888671875, 12.63332748413086, 7.825502395629883, -0.7478599548339844, 6.050788879394531, 6.866672515869141, 24.443517684936523, 5.595941543579102, 5.049633026123047, 18.574108123779297, 10.128366470336914, 5.4545745849609375, 7.628055572509766, 4.863838195800781, 5.274051666259766, 6.4609832763671875, 3.020172119140625, 17.30237579345703, 21.515640258789062, 9.326278686523438], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000280.npy"}
|
||||
{"epoch": 0.42328042328042326, "step": 281, "batch_size": 64, "mean": 8.651966094970703, "std": 10.245503425598145, "min": -16.532461166381836, "p10": -2.5233249664306636, "median": 7.782723426818848, "p90": 22.290011978149415, "max": 27.151611328125, "pos_frac": 0.78125, "sample": [-6.150730133056641, 20.372589111328125, 3.3662967681884766, 2.2777557373046875, 25.046546936035156, 20.678497314453125, 8.352510452270508, 16.450775146484375, 7.2129364013671875, 27.151611328125, 8.581123352050781, 19.68977165222168, 11.020021438598633, 11.77581787109375, 3.6122512817382812, 2.972625732421875, 4.871561050415039, 21.55828094482422, -1.1605815887451172, 16.005043029785156, 20.230159759521484, 3.310749053955078, 6.8359527587890625, 10.406776428222656, 16.239376068115234, 3.495288848876953, 5.694843292236328, 1.0351142883300781, 7.197784423828125, -16.532461166381836, 5.367786407470703, -0.9135799407958984, -0.3939361572265625, 6.428607940673828, 24.318954467773438, -11.084526062011719, 15.343719482421875, 26.930648803710938, -0.444793701171875, -0.3089141845703125, 24.21825408935547, 14.029685974121094, 25.644641876220703, 13.495485305786133, 18.164018630981445, 10.347785949707031, 13.317100524902344, 3.941669464111328, 18.093109130859375, 8.531099319458008, -0.21956634521484375, 0.7904815673828125, 22.60361099243164, -12.704748153686523, 4.551067352294922, 16.30209732055664, 17.876873016357422, -2.785858154296875, 9.723579406738281, 15.951122283935547, -1.9107475280761719, -4.964105606079102, 1.3869667053222656, -9.500043869018555], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000281.npy"}
|
||||
{"epoch": 0.42479213907785335, "step": 282, "batch_size": 64, "mean": 7.88245964050293, "std": 11.297110557556152, "min": -15.51754379272461, "p10": -5.910706329345703, "median": 6.2730712890625, "p90": 24.67159061431885, "max": 30.836410522460938, "pos_frac": 0.765625, "sample": [7.287841796875, -7.0660552978515625, -4.671186447143555, 4.1717987060546875, 1.345184326171875, 14.33755111694336, 12.834041595458984, 5.718513488769531, -1.8991737365722656, 28.2919921875, 16.788917541503906, 24.782350540161133, 8.31252670288086, 9.37237548828125, 21.81487274169922, 5.7055206298828125, 27.876558303833008, 8.054927825927734, -2.724365234375, 11.24679183959961, 2.5730972290039062, 2.426239013671875, 4.691766738891602, 22.219940185546875, 2.4403419494628906, 1.458343505859375, 0.3021812438964844, 19.378896713256836, 29.547454833984375, 7.903789520263672, -10.50362777709961, -3.801950454711914, -15.51754379272461, 18.930509567260742, 2.592996597290039, -6.050926208496094, 21.82762908935547, 2.9050369262695312, -7.9301605224609375, 6.05853271484375, 11.295448303222656, 18.67925453186035, 2.7350311279296875, 11.239322662353516, 24.413150787353516, 30.836410522460938, 10.79372787475586, 1.0268020629882812, -0.8323593139648438, 8.981216430664062, -5.583526611328125, -3.0292892456054688, -4.3073577880859375, 2.786041259765625, -10.208526611328125, 29.51503562927246, 25.660263061523438, 11.502840042114258, -9.766237258911133, 6.48760986328125, 9.555288314819336, 14.899261474609375, 19.81763458251953, 4.94683837890625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000282.npy"}
|
||||
{"epoch": 0.42630385487528344, "step": 283, "batch_size": 64, "mean": 6.637721538543701, "std": 11.160143852233887, "min": -16.2476806640625, "p10": -7.425541687011718, "median": 4.099283218383789, "p90": 22.519212532043458, "max": 28.131874084472656, "pos_frac": 0.734375, "sample": [22.44085693359375, -11.84044075012207, -10.115066528320312, -4.211151123046875, 11.211318969726562, 8.795230865478516, 1.9613494873046875, -0.719573974609375, 25.403236389160156, -0.56927490234375, -4.4633636474609375, 8.656707763671875, 7.7345428466796875, 1.7066421508789062, 18.5413818359375, 4.149127960205078, 1.6224365234375, 23.241981506347656, -13.794601440429688, 1.245391845703125, 22.249351501464844, 9.505096435546875, 20.186126708984375, 1.8287467956542969, 15.079437255859375, 24.949607849121094, 25.338043212890625, 5.483222961425781, 28.131874084472656, 4.622688293457031, 4.197624206542969, 19.88546371459961, -4.688163757324219, 4.0494384765625, -6.0459747314453125, 9.677520751953125, -10.041183471679688, 2.118377685546875, 12.041915893554688, 20.197242736816406, 20.394512176513672, -5.278408050537109, 20.488174438476562, 22.552793502807617, 3.9925537109375, 0.8868446350097656, 2.7699508666992188, -16.2476806640625, 1.9696006774902344, -8.01678466796875, 15.296659469604492, 2.680095672607422, 10.11181640625, -0.1208343505859375, 8.19183349609375, 20.419273376464844, -1.1184158325195312, 1.8356361389160156, 3.741720199584961, 1.4472274780273438, -12.679229736328125, 22.74334144592285, -0.72900390625, 9.719310760498047], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000283.npy"}
|
||||
{"epoch": 0.42781557067271353, "step": 284, "batch_size": 64, "mean": 8.056710243225098, "std": 10.473479270935059, "min": -18.590129852294922, "p10": -0.19255180358886703, "median": 5.803559303283691, "p90": 23.00187377929688, "max": 36.5606689453125, "pos_frac": 0.875, "sample": [2.8741912841796875, 2.784576416015625, 5.670215606689453, 5.646331787109375, 25.78362274169922, 9.27911376953125, -0.027784347534179688, -8.431320190429688, 26.518112182617188, 3.5706863403320312, 2.3922271728515625, 10.998348236083984, 1.7119789123535156, 5.245361328125, 15.46824836730957, -2.9916038513183594, 4.811517715454102, 14.166240692138672, 1.9418182373046875, 36.5606689453125, 2.63214111328125, 8.473739624023438, 2.3406829833984375, -7.3931732177734375, -2.0674209594726562, 10.782356262207031, 2.5579757690429688, 7.035713195800781, 11.70254898071289, 1.8016395568847656, 17.671669006347656, 13.775672912597656, 14.924644470214844, 5.324554443359375, 5.93690299987793, 3.255847930908203, 14.525505065917969, 30.874408721923828, 21.52825927734375, 2.39910888671875, 12.650169372558594, 30.189437866210938, 11.416084289550781, 6.327190399169922, -0.2631664276123047, 0.2640113830566406, 17.0247802734375, 8.165441513061523, 12.330062866210938, 23.6334228515625, 0.9322071075439453, 6.751125335693359, 20.188888549804688, 0.16170501708984375, 3.3252944946289062, 6.0313873291015625, 29.840972900390625, 3.0280303955078125, 9.983779907226562, -18.22834014892578, 13.9383544921875, -18.590129852294922, 4.104316711425781, 0.36908721923828125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000284.npy"}
|
||||
{"epoch": 0.4293272864701436, "step": 285, "batch_size": 64, "mean": 6.776730537414551, "std": 10.729656219482422, "min": -15.044120788574219, "p10": -5.2981315612792965, "median": 3.0216331481933594, "p90": 23.304959869384767, "max": 29.75506591796875, "pos_frac": 0.75, "sample": [8.628120422363281, 2.458465576171875, -2.6064453125, 25.06559181213379, 0.604827880859375, -5.285015106201172, -15.044120788574219, 5.534095764160156, 4.5939178466796875, -8.100032806396484, 14.030860900878906, 24.626239776611328, 17.382423400878906, 9.535308837890625, 22.876373291015625, -6.74378776550293, 14.891868591308594, 26.98276138305664, -1.8859596252441406, 2.4519615173339844, 1.9985313415527344, 0.5382404327392578, -5.303752899169922, -10.668140411376953, 2.687946319580078, -7.859527587890625, -2.6653594970703125, 2.8095550537109375, 10.62841796875, -4.742305755615234, -2.9638519287109375, 21.885513305664062, -4.405632019042969, 24.344268798828125, 16.360443115234375, 10.219501495361328, 11.736152648925781, 5.3881988525390625, 19.862112045288086, 12.166763305664062, 13.199443817138672, 3.6677017211914062, -4.2501068115234375, 27.47631072998047, 16.544029235839844, 3.2337112426757812, 1.1044254302978516, 29.75506591796875, 0.9329376220703125, 0.12619400024414062, 2.6902294158935547, 16.05925750732422, 2.2611923217773438, 17.457725524902344, 1.6234588623046875, 2.054872512817383, 23.48863983154297, 22.062347412109375, -6.1255950927734375, 1.9799613952636719, -1.287384033203125, 6.304412841796875, 0.75054931640625, 10.58685302734375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000285.npy"}
|
||||
{"epoch": 0.4308390022675737, "step": 286, "batch_size": 64, "mean": 10.862764358520508, "std": 11.276802062988281, "min": -14.631645202636719, "p10": -0.7107364654541012, "median": 8.331661224365234, "p90": 27.510255432128908, "max": 33.14496612548828, "pos_frac": 0.875, "sample": [12.242683410644531, 0.6274394989013672, 0.8915233612060547, 13.438751220703125, 3.817901611328125, 19.492919921875, 3.0116653442382812, 4.938077926635742, 6.914396286010742, -1.7895355224609375, 16.130905151367188, 7.390224456787109, 7.725822448730469, 21.745729446411133, 14.100425720214844, 32.132049560546875, 20.475421905517578, 5.782541275024414, -14.295143127441406, 6.162483215332031, 14.777713775634766, 0.9446449279785156, 5.926395416259766, 28.637699127197266, 1.2347984313964844, -10.864311218261719, 19.43697738647461, 18.66613006591797, -0.3208770751953125, 13.981277465820312, 19.53864860534668, 16.66474151611328, 27.897308349609375, 27.776412963867188, 13.600654602050781, 26.8404541015625, 5.886203765869141, 23.84429931640625, -3.0539283752441406, 9.292465209960938, 28.073265075683594, 29.924560546875, 2.7548980712890625, 1.4664459228515625, 13.20608139038086, 8.064620971679688, 1.6520423889160156, -1.25140380859375, 33.14496612548828, 25.953954696655273, 26.88922119140625, 8.598701477050781, 2.408855438232422, 2.359067916870117, -0.8778190612792969, 8.612564086914062, 20.258750915527344, 6.62353515625, -14.631645202636719, 26.7838134765625, 7.96796989440918, 0.6616783142089844, 10.870620727539062, 4.058197021484375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000286.npy"}
|
||||
{"epoch": 0.4323507180650038, "step": 287, "batch_size": 64, "mean": 10.706443786621094, "std": 12.022651672363281, "min": -21.006425857543945, "p10": -0.8954349517822264, "median": 7.555463790893555, "p90": 28.572080993652346, "max": 36.02311706542969, "pos_frac": 0.859375, "sample": [13.637012481689453, 2.6847076416015625, 4.518396377563477, 19.718738555908203, 33.363525390625, 25.015945434570312, 1.3940505981445312, 10.075857162475586, 11.002906799316406, -2.2524490356445312, 11.919136047363281, 6.556026458740234, 7.385051727294922, 8.873825073242188, 26.815345764160156, 33.506507873535156, 28.76336669921875, 4.389507293701172, 23.10968017578125, 11.269580841064453, 11.172645568847656, 8.77825927734375, 19.946949005126953, 2.5924816131591797, 3.4483108520507812, 21.486114501953125, -0.19304275512695312, -6.9540252685546875, 19.494747161865234, 1.4897346496582031, 21.991859436035156, 1.43011474609375, -21.006425857543945, -0.8141326904296875, -2.565317153930664, 12.111038208007812, 4.797523498535156, 1.7868843078613281, 7.7258758544921875, 21.323959350585938, 23.21587371826172, 0.1015472412109375, 28.125747680664062, 20.343839645385742, -8.411048889160156, 2.8535308837890625, 36.02311706542969, 3.226825714111328, 23.744529724121094, 19.41912841796875, 1.0728893280029297, 1.1253128051757812, 12.815187454223633, 3.0234527587890625, -0.9302787780761719, 5.010169982910156, 32.25065994262695, 5.306709289550781, 5.41845703125, 2.794818878173828, -1.9278945922851562, 28.8890380859375, 31.603303909301758, 0.32723236083984375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000287.npy"}
|
||||
{"epoch": 0.43386243386243384, "step": 288, "batch_size": 64, "mean": 7.2326507568359375, "std": 11.79343032836914, "min": -18.71167755126953, "p10": -7.687732696533203, "median": 6.749593734741211, "p90": 25.22388916015625, "max": 33.382667541503906, "pos_frac": 0.71875, "sample": [10.402666091918945, 1.9856338500976562, -8.75372314453125, -18.71167755126953, 11.965049743652344, 4.508644104003906, 6.671154022216797, -6.521080017089844, 14.3551025390625, 6.271867752075195, 6.828033447265625, 10.896574020385742, 1.79278564453125, 25.28966522216797, -5.59442138671875, 15.7593994140625, -7.8009033203125, 11.34915542602539, -2.3191375732421875, 11.462509155273438, -13.192157745361328, 27.45946502685547, 30.52814483642578, 28.994346618652344, -2.557708740234375, 30.441238403320312, 6.652027130126953, 13.155803680419922, 0.6262283325195312, -0.13605308532714844, 33.382667541503906, -9.536788940429688, 7.961517333984375, 11.139652252197266, 27.8319091796875, 11.040332794189453, 8.359649658203125, -7.4219207763671875, 6.284751892089844, -0.633056640625, -7.423667907714844, -12.93536376953125, 25.070411682128906, 2.2079391479492188, 2.2721786499023438, -8.783660888671875, 15.581886291503906, -0.4803314208984375, 15.44932746887207, 17.404571533203125, 18.435928344726562, 0.909149169921875, 12.49725341796875, 6.196990966796875, -3.7419490814208984, -1.3448600769042969, 9.009902954101562, 8.760612487792969, 0.6655025482177734, 22.712265014648438, 3.105356216430664, 19.361366271972656, 7.3750457763671875, 10.366447448730469], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000288.npy"}
|
||||
{"epoch": 0.43537414965986393, "step": 289, "batch_size": 64, "mean": 7.838686466217041, "std": 12.895816802978516, "min": -18.8502197265625, "p10": -5.6362152099609375, "median": 5.478973388671875, "p90": 27.291699600219726, "max": 36.609031677246094, "pos_frac": 0.734375, "sample": [5.775476455688477, 1.6977500915527344, 32.57862854003906, -5.222806930541992, -4.9519500732421875, -5.723236083984375, -2.9611167907714844, 8.291694641113281, 16.888317108154297, 6.606880187988281, 3.297149658203125, 11.011856079101562, 15.215291976928711, 19.855979919433594, 10.702072143554688, 1.1029644012451172, 18.0699462890625, -13.38812255859375, -3.6537513732910156, 27.280380249023438, 4.04559326171875, 30.869163513183594, 0.20143890380859375, -5.43316650390625, 16.650428771972656, 3.027994155883789, 27.630531311035156, 1.3805427551269531, -1.9511070251464844, 30.695941925048828, 5.2316131591796875, 5.1233062744140625, -4.666271209716797, -18.8502197265625, 36.609031677246094, -3.154634475708008, 0.47834205627441406, 33.840545654296875, 6.9244384765625, 5.7263336181640625, 6.579030990600586, 15.172012329101562, -3.6288528442382812, 3.8689346313476562, 9.539331436157227, -7.802202224731445, 7.837581634521484, 0.3930015563964844, -6.9100494384765625, -15.091049194335938, 20.60953712463379, 10.50933837890625, 3.943756103515625, 5.8342437744140625, 27.121063232421875, 27.296550750732422, 18.55683135986328, 23.867385864257812, 21.90496826171875, 5.1362457275390625, 3.3746299743652344, -10.323770523071289, 20.91339111328125, -3.879222869873047], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000289.npy"}
|
||||
{"epoch": 0.436885865457294, "step": 290, "batch_size": 64, "mean": 9.523682594299316, "std": 13.095941543579102, "min": -22.49258804321289, "p10": -5.3986572265624995, "median": 7.887912750244141, "p90": 29.491426086425783, "max": 31.926673889160156, "pos_frac": 0.765625, "sample": [20.515487670898438, 13.987205505371094, 17.894012451171875, 30.61871337890625, 12.077449798583984, 2.147989273071289, -4.998332977294922, 0.7756862640380859, 1.5819091796875, 25.145591735839844, 9.691452026367188, 0.3003101348876953, -0.3740386962890625, -8.177238464355469, 11.565086364746094, 7.6733245849609375, -6.141807556152344, -5.570224761962891, -1.7505226135253906, -1.7476882934570312, 13.002580642700195, 18.867897033691406, 31.2740478515625, 27.794570922851562, 1.9598617553710938, 4.7166290283203125, 6.507144927978516, -2.4797821044921875, 6.752431869506836, -9.11761474609375, 25.953155517578125, 29.845829010009766, 21.95267677307129, 5.2481689453125, 11.243431091308594, 31.926673889160156, 29.611572265625, 8.102500915527344, 9.393836975097656, -17.165191650390625, 29.54998779296875, 25.516313552856445, 0.034046173095703125, -22.49258804321289, 0.17783355712890625, -1.08941650390625, -0.5441265106201172, 31.61071014404297, 25.499061584472656, 24.97736930847168, 1.0281333923339844, 5.74346923828125, 1.4455032348632812, 5.492835998535156, -12.718170166015625, 23.197036743164062, 9.414436340332031, 29.354782104492188, 9.6795654296875, 7.544654846191406, 14.076488494873047, 12.257537841796875, 10.946563720703125, -1.7911376953125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000290.npy"}
|
||||
{"epoch": 0.4383975812547241, "step": 291, "batch_size": 64, "mean": 9.23512077331543, "std": 12.911935806274414, "min": -24.116525650024414, "p10": -4.595977401733397, "median": 6.91021728515625, "p90": 29.418890380859374, "max": 34.931365966796875, "pos_frac": 0.734375, "sample": [23.473663330078125, 30.211551666259766, 2.8157806396484375, -5.8704986572265625, 2.779998779296875, 9.224868774414062, -3.1980361938476562, 1.3319664001464844, 30.604507446289062, -0.5618515014648438, 16.85955047607422, 18.064687728881836, -7.793235778808594, -2.3788318634033203, 19.849092483520508, 24.845298767089844, -2.3111953735351562, -2.127593994140625, 6.0193634033203125, 12.214042663574219, -1.8342666625976562, 2.17041015625, 9.085342407226562, 8.402732849121094, 14.64053726196289, 23.378101348876953, 24.072479248046875, 0.417388916015625, 32.601016998291016, 2.7791290283203125, 12.723426818847656, 34.931365966796875, 0.3844337463378906, -0.9328460693359375, -5.041938781738281, -0.5292263031005859, 17.874767303466797, 4.288482666015625, 18.886878967285156, 6.1810760498046875, 3.2672805786132812, 29.415451049804688, -24.116525650024414, 9.767135620117188, 29.841232299804688, 7.34088134765625, 20.108245849609375, 28.240589141845703, 13.396705627441406, 31.319305419921875, -3.555400848388672, 3.27813720703125, -7.757381439208984, 6.47955322265625, -16.123159408569336, 1.6526451110839844, 3.5845413208007812, 29.420364379882812, 8.853286743164062, 13.632770538330078, 21.440589904785156, 9.764841079711914, -0.6497955322265625, -6.085960388183594], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000291.npy"}
|
||||
{"epoch": 0.4399092970521542, "step": 292, "batch_size": 64, "mean": 8.707426071166992, "std": 12.993090629577637, "min": -20.774246215820312, "p10": -3.5116315841674792, "median": 7.582881927490234, "p90": 23.625995445251466, "max": 34.928619384765625, "pos_frac": 0.78125, "sample": [-20.774246215820312, -1.947357177734375, 21.289304733276367, -0.15237808227539062, 30.62552833557129, 20.172792434692383, 5.932991027832031, 29.306549072265625, 15.828218460083008, -1.245147705078125, -3.9737091064453125, 1.5400562286376953, 11.449251174926758, 7.396568298339844, -11.381210327148438, -1.4640884399414062, 17.736984252929688, 21.028945922851562, 21.926498413085938, 0.143280029296875, 12.899810791015625, 15.69808578491211, 13.864322662353516, -12.299583435058594, 3.737030029296875, 2.5303115844726562, 0.3297576904296875, 23.691917419433594, -1.7241973876953125, 27.100393295288086, 4.111534118652344, 19.65434455871582, 1.976663589477539, 1.0488433837890625, 0.1536102294921875, -18.52033233642578, 23.384204864501953, -20.115646362304688, 1.4004745483398438, 8.127738952636719, 1.6003875732421875, 21.68280792236328, 17.678796768188477, -2.433450698852539, 34.928619384765625, 23.472177505493164, 7.769195556640625, 11.972299575805664, 8.287092208862305, 4.349864959716797, 19.244476318359375, 21.470497131347656, 3.061281204223633, 20.960933685302734, 10.848396301269531, 4.762580871582031, 12.999439239501953, 4.980945587158203, 33.86330032348633, 8.593368530273438, 0.5687255859375, -1.0238037109375, -14.021747589111328, 31.170982360839844], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000292.npy"}
|
||||
{"epoch": 0.4414210128495843, "step": 293, "batch_size": 64, "mean": 11.13209342956543, "std": 14.966391563415527, "min": -29.68529510498047, "p10": -6.758627319335937, "median": 10.219833374023438, "p90": 30.189613342285156, "max": 37.0955810546875, "pos_frac": 0.75, "sample": [2.9746322631835938, 9.066436767578125, 6.184833526611328, 30.283340454101562, 19.293212890625, 7.0906524658203125, -11.652849197387695, 8.776273727416992, 11.37322998046875, 6.0772552490234375, 28.126113891601562, -1.8723430633544922, 30.698989868164062, 21.36517333984375, 22.003448486328125, 1.4395523071289062, 26.304086685180664, 15.800552368164062, 25.687532424926758, 21.794151306152344, 14.619743347167969, -9.976242065429688, -2.3747787475585938, -16.96136474609375, 28.049530029296875, 1.2713394165039062, 18.034683227539062, -6.8875885009765625, 7.790077209472656, 37.0955810546875, 2.1857681274414062, 35.23786163330078, 29.970916748046875, -14.044666290283203, 19.274948120117188, -2.7171783447265625, 29.215784072875977, -29.68529510498047, -0.027942657470703125, -1.3368682861328125, 8.044395446777344, 33.29576110839844, -14.969219207763672, -2.973651885986328, 0.291656494140625, -1.6782684326171875, 11.428146362304688, 23.04506492614746, 37.072601318359375, 7.827842712402344, 27.771129608154297, 1.4984283447265625, 13.407514572143555, 1.8889045715332031, 22.169517517089844, -6.4577178955078125, 22.33038330078125, 4.418243408203125, 15.728342056274414, 34.781463623046875, 12.225494384765625, -0.14929580688476562, 20.167984008789062, 21.740631103515625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000293.npy"}
|
||||
{"epoch": 0.4429327286470144, "step": 294, "batch_size": 64, "mean": 9.196502685546875, "std": 13.145197868347168, "min": -19.548980712890625, "p10": -2.242462730407715, "median": 5.51706600189209, "p90": 31.490595626831055, "max": 38.65443420410156, "pos_frac": 0.765625, "sample": [33.192115783691406, 6.1366424560546875, 16.296905517578125, 1.509033203125, 28.01500701904297, 6.776824951171875, 35.531803131103516, 3.8039398193359375, -17.022109985351562, 3.7833404541015625, 17.66071319580078, -19.548980712890625, -7.66020393371582, 11.809295654296875, 6.656038284301758, 22.51551055908203, 31.637882232666016, 6.504417419433594, 28.798709869384766, 10.538017272949219, 3.6972732543945312, 3.0187454223632812, 4.125185012817383, 14.033500671386719, -1.6565170288085938, 38.65443420410156, -4.273296356201172, 12.801080703735352, -2.2731781005859375, 35.56000518798828, 35.07830810546875, -0.9246253967285156, -0.2940196990966797, 31.146926879882812, -2.1052284240722656, 1.5809097290039062, 5.3119049072265625, -2.0222301483154297, 1.2216339111328125, 11.479461669921875, 2.6327285766601562, -1.6414794921875, 21.43328285217285, 2.179454803466797, 5.722227096557617, 17.499053955078125, 6.085231781005859, 13.770977020263672, 2.194131851196289, 32.501956939697266, -5.673519134521484, 18.81270980834961, 20.038955688476562, 0.8190841674804688, 1.976613998413086, 15.385589599609375, 1.318511962890625, -2.1707935333251953, 0.7415618896484375, 1.4362030029296875, 18.39728546142578, -6.299995422363281, 12.09140396118164, -1.7701416015625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000294.npy"}
|
||||
{"epoch": 0.4444444444444444, "step": 295, "batch_size": 64, "mean": 7.963223934173584, "std": 13.642413139343262, "min": -17.350933074951172, "p10": -5.6685686111450195, "median": 4.073439598083496, "p90": 26.613914871215822, "max": 39.166656494140625, "pos_frac": 0.71875, "sample": [27.343276977539062, -1.7519683837890625, -0.40766143798828125, 0.6120223999023438, 15.7354736328125, 22.915069580078125, 26.484718322753906, 4.087726593017578, 7.0908966064453125, 2.5052642822265625, 36.78337860107422, -5.5016021728515625, 37.035552978515625, 0.4706878662109375, 10.351844787597656, 9.870033264160156, 2.086639404296875, 1.945404052734375, -3.0292510986328125, 23.87158966064453, 26.66928482055664, -11.703285217285156, 0.8400955200195312, 2.881704330444336, 15.832733154296875, -3.2020721435546875, 2.780252456665039, -5.74012565612793, 19.86181640625, 4.522590637207031, -5.1440277099609375, -2.1427440643310547, 39.166656494140625, 18.159072875976562, 11.573970794677734, -13.039337158203125, 24.453933715820312, 7.9437255859375, 13.618038177490234, 31.661762237548828, 20.203033447265625, -17.350933074951172, 5.10980224609375, -17.0086669921875, 18.584381103515625, 15.003135681152344, -11.543354034423828, 0.48987579345703125, -15.062074661254883, 8.228212356567383, -1.3428707122802734, 2.33349609375, 1.811126708984375, 1.1559829711914062, 3.011991500854492, 4.059152603149414, 23.224937438964844, -0.5941123962402344, 4.724281311035156, -3.429168701171875, 21.625885009765625, 19.507247924804688, 31.845199584960938, -2.4333629608154297], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000295.npy"}
|
||||
{"epoch": 0.4459561602418745, "step": 296, "batch_size": 64, "mean": 10.102575302124023, "std": 13.797873497009277, "min": -17.586334228515625, "p10": -6.8363815307617175, "median": 7.246437072753906, "p90": 29.39590320587159, "max": 39.15061950683594, "pos_frac": 0.796875, "sample": [14.321762084960938, 24.191680908203125, 24.656829833984375, 6.087018966674805, -0.8944549560546875, 1.3371124267578125, 1.931243896484375, 36.56825256347656, 4.657752990722656, 12.778030395507812, 11.77145767211914, 9.282817840576172, -11.722490310668945, 35.555694580078125, 1.6204376220703125, 12.341690063476562, 22.558639526367188, 20.08809471130371, 3.5590133666992188, 10.854331970214844, 2.5779342651367188, 5.299709320068359, 6.6144561767578125, -14.07501220703125, -2.919525146484375, -0.9363555908203125, 1.2680892944335938, 13.731559753417969, 4.840141296386719, 8.70916748046875, 2.5239810943603516, -9.469276428222656, 7.87841796875, 6.0383148193359375, 36.84088134765625, 0.5499725341796875, -0.9487075805664062, -8.462360382080078, 22.63324737548828, -7.3515167236328125, 11.637771606445312, 15.30889892578125, 39.15061950683594, -17.586334228515625, 20.111358642578125, 5.787353515625, 22.45101547241211, 22.421524047851562, 17.05562400817871, 25.377098083496094, 20.209915161132812, 1.352325439453125, 30.280393600463867, 4.707912445068359, 38.084877014160156, 0.60797119140625, 1.4394378662109375, 33.91465759277344, -2.8361358642578125, -5.6343994140625, -14.796733856201172, 27.33209228515625, 22.024688720703125, 11.274871826171875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000296.npy"}
|
||||
{"epoch": 0.4474678760393046, "step": 297, "batch_size": 64, "mean": 10.525908470153809, "std": 12.543261528015137, "min": -20.9013614654541, "p10": -1.8027938842773437, "median": 7.924764633178711, "p90": 28.056263732910157, "max": 33.22644805908203, "pos_frac": 0.8125, "sample": [18.582714080810547, -11.225051879882812, -0.9149551391601562, 21.76715850830078, 8.521270751953125, 9.492431640625, 5.715110778808594, 28.077402114868164, -1.0898551940917969, -20.9013614654541, 17.210941314697266, 0.470947265625, 6.65972900390625, 7.328258514404297, 3.0028018951416016, 31.846418380737305, 8.884651184082031, 16.645771026611328, 0.1848926544189453, -0.6695098876953125, 10.363845825195312, -1.7106475830078125, -7.9100189208984375, -9.231674194335938, -1.9130134582519531, 29.921798706054688, 6.761070251464844, 1.78497314453125, 26.6374454498291, 1.7518310546875, 28.006940841674805, 33.22644805908203, 1.2902450561523438, -3.171022415161133, 0.08340072631835938, 6.56342887878418, 4.564235687255859, 1.701690673828125, 27.587722778320312, 18.306358337402344, 10.810928344726562, 9.504167556762695, 23.874008178710938, 16.79679298400879, 5.71820068359375, 15.577873229980469, 2.3129501342773438, 0.2602691650390625, -1.84228515625, 31.72231101989746, 6.775302886962891, 25.121212005615234, 24.899517059326172, 16.751632690429688, -1.4312248229980469, 10.852312088012695, 27.630050659179688, 17.855667114257812, 1.3974838256835938, 32.76428985595703, 1.8770904541015625, 13.939990997314453, 33.131553649902344, 23.153221130371094], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000297.npy"}
|
||||
{"epoch": 0.4489795918367347, "step": 298, "batch_size": 64, "mean": 9.072603225708008, "std": 15.844767570495605, "min": -26.92255210876465, "p10": -9.863449096679688, "median": 7.058881759643555, "p90": 32.07016258239746, "max": 42.972904205322266, "pos_frac": 0.75, "sample": [11.204170227050781, -20.50432014465332, 42.972904205322266, 4.6023101806640625, 32.07444381713867, 2.0149993896484375, 21.28645896911621, 1.5053176879882812, -9.545509338378906, 14.434268951416016, 37.56631088256836, 29.141067504882812, -9.890914916992188, 10.21270751953125, -6.5875244140625, 19.23073959350586, 13.914276123046875, -11.306076049804688, 26.427030563354492, 22.943328857421875, 1.4564170837402344, -3.4206771850585938, 26.658248901367188, 13.0740966796875, 6.273490905761719, 8.678741455078125, -21.273582458496094, 8.488790512084961, 13.870361328125, 35.25929260253906, 3.2218399047851562, 34.39869689941406, 1.6840667724609375, 11.519920349121094, 9.34841537475586, 25.75644302368164, 4.470441818237305, 35.45000457763672, 3.510866165161133, 3.9827041625976562, 4.4705810546875, -4.136285781860352, -7.308719635009766, 7.178413391113281, 6.939350128173828, 29.467498779296875, 32.06017303466797, 3.9894638061523438, -4.693073272705078, 3.756927490234375, 12.38528823852539, -0.0820159912109375, -6.347629547119141, -12.170116424560547, 1.5929031372070312, 24.08209991455078, -15.799449920654297, 16.789520263671875, -9.799362182617188, -26.92255210876465, 27.5654296875, 7.727985382080078, 2.379913330078125, 33.41571044921875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000298.npy"}
|
||||
{"epoch": 0.4504913076341648, "step": 299, "batch_size": 64, "mean": 7.249475002288818, "std": 15.784322738647461, "min": -23.958297729492188, "p10": -14.537446212768554, "median": 4.898468017578125, "p90": 27.672858047485352, "max": 42.4228515625, "pos_frac": 0.703125, "sample": [18.863689422607422, 16.83954620361328, 1.0694446563720703, -8.841873168945312, -3.200693130493164, 24.393898010253906, 6.059913635253906, 7.800296783447266, 42.4228515625, 0.10346412658691406, -14.804767608642578, 7.338081359863281, -23.958297729492188, 2.5254058837890625, -22.205249786376953, -2.700653076171875, 4.185333251953125, 27.853046417236328, 18.322525024414062, -15.28875732421875, 28.749542236328125, -4.601062774658203, 8.897357940673828, 12.65555191040039, -13.810470581054688, -16.686424255371094, 2.6532058715820312, 2.4482765197753906, 37.98503112792969, -0.144256591796875, 25.327972412109375, 4.638267517089844, 13.724784851074219, -21.892772674560547, 0.27948760986328125, 5.111846923828125, 14.716598510742188, 32.749385833740234, 4.127632141113281, 4.3569488525390625, 17.479949951171875, -2.5686264038085938, 23.306793212890625, 23.832239151000977, 34.26618957519531, -18.2025146484375, 7.703916549682617, 27.252418518066406, 22.77478790283203, 19.60773468017578, 0.32346153259277344, 4.685089111328125, 18.04490852355957, -13.9136962890625, 3.9335479736328125, 21.659992218017578, -9.37615966796875, 17.73382568359375, -0.8513050079345703, 5.641571044921875, -0.35230445861816406, 9.180198669433594, -12.784687042236328, 36.52495574951172], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000299.npy"}
|
||||
{"epoch": 0.4520030234315949, "step": 300, "batch_size": 64, "mean": 9.58050537109375, "std": 15.349577903747559, "min": -26.923553466796875, "p10": -11.08162040710449, "median": 9.672480583190918, "p90": 27.860451507568364, "max": 37.575660705566406, "pos_frac": 0.75, "sample": [9.285928726196289, 32.24319839477539, -0.6964874267578125, 23.227256774902344, 3.259113311767578, 0.372894287109375, 22.880409240722656, 7.873443603515625, 0.9928092956542969, -19.837827682495117, 36.78497314453125, 15.610694885253906, 24.230926513671875, 25.782806396484375, 37.4392204284668, 17.966445922851562, 28.668136596679688, -4.604728698730469, -6.6581878662109375, 16.417285919189453, 2.3232784271240234, 21.723060607910156, 15.135498046875, -9.219470977783203, -17.233444213867188, 16.276344299316406, 18.961864471435547, -6.021980285644531, 14.621192932128906, 12.474397659301758, 3.5050926208496094, -17.297607421875, 22.121978759765625, -0.6794853210449219, -26.923553466796875, 14.458415985107422, 5.18617057800293, 37.575660705566406, 0.40608978271484375, 28.147315979003906, 13.19781494140625, 10.059032440185547, -2.37994384765625, 34.15303039550781, 22.75458526611328, -11.879684448242188, 1.2844810485839844, 18.15981674194336, 2.3488311767578125, 26.329517364501953, 18.6942138671875, 2.6897811889648438, 8.330093383789062, 7.6296539306640625, -2.340789794921875, -15.85696029663086, 10.088539123535156, 25.912063598632812, 23.229236602783203, -21.19212532043457, 2.2511138916015625, 9.226356506347656, 27.19110107421875, -3.5065689086914062], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000300.npy"}
|
||||
{"epoch": 0.45351473922902497, "step": 301, "batch_size": 64, "mean": 8.974075317382812, "std": 15.472105026245117, "min": -29.79922103881836, "p10": -6.908158493041992, "median": 7.53535270690918, "p90": 30.733190917968752, "max": 44.15673828125, "pos_frac": 0.734375, "sample": [23.70586395263672, 9.980300903320312, 11.323966979980469, 7.400745391845703, 12.383209228515625, -4.7981719970703125, 24.065444946289062, 34.21677017211914, 19.892351150512695, 2.1625442504882812, 5.976240158081055, 5.589174270629883, 8.908500671386719, -6.828052520751953, -24.213218688964844, 5.0011749267578125, -0.5361328125, 21.359451293945312, 24.307167053222656, 1.7234001159667969, -13.605289459228516, 9.642578125, 40.50410079956055, -5.267814636230469, 20.419363021850586, 2.4117164611816406, 1.6454448699951172, 30.87242889404297, 21.964157104492188, -6.5367584228515625, -4.062633514404297, 2.3939056396484375, 15.531475067138672, 31.014495849609375, -8.406120300292969, -0.4266510009765625, -6.9424896240234375, 44.15673828125, -12.91081428527832, 5.561992645263672, -3.8514347076416016, 8.256277084350586, 7.835639953613281, -23.162757873535156, 1.7140483856201172, 23.302766799926758, 26.738245010375977, 12.724319458007812, 3.516876220703125, 34.02784729003906, 10.491949081420898, 26.75531005859375, 14.774776458740234, -0.5886955261230469, -4.096946716308594, 9.908889770507812, 27.918922424316406, 30.408302307128906, 7.669960021972656, 6.3383026123046875, 0.0965728759765625, -29.79922103881836, 31.240684509277344, 2.5396270751953125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000301.npy"}
|
||||
{"epoch": 0.455026455026455, "step": 302, "batch_size": 64, "mean": 6.349787712097168, "std": 14.279735565185547, "min": -28.550079345703125, "p10": -7.6830766677856435, "median": 5.010135650634766, "p90": 27.386120414733888, "max": 40.5499267578125, "pos_frac": 0.71875, "sample": [9.61948013305664, 37.110774993896484, -3.5943832397460938, 40.5499267578125, 22.896453857421875, 6.477838516235352, 35.564151763916016, -4.9512481689453125, 29.913597106933594, 5.3900604248046875, 9.916175842285156, 12.99639892578125, 3.8246421813964844, -0.8258934020996094, 18.717945098876953, 32.30701446533203, 0.9055061340332031, 3.1578598022460938, 1.5015716552734375, -4.264190673828125, 4.737560272216797, -20.689796447753906, -19.665374755859375, 7.444091796875, 12.460861206054688, -6.682710647583008, 4.1319427490234375, 14.298316955566406, -3.4779396057128906, -1.16790771484375, -5.79547119140625, 12.165237426757812, 12.265419006347656, 2.7480201721191406, -17.608076095581055, 26.70590591430664, 30.579360961914062, -1.1672744750976562, 1.5398101806640625, 12.367626190185547, 14.46983528137207, 27.677640914916992, 0.5648345947265625, 8.768814086914062, 4.2974853515625, 5.8232574462890625, -24.38052749633789, -2.7639541625976562, 4.1155242919921875, 8.575233459472656, 18.584367752075195, 6.65155029296875, 3.9026947021484375, 3.4657745361328125, 5.282711029052734, -11.997795104980469, -5.4805145263671875, 23.823318481445312, 0.0631103515625, -8.111804962158203, -28.550079345703125, 5.8746337890625, 17.275848388671875, 6.051189422607422], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000302.npy"}
|
||||
{"epoch": 0.4565381708238851, "step": 303, "batch_size": 64, "mean": 12.354693412780762, "std": 16.908199310302734, "min": -22.32119369506836, "p10": -11.740863800048828, "median": 11.375539779663086, "p90": 33.66129112243653, "max": 43.53224182128906, "pos_frac": 0.71875, "sample": [-1.8946075439453125, 10.754318237304688, 25.300430297851562, 10.514528274536133, 43.4249267578125, 32.007049560546875, 27.7637939453125, -9.783605575561523, 24.459688186645508, 11.380172729492188, 30.452247619628906, -14.455024719238281, 11.370906829833984, -0.7161579132080078, -16.983444213867188, -14.872169494628906, -15.660324096679688, 24.0594482421875, -22.32119369506836, -1.446075439453125, -0.271392822265625, 7.810552597045898, 34.279266357421875, 16.54800033569336, -15.579307556152344, 14.382980346679688, 25.152780532836914, -11.296340942382812, -1.8051738739013672, 16.493881225585938, 14.739673614501953, 5.3889617919921875, 4.477943420410156, -2.555145263671875, 5.020622253417969, 3.430187225341797, 32.2193489074707, 3.9446258544921875, 27.45102310180664, 19.968307495117188, 7.9359588623046875, 24.866683959960938, 19.732864379882812, 31.2672119140625, 1.6979904174804688, 25.384613037109375, 39.730690002441406, 15.881858825683594, 16.49029541015625, 30.245689392089844, -3.6568756103515625, -0.5438594818115234, -11.931373596191406, 4.4019775390625, 38.33087158203125, 22.147598266601562, 9.417905807495117, 7.9534912109375, 25.381614685058594, 37.556827545166016, 43.53224182128906, -11.170469284057617, 27.724395751953125, 35.16648864746094], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000303.npy"}
|
||||
{"epoch": 0.4580498866213152, "step": 304, "batch_size": 64, "mean": 11.962800979614258, "std": 13.054464340209961, "min": -23.904281616210938, "p10": -1.568632125854492, "median": 10.300350189208984, "p90": 29.53509750366211, "max": 40.24443817138672, "pos_frac": 0.828125, "sample": [26.958423614501953, -1.3729133605957031, 8.9541015625, 27.077774047851562, 1.5141067504882812, 16.692092895507812, 12.699623107910156, 18.400360107421875, -0.4502544403076172, 9.561456680297852, 2.7063522338867188, 25.038558959960938, 20.47198486328125, 12.253387451171875, 28.925434112548828, 10.681747436523438, -1.6525115966796875, 7.804393768310547, 34.77593231201172, -23.904281616210938, 3.8377227783203125, 25.861007690429688, 7.248016357421875, 15.006050109863281, 1.2731246948242188, 3.6354598999023438, 1.86981201171875, 14.789409637451172, 8.882789611816406, 31.39217185974121, 21.152191162109375, 0.6676616668701172, 30.415008544921875, 11.696281433105469, 0.9989166259765625, -18.503982543945312, 14.340373992919922, 23.83885955810547, 8.33013916015625, 18.553695678710938, -0.14646530151367188, -2.6044578552246094, 40.24443817138672, 9.918952941894531, 1.681427001953125, 2.7639541625976562, 14.946922302246094, 23.563339233398438, 29.640357971191406, 34.455322265625, -5.17413330078125, -1.2316055297851562, 29.28948974609375, 0.4654808044433594, 21.158538818359375, -4.005893707275391, 6.926769256591797, 34.90937042236328, 22.43146514892578, 13.342124938964844, 7.66656494140625, -2.3394813537597656, 6.033477783203125, 19.262836456298828], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000304.npy"}
|
||||
{"epoch": 0.4595616024187453, "step": 305, "batch_size": 64, "mean": 9.583664894104004, "std": 16.292339324951172, "min": -36.77069091796875, "p10": -11.945928955078124, "median": 8.334318161010742, "p90": 30.71298179626465, "max": 41.378292083740234, "pos_frac": 0.734375, "sample": [29.200546264648438, 2.4857406616210938, -1.461517333984375, -16.029037475585938, 0.6436309814453125, 0.9384346008300781, 41.378292083740234, -12.489692687988281, 14.250320434570312, 27.571075439453125, 3.8931045532226562, 36.05862045288086, -1.737884521484375, 20.525516510009766, -16.475814819335938, -0.3937053680419922, 10.190559387207031, 2.2054901123046875, -14.47784423828125, 20.209274291992188, -0.3252677917480469, 9.575685501098633, 5.761344909667969, 18.643569946289062, 6.057464599609375, 20.3179931640625, 20.845245361328125, -0.0101165771484375, 16.06646728515625, -36.77069091796875, 4.316764831542969, 14.393766403198242, 30.85744857788086, 25.83617401123047, -3.2684249877929688, 4.867712020874023, 40.511505126953125, 5.21063232421875, 3.590208053588867, 10.294342041015625, 2.11102294921875, 2.73333740234375, 16.492141723632812, 37.25880432128906, 30.375892639160156, -3.2430877685546875, 8.232063293457031, -15.721084594726562, 31.185211181640625, 21.997333526611328, 15.643264770507812, 16.36803436279297, 8.436573028564453, 34.480491638183594, 18.971481323242188, 28.292938232421875, -4.7801055908203125, 10.979873657226562, -24.631107330322266, 27.125333786010742, -6.361289978027344, 0.4664936065673828, 24.361160278320312, -10.677146911621094], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000305.npy"}
|
||||
{"epoch": 0.46107331821617537, "step": 306, "batch_size": 64, "mean": 8.643621444702148, "std": 13.5547513961792, "min": -29.44121551513672, "p10": -3.407820701599121, "median": 6.755719184875488, "p90": 27.798985290527348, "max": 47.13627624511719, "pos_frac": 0.765625, "sample": [37.900634765625, 17.6099853515625, -0.38712501525878906, 31.994918823242188, 7.5650787353515625, 19.70301055908203, 2.4684295654296875, 7.339286804199219, 7.1824951171875, 18.490341186523438, 26.714271545410156, 21.430381774902344, -17.037559509277344, 7.125972747802734, -3.273845672607422, -29.44121551513672, 25.518142700195312, 3.192291259765625, 13.781742095947266, -1.8614921569824219, 4.630706787109375, 28.26386260986328, 19.25720977783203, 2.7649383544921875, 7.357549667358398, 0.12287139892578125, -3.465238571166992, 26.369361877441406, 1.4873046875, 4.8828277587890625, -9.064834594726562, 28.380149841308594, -3.2550125122070312, 2.3238391876220703, 18.351722717285156, 8.963569641113281, 30.59176254272461, 30.757644653320312, -0.738525390625, -0.8456001281738281, 6.385465621948242, 2.073976516723633, 8.779132843017578, 7.8557281494140625, -11.760519027709961, 2.5570220947265625, -0.8869152069091797, 17.0771484375, 1.8669509887695312, 11.694290161132812, 16.71747589111328, 14.85365104675293, 4.0773162841796875, 1.4043807983398438, 4.712921142578125, 0.3942604064941406, 8.374969482421875, -5.598358154296875, 12.073509216308594, 47.13627624511719, -13.398881912231445, 5.890228271484375, 20.946632385253906, -3.186800003051758], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000306.npy"}
|
||||
{"epoch": 0.46258503401360546, "step": 307, "batch_size": 64, "mean": 6.000943660736084, "std": 16.402250289916992, "min": -28.603805541992188, "p10": -13.746716690063476, "median": 4.009729385375977, "p90": 30.156651306152344, "max": 39.52748107910156, "pos_frac": 0.5625, "sample": [23.124855041503906, -6.77606201171875, 4.411243438720703, -7.886299133300781, -28.603805541992188, 15.224525451660156, 39.52748107910156, -5.914558410644531, 28.74969482421875, 30.357627868652344, -6.1396942138671875, 9.836370468139648, 6.4589996337890625, 32.510284423828125, -21.92571258544922, 15.833251953125, 4.4634246826171875, -0.7408599853515625, 19.483673095703125, -9.74334716796875, 10.581100463867188, -17.351839065551758, -10.222885131835938, -15.881027221679688, 5.236621856689453, -15.933601379394531, -11.261672973632812, 14.394416809082031, 26.174880981445312, -1.6571025848388672, -5.564964294433594, 12.091268539428711, -0.7603492736816406, -10.756908416748047, -14.930557250976562, 5.023284912109375, -5.543495178222656, 3.2254714965820312, 15.748506546020508, -1.0138397216796875, 2.30230712890625, -8.886714935302734, 3.60821533203125, 34.03010559082031, 31.046031951904297, -6.337791442871094, 6.522859573364258, 29.687705993652344, 34.30265808105469, 18.571182250976562, -13.606048583984375, 21.63312530517578, 28.54998016357422, -0.2855262756347656, 18.361663818359375, 13.977230072021484, 0.46662139892578125, 33.92420196533203, -9.304519653320312, -3.6697616577148438, -13.807003021240234, 14.816801071166992, 26.959754943847656, -2.6511001586914062], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000307.npy"}
|
||||
{"epoch": 0.46409674981103555, "step": 308, "batch_size": 64, "mean": 9.75979232788086, "std": 15.023006439208984, "min": -29.866561889648438, "p10": -7.882039070129394, "median": 8.86411190032959, "p90": 28.47253513336182, "max": 36.63555908203125, "pos_frac": 0.765625, "sample": [28.8455867767334, 16.881141662597656, 16.34228515625, -7.346853256225586, 26.465255737304688, 18.299240112304688, -3.7411231994628906, 16.397994995117188, 9.966949462890625, -0.3741912841796875, 36.63555908203125, 21.595001220703125, 36.25115966796875, 8.448770523071289, 1.3880767822265625, -15.670173645019531, 5.672874450683594, 24.700759887695312, 20.876571655273438, 10.58652114868164, 17.9737548828125, -8.111404418945312, 34.198265075683594, 20.637374877929688, 4.636955261230469, 25.938087463378906, -0.8669338226318359, 18.087459564208984, 10.715499877929688, -19.20580291748047, 32.82721710205078, 2.2193832397460938, 9.27945327758789, -6.544696807861328, 23.88275146484375, 0.7988128662109375, 21.172523498535156, 2.1504898071289062, 2.2964706420898438, -29.866561889648438, 4.2064056396484375, 5.489818572998047, 17.443283081054688, 31.395877838134766, 4.859001159667969, 10.022430419921875, 15.126487731933594, -2.517364501953125, 24.163450241088867, 0.3647499084472656, 27.100839614868164, 3.229736328125, -3.17779541015625, 27.602081298828125, 25.020004272460938, 0.110504150390625, 33.17720031738281, -20.001543045043945, -11.130022048950195, 8.173284530639648, 0.8828830718994141, -16.841354370117188, 5.5093231201171875, -0.023080825805664062], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000308.npy"}
|
||||
{"epoch": 0.4656084656084656, "step": 309, "batch_size": 64, "mean": 9.916337966918945, "std": 14.294142723083496, "min": -19.76202392578125, "p10": -6.419715118408202, "median": 8.298563003540039, "p90": 28.86412792205811, "max": 42.74135971069336, "pos_frac": 0.734375, "sample": [16.72863006591797, 42.74135971069336, 3.74951171875, 28.254384994506836, 17.702550888061523, -0.8556289672851562, 24.613632202148438, -7.04296875, 7.8847198486328125, 6.8692779541015625, 13.25030517578125, -0.36034584045410156, -12.753475189208984, -3.156312942504883, 7.773414611816406, -19.60047149658203, 9.024358749389648, 26.738243103027344, -6.626136779785156, 0.5917530059814453, 1.6917991638183594, 27.343387603759766, 29.163450241088867, -4.824880599975586, 20.53845977783203, 15.674995422363281, -2.9389896392822266, 13.23145866394043, -2.5444259643554688, 0.8206939697265625, 2.868528366088867, 2.5737457275390625, -5.9380645751953125, 18.36730194091797, 16.956405639648438, 9.877307891845703, 7.8968048095703125, 8.700321197509766, -0.6700305938720703, 23.004852294921875, 0.9884033203125, 28.221435546875, 29.125446319580078, 13.537628173828125, 35.68586730957031, -19.76202392578125, 24.804039001464844, 18.666263580322266, 2.9082393646240234, 17.620948791503906, -5.381645202636719, 2.8234024047851562, 12.529674530029297, 13.901212692260742, 40.586814880371094, 4.532794952392578, 18.334184646606445, 29.641489028930664, -7.6910552978515625, -14.601837158203125, -0.897552490234375, 1.1639328002929688, 14.900819778442383, 35.68717956542969], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000309.npy"}
|
||||
{"epoch": 0.4671201814058957, "step": 310, "batch_size": 64, "mean": 6.087133407592773, "std": 14.942623138427734, "min": -19.202835083007812, "p10": -12.803611946105956, "median": 5.628020286560059, "p90": 26.69007205963135, "max": 38.304779052734375, "pos_frac": 0.59375, "sample": [27.912033081054688, 0.8660545349121094, 12.331844329833984, 8.118148803710938, 10.752944946289062, -15.09820556640625, -10.537216186523438, 23.336883544921875, 20.166534423828125, -0.6942539215087891, -10.802289962768555, -12.779909133911133, 15.75649642944336, 38.304779052734375, 9.961212158203125, -1.5278663635253906, -0.30486488342285156, 14.004928588867188, 37.3992919921875, -7.915733337402344, 0.4596443176269531, -0.6511554718017578, -12.813770294189453, 38.10292053222656, -8.312843322753906, -6.446229934692383, 12.945442199707031, -7.664028167724609, -0.023223876953125, 27.149824142456055, -16.24687957763672, 20.44384002685547, 7.7569732666015625, 19.874771118164062, -9.065048217773438, 15.335784912109375, 6.081787109375, 13.69866943359375, 14.29193115234375, 16.68157196044922, -0.5860061645507812, 11.177291870117188, -16.089797973632812, -19.202835083007812, 17.97661590576172, 5.2742767333984375, 0.735565185546875, 16.83462142944336, -3.3020553588867188, -8.5428466796875, 3.227752685546875, 32.02940368652344, 22.43598747253418, -14.747978210449219, 0.6839599609375, -3.240304946899414, -9.631393432617188, 29.78036117553711, 18.57245445251465, -10.949287414550781, -14.050529479980469, 25.61731719970703, 5.98176383972168, 8.74139404296875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000310.npy"}
|
||||
{"epoch": 0.46863189720332576, "step": 311, "batch_size": 64, "mean": 8.457317352294922, "std": 14.7326021194458, "min": -34.85093688964844, "p10": -7.917704010009765, "median": 6.619621276855469, "p90": 24.19523162841797, "max": 50.9013671875, "pos_frac": 0.75, "sample": [23.135665893554688, -0.1519775390625, 2.7965621948242188, -11.581306457519531, -9.821975708007812, -5.363298416137695, 7.522426605224609, 30.85491180419922, 2.1426029205322266, 24.53704833984375, 3.594207763671875, -34.85093688964844, 3.472219467163086, 23.397659301757812, 15.422714233398438, -0.1931610107421875, 15.881301879882812, 2.0387649536132812, 20.85772705078125, 11.636711120605469, 1.5439414978027344, 12.689987182617188, 1.7066726684570312, 8.762855529785156, 23.003028869628906, 12.278301239013672, -8.70709228515625, -4.3187713623046875, 0.715423583984375, 20.97088623046875, -7.3851470947265625, 34.193023681640625, 46.31761169433594, 8.624263763427734, 5.773185729980469, 19.638917922973633, 4.401004791259766, 11.804584503173828, 3.246307373046875, 12.290557861328125, 50.9013671875, 1.5712127685546875, -6.163475036621094, 29.162872314453125, -12.646455764770508, 5.159210205078125, 2.744140625, -6.840030670166016, 8.996963500976562, 7.466056823730469, 38.273704528808594, 13.882080078125, 5.762451171875, -15.539054870605469, 12.658981323242188, 22.341524124145508, 0.9999599456787109, -8.145942687988281, -2.5908966064453125, 7.8527069091796875, -0.344970703125, 12.991806030273438, 21.855484008789062, 18.041244506835938], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000311.npy"}
|
||||
{"epoch": 0.47014361300075586, "step": 312, "batch_size": 64, "mean": 11.876441955566406, "std": 14.096991539001465, "min": -27.13524627685547, "p10": -1.039236068725586, "median": 10.2628173828125, "p90": 33.290541458129894, "max": 43.75041198730469, "pos_frac": 0.84375, "sample": [7.208946228027344, 1.6772918701171875, -1.0429840087890625, 5.5659942626953125, 13.746685028076172, 10.245697021484375, 12.33023452758789, -1.0104732513427734, 7.271644592285156, 1.4891204833984375, 40.707672119140625, 22.441452026367188, 21.75079345703125, 17.443811416625977, 0.552032470703125, 24.18720245361328, 5.067718505859375, 0.3710956573486328, 3.3627567291259766, 17.746246337890625, 35.56584930419922, 4.653602600097656, -3.8202743530273438, 3.4457244873046875, 10.455490112304688, 36.155792236328125, 7.495052337646484, -1.2133636474609375, 10.279937744140625, 34.693016052246094, 36.185028076171875, -27.13524627685547, 19.43358612060547, 27.200637817382812, -13.504402160644531, 14.347721099853516, 24.563095092773438, 6.818336486816406, 4.2235565185546875, 28.26641845703125, 43.75041198730469, 1.2325782775878906, 5.45989990234375, 13.594955444335938, 23.034744262695312, 9.206626892089844, 10.200065612792969, 30.01810073852539, 14.849288940429688, -1.0304908752441406, -0.6771011352539062, 20.67320442199707, 36.065399169921875, 14.42138671875, 1.810760498046875, 17.449172973632812, 12.64373779296875, 15.318994522094727, -22.920597076416016, 27.45764923095703, 12.517333984375, -6.561241149902344, 6.345563888549805, 6.009407043457031], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000312.npy"}
|
||||
{"epoch": 0.47165532879818595, "step": 313, "batch_size": 64, "mean": 7.676838397979736, "std": 15.854860305786133, "min": -25.76572036743164, "p10": -12.994447135925292, "median": 5.31838321685791, "p90": 30.50884265899659, "max": 47.04554748535156, "pos_frac": 0.6875, "sample": [2.231708526611328, 14.009117126464844, 22.36517906188965, 6.703556060791016, 6.110877990722656, 13.624601364135742, 3.9580307006835938, 4.575990676879883, 31.219642639160156, -17.826663970947266, 0.4756336212158203, 22.3565616607666, 3.9242935180664062, 10.06365966796875, 7.2002410888671875, -0.5211257934570312, 1.3423080444335938, -3.064971923828125, 6.0607757568359375, -0.7811508178710938, 14.708992004394531, 18.387062072753906, 0.007814407348632812, 41.11097717285156, -25.76572036743164, -13.393159866333008, -2.2638702392578125, -0.557708740234375, 18.341461181640625, 9.816947937011719, 40.158355712890625, 33.232547760009766, -14.072288513183594, -6.16102409362793, 34.11009216308594, 20.84136199951172, 1.3837013244628906, -1.4629249572753906, 0.5927734375, 38.01287078857422, 11.027111053466797, 4.328525543212891, 12.072799682617188, -2.6859512329101562, 47.04554748535156, 1.986846923828125, -1.6010513305664062, -2.3802032470703125, 7.471832275390625, 8.798561096191406, 24.113143920898438, 13.793601989746094, 13.58102035522461, -0.1842365264892578, -3.7672176361083984, -21.399215698242188, 23.013511657714844, -22.935894012451172, 28.850309371948242, -12.064117431640625, -18.199474334716797, 7.498992919921875, 3.5443458557128906, 28.352327346801758], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000313.npy"}
|
||||
{"epoch": 0.47316704459561604, "step": 314, "batch_size": 64, "mean": 12.927923202514648, "std": 14.607532501220703, "min": -14.1593017578125, "p10": -2.4531738281249997, "median": 8.664783477783203, "p90": 34.5379997253418, "max": 45.896827697753906, "pos_frac": 0.8125, "sample": [11.664024353027344, 2.5780372619628906, 6.612709045410156, 39.97575378417969, 18.759658813476562, 24.506942749023438, 5.9364166259765625, 16.38123321533203, -6.446098327636719, 16.38300895690918, 2.9998550415039062, -1.9883289337158203, 43.08587646484375, 5.4822845458984375, -1.4828243255615234, 42.688873291015625, 10.344930648803711, 8.739387512207031, 29.483291625976562, 1.1716690063476562, 4.146171569824219, -0.35419464111328125, 29.534170150756836, 32.64752960205078, 36.85034942626953, -14.1593017578125, 13.9840087890625, 31.716079711914062, 6.7963714599609375, 5.3745880126953125, 26.40576934814453, 7.2928619384765625, 11.037033081054688, 3.1985855102539062, 1.2976036071777344, 13.49703598022461, 19.251922607421875, -3.1373443603515625, 2.9234867095947266, 19.72026824951172, 1.4456157684326172, -2.0083465576171875, 32.50947189331055, 6.168697357177734, 25.009845733642578, 8.590179443359375, 23.8193359375, 45.896827697753906, -1.5466690063476562, 37.07563018798828, 5.7773590087890625, 34.60618591308594, 13.585723876953125, 7.132938385009766, 20.827476501464844, 7.1495208740234375, -6.45025634765625, 12.680885314941406, 7.491275787353516, -2.6438140869140625, 11.138654708862305, -7.570505142211914, -12.577552795410156, 34.37889862060547], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000314.npy"}
|
||||
{"epoch": 0.47467876039304613, "step": 315, "batch_size": 64, "mean": 9.904523849487305, "std": 14.219016075134277, "min": -28.721012115478516, "p10": -5.846569061279294, "median": 10.040014266967773, "p90": 30.05413856506348, "max": 39.098854064941406, "pos_frac": 0.765625, "sample": [-3.3672332763671875, 14.05950927734375, 30.426862716674805, 4.573751449584961, -0.4081096649169922, 12.403135299682617, -28.721012115478516, -0.9732170104980469, -14.728229522705078, 3.7952651977539062, -16.062271118164062, 18.71395492553711, 4.372699737548828, 30.373809814453125, 0.16173934936523438, 16.409469604492188, 8.549995422363281, -1.9187393188476562, 1.1961898803710938, -1.212667465209961, 11.295478820800781, 10.356887817382812, 1.3882217407226562, 7.3382415771484375, 6.07672119140625, 23.34756088256836, 28.776947021484375, -6.909141540527344, 9.723140716552734, 12.330923080444336, 12.559921264648438, 11.444061279296875, 32.273582458496094, 7.432929992675781, 11.381591796875, 26.667999267578125, 25.394073486328125, 1.3682708740234375, -15.938446044921875, 11.701004028320312, 22.627342224121094, -2.260406494140625, 19.981170654296875, 29.308238983154297, -2.6556777954101562, 11.387611389160156, 20.92910385131836, 39.098854064941406, -15.447734832763672, 8.120819091796875, 15.201347351074219, 1.8101577758789062, 18.7490234375, -0.06136894226074219, 17.48419952392578, 6.334434509277344, 25.71983528137207, 11.512039184570312, 34.20687484741211, -12.631185531616211, 34.351722717285156, 1.6977481842041016, 34.74040222167969, 8.030059814453125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000315.npy"}
|
||||
{"epoch": 0.47619047619047616, "step": 316, "batch_size": 64, "mean": 12.683357238769531, "std": 13.87105655670166, "min": -17.90592384338379, "p10": -4.459375, "median": 12.664280891418457, "p90": 31.94597320556641, "max": 37.60041809082031, "pos_frac": 0.75, "sample": [-8.234039306640625, 7.306556701660156, 23.282577514648438, 0.638763427734375, -11.548561096191406, -0.020843505859375, 15.4803466796875, 16.47730255126953, 8.791671752929688, 27.834228515625, 30.716156005859375, 25.0059814453125, 26.606929779052734, 28.596939086914062, 11.182861328125, -3.3813323974609375, -17.90592384338379, 2.3580360412597656, -6.305793762207031, 24.276456832885742, 12.550127029418945, 37.60041809082031, 3.1963424682617188, 7.577857971191406, 14.646728515625, 12.778434753417969, 16.732696533203125, -2.8272171020507812, -3.0666122436523438, 11.266510009765625, 27.69156837463379, 32.47303771972656, 23.447383880615234, 37.059967041015625, -2.848176956176758, 10.384002685546875, -4.66851806640625, 21.634910583496094, 22.569000244140625, 13.998889923095703, 33.16070556640625, 17.220664978027344, 18.153423309326172, -1.3711490631103516, 26.095794677734375, 0.8001441955566406, -0.30359840393066406, 33.54119873046875, -7.2574310302734375, 27.43708038330078, -3.97137451171875, -5.3997955322265625, 2.1651153564453125, -3.338542938232422, 7.9913330078125, 12.404861450195312, 2.1968345642089844, 35.67210006713867, 14.552108764648438, 19.090248107910156, 25.976455688476562, 10.53363037109375, 33.67879104614258, 19.350601196289062], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000316.npy"}
|
||||
{"epoch": 0.47770219198790626, "step": 317, "batch_size": 64, "mean": 13.763182640075684, "std": 14.52507209777832, "min": -18.797882080078125, "p10": -3.4186355590820305, "median": 12.102127075195312, "p90": 32.39192352294922, "max": 41.966064453125, "pos_frac": 0.8125, "sample": [24.944061279296875, 26.672344207763672, 15.461002349853516, 16.850830078125, 12.322097778320312, 41.966064453125, -3.7988014221191406, 13.0361328125, -0.8854503631591797, 18.463096618652344, 1.9874267578125, 9.399486541748047, -6.335136413574219, -0.555450439453125, 17.47626495361328, 5.044639587402344, 32.42997741699219, 6.26129150390625, 31.897605895996094, 32.303131103515625, -6.629688262939453, 4.381011962890625, 8.937034606933594, -7.8190460205078125, 20.85421371459961, 10.400367736816406, 23.853797912597656, 24.577991485595703, 27.32828140258789, -17.55780029296875, 41.57469177246094, 6.1179046630859375, 15.940359115600586, 40.24751281738281, 35.08528137207031, -18.797882080078125, 11.882156372070312, -2.5315818786621094, 18.283950805664062, 36.27936553955078, 4.163524627685547, 14.690444946289062, 38.067283630371094, -6.967323303222656, 3.540771484375, 3.7244300842285156, 8.843040466308594, 25.661163330078125, 4.823661804199219, 2.643146514892578, 20.89752197265625, 13.433116912841797, 9.088415145874023, 29.258522033691406, -0.8852767944335938, 11.542694091796875, -2.499725341796875, 22.889007568359375, 30.338594436645508, 30.475807189941406, 1.3867416381835938, 31.119155883789062, 7.15423583984375, 10.106208801269531], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000317.npy"}
|
||||
{"epoch": 0.47921390778533635, "step": 318, "batch_size": 64, "mean": 8.648831367492676, "std": 18.59535789489746, "min": -29.10204315185547, "p10": -14.985969734191894, "median": 6.98419189453125, "p90": 34.48647232055665, "max": 45.67961120605469, "pos_frac": 0.671875, "sample": [14.115116119384766, 30.54180908203125, 3.200439453125, -15.99163818359375, 31.782684326171875, 28.09156036376953, 7.413421630859375, -15.308837890625, 12.256994247436523, 42.463134765625, 0.83935546875, -18.420066833496094, 18.879863739013672, 21.384124755859375, 35.24919128417969, 1.3749351501464844, 6.554962158203125, -14.232610702514648, 14.13241958618164, 0.7207851409912109, -5.447772979736328, 21.576828002929688, 35.52881622314453, -10.106910705566406, 0.9612045288085938, 17.790485382080078, 37.89906311035156, -28.168258666992188, 20.28961181640625, 22.64392852783203, -5.429779052734375, 15.128448486328125, -12.797492980957031, 6.289196014404297, -1.3955230712890625, 32.48577880859375, 45.67961120605469, -10.403518676757812, 32.70679473876953, -1.4492645263671875, -9.751541137695312, 6.1255035400390625, 22.595664978027344, -29.10204315185547, -0.45105743408203125, 14.093521118164062, 4.9561767578125, 36.67918395996094, 0.9534435272216797, 29.774188995361328, 1.0407428741455078, 38.8905029296875, 11.061464309692383, 15.618865966796875, -23.201522827148438, -3.3832168579101562, 20.4146728515625, -22.316791534423828, 9.145843505859375, -1.2275161743164062, 20.28206443786621, -12.587751388549805, -14.164989471435547, 19.250932693481445], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000318.npy"}
|
||||
{"epoch": 0.48072562358276644, "step": 319, "batch_size": 64, "mean": 11.364680290222168, "std": 16.690025329589844, "min": -20.86465835571289, "p10": -9.168967056274413, "median": 11.136781692504883, "p90": 33.660667800903326, "max": 49.368133544921875, "pos_frac": 0.703125, "sample": [13.829092025756836, 36.63398742675781, 5.725135803222656, 23.50872802734375, 7.53363037109375, 28.259613037109375, 4.7481231689453125, 16.822662353515625, 10.93475341796875, 17.206634521484375, 29.363845825195312, 30.461212158203125, -1.0553855895996094, -0.6633682250976562, 5.94256591796875, 16.269500732421875, -1.3604583740234375, 49.368133544921875, 1.841278076171875, 20.8724365234375, 4.636363983154297, -8.366863250732422, 13.340301513671875, 21.313968658447266, 41.15357971191406, -2.0515213012695312, 36.175758361816406, 11.338809967041016, 34.01145553588867, 32.8421630859375, -9.512725830078125, -15.806777954101562, 12.432945251464844, 48.619140625, -6.19622802734375, 15.646356582641602, -15.664764404296875, 19.713783264160156, 21.172325134277344, -11.701675415039062, 26.956832885742188, 4.761100769042969, -1.8547134399414062, -0.5968589782714844, -0.92962646484375, 0.5575675964355469, -17.809890747070312, -1.4330825805664062, -20.771833419799805, 30.102611541748047, -0.2709674835205078, 7.188835144042969, 2.0440216064453125, -3.9654312133789062, 12.102149963378906, 27.505859375, 23.37318229675293, 8.787622451782227, 19.689172744750977, 5.030328750610352, 11.676651000976562, -20.86465835571289, 41.97024154663086, 14.75189208984375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000319.npy"}
|
||||
{"epoch": 0.48223733938019653, "step": 320, "batch_size": 64, "mean": 9.29432201385498, "std": 16.604209899902344, "min": -31.147476196289062, "p10": -8.763373565673827, "median": 4.271021842956543, "p90": 33.40007019042969, "max": 49.170654296875, "pos_frac": 0.6875, "sample": [30.38117790222168, -11.460865020751953, 0.24088096618652344, 5.517658233642578, 3.8011703491210938, 6.172441482543945, 34.827388763427734, -0.4634265899658203, 22.439697265625, -0.48473358154296875, -10.390861511230469, 28.881193161010742, 0.8832130432128906, 0.19128799438476562, -8.904953002929688, 21.54718017578125, 18.16339111328125, -5.420967102050781, -2.841266632080078, 25.15166473388672, -9.187429428100586, 10.687030792236328, -31.147476196289062, 0.468231201171875, -0.577239990234375, 49.170654296875, 48.92652893066406, 33.677818298339844, 12.278011322021484, -8.929000854492188, 9.622352600097656, -8.07763671875, 6.892822265625, 1.91925048828125, -0.0067691802978515625, 37.761871337890625, 3.136260986328125, -6.0564422607421875, -5.883502960205078, -3.7044105529785156, 17.711688995361328, 38.95822525024414, 2.582202911376953, 1.8850784301757812, 1.5291366577148438, 4.740873336791992, 42.88890075683594, 14.782279968261719, 0.2509918212890625, -5.802244186401367, 1.1807022094726562, -9.194293975830078, 32.751991271972656, -3.2017440795898438, 6.353191375732422, 12.532272338867188, 10.709152221679688, 29.651668548583984, 21.990493774414062, 10.749618530273438, 17.52113151550293, -8.433021545410156, 30.660049438476562, 22.836084365844727], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000320.npy"}
|
||||
{"epoch": 0.4837490551776266, "step": 321, "batch_size": 64, "mean": 11.117124557495117, "std": 16.372987747192383, "min": -26.446670532226562, "p10": -10.40648193359375, "median": 10.11574935913086, "p90": 32.442766189575195, "max": 41.360233306884766, "pos_frac": 0.765625, "sample": [21.502437591552734, 21.749637603759766, 1.260885238647461, 21.907196044921875, 20.42827606201172, -3.2208480834960938, 41.360233306884766, 19.488588333129883, 33.2643928527832, 5.6351318359375, 36.40686798095703, 8.104911804199219, 13.211856842041016, 23.275341033935547, -24.018653869628906, -0.16122817993164062, 8.528770446777344, 15.319007873535156, 10.923530578613281, 27.14483642578125, 14.533432006835938, 2.803680419921875, -25.060165405273438, -3.0502777099609375, 9.307968139648438, -14.692264556884766, -15.286230087280273, -16.59969139099121, 15.491228103637695, 21.319107055664062, 3.6415481567382812, 18.87680435180664, 27.339736938476562, 21.37673568725586, -0.23215675354003906, 24.341903686523438, 11.649642944335938, -2.336568832397461, 7.5793609619140625, 30.33527374267578, 1.6954174041748047, -1.3356895446777344, 4.072551727294922, 31.769332885742188, 0.9245452880859375, -26.446670532226562, 31.305526733398438, 37.70170593261719, 38.49636459350586, 20.551136016845703, 3.9035415649414062, 5.545656204223633, 3.6860809326171875, 32.731380462646484, 31.661636352539062, 4.303489685058594, 5.480588912963867, 12.085006713867188, 40.91824722290039, -10.571670532226562, 6.75384521484375, -1.2697601318359375, 14.104469299316406, -10.021041870117188], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000321.npy"}
|
||||
{"epoch": 0.4852607709750567, "step": 322, "batch_size": 64, "mean": 14.153627395629883, "std": 14.925612449645996, "min": -24.64120101928711, "p10": 0.38598632812500006, "median": 11.9202241897583, "p90": 35.140144729614256, "max": 41.80540466308594, "pos_frac": 0.90625, "sample": [-0.9633445739746094, 22.725814819335938, 37.660316467285156, 4.1846923828125, 26.533161163330078, 3.3496360778808594, 30.526199340820312, 20.30228614807129, 29.127483367919922, 11.948019027709961, 8.459915161132812, 11.168754577636719, 28.754837036132812, 40.92032241821289, 11.593719482421875, 3.806774139404297, 9.671852111816406, 15.002456665039062, 1.3127937316894531, 1.1204833984375, 16.944313049316406, 4.742759704589844, -16.673059463500977, 37.05519104003906, 35.256168365478516, 9.735221862792969, 10.905410766601562, 13.008241653442383, 0.3642768859863281, 9.500137329101562, -6.1646270751953125, 18.356231689453125, 18.298065185546875, 30.280731201171875, 27.41196060180664, 11.89242935180664, 2.8008270263671875, 40.974449157714844, 2.5977783203125, 0.8000984191894531, 20.042449951171875, 19.840377807617188, 0.4366416931152344, 34.869422912597656, 12.11114501953125, 41.80540466308594, 33.39403533935547, 18.155216217041016, -14.938289642333984, -24.64120101928711, 10.50705337524414, 5.998504638671875, 8.19826889038086, 19.21743392944336, 33.68114471435547, 7.52386474609375, 15.917594909667969, 3.40478515625, 12.357666015625, -13.804630279541016, 41.706024169921875, 1.4216156005859375, 25.494604110717773, 7.840211868286133], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000322.npy"}
|
||||
{"epoch": 0.48677248677248675, "step": 323, "batch_size": 64, "mean": 9.853229522705078, "std": 19.461956024169922, "min": -34.352745056152344, "p10": -12.253491210937495, "median": 4.888976097106934, "p90": 39.85063781738281, "max": 55.8505859375, "pos_frac": 0.6875, "sample": [2.1291046142578125, -19.45599365234375, 18.010940551757812, 0.7908248901367188, -1.9113845825195312, 24.377548217773438, 8.027389526367188, 5.490423202514648, 8.229278564453125, 39.543212890625, 14.311927795410156, -5.778863906860352, 39.982391357421875, 7.788333892822266, 35.85479736328125, -7.061206817626953, -19.225265502929688, 23.641082763671875, 28.89177703857422, 3.318389892578125, 10.178146362304688, 29.593841552734375, 1.2013702392578125, 3.0283737182617188, 47.122703552246094, 4.007896423339844, -17.1165771484375, 19.995393753051758, 0.7495098114013672, 3.902618408203125, 55.8505859375, -5.496063232421875, -0.66949462890625, -14.478755950927734, 31.706832885742188, 4.287528991699219, -16.593353271484375, 25.171295166015625, 17.385879516601562, 0.9032154083251953, 6.533866882324219, 14.211299896240234, -2.769439697265625, -1.0497779846191406, 46.747535705566406, 51.78462600708008, 44.80731964111328, -31.095108032226562, 41.587860107421875, -0.64971923828125, -34.352745056152344, -0.9546279907226562, 19.370243072509766, 3.903106689453125, 6.29949951171875, 24.324880599975586, 21.547775268554688, 15.853713989257812, -4.8344268798828125, -4.879404067993164, 8.84225845336914, 2.1365814208984375, -1.2923812866210938, -3.151885986328125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000323.npy"}
|
||||
{"epoch": 0.48828420256991684, "step": 324, "batch_size": 64, "mean": 11.880046844482422, "std": 14.896973609924316, "min": -22.228294372558594, "p10": -4.615657043457031, "median": 9.96143913269043, "p90": 30.731793975830083, "max": 43.954803466796875, "pos_frac": 0.8125, "sample": [22.244232177734375, 16.543777465820312, 13.017318725585938, 1.6439056396484375, -20.529834747314453, 39.01542663574219, 15.864774703979492, 9.395584106445312, -1.6070671081542969, -0.035736083984375, 15.95220947265625, 28.412851333618164, 16.506912231445312, 11.118366241455078, 26.19225311279297, 1.659708023071289, 27.15625, -7.9840850830078125, 18.11615753173828, 0.0667572021484375, 18.404266357421875, -4.71875, 2.605487823486328, 27.194507598876953, 9.235477447509766, 5.860439300537109, -6.195167541503906, 4.840583801269531, 43.954803466796875, 5.920158386230469, 0.060428619384765625, -22.228294372558594, 18.25742530822754, 29.26142120361328, 6.275726318359375, 10.664546966552734, 10.221683502197266, -7.39788818359375, 19.546329498291016, 20.43169403076172, 17.71042251586914, -17.362777709960938, 38.199310302734375, -0.16616439819335938, 42.44263458251953, 3.6952781677246094, 0.5514678955078125, 25.855361938476562, 16.136096954345703, 32.203372955322266, 43.79266357421875, 1.4347648620605469, 2.0530166625976562, 21.199134826660156, 31.361953735351562, 5.4897003173828125, 9.701194763183594, -4.3751068115234375, 1.1731948852539062, 8.739322662353516, 26.77739715576172, 21.47589874267578, 8.262687683105469, -0.9724922180175781], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000324.npy"}
|
||||
{"epoch": 0.4897959183673469, "step": 325, "batch_size": 64, "mean": 11.787601470947266, "std": 16.697887420654297, "min": -41.48529052734375, "p10": -7.536046600341793, "median": 10.151716232299805, "p90": 35.34537506103517, "max": 51.130332946777344, "pos_frac": 0.75, "sample": [9.631893157958984, 44.05165100097656, 2.3593387603759766, 25.525869369506836, -0.8103179931640625, 18.108497619628906, 28.94082260131836, 19.997543334960938, 8.3568115234375, 14.387283325195312, 18.619308471679688, 23.773550033569336, -0.7559623718261719, -10.458694458007812, 13.57568359375, -1.803213119506836, 36.93937683105469, 19.925811767578125, 17.033329010009766, -15.519817352294922, 51.130332946777344, -4.196098327636719, 27.632123947143555, 0.6518898010253906, -0.07190704345703125, 42.7503662109375, 3.4014930725097656, 23.138717651367188, -12.245136260986328, 13.842622756958008, -8.967453002929688, 9.637432098388672, 19.102622985839844, 8.424957275390625, -10.225669860839844, 23.590240478515625, -10.233814239501953, 31.62603759765625, 14.255609512329102, 18.70654296875, 18.24786376953125, 1.6452045440673828, 3.204010009765625, 37.969818115234375, -41.48529052734375, 3.9652557373046875, 4.149360656738281, 8.15753173828125, 38.529571533203125, 20.352018356323242, 47.768531799316406, -2.7366256713867188, 12.10975456237793, -3.5178680419921875, 8.526039123535156, -0.35933685302734375, 10.759658813476562, 18.882568359375, 30.413894653320312, 8.977073669433594, 10.666000366210938, 0.6068191528320312, 6.016365051269531, -2.2714080810546875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000325.npy"}
|
||||
{"epoch": 0.491307634164777, "step": 326, "batch_size": 64, "mean": 6.6530256271362305, "std": 16.852689743041992, "min": -26.06915283203125, "p10": -14.35602912902832, "median": 5.2931060791015625, "p90": 30.18198261260987, "max": 40.267940521240234, "pos_frac": 0.59375, "sample": [-13.421897888183594, -3.9832420349121094, 10.03390884399414, 6.487247467041016, 16.84490966796875, -21.916580200195312, 38.45213317871094, 19.671600341796875, 0.5354461669921875, 30.877822875976562, -2.6827468872070312, 16.332576751708984, 5.337615966796875, -3.1137466430664062, 8.437631607055664, 21.35846710205078, 1.2668304443359375, 14.338638305664062, -2.545032501220703, 14.174560546875, 27.566490173339844, 1.82830810546875, 9.682937622070312, 23.874267578125, -17.112937927246094, -1.5100250244140625, -26.06915283203125, -7.0007476806640625, -1.30206298828125, -3.289459228515625, 40.267940521240234, -25.252037048339844, 11.831047058105469, -3.865009307861328, 16.439292907714844, 4.731693267822266, 3.6569290161132812, -3.0744781494140625, -13.033592224121094, -7.8177032470703125, 28.5583553314209, 38.94091033935547, -14.557304382324219, -0.9927291870117188, 32.32307052612305, 13.335647583007812, 19.507946014404297, -20.943788528442383, 19.852514266967773, 40.107704162597656, -1.4585838317871094, -2.2781124114990234, -9.265533447265625, 31.42920684814453, -0.4014892578125, 8.827659606933594, -25.38262367248535, 13.184799194335938, 19.12295150756836, 5.24859619140625, 25.12688636779785, -13.88638687133789, 9.032180786132812, 23.32390594482422], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000326.npy"}
|
||||
{"epoch": 0.4928193499622071, "step": 327, "batch_size": 64, "mean": 9.364799499511719, "std": 19.457340240478516, "min": -40.421356201171875, "p10": -14.342738723754879, "median": 6.193265914916992, "p90": 37.43193283081055, "max": 54.3450927734375, "pos_frac": 0.6875, "sample": [11.477577209472656, -5.906837463378906, -7.685005187988281, 34.76939392089844, -0.219207763671875, 11.83050537109375, -24.43827247619629, 18.798412322998047, -22.507673263549805, 20.63278579711914, 38.24263000488281, 6.250617980957031, -17.149627685546875, 35.540306091308594, 13.627571105957031, 33.43486785888672, 31.45501708984375, 1.1118736267089844, 0.796051025390625, -10.345706939697266, 1.8796005249023438, 6.846111297607422, 20.866899490356445, 42.90321350097656, 39.197357177734375, 0.803558349609375, 5.6246795654296875, 20.444713592529297, 19.893272399902344, -10.422683715820312, 17.141071319580078, 4.571117401123047, -26.929534912109375, 28.95471954345703, 20.971603393554688, -0.7557601928710938, 11.770523071289062, 5.6013336181640625, -2.3191986083984375, -4.038652420043945, 42.14634704589844, 4.430595397949219, -40.421356201171875, 0.5168304443359375, 22.085975646972656, 54.3450927734375, -5.906379699707031, 24.978744506835938, 45.31951141357422, 7.131744384765625, 40.446632385253906, -0.7842235565185547, 6.113101959228516, 6.135913848876953, -16.022762298583984, -5.90904426574707, 29.961524963378906, 20.19628143310547, 7.8784942626953125, -1.4685783386230469, 0.3207550048828125, -21.52328109741211, 10.502685546875, -3.8466796875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000327.npy"}
|
||||
{"epoch": 0.4943310657596372, "step": 328, "batch_size": 64, "mean": 11.136017799377441, "std": 16.075891494750977, "min": -23.805137634277344, "p10": -6.123160934448241, "median": 7.183849334716797, "p90": 36.14384193420412, "max": 48.22650146484375, "pos_frac": 0.75, "sample": [41.91791534423828, 2.3045482635498047, 4.408164978027344, 8.760177612304688, 28.35177993774414, 1.6560020446777344, -6.574031829833984, 27.374065399169922, 19.29096221923828, 7.413002014160156, -3.321929931640625, 27.69268798828125, 37.77070617675781, 8.338367462158203, 6.9409637451171875, -11.80230712890625, 39.2487678527832, 11.414018630981445, 6.9546966552734375, -5.071128845214844, 5.7965087890625, -3.455718994140625, 0.3939189910888672, 16.941516876220703, -4.59130859375, -6.835906982421875, 44.214508056640625, 5.075538635253906, -16.696731567382812, 15.807487487792969, -23.805137634277344, 23.664918899536133, 24.90303611755371, 6.5152435302734375, 29.62890625, -3.6310882568359375, 1.71527099609375, 2.42254638671875, 48.22650146484375, 2.0021286010742188, 28.256515502929688, 23.764999389648438, -1.8899669647216797, -13.17094612121582, 8.276290893554688, 28.802879333496094, -0.07559776306152344, 1.0150146484375, 1.2742156982421875, 5.9160003662109375, 14.711820602416992, -9.181243896484375, 23.1243896484375, 9.395660400390625, -1.6410598754882812, 16.660106658935547, 24.20081329345703, 2.860136032104492, 32.45056915283203, 8.352088928222656, 12.06974983215332, -0.16399002075195312, 38.61043930053711, 37.7266731262207], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000328.npy"}
|
||||
{"epoch": 0.4958427815570673, "step": 329, "batch_size": 64, "mean": 12.474262237548828, "std": 16.84587287902832, "min": -16.468360900878906, "p10": -7.937754631042479, "median": 10.602275848388672, "p90": 37.477965545654314, "max": 50.43700408935547, "pos_frac": 0.71875, "sample": [11.559379577636719, 9.672183990478516, 14.671371459960938, 13.31707763671875, -13.981124877929688, 10.790016174316406, 41.322357177734375, -1.6480789184570312, 23.249969482421875, 13.277214050292969, 30.85155487060547, 12.167991638183594, -5.289606094360352, 23.7613525390625, -0.38934326171875, 6.4561309814453125, 6.311710357666016, 27.482437133789062, -5.0377197265625, 11.918777465820312, 3.5890045166015625, -0.8704586029052734, 50.43700408935547, 15.289215087890625, -16.468360900878906, 39.63823699951172, 50.353797912597656, 25.473976135253906, 15.653167724609375, 3.7012939453125, 32.43733215332031, 24.91961669921875, -9.080970764160156, 10.124176025390625, 13.638334274291992, 0.201568603515625, 40.453704833984375, -1.812448501586914, 29.75321388244629, 28.396194458007812, 39.88818359375, 27.972305297851562, 4.556560516357422, -14.403060913085938, 2.4823379516601562, 24.196533203125, 0.3178672790527344, 7.176624298095703, 10.414535522460938, 14.27811050415039, -11.644901275634766, 19.7286376953125, 3.2472991943359375, -4.293495178222656, 49.559715270996094, 30.3349609375, -6.063592910766602, 28.499557495117188, -0.6247711181640625, -8.740966796875, 8.028533935546875, -0.05138397216796875, -1.2326641082763672, -11.565383911132812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000329.npy"}
|
||||
{"epoch": 0.4973544973544973, "step": 330, "batch_size": 64, "mean": 13.138544082641602, "std": 20.049528121948242, "min": -41.36100769042969, "p10": -10.590353965759274, "median": 10.249015808105469, "p90": 37.48098793029785, "max": 50.701507568359375, "pos_frac": 0.796875, "sample": [-0.9373092651367188, -4.291044235229492, -4.052446365356445, -32.496124267578125, 11.460189819335938, -41.36100769042969, 31.84412384033203, 5.784553527832031, -7.172111511230469, 35.73213195800781, 32.279541015625, 14.934257507324219, -22.757299423217773, 3.5548019409179688, 14.701427459716797, 37.65618896484375, 39.195945739746094, -19.211273193359375, 50.701507568359375, 37.07218551635742, 32.175045013427734, -1.6079235076904297, 26.01873779296875, 2.7744483947753906, 35.91117858886719, 27.586807250976562, 3.7485427856445312, 0.91058349609375, -7.787799835205078, 3.4076995849609375, 7.594282150268555, 29.52190399169922, 2.0623321533203125, 10.04336929321289, -20.059978485107422, 37.85784149169922, 23.01525115966797, 32.591461181640625, 24.73133087158203, 9.298521041870117, 4.984275817871094, 20.381942749023438, 5.340156555175781, 27.639854431152344, 33.070220947265625, 33.276817321777344, 2.018362045288086, 17.17108917236328, 2.239166259765625, 7.248779296875, 0.27256011962890625, 13.559295654296875, 42.939151763916016, 23.61750030517578, 3.574432373046875, 10.403091430664062, 10.094940185546875, -11.791448593139648, 45.384368896484375, -22.890838623046875, 28.8479061126709, 28.42816925048828, 6.082929611206055, 46.5422248840332], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000330.npy"}
|
||||
{"epoch": 0.4988662131519274, "step": 331, "batch_size": 64, "mean": 17.468751907348633, "std": 20.21311378479004, "min": -41.72123718261719, "p10": -5.216526031494141, "median": 19.419736862182617, "p90": 44.598793792724614, "max": 55.8194580078125, "pos_frac": 0.78125, "sample": [47.414676666259766, 7.234748840332031, 7.572414398193359, 23.235843658447266, 24.524272918701172, 36.3328857421875, 18.068191528320312, 33.09822082519531, 23.28875732421875, 11.49066162109375, -0.08850860595703125, 49.52018737792969, 42.41920471191406, 44.39129638671875, 34.10874938964844, -4.684638977050781, 55.8194580078125, 16.41394805908203, 14.253841400146484, 50.84674835205078, 24.71353530883789, 2.894174575805664, 4.737876892089844, 5.784065246582031, -1.6564750671386719, 41.72675323486328, 22.325674057006836, 53.34388732910156, 2.5552978515625, 22.432479858398438, 23.906326293945312, 14.311014175415039, 39.57978820800781, 29.85552978515625, 3.8477935791015625, -0.5390625, 19.7091064453125, -20.314437866210938, -12.59107780456543, 38.11199188232422, -9.42926025390625, 45.1201171875, 15.21534538269043, -1.5455322265625, 11.803510665893555, -41.72123718261719, 24.65985870361328, -5.34173583984375, -11.321281433105469, 4.312982559204102, -4.924369812011719, 25.268226623535156, 26.089599609375, 5.3980865478515625, 23.16320037841797, 0.07924652099609375, -0.7973833084106445, 32.513206481933594, 22.06452178955078, -19.196666717529297, 37.29468536376953, 19.130367279052734, 25.481765747070312, 44.687721252441406], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000331.npy"}
|
||||
{"epoch": 0.5003779289493575, "step": 332, "batch_size": 64, "mean": 13.684735298156738, "std": 17.777048110961914, "min": -21.495594024658203, "p10": -4.751795578002927, "median": 6.495441436767578, "p90": 44.011650848388676, "max": 52.603309631347656, "pos_frac": 0.859375, "sample": [-12.56998062133789, 5.04534912109375, 6.4384613037109375, 0.9405975341796875, 9.51786994934082, 28.413230895996094, 5.3904571533203125, 13.12211799621582, 3.343841552734375, 6.421958923339844, 5.813468933105469, 50.64490509033203, 21.662445068359375, 23.95993423461914, 46.77952575683594, 5.402923583984375, 5.79345703125, 0.7981967926025391, 18.054489135742188, 10.598114013671875, 28.85309600830078, 1.8601303100585938, -21.495594024658203, 44.67001724243164, 25.272186279296875, 18.299407958984375, -6.292280197143555, 3.6239242553710938, 6.0240631103515625, 11.266952514648438, 6.552421569824219, 38.85438537597656, 5.7000732421875, 0.803497314453125, 0.10211181640625, 10.115180969238281, -1.8829689025878906, 52.603309631347656, -17.85625457763672, 0.6984481811523438, 21.856796264648438, 40.43287658691406, 18.222753524780273, 6.187646865844727, 5.439537048339844, -0.42050933837890625, 28.403106689453125, 28.62567138671875, 26.525360107421875, 11.60983657836914, 44.629547119140625, 7.340240478515625, 16.04599380493164, 0.14814376831054688, 4.606529235839844, -6.7836761474609375, 50.552207946777344, 3.877086639404297, 4.8955841064453125, 42.56989288330078, 51.282989501953125, 25.486114501953125, -13.072834014892578, -5.981292724609375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000332.npy"}
|
||||
{"epoch": 0.5018896447467877, "step": 333, "batch_size": 64, "mean": 15.953669548034668, "std": 18.00642204284668, "min": -19.51220703125, "p10": -4.7666904449462875, "median": 14.088184356689453, "p90": 41.55117416381836, "max": 58.71553039550781, "pos_frac": 0.765625, "sample": [11.42642593383789, -1.4677562713623047, 15.898895263671875, 27.77191925048828, 14.463768005371094, -0.09611129760742188, 18.077388763427734, 11.217758178710938, 3.8116836547851562, -5.884033203125, 26.07610321044922, 26.56723403930664, 15.755943298339844, 31.96221160888672, 37.71092224121094, 10.350372314453125, 17.5794677734375, 48.53944396972656, 41.412078857421875, 47.60438537597656, -0.751434326171875, 13.712600708007812, 7.539360046386719, 3.7906570434570312, 16.966243743896484, -9.329971313476562, -5.494373321533203, 16.089508056640625, 2.8756790161132812, 20.780845642089844, 39.57781219482422, 45.7994384765625, -3.0687637329101562, 12.822456359863281, 21.800521850585938, 17.350345611572266, 58.71553039550781, -1.9512252807617188, -0.04735565185546875, 1.2042083740234375, -1.214151382446289, 9.280502319335938, -19.51220703125, 3.931344985961914, 6.6142730712890625, -10.575874328613281, 42.975616455078125, 8.3818359375, -0.5390167236328125, 16.300491333007812, 6.246395111083984, -11.646713256835938, 23.035337448120117, 46.511810302734375, 2.5289955139160156, -8.848945617675781, 30.789947509765625, 2.282257080078125, 41.61078643798828, 21.045825958251953, 40.556427001953125, 34.40038299560547, 39.11397933959961, 40.605384826660156], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000333.npy"}
|
||||
{"epoch": 0.5034013605442177, "step": 334, "batch_size": 64, "mean": 10.776723861694336, "std": 21.764890670776367, "min": -40.540611267089844, "p10": -15.213552856445308, "median": 8.207942962646484, "p90": 44.07410163879395, "max": 52.774658203125, "pos_frac": 0.71875, "sample": [-20.18408203125, -40.540611267089844, 11.397705078125, 17.02356719970703, 20.972152709960938, 0.8135986328125, -39.62010192871094, 35.59632110595703, 5.163211822509766, -34.209129333496094, 2.4345550537109375, 52.774658203125, -9.218141555786133, 9.254631042480469, 2.1393585205078125, 28.73276710510254, 1.51251220703125, -6.35112190246582, -9.95989990234375, 38.51759719848633, 28.94732666015625, 51.49787902832031, 10.449047088623047, 24.506914138793945, 27.817272186279297, -17.61566162109375, 9.5576171875, 1.0649528503417969, 47.9765739440918, 4.246425628662109, -0.17637062072753906, -1.6710033416748047, 45.953041076660156, -21.435089111328125, 25.674510955810547, 44.1567497253418, 22.54010009765625, 7.1612548828125, 9.542236328125, 2.74310302734375, 3.4451065063476562, -8.988594055175781, -1.5058631896972656, 6.2407989501953125, 44.66509246826172, -0.6858444213867188, 0.4664268493652344, 4.738615036010742, -17.465118408203125, 22.94245147705078, 27.319496154785156, 9.984413146972656, -9.25689697265625, 16.151018142700195, -9.260398864746094, 12.729736328125, 12.028884887695312, 43.881256103515625, -4.5795135498046875, 5.014308929443359, 30.28277587890625, 32.11823272705078, 48.54491424560547, 31.712547302246094], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000334.npy"}
|
||||
{"epoch": 0.5049130763416477, "step": 335, "batch_size": 64, "mean": 12.993217468261719, "std": 20.485177993774414, "min": -31.697406768798828, "p10": -10.675304794311522, "median": 8.866539001464844, "p90": 44.61473007202149, "max": 54.35736846923828, "pos_frac": 0.75, "sample": [30.169593811035156, 30.2061767578125, -14.014245986938477, 49.636627197265625, 8.747360229492188, 8.487335205078125, 23.4520263671875, 0.3928508758544922, 34.4462890625, 20.442649841308594, 44.65147018432617, 40.12236785888672, 6.7367706298828125, 11.065483093261719, 40.8172607421875, 44.52900314331055, 9.970382690429688, 3.228860855102539, 4.697822570800781, 54.35736846923828, -18.78125, 7.0373077392578125, 19.960811614990234, 45.739166259765625, -5.0084381103515625, 8.9857177734375, 26.730674743652344, -17.7879638671875, -7.487775802612305, 19.82465362548828, 34.374794006347656, -0.352325439453125, 2.4706268310546875, 11.1258544921875, 34.81644058227539, -8.596210479736328, 47.677894592285156, -7.3416290283203125, 12.270149230957031, 54.05686569213867, 3.3371505737304688, 3.369344711303711, 0.08036041259765625, 19.562286376953125, 17.21563720703125, 44.821754455566406, -3.1586227416992188, -19.661102294921875, 9.316226959228516, 0.0027256011962890625, 8.464523315429688, -8.350521087646484, -0.5766372680664062, -22.576427459716797, -31.697406768798828, -11.56634521484375, 35.81318664550781, 24.533016204833984, 4.906944274902344, 24.095584869384766, -2.5169219970703125, 1.1977996826171875, 3.017974853515625, 20.076583862304688], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000335.npy"}
|
||||
{"epoch": 0.5064247921390779, "step": 336, "batch_size": 64, "mean": 11.857397079467773, "std": 20.161895751953125, "min": -31.061344146728516, "p10": -8.949636077880859, "median": 7.289888381958008, "p90": 38.35642700195313, "max": 70.13853454589844, "pos_frac": 0.6875, "sample": [-8.7269287109375, 8.89569091796875, 32.69831085205078, 0.303497314453125, 49.300811767578125, 28.854034423828125, 20.084415435791016, -7.2229461669921875, 55.56565856933594, 1.9921283721923828, -0.42056846618652344, -4.68266487121582, 48.14179992675781, -5.707672119140625, 36.42198181152344, 4.4569854736328125, 1.0809097290039062, 20.17670440673828, 51.598297119140625, -17.163700103759766, -9.80612564086914, 17.83490753173828, 39.092735290527344, 5.7109832763671875, -9.045082092285156, 31.801746368408203, 8.082111358642578, -31.061344146728516, -4.311012268066406, -23.01354217529297, 43.44532775878906, -9.982097625732422, 9.817197799682617, 9.21746826171875, 0.9283638000488281, -0.028181076049804688, 4.7043304443359375, -6.5476837158203125, 24.883127212524414, 28.61001205444336, 0.87530517578125, -3.2607955932617188, 13.353530883789062, 6.308574676513672, 30.977848052978516, 12.858146667480469, 24.067581176757812, -2.136749267578125, 8.190412521362305, 32.798583984375, 3.0115795135498047, -2.8905181884765625, 23.997535705566406, 36.63837432861328, 70.13853454589844, 17.188350677490234, -11.505111694335938, 3.1014556884765625, 15.173294067382812, -5.102073669433594, 10.490631103515625, 27.646198272705078, 6.4976654052734375, -5.5249481201171875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000336.npy"}
|
||||
{"epoch": 0.5079365079365079, "step": 337, "batch_size": 64, "mean": 11.218806266784668, "std": 18.690427780151367, "min": -29.64453125, "p10": -9.210938262939452, "median": 9.002798080444336, "p90": 37.33754043579103, "max": 64.4224853515625, "pos_frac": 0.734375, "sample": [2.7267990112304688, -3.4922103881835938, -29.64453125, -8.743942260742188, 16.292404174804688, 29.707427978515625, -2.6307716369628906, 39.245567321777344, -1.666168212890625, -5.50616455078125, 42.58636474609375, 2.3956527709960938, 16.439048767089844, -29.22928810119629, 6.0565185546875, 1.4349746704101562, 13.515983581542969, 14.334564208984375, 7.310935974121094, -6.103759765625, 10.018997192382812, -0.0260467529296875, -10.564680099487305, -24.90738296508789, 26.51490592956543, 17.654624938964844, -9.600700378417969, 19.85486602783203, 16.33849334716797, 50.95295715332031, 23.172630310058594, 0.5173568725585938, 40.72657775878906, -3.012493133544922, 32.959503173828125, 17.79193878173828, 31.35692596435547, 3.5731201171875, 3.6779632568359375, 22.599342346191406, 38.390357971191406, 5.241539001464844, 29.830413818359375, 8.871788024902344, 5.4821624755859375, -3.1881561279296875, 19.173120498657227, 1.9376983642578125, 12.351806640625, 9.133808135986328, 25.229354858398438, 64.4224853515625, 1.4055252075195312, 2.4412841796875, 10.493637084960938, 34.88096618652344, -0.8412227630615234, -9.411079406738281, 27.958852767944336, 23.15553855895996, -22.140121459960938, 3.78533935546875, 39.057865142822266, 15.712299346923828], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000337.npy"}
|
||||
{"epoch": 0.509448223733938, "step": 338, "batch_size": 64, "mean": 11.375207901000977, "std": 20.081457138061523, "min": -38.45989990234375, "p10": -7.866838836669921, "median": 6.570577621459961, "p90": 39.99558792114258, "max": 57.46583938598633, "pos_frac": 0.71875, "sample": [-7.128082275390625, 6.514793395996094, 37.502227783203125, 29.19458770751953, 37.86867904663086, 30.205055236816406, -35.959346771240234, 28.181907653808594, 2.4169082641601562, 45.008087158203125, 57.46583938598633, 0.7305374145507812, -2.197650909423828, 11.36822509765625, 1.9967308044433594, -18.647354125976562, -0.092498779296875, -27.55547523498535, -3.5900726318359375, 4.337615966796875, 4.675422668457031, 39.795631408691406, 3.8582992553710938, -1.1992359161376953, 4.870185852050781, 21.30645751953125, -38.45989990234375, 28.100379943847656, 25.9764404296875, 23.52558708190918, 19.20964813232422, 18.645095825195312, 1.8606224060058594, 6.626361846923828, 13.146562576293945, 16.8255615234375, 8.682214736938477, 6.1955718994140625, 46.43756866455078, -1.081390380859375, -2.4905014038085938, 28.294939041137695, -11.140193939208984, 21.511905670166016, 2.9874000549316406, 9.573928833007812, -8.183448791503906, -5.248809814453125, 10.766777038574219, -0.9542579650878906, 8.243995666503906, 47.21025848388672, 9.735721588134766, 0.45221710205078125, 14.616680145263672, 26.660194396972656, -0.4266815185546875, 48.834625244140625, 40.08128356933594, -11.120372772216797, -3.261018753051758, 5.393035888671875, 49.27918243408203, 0.5786075592041016], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000338.npy"}
|
||||
{"epoch": 0.5109599395313681, "step": 339, "batch_size": 64, "mean": 14.053956985473633, "std": 18.784576416015625, "min": -24.14276885986328, "p10": -7.277016639709472, "median": 10.290962219238281, "p90": 39.82970199584962, "max": 56.48334503173828, "pos_frac": 0.78125, "sample": [27.688508987426758, 13.209121704101562, 3.5648651123046875, 0.8655548095703125, -3.67352294921875, 40.547752380371094, 10.031906127929688, 11.121240615844727, 4.848323822021484, -12.918203353881836, 21.055511474609375, -5.0376129150390625, 2.8964157104492188, 8.34341812133789, 18.731746673583984, 1.3808822631835938, -17.337730407714844, 28.073593139648438, 48.599082946777344, 1.1183719635009766, 35.91217041015625, 6.824371337890625, 23.600830078125, 22.45140838623047, 5.9558258056640625, 38.15425109863281, 2.4857540130615234, 13.921123504638672, -1.08624267578125, 27.184856414794922, 27.819625854492188, 37.89550018310547, 51.09666442871094, 54.12928771972656, 23.929466247558594, 8.532733917236328, 26.440078735351562, -8.787338256835938, 21.606231689453125, 20.962783813476562, 44.034637451171875, 10.550018310546875, -6.561346054077148, -7.583732604980469, 8.814300537109375, -2.59075927734375, -5.22991943359375, 16.6015625, -24.14276885986328, -15.368576049804688, 3.751333236694336, 2.453125, 30.746631622314453, 28.578227996826172, -18.295761108398438, 56.48334503173828, -2.857025146484375, 10.00416374206543, 5.933563232421875, 29.386512756347656, 20.410282135009766, 22.808002471923828, 1.1351203918457031, 48.25364685058594], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000339.npy"}
|
||||
{"epoch": 0.5124716553287982, "step": 340, "batch_size": 64, "mean": 14.868640899658203, "std": 24.005294799804688, "min": -41.51277160644531, "p10": -10.263761711120605, "median": 14.588497161865234, "p90": 49.16512870788574, "max": 58.687095642089844, "pos_frac": 0.6875, "sample": [17.458213806152344, 49.61676788330078, 15.838092803955078, 22.119808197021484, -5.5850372314453125, 56.506011962890625, 0.5614337921142578, -3.749177932739258, -7.228172302246094, 40.185577392578125, 3.0352325439453125, 29.683273315429688, 18.82837677001953, 30.4493465423584, 56.925086975097656, 34.2902946472168, 4.43449592590332, -17.151947021484375, 20.710281372070312, 24.9351806640625, -6.192216873168945, -2.213958740234375, -25.85611343383789, 18.15869140625, 3.55322265625, 58.687095642089844, -3.3976211547851562, 40.534027099609375, 7.984443664550781, 46.54273986816406, -26.681640625, 38.57831573486328, 3.4650421142578125, 15.052230834960938, 42.14933776855469, 24.928913116455078, 49.35076904296875, -0.1806163787841797, -2.1354827880859375, 48.73196792602539, 42.37969207763672, -8.345500946044922, -6.973461151123047, 48.13628387451172, 20.31698989868164, -6.379020690917969, -7.0670013427734375, -9.353443145751953, 7.447734832763672, 39.716400146484375, 1.0179901123046875, 14.311702728271484, 14.912017822265625, -10.653898239135742, -41.51277160644531, -18.284133911132812, 56.34880065917969, 3.7869949340820312, 19.327804565429688, 58.50414276123047, 14.865291595458984, 0.6170806884765625, -15.600654602050781, 11.151702880859375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000340.npy"}
|
||||
{"epoch": 0.5139833711262283, "step": 341, "batch_size": 64, "mean": 15.230374336242676, "std": 24.226701736450195, "min": -43.56571960449219, "p10": -10.890474700927735, "median": 11.340972900390625, "p90": 50.21564331054688, "max": 59.134925842285156, "pos_frac": 0.734375, "sample": [-30.420223236083984, 27.168010711669922, 34.323036193847656, 26.529394149780273, 51.23280334472656, 32.34886932373047, 41.82463836669922, 51.93658447265625, 41.69464111328125, -10.311920166015625, 48.877647399902344, 8.626213073730469, -31.298561096191406, -1.7824363708496094, 0.2989978790283203, 1.9347457885742188, 36.38482666015625, 3.9267654418945312, 45.523048400878906, 55.418190002441406, 41.18632507324219, 50.78907012939453, 55.842002868652344, 21.796091079711914, -16.84765625, 59.134925842285156, 25.725418090820312, 0.3906974792480469, -2.352386474609375, -6.16510009765625, 15.185970306396484, 4.403648376464844, -10.724174499511719, -3.2088241577148438, 15.059738159179688, 46.219661712646484, 14.73855972290039, 9.744539260864258, 11.633064270019531, -26.673187255859375, -12.252357482910156, -10.961746215820312, 10.551750183105469, 34.5982666015625, 2.336681365966797, 53.27671813964844, 14.50616455078125, 18.4676513671875, 29.228248596191406, 42.01734161376953, 15.486114501953125, 9.596611022949219, 36.59700012207031, -9.605194091796875, -0.10903167724609375, 35.078102111816406, 11.048881530761719, 5.34698486328125, 0.9804210662841797, -7.147605895996094, -4.250457763671875, 0.41414642333984375, 2.9913711547851562, -43.56571960449219], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000341.npy"}
|
||||
{"epoch": 0.5154950869236583, "step": 342, "batch_size": 64, "mean": 17.221298217773438, "std": 22.281024932861328, "min": -34.79857635498047, "p10": -5.019342422485352, "median": 12.836248397827148, "p90": 50.659859848022464, "max": 64.76673889160156, "pos_frac": 0.734375, "sample": [5.640922546386719, 14.221595764160156, -5.1751251220703125, -0.284912109375, 0.7719097137451172, 46.87715530395508, 4.536823272705078, 44.697662353515625, -4.4964141845703125, 22.602401733398438, 3.869953155517578, 15.64239501953125, -0.6008968353271484, 51.07456970214844, 3.4132080078125, -11.609542846679688, 1.3653640747070312, -34.79857635498047, -0.36234474182128906, -6.1691436767578125, 11.567070007324219, 49.692203521728516, 30.832992553710938, 55.91638946533203, -14.99908447265625, 21.51996612548828, 12.631484985351562, 21.45392417907715, 33.282859802246094, 31.539112091064453, 13.041011810302734, 15.975475311279297, 2.1416168212890625, 40.25579833984375, 18.277076721191406, -9.250982284545898, 33.78716278076172, -3.0493927001953125, 6.370941162109375, 45.44015121459961, -1.713165283203125, 55.21515655517578, 23.474151611328125, 16.30609130859375, -9.30858039855957, 61.13902282714844, 6.38031005859375, -1.4443626403808594, 64.76673889160156, 54.79945373535156, 20.328392028808594, -4.252544403076172, 9.086761474609375, 62.41276168823242, -4.655849456787109, 48.40636444091797, 19.654020309448242, 2.2019729614257812, 5.0110626220703125, 21.294841766357422, 30.62078857421875, 10.357465744018555, -0.7097396850585938, 45.1492919921875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000342.npy"}
|
||||
{"epoch": 0.5170068027210885, "step": 343, "batch_size": 64, "mean": 10.011658668518066, "std": 20.028776168823242, "min": -45.073760986328125, "p10": -11.98086814880371, "median": 7.550559997558594, "p90": 36.809943008422856, "max": 54.473846435546875, "pos_frac": 0.671875, "sample": [16.94757843017578, 0.5438613891601562, -5.2823333740234375, -0.5348968505859375, -2.508697509765625, 32.03684997558594, 8.115974426269531, -1.118398666381836, -1.9637298583984375, -0.6911392211914062, -30.79578399658203, 1.333160400390625, 8.013992309570312, 26.247314453125, 54.473846435546875, -2.583812713623047, 38.97235107421875, 40.67132568359375, 40.32727813720703, 36.51070785522461, 17.1376953125, 4.551074981689453, 24.86243438720703, 24.2181396484375, 5.035022735595703, 20.094280242919922, 4.896636962890625, -16.020431518554688, 18.400680541992188, -0.2839927673339844, -25.49321746826172, 11.532577514648438, 36.93818664550781, 3.6330184936523438, -0.5217437744140625, 4.114276885986328, 32.67716979980469, 32.625732421875, -45.073760986328125, 48.96717834472656, -12.780181884765625, 11.518768310546875, 20.831260681152344, 42.731903076171875, 9.570220947265625, 6.4387359619140625, 21.222503662109375, 8.58212661743164, -1.1906356811523438, 34.73451232910156, 28.74994659423828, -0.9813079833984375, 7.087127685546875, 17.85253143310547, 20.856842041015625, 18.948394775390625, -12.188739776611328, 0.03835296630859375, 14.595840454101562, -0.6825790405273438, -39.88627624511719, -11.495834350585938, -7.098920822143555, 2.285154342651367], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000343.npy"}
|
||||
{"epoch": 0.5185185185185185, "step": 344, "batch_size": 64, "mean": 14.950439453125, "std": 22.59248924255371, "min": -41.44383239746094, "p10": -12.206027793884276, "median": 11.509235382080078, "p90": 47.319780731201185, "max": 59.260894775390625, "pos_frac": 0.71875, "sample": [-21.43389892578125, 32.924293518066406, 4.413291931152344, 16.8817195892334, 49.145286560058594, -0.01914215087890625, 4.751455307006836, 18.236026763916016, 8.568832397460938, 24.399002075195312, -5.4028472900390625, 3.8686676025390625, 8.270292282104492, 35.1990852355957, 26.87067413330078, 26.074275970458984, 12.158889770507812, 59.260894775390625, 4.7561798095703125, 31.65557861328125, -13.162126541137695, -5.4014129638671875, -14.081207275390625, 58.3787841796875, 1.5843124389648438, -4.885917663574219, -16.826629638671875, 1.8377685546875, 48.92387390136719, 21.22793197631836, -12.931953430175781, 52.788543701171875, -0.41771697998046875, 13.77484130859375, 24.207435607910156, 42.565826416015625, 54.32138442993164, 7.665378570556641, 22.705041885375977, -32.05226135253906, -0.3994483947753906, 3.5088882446289062, -6.6605377197265625, -3.976755142211914, 3.2739181518554688, -10.512201309204102, 51.04557800292969, 41.73808288574219, 23.897735595703125, 20.266490936279297, 33.08197784423828, 10.859580993652344, 34.106712341308594, 10.835487365722656, 37.122764587402344, 1.4159259796142578, -7.6169281005859375, 42.63108825683594, -0.0703887939453125, 43.57689666748047, -41.44383239746094, 27.098052978515625, 36.108619689941406, 16.139892578125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000344.npy"}
|
||||
{"epoch": 0.5200302343159486, "step": 345, "batch_size": 64, "mean": 14.691193580627441, "std": 22.228593826293945, "min": -38.058799743652344, "p10": -10.62834358215332, "median": 12.1051025390625, "p90": 43.90588607788086, "max": 71.36054992675781, "pos_frac": 0.78125, "sample": [18.286087036132812, 6.103385925292969, 60.47840118408203, -30.175655364990234, 4.297016143798828, 1.854339599609375, 2.6038436889648438, -11.933792114257812, 1.87109375, 6.137275695800781, 31.900455474853516, 45.31650924682617, 28.381200790405273, 30.288724899291992, -18.958492279052734, 25.70952606201172, -20.787696838378906, 10.384626388549805, 6.7852935791015625, 36.16957092285156, 71.36054992675781, 5.55640983581543, 18.250396728515625, 23.61725616455078, 38.88340759277344, 2.14508056640625, -8.041175842285156, 14.383512496948242, 3.314788818359375, 11.50653076171875, 30.520050048828125, 47.95722961425781, 23.34466552734375, 21.009241104125977, 12.816116333007812, -25.894704818725586, 59.20570373535156, 51.29417419433594, 33.93106460571289, -6.380119323730469, 41.18070983886719, 4.327400207519531, -10.158599853515625, 43.53263854980469, 23.238616943359375, -10.829662322998047, 12.70367431640625, -0.5298690795898438, 16.84406280517578, 2.068695068359375, 20.109542846679688, 4.5146942138671875, -6.022117614746094, 10.035125732421875, 9.103630065917969, 27.9888916015625, -0.2410411834716797, -38.058799743652344, -7.178943634033203, 17.48155975341797, 24.82322883605957, 8.099136352539062, 39.6461296081543, 44.06584930419922], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000345.npy"}
|
||||
{"epoch": 0.5215419501133787, "step": 346, "batch_size": 64, "mean": 17.673648834228516, "std": 24.311975479125977, "min": -42.497108459472656, "p10": -11.257430267333984, "median": 12.998958587646484, "p90": 51.400890350341804, "max": 62.36680603027344, "pos_frac": 0.78125, "sample": [39.52268981933594, 47.337459564208984, 22.435348510742188, 33.29833221435547, 42.3958740234375, 12.264572143554688, 7.0475311279296875, 31.38762855529785, 52.168766021728516, 52.3286247253418, 52.112403869628906, 2.7554664611816406, 11.3643798828125, 39.55333709716797, -4.072210311889648, 62.1810417175293, -27.339187622070312, 20.218914031982422, -10.648513793945312, 20.352611541748047, 27.820796966552734, 15.233718872070312, 0.0178985595703125, 12.638202667236328, 42.829246520996094, 56.61833190917969, -42.497108459472656, 13.044017791748047, 0.000316619873046875, -28.227737426757812, 1.8453369140625, -23.907386779785156, 36.20112228393555, -2.0940628051757812, 12.953899383544922, 11.825157165527344, 9.336776733398438, -5.1146087646484375, 48.405921936035156, 48.02790832519531, 52.42063903808594, 40.016849517822266, 1.0219650268554688, 49.740692138671875, 8.832839965820312, 8.973762512207031, 42.375579833984375, 17.637908935546875, -12.907875061035156, 21.672164916992188, 13.790786743164062, 4.262657165527344, 6.045770645141602, -2.73956298828125, -11.518394470214844, 2.4662704467773438, 14.021087646484375, 31.962181091308594, -14.194305419921875, 62.36680603027344, -5.936969757080078, -1.2753772735595703, 10.889251708984375, 49.56597900390625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000346.npy"}
|
||||
{"epoch": 0.5230536659108088, "step": 347, "batch_size": 64, "mean": 13.848976135253906, "std": 23.23600959777832, "min": -33.0968017578125, "p10": -10.417218017578126, "median": 7.714122772216797, "p90": 51.263233566284185, "max": 68.3761215209961, "pos_frac": 0.734375, "sample": [10.461202621459961, 4.9295196533203125, 20.83917999267578, 14.119430541992188, -33.0968017578125, 3.5468292236328125, 21.4786376953125, 20.84654426574707, 32.62626647949219, 51.96596145629883, 0.5753326416015625, 44.309200286865234, 11.941024780273438, -11.416633605957031, -19.20606231689453, 65.6805191040039, 7.7639617919921875, 25.70101547241211, -10.143051147460938, 4.0377197265625, 1.346343994140625, 1.9214248657226562, 16.088394165039062, 5.18524169921875, -20.600967407226562, 1.4724807739257812, 37.82427215576172, 36.63153076171875, 49.62353515625, -9.719474792480469, -5.847389221191406, -4.254283905029297, -15.998252868652344, 26.1571044921875, -7.440336227416992, 14.32815933227539, -10.473472595214844, 32.93741226196289, -8.99774169921875, 8.430023193359375, 23.09396743774414, 2.3684921264648438, -4.924957275390625, 26.240989685058594, 5.297246932983398, 57.373626708984375, -9.914461135864258, -10.285957336425781, 68.3761215209961, 58.25746154785156, 55.827484130859375, 36.3747444152832, 26.892120361328125, 4.69158935546875, -16.4879207611084, 27.616073608398438, 19.45383644104004, 7.664283752441406, 3.6093368530273438, 0.05401611328125, -6.464176177978516, 3.696197509765625, 52.65019226074219, 39.30036163330078], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000347.npy"}
|
||||
{"epoch": 0.5245653817082389, "step": 348, "batch_size": 64, "mean": 18.360071182250977, "std": 26.817941665649414, "min": -24.164047241210938, "p10": -11.622804260253906, "median": 12.741565704345703, "p90": 56.616693878173834, "max": 77.2777099609375, "pos_frac": 0.671875, "sample": [48.93304443359375, 54.481590270996094, 1.4368133544921875, 2.7951812744140625, 57.53173828125, 30.627182006835938, 7.209381103515625, -15.630096435546875, -24.164047241210938, 0.3774566650390625, 68.38631439208984, -10.315872192382812, 14.600555419921875, 31.098060607910156, 45.67162322998047, 43.041419982910156, 13.539146423339844, 66.78501892089844, 11.81119155883789, 66.52903747558594, 1.3467674255371094, 41.263797760009766, 45.22112274169922, 8.733673095703125, 13.7535400390625, 52.5547981262207, -0.0217437744140625, 47.25770950317383, -17.038713455200195, -11.742042541503906, 19.096145629882812, -1.7101516723632812, 2.4556026458740234, 70.64219665527344, 15.834068298339844, -14.746023178100586, 22.626338958740234, -10.53634262084961, -7.997501373291016, 22.70047950744629, 51.243896484375, 77.2777099609375, 11.943984985351562, -1.1164913177490234, 32.36091613769531, -3.270263671875, -7.937238693237305, -12.25189208984375, -4.0444183349609375, 27.600841522216797, -5.8925933837890625, 69.85748291015625, -16.685895919799805, 29.077268600463867, 37.53929138183594, -0.7774887084960938, 23.483793258666992, 3.50323486328125, -11.344581604003906, 4.203527450561523, -3.75347900390625, 18.392196655273438, 46.75011444091797, -5.553852081298828], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000348.npy"}
|
||||
{"epoch": 0.5260770975056689, "step": 349, "batch_size": 64, "mean": 17.36975860595703, "std": 24.451433181762695, "min": -48.23877716064453, "p10": -6.530711364746093, "median": 12.220267295837402, "p90": 50.30864067077637, "max": 66.65614318847656, "pos_frac": 0.78125, "sample": [-15.778820037841797, 8.339263916015625, 46.80426788330078, 12.140113830566406, 19.898855209350586, 3.4345264434814453, 25.064125061035156, 41.81692123413086, 13.719335556030273, 11.67973518371582, -5.702882766723633, 6.528730392456055, 50.79319381713867, 6.308250427246094, 12.430801391601562, 39.593135833740234, 48.830291748046875, -27.922882080078125, -48.23877716064453, 49.08258819580078, 15.676841735839844, 49.178016662597656, -3.6700286865234375, 2.430988311767578, 39.4955940246582, 9.622352600097656, 1.4490585327148438, 26.407611846923828, 2.207611083984375, 12.397979736328125, 44.35943603515625, 33.91006088256836, -6.667449951171875, 19.856847763061523, -2.154815673828125, 60.13127517700195, 12.300420761108398, 65.21894836425781, -15.372817993164062, 17.907323837280273, 58.665306091308594, -2.2785568237304688, 3.5499343872070312, 8.832061767578125, 24.84005355834961, 52.948875427246094, 39.580169677734375, 0.9267387390136719, -9.185317993164062, -1.9495162963867188, 30.610061645507812, 1.3788604736328125, -5.088836669921875, 1.1903533935546875, 15.136743545532227, 66.65614318847656, -6.2116546630859375, 5.2205047607421875, -11.796566009521484, 48.39093780517578, 0.4718494415283203, 11.440742492675781, 28.292884826660156, 66.5367431640625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000349.npy"}
|
||||
{"epoch": 0.527588813303099, "step": 350, "batch_size": 64, "mean": 22.000972747802734, "std": 24.191852569580078, "min": -27.86797332763672, "p10": -5.488664817810058, "median": 18.52070140838623, "p90": 57.51699447631837, "max": 68.54317474365234, "pos_frac": 0.8125, "sample": [55.22901153564453, 3.2683563232421875, 27.26020050048828, -5.838798522949219, 28.495933532714844, 13.695024490356445, 52.10447692871094, 29.240692138671875, 18.814231872558594, 53.250152587890625, 2.2399444580078125, -3.3489227294921875, 2.928955078125, 34.94137191772461, 16.616552352905273, 34.524723052978516, 59.45751953125, 10.096118927001953, -5.194421768188477, -3.28546142578125, 16.0523681640625, 35.71880340576172, -27.86797332763672, 32.53489685058594, 17.49870491027832, -1.00408935546875, -24.944862365722656, 52.43995666503906, 2.3487987518310547, 4.449947357177734, 45.49311065673828, -18.742950439453125, 39.880157470703125, 3.1726303100585938, 43.05470657348633, 62.51029968261719, 23.172653198242188, 49.23161315917969, 58.49755859375, 67.50079345703125, 18.227170944213867, 68.54317474365234, 65.94123840332031, -5.614768981933594, 12.816478729248047, 15.810617446899414, 9.913581848144531, 4.176605224609375, -10.478450775146484, 21.541419982910156, 19.2672119140625, 31.695865631103516, 1.0491294860839844, 19.63994598388672, 12.71246337890625, -15.576141357421875, 27.72564697265625, 16.545272827148438, -0.7380828857421875, 9.259368896484375, 67.18024444580078, 30.423641204833984, 40.42176818847656, 42.08610916137695], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000350.npy"}
|
||||
{"epoch": 0.5291005291005291, "step": 351, "batch_size": 64, "mean": 18.47935676574707, "std": 26.9949951171875, "min": -56.46131134033203, "p10": -14.632999420166016, "median": 22.5866641998291, "p90": 53.30533027648926, "max": 59.229087829589844, "pos_frac": 0.75, "sample": [-4.939851760864258, 33.243797302246094, 54.872398376464844, 42.88762664794922, -3.5759315490722656, 53.47636032104492, -6.458377838134766, 27.035850524902344, 53.424537658691406, 12.516998291015625, 10.417648315429688, 54.72919464111328, 10.644441604614258, -1.7423095703125, 21.40683364868164, -14.580581665039062, 3.7755584716796875, 1.0741195678710938, 41.276390075683594, 23.24227523803711, 23.262901306152344, -20.52301025390625, 53.02717971801758, 17.174774169921875, 52.40888977050781, 56.61155700683594, 29.54108428955078, 32.482383728027344, -54.185089111328125, 17.11126708984375, 50.315242767333984, -14.655464172363281, 37.60600280761719, 1.5731658935546875, 33.03105163574219, -20.20972442626953, -6.6347198486328125, 47.083984375, 21.931053161621094, 6.272884368896484, 37.591896057128906, 59.229087829589844, 9.856735229492188, -15.644950866699219, 31.008506774902344, 11.108787536621094, 39.499534606933594, 30.613052368164062, 1.2456512451171875, 28.61579132080078, -12.12911605834961, -0.5426521301269531, 37.602848052978516, 25.759016036987305, 56.11882019042969, 24.438003540039062, 21.441802978515625, 52.56242370605469, -56.46131134033203, 9.354185104370117, -42.58753967285156, 32.77874755859375, 31.918256759643555, -6.651123046875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000351.npy"}
|
||||
{"epoch": 0.5306122448979592, "step": 352, "batch_size": 64, "mean": 17.197158813476562, "std": 24.377456665039062, "min": -26.69208526611328, "p10": -7.122242355346679, "median": 12.958732604980469, "p90": 53.43660469055176, "max": 77.6112289428711, "pos_frac": 0.75, "sample": [45.571319580078125, 77.6112289428711, 67.08328247070312, 1.7069091796875, 42.1265869140625, -26.69208526611328, -5.542453765869141, 1.6578617095947266, 17.17477798461914, 24.02865219116211, 28.135190963745117, 30.927215576171875, -21.312881469726562, -1.3861312866210938, 30.299148559570312, 8.15826416015625, -0.70050048828125, -7.3815460205078125, -23.59447479248047, 10.980262756347656, 37.636199951171875, 22.722640991210938, 13.0302734375, 54.02031326293945, 52.515811920166016, 12.624710083007812, 14.555023193359375, -21.249788284301758, 4.35798454284668, 0.9689826965332031, 5.8784637451171875, 3.0580673217773438, -3.3974227905273438, 28.17498779296875, 14.358940124511719, 23.12704086303711, 15.53318977355957, 57.314453125, 51.208709716796875, 12.887191772460938, 49.2697639465332, -2.4317398071289062, 69.18408203125, 2.2219715118408203, 17.13176727294922, 50.620513916015625, -6.2304534912109375, 0.838104248046875, -6.517200469970703, 24.26932144165039, -7.7807769775390625, 15.791534423828125, 7.926605224609375, 53.83123016357422, 31.99535369873047, -0.3890838623046875, -5.287147521972656, 59.70344543457031, -12.089458465576172, 2.395904541015625, 0.41997337341308594, 39.05494689941406, 13.074201583862305, 5.438928604125977], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000352.npy"}
|
||||
{"epoch": 0.5321239606953893, "step": 353, "batch_size": 64, "mean": 18.460941314697266, "std": 26.49051856994629, "min": -61.777801513671875, "p10": -6.5460361480712885, "median": 13.19942855834961, "p90": 57.025994873046876, "max": 68.96905517578125, "pos_frac": 0.796875, "sample": [31.419858932495117, 51.34405517578125, -42.593685150146484, 16.302471160888672, -22.496715545654297, 14.52895736694336, 25.18115997314453, -6.364635467529297, 45.64491271972656, 13.014175415039062, 3.3107566833496094, 13.384681701660156, 37.34248352050781, -8.66973876953125, 66.4067153930664, 5.41771125793457, 19.595703125, 42.37461853027344, 57.074371337890625, 62.35894012451172, 36.63243103027344, 64.75720977783203, 1.0527591705322266, 32.91648864746094, -1.3901824951171875, 0.40273284912109375, 57.66313934326172, 18.83454132080078, 57.89457702636719, 10.647798538208008, 17.614654541015625, 6.260936737060547, 31.384891510009766, -0.033267974853515625, -61.777801513671875, 46.806739807128906, -20.720985412597656, -17.22740936279297, 44.47560119628906, 53.224525451660156, 24.301864624023438, 53.992149353027344, 68.96905517578125, 8.589042663574219, 6.81402587890625, 6.521419525146484, 37.82367706298828, 9.966386795043945, 1.3521728515625, 3.2028121948242188, 16.92060661315918, 6.1595001220703125, -0.579620361328125, 8.4154052734375, -0.183013916015625, 2.079010009765625, 5.843544006347656, 11.638458251953125, -2.8985671997070312, 4.324424743652344, 28.01080322265625, -6.623779296875, 56.913116455078125, 25.95159149169922], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000353.npy"}
|
||||
{"epoch": 0.5336356764928194, "step": 354, "batch_size": 64, "mean": 20.501022338867188, "std": 27.31888198852539, "min": -38.432861328125, "p10": -15.778707122802734, "median": 23.036760330200195, "p90": 54.96008071899414, "max": 76.01579284667969, "pos_frac": 0.78125, "sample": [66.07453155517578, 24.653606414794922, -15.85504150390625, 35.97902297973633, -15.600593566894531, 15.596418380737305, 17.97314453125, 29.949806213378906, 51.02557373046875, 3.1531944274902344, 31.813508987426758, 55.3146858215332, 31.30548095703125, 3.785491943359375, 1.697235107421875, 1.3523731231689453, 41.33989715576172, -17.139217376708984, 51.34779357910156, 76.01579284667969, 36.64692687988281, 40.961036682128906, 10.203262329101562, 26.644874572753906, -38.432861328125, 0.524444580078125, -30.196685791015625, 45.615047454833984, 21.16181182861328, 23.03679656982422, 27.6644287109375, 43.801265716552734, 53.318328857421875, 27.533294677734375, 11.03363037109375, 7.682762145996094, 23.518077850341797, -27.007638931274414, 63.6224365234375, -0.198272705078125, 36.03626251220703, 55.05432891845703, -25.726356506347656, -12.043899536132812, 52.619293212890625, -24.86798095703125, 54.99290466308594, 54.07411193847656, 3.7172298431396484, -5.090721130371094, 19.518951416015625, 23.036724090576172, 54.88349151611328, 62.684749603271484, 24.656932830810547, 9.056865692138672, 2.1145095825195312, 9.446342468261719, -8.164604187011719, 33.09846878051758, 13.49339485168457, 49.026206970214844, -15.351951599121094, -11.115558624267578], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000354.npy"}
|
||||
{"epoch": 0.5351473922902494, "step": 355, "batch_size": 64, "mean": 21.115116119384766, "std": 28.162139892578125, "min": -52.119529724121094, "p10": -11.031769943237302, "median": 21.533313751220703, "p90": 59.98015251159669, "max": 68.16777038574219, "pos_frac": 0.75, "sample": [27.604061126708984, 67.89038848876953, 2.28009033203125, 11.009803771972656, 43.52313232421875, 21.35961151123047, 21.707015991210938, -7.953945159912109, -12.350837707519531, 16.16564178466797, 31.01689910888672, 3.6962127685546875, 27.790040969848633, -17.631240844726562, 36.20789337158203, 46.51051712036133, -3.298847198486328, 24.950275421142578, -33.53066635131836, 29.638648986816406, -24.51221466064453, 19.015697479248047, 20.496849060058594, 55.92759704589844, 54.78827667236328, 16.01520347595215, 21.96983528137207, 22.322147369384766, 51.116432189941406, 68.16777038574219, -18.254432678222656, 10.70529556274414, -4.570213317871094, 28.872344970703125, 5.015594482421875, -40.146514892578125, 33.209983825683594, -2.5334548950195312, 24.413196563720703, 51.82312774658203, 10.59173583984375, 6.104248046875, 64.82102966308594, -3.6554946899414062, 3.004535675048828, 61.25859832763672, 45.015106201171875, 4.061840057373047, 63.91707992553711, -52.119529724121094, 56.99711227416992, 38.81060791015625, 56.90327453613281, -0.9611358642578125, 66.12347412109375, -6.907320022583008, -2.4501266479492188, 8.274364471435547, 62.43871307373047, -1.0740280151367188, 47.585960388183594, 13.163732528686523, 33.694419860839844, 45.341896057128906], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000355.npy"}
|
||||
{"epoch": 0.5366591080876795, "step": 356, "batch_size": 64, "mean": 17.235088348388672, "std": 33.77053451538086, "min": -66.85041809082031, "p10": -28.26426773071289, "median": 16.722025871276855, "p90": 60.4280990600586, "max": 76.51460266113281, "pos_frac": 0.75, "sample": [69.89480590820312, 51.69463348388672, 10.672836303710938, -28.655044555664062, 51.799072265625, 4.452659606933594, 1.7752342224121094, 52.92510986328125, 32.715415954589844, 0.9699687957763672, -42.49663162231445, -66.03462982177734, 25.108016967773438, -29.319908142089844, 64.71099853515625, 68.5130844116211, 3.1610031127929688, 7.950355529785156, 60.67449951171875, 0.5244178771972656, 62.466400146484375, -66.85041809082031, 54.53959655761719, 8.042434692382812, 8.697654724121094, 59.85316467285156, -24.685787200927734, -28.904037475585938, 22.549251556396484, 20.177024841308594, 25.911422729492188, 4.281547546386719, -8.68321418762207, 0.6416530609130859, -27.352455139160156, 39.47590637207031, 51.297996520996094, -19.8325138092041, 32.985870361328125, 45.43339920043945, 10.35751724243164, 5.0869293212890625, 47.5898323059082, -5.47894287109375, 28.157669067382812, 29.86307716369629, 35.11835861206055, 8.285530090332031, -14.936792373657227, 24.95287322998047, 0.7462882995605469, 34.03852844238281, 61.51008605957031, 58.18412780761719, -3.94293212890625, 43.47294616699219, 15.216352462768555, -5.310569763183594, 54.74519348144531, -10.749252319335938, -48.76301193237305, 76.51460266113281, 18.227699279785156, 39.0787353515625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000356.npy"}
|
||||
{"epoch": 0.5381708238851096, "step": 357, "batch_size": 64, "mean": 15.968550682067871, "std": 26.978513717651367, "min": -49.792808532714844, "p10": -15.261003875732417, "median": 12.276365280151367, "p90": 52.52053375244141, "max": 72.3768539428711, "pos_frac": 0.765625, "sample": [52.282814025878906, -4.161170959472656, 52.622413635253906, -8.104293823242188, -48.1109619140625, 72.3768539428711, 16.5765380859375, 17.637081146240234, 5.338432312011719, 1.900543212890625, 4.7636871337890625, 35.077293395996094, -10.725311279296875, 8.679496765136719, 64.13536834716797, 26.33392333984375, 57.17072296142578, 20.510711669921875, 37.63678741455078, 2.416339874267578, 36.834205627441406, 11.342758178710938, 20.78982925415039, 8.544427871704102, 5.4331207275390625, 30.3331298828125, -0.21874046325683594, 0.4939460754394531, -24.258392333984375, 24.983905792236328, -17.204872131347656, -6.050363540649414, -9.347087860107422, 25.79303741455078, 23.307090759277344, 67.34364318847656, 49.70677185058594, 16.299545288085938, 33.540283203125, 60.452125549316406, -4.180931091308594, 30.110519409179688, 2.230621337890625, 2.0584793090820312, 46.04156494140625, 16.116714477539062, 2.300018310546875, -8.035903930664062, -18.580669403076172, 6.3555908203125, 13.209972381591797, 44.27418518066406, 40.88230895996094, 15.243335723876953, 44.047767639160156, 11.328689575195312, 8.286943435668945, 6.989963531494141, 62.310218811035156, -33.71369552612305, -22.611160278320312, -49.792808532714844, 44.12804412841797, 0.5118503570556641], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000357.npy"}
|
||||
{"epoch": 0.5396825396825397, "step": 358, "batch_size": 64, "mean": 21.760364532470703, "std": 28.916423797607422, "min": -34.585147857666016, "p10": -11.829058837890624, "median": 16.55695343017578, "p90": 65.60062866210939, "max": 73.14733123779297, "pos_frac": 0.765625, "sample": [50.47478485107422, 29.067184448242188, 41.31235122680664, 61.51958465576172, -9.220479965209961, 57.727210998535156, 63.6395263671875, -11.899055480957031, 3.0863304138183594, 17.250228881835938, -3.7872676849365234, 43.14614486694336, 14.093765258789062, 35.928436279296875, 67.5409164428711, 8.429458618164062, -28.025314331054688, 15.863677978515625, 44.11100769042969, 14.656526565551758, 19.12493896484375, 6.787334442138672, -15.518157958984375, 56.06151580810547, 47.25205993652344, -0.8791122436523438, 6.504669189453125, -11.665733337402344, -4.1275482177734375, 3.1389694213867188, -29.28192138671875, 13.347190856933594, -11.4459228515625, 2.5054168701171875, -4.654411315917969, 32.378997802734375, 31.915668487548828, 31.906234741210938, 9.518096923828125, -0.8143405914306641, 71.94046020507812, 66.44110107421875, 66.79507446289062, 2.597553253173828, 73.14733123779297, 71.19834899902344, 31.164817810058594, -25.321544647216797, 8.787551879882812, 24.555694580078125, 1.984222412109375, -26.558679580688477, -34.585147857666016, 54.18207931518555, 7.956268310546875, 44.79581069946289, 13.658313751220703, 36.4521484375, 4.432861328125, 34.2834358215332, 27.342092514038086, 50.14679718017578, 17.8834228515625, 72.41433715820312], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000358.npy"}
|
||||
{"epoch": 0.5411942554799698, "step": 359, "batch_size": 64, "mean": 12.320904731750488, "std": 25.921424865722656, "min": -42.04403305053711, "p10": -17.396131896972655, "median": 8.33353042602539, "p90": 47.9200149536133, "max": 73.56399536132812, "pos_frac": 0.65625, "sample": [-7.87945556640625, 28.06481170654297, -5.5186920166015625, 5.8101043701171875, 32.51383972167969, 13.852699279785156, -12.175167083740234, 11.044940948486328, -17.717872619628906, 13.26165771484375, 73.56399536132812, 7.592988967895508, -3.9413299560546875, 24.43280029296875, -2.6616687774658203, 21.43408203125, 59.08439636230469, -18.35698699951172, 5.336479187011719, -42.04403305053711, 4.692905426025391, 18.977405548095703, 7.58245849609375, -5.523406982421875, 4.014213562011719, 52.740318298339844, 21.430618286132812, 12.439002990722656, 7.7369384765625, -37.45978546142578, 49.4549560546875, 73.38134765625, -26.646987915039062, 34.75325012207031, 7.8507232666015625, 8.816337585449219, 11.137397766113281, 28.172489166259766, 4.424110412597656, 10.242729187011719, -17.402389526367188, 33.229888916015625, 63.742393493652344, 35.126625061035156, -3.3859786987304688, 3.54962158203125, -16.787208557128906, -17.38153076171875, -15.280609130859375, 28.642745971679688, 10.778568267822266, -4.1050567626953125, -32.3677978515625, -2.981149673461914, 38.700172424316406, -0.247222900390625, 43.28123474121094, 12.244504928588867, 30.025177001953125, 44.33848571777344, 12.61777114868164, 72.36390686035156, -1.9113655090332031, -2.16748046875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000359.npy"}
|
||||
{"epoch": 0.5427059712773998, "step": 360, "batch_size": 64, "mean": 18.459043502807617, "std": 27.122350692749023, "min": -35.015708923339844, "p10": -8.597284317016602, "median": 9.923961639404297, "p90": 58.71397399902344, "max": 78.37158966064453, "pos_frac": 0.734375, "sample": [-3.523651123046875, 75.39069366455078, 4.2808837890625, -1.07470703125, 22.832901000976562, 43.42797088623047, -2.3411102294921875, 15.914714813232422, 8.14963150024414, 49.16314697265625, -1.1782417297363281, -22.409713745117188, 0.4139862060546875, -25.92236328125, 60.98823165893555, -16.099685668945312, 44.9375, -7.828121185302734, 2.7311134338378906, 58.96684265136719, 15.313831329345703, 1.2654609680175781, -20.082138061523438, 6.058013916015625, 1.6657867431640625, -0.5531501770019531, 51.18743133544922, 8.341743469238281, 31.42955780029297, 3.373565673828125, 12.185890197753906, 5.1686553955078125, 24.015384674072266, -8.926925659179688, 11.506179809570312, 46.039894104003906, -3.933349609375, 28.046175003051758, 54.43601989746094, 0.7297897338867188, 78.37158966064453, 4.2801513671875, 46.73188018798828, -9.474897384643555, 67.6578369140625, 22.681114196777344, 58.12394714355469, 5.512523651123047, 15.016975402832031, 71.87486267089844, 56.41508483886719, 18.639968872070312, 60.06907272338867, -6.719972610473633, -3.4739627838134766, 27.240158081054688, -5.893877029418945, -35.015708923339844, 3.69696044921875, 43.7965087890625, 13.575576782226562, 4.638832092285156, 33.44451904296875, 36.10188293457031], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000360.npy"}
|
||||
{"epoch": 0.54421768707483, "step": 361, "batch_size": 64, "mean": 21.408405303955078, "std": 27.917287826538086, "min": -53.29862976074219, "p10": -7.904567146301269, "median": 20.240575790405273, "p90": 61.84013633728028, "max": 78.95952606201172, "pos_frac": 0.78125, "sample": [31.446632385253906, 29.09368133544922, 2.664052963256836, -8.110910415649414, 65.02407836914062, 9.28858757019043, -12.68874740600586, -30.900131225585938, 21.563655853271484, 20.088512420654297, 18.625564575195312, -23.109233856201172, 2.022125244140625, 65.21150207519531, 47.71373748779297, 30.715850830078125, -7.423099517822266, -3.601757049560547, 21.54803466796875, 8.448654174804688, 11.005706787109375, 34.10358428955078, 6.606834411621094, 66.61984252929688, 50.68926239013672, 40.300437927246094, 40.30868911743164, -53.29862976074219, 63.004119873046875, 22.404979705810547, 28.14244842529297, -0.88092041015625, 50.073490142822266, 78.95952606201172, 62.03236389160156, 20.39263916015625, 55.6103401184082, 35.90399169921875, -7.363777160644531, -3.50018310546875, -0.04978179931640625, 5.716686248779297, 60.892250061035156, 43.73530197143555, 11.275806427001953, 61.391605377197266, 28.04486656188965, -3.2836990356445312, -30.675437927246094, 1.0036697387695312, 30.793289184570312, 13.139892578125, 26.61933135986328, 10.611682891845703, 77.60529327392578, 16.63404655456543, 57.165771484375, 9.805595397949219, -15.151626586914062, 26.826719284057617, 26.733642578125, 18.178491592407227, 3.3438663482666016, 1.0452117919921875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000361.npy"}
|
||||
{"epoch": 0.54572940287226, "step": 362, "batch_size": 64, "mean": 21.240074157714844, "std": 29.67422103881836, "min": -55.31910705566406, "p10": -13.178260040283197, "median": 19.91415786743164, "p90": 61.413339996337896, "max": 79.0843505859375, "pos_frac": 0.78125, "sample": [-15.60211181640625, 7.174642562866211, 16.417633056640625, 0.5781478881835938, 9.337753295898438, -55.31910705566406, 26.791404724121094, 21.577804565429688, 79.0843505859375, 38.55699157714844, 1.1477584838867188, -3.974658966064453, 22.756752014160156, -3.3597564697265625, 0.2144622802734375, 5.344188690185547, 27.104385375976562, 41.62694549560547, 23.639446258544922, 66.47090148925781, -2.745908737182617, 4.7371826171875, 8.516965866088867, 22.89178466796875, 57.789878845214844, 13.595054626464844, 60.007225036621094, -2.2476043701171875, -0.2819786071777344, 51.20485305786133, 62.015960693359375, 1.8636150360107422, -1.7176055908203125, 58.15815734863281, 9.343696594238281, 59.94171142578125, 33.182273864746094, 7.518707275390625, 26.401222229003906, -7.522605895996094, 3.5816497802734375, 56.838462829589844, 62.435325622558594, 66.14016723632812, 16.303146362304688, 9.1390380859375, 43.794517517089844, 66.08808898925781, 48.570701599121094, 34.48529052734375, -26.233186721801758, 1.829864501953125, 45.330467224121094, -32.54090881347656, 18.250511169433594, 24.7884521484375, 59.832420349121094, -32.041015625, -20.319793701171875, 29.645095825195312, 78.49609375, -28.927444458007812, 38.50071716308594, 23.156539916992188], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000362.npy"}
|
||||
{"epoch": 0.54724111866969, "step": 363, "batch_size": 64, "mean": 21.322845458984375, "std": 28.855497360229492, "min": -49.409324645996094, "p10": -12.816497802734368, "median": 17.258468627929688, "p90": 65.0281463623047, "max": 81.6666259765625, "pos_frac": 0.796875, "sample": [43.53740692138672, -36.13892364501953, 72.25139617919922, 26.65941619873047, 27.125640869140625, 27.969970703125, 3.4258499145507812, 72.24382019042969, 5.44537353515625, -15.786834716796875, -2.614349365234375, 18.073272705078125, 81.6666259765625, 66.892333984375, 5.290611267089844, 78.69969940185547, 15.98563003540039, 13.768610000610352, 9.433708190917969, 53.66041564941406, 2.2174224853515625, 33.45894241333008, 20.074533462524414, 66.02381896972656, 62.70491027832031, 15.660577774047852, 1.9735870361328125, 16.44366455078125, -5.885711669921875, 6.0691986083984375, -19.03684425354004, -49.409324645996094, 27.60832977294922, -29.86715316772461, 36.73337173461914, 38.449859619140625, 10.728277206420898, 41.21382522583008, 12.509481430053711, -0.08713722229003906, 5.204517364501953, 3.7556915283203125, 27.508216857910156, 52.58403015136719, 13.317243576049805, 69.83722686767578, 39.5616569519043, 14.314361572265625, 50.45457458496094, -4.040107727050781, 57.29340362548828, 0.03496551513671875, 34.57093811035156, 32.53318786621094, -20.030946731567383, -26.95361328125, 56.72491455078125, -4.185825347900391, 25.194107055664062, 20.76828384399414, -1.0920581817626953, 22.943374633789062, 30.521240234375, 8.669544219970703], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000363.npy"}
|
||||
{"epoch": 0.5487528344671202, "step": 364, "batch_size": 64, "mean": 23.249065399169922, "std": 34.79979705810547, "min": -64.24613189697266, "p10": -13.012045288085936, "median": 23.27452278137207, "p90": 71.64942855834963, "max": 85.90428161621094, "pos_frac": 0.71875, "sample": [73.8359375, -0.5592899322509766, 2.6325149536132812, 78.1579818725586, 75.84427642822266, 5.1232452392578125, 30.240154266357422, 45.56243896484375, -11.560028076171875, 68.91934204101562, 84.98574829101562, 30.203086853027344, -6.331394195556641, -28.442909240722656, 44.57572937011719, 72.85888671875, 22.356334686279297, 0.24005508422851562, 85.90428161621094, -9.98870849609375, 68.65861511230469, 11.377937316894531, 36.74440002441406, 67.78842163085938, -43.80082702636719, -31.186742782592773, 72.81946563720703, -2.7916259765625, 0.16387176513671875, 45.59773254394531, 58.73640441894531, 29.367576599121094, -0.088775634765625, 28.66289520263672, 15.912033081054688, -0.1993255615234375, 46.10372543334961, 58.6378173828125, -4.682731628417969, 7.86225700378418, 32.972084045410156, 24.192710876464844, -9.263275146484375, -22.734664916992188, 53.843170166015625, 5.090900421142578, -5.447052001953125, 48.43904113769531, 1.7727508544921875, 68.48089599609375, -6.3909454345703125, 44.33332824707031, -38.63798522949219, 35.72589111328125, -64.24613189697266, 10.890914916992188, 34.541358947753906, 31.893142700195312, 0.7655239105224609, 7.840087890625, 2.759918212890625, -13.63433837890625, 67.32295227050781, 47.18907165527344], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000364.npy"}
|
||||
{"epoch": 0.5502645502645502, "step": 365, "batch_size": 64, "mean": 21.135812759399414, "std": 26.35120964050293, "min": -39.303932189941406, "p10": -3.9885482788085933, "median": 13.53954029083252, "p90": 58.367540740966824, "max": 79.02017211914062, "pos_frac": 0.828125, "sample": [12.59669303894043, 13.200571060180664, 43.81468200683594, 3.2477664947509766, 17.221664428710938, 2.5766773223876953, 52.423912048339844, 40.057281494140625, 60.91481018066406, 33.0557861328125, 47.93054962158203, 18.290542602539062, -36.57988739013672, 3.528644561767578, 6.483680725097656, 40.34926223754883, -1.3924369812011719, 5.684497833251953, -6.697395324707031, 0.9333534240722656, 13.878509521484375, 32.04106903076172, 38.97620391845703, 4.105104446411133, 0.2799034118652344, -39.303932189941406, 19.885948181152344, 76.41873168945312, 2.1221084594726562, 42.71751403808594, 10.065887451171875, 11.096704483032227, 42.7411003112793, 11.574230194091797, 45.75819396972656, 25.146053314208984, 65.77305603027344, -2.553630828857422, 13.14889144897461, 37.12419128417969, 61.76310729980469, 38.94175720214844, 30.101089477539062, -4.14288330078125, 21.060867309570312, 65.84050750732422, 2.577016830444336, -10.620048522949219, 70.5782470703125, 9.351394653320312, 43.46437072753906, -1.2053680419921875, 45.94709777832031, 8.013824462890625, 1.3634834289550781, 12.118194580078125, -4.236183166503906, -36.132144927978516, 27.280471801757812, 2.4396591186523438, 79.02017211914062, 49.427330017089844, -3.6284332275390625, 34.731964111328125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000365.npy"}
|
||||
{"epoch": 0.5517762660619804, "step": 366, "batch_size": 64, "mean": 19.541061401367188, "std": 32.21116638183594, "min": -48.62135314941406, "p10": -17.420188331604, "median": 15.790094375610352, "p90": 60.070265197753905, "max": 78.16497802734375, "pos_frac": 0.6875, "sample": [-1.483194351196289, 56.796142578125, 53.30150604248047, 13.994098663330078, -2.2744903564453125, 60.02244567871094, -12.406005859375, -18.897878646850586, -13.972244262695312, -8.937263488769531, -9.634872436523438, 48.795677185058594, 5.571495056152344, 39.853641510009766, 25.128158569335938, 11.601898193359375, -27.59368896484375, 17.586090087890625, 1.6279067993164062, 48.42009735107422, 5.501434326171875, 30.242385864257812, 22.302711486816406, -5.552295684814453, 0.3337230682373047, 43.283843994140625, 28.53787612915039, 58.51906204223633, -48.62135314941406, -1.586160659790039, 78.16497802734375, -1.4251556396484375, -6.019548416137695, 66.32958984375, -35.21381378173828, 0.6818523406982422, -3.0897750854492188, -39.54859161376953, 12.071329116821289, 21.903247833251953, 3.5349788665771484, -7.14277458190918, 1.6782245635986328, 20.478897094726562, -23.711917877197266, 0.955413818359375, 8.068731307983398, 58.24906921386719, 72.87055969238281, 31.037765502929688, 20.762161254882812, 59.864288330078125, 47.56318664550781, -42.96091079711914, 55.53978729248047, -3.336761474609375, 77.05964660644531, 51.90692138671875, 76.18633270263672, 60.09075927734375, 60.409690856933594, 37.827598571777344, 49.408287048339844, 19.973201751708984], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000366.npy"}
|
||||
{"epoch": 0.5532879818594104, "step": 367, "batch_size": 64, "mean": 18.435543060302734, "std": 29.735109329223633, "min": -67.58599090576172, "p10": -12.890859222412107, "median": 16.96038055419922, "p90": 60.3046226501465, "max": 74.08259582519531, "pos_frac": 0.84375, "sample": [2.0386714935302734, 66.20242309570312, 24.56964874267578, 0.5213775634765625, 26.903526306152344, 1.95294189453125, 61.51750183105469, 70.26644897460938, 33.4774169921875, 0.9521980285644531, 7.5980072021484375, 19.415069580078125, 3.5915145874023438, 6.708213806152344, 74.08259582519531, 6.3075714111328125, 40.065155029296875, 0.01804351806640625, -16.718595504760742, 57.37421417236328, 21.92376708984375, 26.292144775390625, -0.207183837890625, -11.672080993652344, 21.356689453125, -65.20724487304688, -13.413192749023438, 25.753387451171875, 5.637256622314453, 24.97894287109375, 41.46246337890625, 41.40257263183594, 25.844085693359375, 19.33984375, 13.662342071533203, 3.5762863159179688, 20.151046752929688, -22.275802612304688, 6.066457748413086, 72.54609680175781, 1.9921493530273438, -67.58599090576172, 3.882112503051758, 70.48670959472656, 40.17686462402344, -15.611083984375, 32.0466423034668, 8.477153778076172, 32.281097412109375, 7.476154327392578, 5.4682159423828125, 6.706205368041992, -45.08191680908203, 57.474571228027344, 40.40513610839844, 56.15007781982422, 16.752281188964844, -8.06485366821289, 38.92301940917969, 65.70925903320312, 2.848907470703125, 16.44443702697754, 17.168479919433594, 51.287376403808594], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000367.npy"}
|
||||
{"epoch": 0.5547996976568406, "step": 368, "batch_size": 64, "mean": 19.164134979248047, "std": 30.007360458374023, "min": -62.575408935546875, "p10": -19.036473083496094, "median": 18.45807647705078, "p90": 57.8961483001709, "max": 82.10630798339844, "pos_frac": 0.71875, "sample": [-62.575408935546875, 38.351318359375, -2.8276710510253906, 5.591730117797852, 17.95648193359375, 34.63787078857422, 49.694793701171875, 18.22100067138672, -3.8121795654296875, -9.614364624023438, 68.40460205078125, 2.022960662841797, 18.695152282714844, 19.661216735839844, 67.25370788574219, 39.07598114013672, -12.011543273925781, -19.125503540039062, -18.8287353515625, 58.263580322265625, 17.198509216308594, 12.835660934448242, 29.55109405517578, -7.83953857421875, 67.03917694091797, 6.197910308837891, -7.6825714111328125, 47.64087677001953, 82.10630798339844, 43.0426139831543, -0.7920017242431641, 64.28195190429688, 39.238555908203125, 0.36843299865722656, 18.05612564086914, 36.06523513793945, 9.90329360961914, 33.69483947753906, 7.766273498535156, 35.506309509277344, 22.908226013183594, -5.931264877319336, 66.3453369140625, 32.34159851074219, -22.269851684570312, -23.331077575683594, 13.462406158447266, -10.4464111328125, 51.29686737060547, -24.630081176757812, 36.125030517578125, 57.0388069152832, 44.445648193359375, 54.272193908691406, 3.3855533599853516, 23.056838989257812, 22.04839324951172, -22.473426818847656, -14.576929092407227, 43.156494140625, 49.0943603515625, 27.99723243713379, -44.33953857421875, 4.3141021728515625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000368.npy"}
|
||||
{"epoch": 0.5563114134542706, "step": 369, "batch_size": 64, "mean": 17.495262145996094, "std": 29.39101791381836, "min": -47.71269607543945, "p10": -13.80169830322265, "median": 12.015617370605469, "p90": 58.01157341003418, "max": 84.21923828125, "pos_frac": 0.703125, "sample": [25.159759521484375, -28.710731506347656, 28.570837020874023, 30.191680908203125, -33.67759704589844, 45.41436767578125, 39.02114486694336, 0.9142684936523438, 34.34919738769531, 15.84482192993164, 5.8780517578125, 56.63816833496094, 58.3692626953125, 44.86940002441406, 57.74097442626953, 6.251094818115234, 25.069120407104492, 6.5114593505859375, 9.001380920410156, 19.868648529052734, 20.642608642578125, -47.71269607543945, 27.32159423828125, -35.49980545043945, 20.99738311767578, 41.88018798828125, -6.9474334716796875, 33.794063568115234, 35.1247673034668, -5.9663543701171875, 56.39753723144531, -8.674072265625, 8.326091766357422, 62.738868713378906, 27.98789405822754, 58.12754440307617, 80.51766967773438, 1.5033416748046875, -3.9424591064453125, -27.87793731689453, -1.9356231689453125, 5.4376678466796875, -2.71368408203125, 30.867406845092773, -0.6437416076660156, -3.0490951538085938, 84.21923828125, 4.667736053466797, 38.43963623046875, -4.405059814453125, -1.7244338989257812, 8.206565856933594, -1.0749931335449219, 58.50273132324219, 0.7422332763671875, -4.693910598754883, 0.287261962890625, 77.62394714355469, 10.755104064941406, 13.276130676269531, -15.999252319335938, 26.34308624267578, -38.51791000366211, 49.07166290283203], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000369.npy"}
|
||||
{"epoch": 0.5578231292517006, "step": 370, "batch_size": 64, "mean": 24.15358543395996, "std": 36.13318634033203, "min": -71.82548522949219, "p10": -12.194152450561523, "median": 19.941329956054688, "p90": 74.82325744628906, "max": 88.41333770751953, "pos_frac": 0.765625, "sample": [28.7185001373291, 76.49705505371094, 47.9757080078125, 20.4393310546875, 76.66673278808594, 2.7439327239990234, 83.91232299804688, 5.233489990234375, 88.41333770751953, 6.196338653564453, 2.5666351318359375, 3.3325576782226562, -0.3046417236328125, 49.754066467285156, 69.42152404785156, 28.945629119873047, 9.167465209960938, -23.482688903808594, -18.269973754882812, 27.1845703125, 19.443328857421875, 33.84928894042969, 30.212894439697266, 33.159889221191406, 66.09803771972656, 51.38960266113281, 54.15935516357422, 3.0358924865722656, 1.979736328125, -2.2199859619140625, -1.9953994750976562, -5.000144958496094, 6.257423400878906, -43.31098937988281, -11.025634765625, 48.2861328125, 65.17141723632812, 26.594284057617188, 77.49993896484375, 49.66136169433594, 75.35275268554688, -71.82548522949219, 15.658205032348633, 50.88507080078125, 2.7531871795654297, 75.46195220947266, 47.891845703125, 51.939178466796875, 64.00479888916016, 72.20413208007812, -69.72270202636719, 6.544378280639648, 58.109893798828125, -2.8495655059814453, -12.576118469238281, 6.2441864013671875, 6.326324462890625, -1.6324920654296875, -26.404617309570312, 1.276437759399414, 26.520591735839844, 73.5877685546875, -11.302898406982422, 19.024354934692383], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000370.npy"}
|
||||
{"epoch": 0.5593348450491308, "step": 371, "batch_size": 64, "mean": 21.980266571044922, "std": 31.851543426513672, "min": -47.630332946777344, "p10": -17.197770309448238, "median": 21.24755859375, "p90": 67.53963546752931, "max": 83.48223876953125, "pos_frac": 0.6875, "sample": [63.4620361328125, -37.12877655029297, 81.98099517822266, 33.217308044433594, 55.368072509765625, -2.53179931640625, 50.01539993286133, -5.356176376342773, -1.9330520629882812, 3.2971668243408203, -0.275238037109375, 36.49253845214844, -18.404560089111328, 82.54876708984375, 0.7056236267089844, 69.37422943115234, -1.1866931915283203, 61.4625244140625, 11.019111633300781, 79.67279052734375, 6.6138763427734375, -25.370849609375, -1.9670581817626953, 3.358814239501953, 33.97721862792969, 55.30771255493164, 51.710113525390625, 83.48223876953125, 63.578086853027344, 27.76898956298828, 35.915283203125, 44.64537811279297, -47.630332946777344, -2.5639572143554688, 23.50969123840332, 11.466110229492188, -0.13410186767578125, -21.453330993652344, -5.762969970703125, 70.33247375488281, -14.381927490234375, 17.507617950439453, 0.39721107482910156, 21.945701599121094, 34.103843688964844, -10.802894592285156, 32.098114013671875, 20.549415588378906, -3.4454574584960938, 25.960418701171875, 34.37248992919922, -7.124366760253906, 1.550567626953125, 47.84980773925781, 50.706024169921875, 0.15480422973632812, 31.604782104492188, 9.448234558105469, 42.620361328125, 27.450172424316406, 69.23744201660156, 49.87382507324219, -23.50872039794922, -20.014114379882812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000371.npy"}
|
||||
{"epoch": 0.5608465608465608, "step": 372, "batch_size": 64, "mean": 13.676551818847656, "std": 38.018306732177734, "min": -71.709716796875, "p10": -33.682619094848626, "median": 12.510673522949219, "p90": 71.07052650451664, "max": 87.01778411865234, "pos_frac": 0.71875, "sample": [-40.348106384277344, 8.803272247314453, 58.79077911376953, 28.77710723876953, 27.257095336914062, 21.61675262451172, 81.42865753173828, -5.850227355957031, 55.227928161621094, 39.805503845214844, 78.13887023925781, 0.5519027709960938, 47.431060791015625, 87.01778411865234, 14.906776428222656, 34.145851135253906, 51.804168701171875, -71.709716796875, 2.6459426879882812, 20.490814208984375, 5.02787971496582, 15.672517776489258, 17.520339965820312, 62.605892181396484, 7.540699005126953, -0.6440811157226562, -57.26832580566406, 20.037582397460938, 83.94206237792969, -25.34416389465332, 35.42169952392578, 20.3751220703125, -3.846965789794922, -8.347919464111328, -28.789897918701172, 18.342130661010742, -66.33032989501953, 6.949434280395508, 45.82881164550781, 19.982725143432617, 74.69822692871094, 9.141841888427734, 37.725860595703125, 21.104827880859375, -35.449501037597656, 18.903711318969727, -61.503089904785156, 0.5999393463134766, 0.3282890319824219, 6.355995178222656, 1.4272232055664062, -68.13361358642578, 51.82183837890625, 2.488363265991211, -13.40867805480957, -21.783647537231445, -29.559894561767578, 17.74011993408203, 76.32889556884766, 79.11322021484375, -15.29561996459961, -3.129058837890625, 10.114570617675781, 6.0620574951171875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000372.npy"}
|
||||
{"epoch": 0.562358276643991, "step": 373, "batch_size": 64, "mean": 16.784204483032227, "std": 38.13325500488281, "min": -60.74397277832031, "p10": -37.85023498535156, "median": 13.129871368408203, "p90": 69.65883026123048, "max": 92.79007720947266, "pos_frac": 0.703125, "sample": [-39.274566650390625, 39.416297912597656, 69.9508285522461, 21.967742919921875, 68.97750091552734, -34.52679443359375, -14.427726745605469, 63.144500732421875, 24.845321655273438, 18.893739700317383, 34.577945709228516, -46.92662811279297, 13.333572387695312, -28.24853515625, -39.66148376464844, -23.155040740966797, 12.614583969116211, 25.06732940673828, -54.14506530761719, -5.973930358886719, 26.511127471923828, -3.5956497192382812, 46.65814208984375, -34.413795471191406, 58.66310119628906, 70.66822052001953, -1.2195587158203125, 15.862136840820312, 51.309532165527344, 40.07958221435547, 84.11563110351562, 3.0784454345703125, 3.2846031188964844, 64.99656677246094, -31.64971923828125, 1.2554817199707031, -18.16351318359375, 8.7706298828125, -3.6091384887695312, 62.13674545288086, 56.29415512084961, -60.74397277832031, -44.440513610839844, 1.3394966125488281, 7.234039306640625, 35.274635314941406, -40.30493927001953, 72.19076538085938, 2.5463714599609375, 10.394744873046875, -5.9071197509765625, 73.86529541015625, 1.9662322998046875, 92.79007720947266, 15.951353073120117, 28.200458526611328, 30.106552124023438, 40.920928955078125, 11.839164733886719, 86.16690063476562, 19.6530704498291, 6.324916839599609, 12.926170349121094, 68.41217041015625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000373.npy"}
|
||||
{"epoch": 0.563869992441421, "step": 374, "batch_size": 64, "mean": 30.510740280151367, "std": 30.553667068481445, "min": -42.72862243652344, "p10": -4.889854431152342, "median": 26.86196517944336, "p90": 74.20169296264649, "max": 88.49122619628906, "pos_frac": 0.828125, "sample": [81.32247924804688, 54.25830078125, 40.048072814941406, -0.15021896362304688, 21.398700714111328, 70.00167846679688, 75.37376403808594, 70.46674346923828, 72.80754089355469, 22.556928634643555, 45.754920959472656, -0.0769195556640625, 52.4547119140625, 79.9950942993164, 43.43224334716797, -3.45745849609375, 71.69112396240234, 14.632911682128906, 2.371124267578125, 25.737808227539062, 20.521774291992188, 49.66139602661133, 14.805530548095703, -17.504486083984375, 48.98491668701172, 39.24919128417969, 22.393840789794922, 74.79918670654297, -0.7130508422851562, 27.209693908691406, 27.2386474609375, 44.3968505859375, 1.73828125, 3.3141860961914062, 37.78434753417969, 26.514236450195312, 24.510910034179688, -42.72862243652344, 51.518157958984375, 35.9367561340332, 8.48794174194336, -23.67694091796875, 4.305782318115234, 88.49122619628906, 80.78816223144531, 12.689117431640625, -6.548980712890625, 24.630416870117188, 31.789819717407227, 12.760292053222656, 79.21573638916016, 59.29143524169922, 65.20231628417969, 21.379196166992188, 19.151554107666016, 1.0911121368408203, 30.15304946899414, 35.85980224609375, 42.748138427734375, 67.80062866210938, -32.873374938964844, -5.5037384033203125, 11.541580200195312, -6.3382110595703125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000374.npy"}
|
||||
{"epoch": 0.5653817082388511, "step": 375, "batch_size": 64, "mean": 18.084556579589844, "std": 32.42041778564453, "min": -67.36537170410156, "p10": -22.17622833251953, "median": 15.58084487915039, "p90": 62.73844070434571, "max": 85.47229766845703, "pos_frac": 0.71875, "sample": [-14.364631652832031, -34.842506408691406, 9.914619445800781, 28.051490783691406, 28.840049743652344, 4.211143493652344, 78.97439575195312, 2.1826210021972656, 0.7848663330078125, 35.75259017944336, -5.22412109375, 53.98155212402344, -1.0431976318359375, 75.68852233886719, 4.586431503295898, -2.5902061462402344, 18.263032913208008, -33.25721740722656, 32.965476989746094, -3.736845016479492, 15.972251892089844, 55.81111145019531, 75.41975402832031, -12.060760498046875, 29.108184814453125, 74.28845977783203, 0.25864410400390625, -31.5401611328125, 19.49384307861328, 48.966060638427734, -2.5556488037109375, -1.17578125, 0.11972999572753906, -23.875831604003906, 35.39012145996094, 19.363927841186523, 28.51219940185547, 4.7323760986328125, 85.47229766845703, 15.189437866210938, 37.2696533203125, 21.18387222290039, 64.01641845703125, -67.36537170410156, 10.434432983398438, 2.1501312255859375, 59.638702392578125, 6.1453857421875, 50.515472412109375, 28.78396987915039, 71.81301879882812, 28.024002075195312, -18.210487365722656, 50.803489685058594, 6.962879180908203, 18.400489807128906, 31.744529724121094, -32.96487808227539, -8.145477294921875, 59.756492614746094, -4.2446441650390625, 49.295494079589844, 10.919120788574219, -35.54345703125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000375.npy"}
|
||||
{"epoch": 0.5668934240362812, "step": 376, "batch_size": 64, "mean": 27.252052307128906, "std": 37.042118072509766, "min": -58.0533561706543, "p10": -22.616673851013182, "median": 24.54652976989746, "p90": 76.88640518188477, "max": 100.51727294921875, "pos_frac": 0.78125, "sample": [26.615875244140625, 6.293979644775391, 1.0285930633544922, 16.925193786621094, 10.597434997558594, -23.804325103759766, -2.83587646484375, 72.51130676269531, 5.967521667480469, 21.080177307128906, 53.30931091308594, 54.383724212646484, -58.0533561706543, -2.0782737731933594, -19.758331298828125, 6.572242736816406, 83.46184539794922, 36.549530029296875, 31.023174285888672, 100.51727294921875, 23.90605926513672, 66.20768737792969, 1.357452392578125, 63.375396728515625, 86.44539642333984, -5.153114318847656, 75.30020904541016, 68.65805053710938, 60.985755920410156, 57.130393981933594, 41.828914642333984, 16.78164291381836, 82.17220306396484, 31.625808715820312, 25.858306884765625, 76.47685241699219, 19.256988525390625, -36.99477005004883, 79.47103881835938, -39.460174560546875, 27.878849029541016, -24.428377151489258, 54.32799530029297, 37.22290802001953, 29.124488830566406, 72.6064453125, -19.845487594604492, 43.20026397705078, -19.315902709960938, 74.78504180908203, -31.19512176513672, 24.361812591552734, 9.462173461914062, 14.583248138427734, 77.06192779541016, 8.718633651733398, 24.731246948242188, 15.979442596435547, 80.46632385253906, 52.37986755371094, -12.27532958984375, 13.599603652954102, 3.2009410858154297, -28.03656005859375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000376.npy"}
|
||||
{"epoch": 0.5684051398337112, "step": 377, "batch_size": 64, "mean": 20.025833129882812, "std": 32.35493087768555, "min": -52.48878479003906, "p10": -15.9630895614624, "median": 18.90534019470215, "p90": 56.98869018554688, "max": 83.73284912109375, "pos_frac": 0.703125, "sample": [45.4462776184082, -16.478965759277344, 51.456382751464844, 57.06774139404297, 29.176544189453125, 1.158599853515625, -36.084632873535156, -47.40789794921875, 24.30187225341797, 36.03118896484375, 83.73284912109375, -1.9497146606445312, 53.98975372314453, -0.4965057373046875, 54.276695251464844, 49.588340759277344, -32.855613708496094, 52.51831817626953, -25.10310935974121, 10.21063232421875, 60.3205680847168, -1.2577857971191406, 46.690311431884766, 11.79022216796875, 31.421274185180664, -51.70489501953125, 9.556114196777344, 78.44032287597656, 11.67697525024414, 8.154298782348633, 55.037681579589844, -5.972408294677734, 37.7532958984375, 0.4937419891357422, 10.254638671875, -13.267934799194336, 15.20892333984375, 37.76850128173828, 54.54973602294922, 27.902847290039062, -1.5174140930175781, 59.74537658691406, 2.5888214111328125, 30.309616088867188, 18.4317626953125, 73.38795471191406, 19.378917694091797, -14.65719985961914, -4.332950592041016, 31.787450790405273, 53.43464660644531, -0.08111572265625, 5.592260360717773, 83.71749877929688, -0.7703704833984375, -14.759378433227539, 56.804237365722656, 42.73539733886719, -52.48878479003906, 12.235908508300781, 25.342191696166992, -9.964317321777344, 26.343521118164062, 24.99410629272461], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000377.npy"}
|
||||
{"epoch": 0.5699168556311414, "step": 378, "batch_size": 64, "mean": 18.119808197021484, "std": 32.62609100341797, "min": -75.7938232421875, "p10": -8.15634536743164, "median": 7.60932731628418, "p90": 65.20102882385254, "max": 87.45857238769531, "pos_frac": 0.71875, "sample": [47.46260070800781, 26.76319122314453, 4.422773361206055, 3.312103271484375, 41.28693389892578, 59.79187774658203, -10.546493530273438, -3.690753936767578, 25.104461669921875, 67.42166137695312, -75.7938232421875, 2.8882293701171875, 84.7821044921875, 1.294656753540039, 53.55876159667969, 41.396644592285156, 14.105127334594727, 4.535512924194336, 81.97317504882812, -1.5118064880371094, 51.32366180419922, 28.165969848632812, 7.056888580322266, 19.592758178710938, 63.8557243347168, -3.098419189453125, 17.25476837158203, 23.56005859375, -33.53337097167969, 7.791744232177734, 40.59089660644531, 16.92645263671875, -17.35059356689453, 0.4353504180908203, 41.860260009765625, 3.1669998168945312, -2.5943832397460938, -0.5009880065917969, 53.026580810546875, -3.3211822509765625, 87.45857238769531, 24.466506958007812, 4.5715179443359375, 65.777587890625, 7.426910400390625, 43.86964416503906, -4.7761993408203125, 78.37077331542969, 24.679710388183594, 3.428922653198242, 1.5511283874511719, 3.6106643676757812, 8.64621353149414, -44.5517578125, -2.8895645141601562, 44.362579345703125, -8.251914978027344, -7.933349609375, 1.7454643249511719, 18.082077026367188, -2.1662139892578125, -47.262699127197266, -1.8237628936767578, 78.50879669189453], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000378.npy"}
|
||||
{"epoch": 0.5714285714285714, "step": 379, "batch_size": 64, "mean": 24.281211853027344, "std": 33.18741226196289, "min": -44.73368835449219, "p10": -17.513615798950195, "median": 20.076218605041504, "p90": 72.14760742187501, "max": 93.41732025146484, "pos_frac": 0.796875, "sample": [26.873126983642578, -25.075836181640625, 71.37986755371094, -2.8134613037109375, 22.322158813476562, 21.059722900390625, 35.58074188232422, -44.73368835449219, 25.048717498779297, 4.677253723144531, 3.4415664672851562, 8.0650634765625, -22.622222900390625, 40.88386535644531, 3.7929534912109375, 91.17471313476562, 38.51966857910156, -12.554458618164062, 52.67210388183594, 59.498565673828125, 87.25186157226562, 35.8443603515625, 72.47663879394531, 39.16114807128906, 5.278350830078125, -21.7584228515625, 32.32548522949219, -25.718002319335938, 39.026893615722656, -24.87713623046875, 93.41732025146484, 26.721120834350586, 5.484594345092773, 11.744464874267578, -15.219680786132812, 51.126312255859375, 74.82147979736328, 1.67315673828125, 4.678680419921875, -18.49673080444336, -9.718681335449219, 11.028877258300781, 16.54192352294922, 85.69314575195312, 69.33979034423828, 1.6628971099853516, 83.940185546875, 64.98709106445312, 2.484691619873047, -0.5421600341796875, 63.72343444824219, -0.4067230224609375, 3.478790283203125, 19.092714309692383, 61.09027099609375, 0.7107696533203125, 31.137779235839844, 34.7496337890625, 12.04720687866211, 1.3055419921875, 49.118247985839844, 41.510772705078125, 36.893638610839844, 1.9753570556640625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000379.npy"}
|
||||
{"epoch": 0.5729402872260015, "step": 380, "batch_size": 64, "mean": 27.52959442138672, "std": 36.09291458129883, "min": -65.27616882324219, "p10": -16.355300712585446, "median": 28.54096221923828, "p90": 72.43125762939454, "max": 87.08832550048828, "pos_frac": 0.8125, "sample": [14.641311645507812, 78.27120971679688, 22.356399536132812, 7.7220306396484375, -39.079933166503906, -40.296897888183594, 64.25987243652344, 70.510009765625, -18.356441497802734, 46.80284118652344, 35.74668884277344, 4.686595916748047, -2.7214527130126953, 44.21895980834961, 15.91635513305664, -32.67742919921875, 0.4964866638183594, 20.912914276123047, 66.44319915771484, 76.65412902832031, 51.29920959472656, 29.402481079101562, -23.334632873535156, -65.27616882324219, 72.93148803710938, 9.411201477050781, 47.39373016357422, 40.831626892089844, -11.685972213745117, 12.422765731811523, 43.38726806640625, 50.854278564453125, 24.988693237304688, 19.482318878173828, 82.33367919921875, 3.7034530639648438, 59.9718017578125, 48.78205871582031, 7.9339141845703125, 14.054588317871094, 46.32078552246094, -61.36113739013672, 67.7240982055664, 85.19480895996094, 17.077192306518555, 59.16046905517578, 29.07854461669922, 42.056556701660156, -4.6053466796875, 57.398468017578125, 3.5329818725585938, 4.139432907104492, -8.411468505859375, 87.08832550048828, 66.32081604003906, 71.26405334472656, 1.8712024688720703, 68.67926025390625, 42.78727340698242, 73.47874450683594, 28.003379821777344, 31.683069229125977, 5.748439788818359, -7.7305145263671875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000380.npy"}
|
||||
{"epoch": 0.5744520030234316, "step": 381, "batch_size": 64, "mean": 11.643007278442383, "std": 33.85009002685547, "min": -86.72993469238281, "p10": -25.469785118103026, "median": 4.502658843994141, "p90": 59.99015350341797, "max": 85.90594482421875, "pos_frac": 0.65625, "sample": [-5.7908935546875, -0.5846405029296875, 2.749753952026367, -24.985673904418945, -86.72993469238281, 3.3830909729003906, 2.2031402587890625, 7.983612060546875, 85.25617218017578, 20.99858856201172, 11.337364196777344, -10.774799346923828, 1.026702880859375, 13.022785186767578, -8.05833625793457, 83.66567993164062, 38.99113464355469, 0.8274555206298828, 16.38079833984375, -0.5624408721923828, 0.5783977508544922, -32.078041076660156, 6.486175537109375, 60.254791259765625, -40.71215057373047, 44.25605010986328, -23.647674560546875, -25.677261352539062, 47.06396484375, -0.8974399566650391, 20.759777069091797, 59.37266540527344, 11.617286682128906, 7.453948974609375, -8.811954498291016, 60.70551300048828, 9.437950134277344, -9.951404571533203, -43.31114196777344, 10.385498046875, 68.6044921875, -8.252426147460938, 17.025726318359375, 3.775236129760742, 20.79625701904297, 85.90594482421875, 53.50944519042969, 28.54247283935547, -1.2122020721435547, 3.1301651000976562, 2.1603012084960938, 51.60359191894531, 55.952728271484375, -11.327804565429688, 10.71636962890625, 5.15118408203125, 54.619384765625, -46.212860107421875, -16.26659393310547, 17.541288375854492, 3.8541336059570312, -29.306625366210938, -4.257682800292969, 75.4754409790039], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000381.npy"}
|
||||
{"epoch": 0.5759637188208617, "step": 382, "batch_size": 64, "mean": 18.832473754882812, "std": 29.944677352905273, "min": -59.676841735839844, "p10": -14.35667152404785, "median": 15.287446975708008, "p90": 65.07368850708008, "max": 89.59468841552734, "pos_frac": 0.75, "sample": [1.0015869140625, -11.022726058959961, 77.61553955078125, 24.569486618041992, 65.7657241821289, -4.474998474121094, 1.544403076171875, 10.540390014648438, 46.05342102050781, 30.987516403198242, 22.92303466796875, 3.804168701171875, -16.794525146484375, 13.763275146484375, 59.47234344482422, 3.7653732299804688, -16.92559814453125, -8.97625732421875, -59.676841735839844, 42.03643798828125, -1.34197998046875, 23.966278076171875, 20.328201293945312, 0.11834716796875, -17.470321655273438, 13.075042724609375, 31.19124412536621, 11.711074829101562, -19.74327850341797, 87.51998901367188, 22.808937072753906, 6.659824371337891, -14.766838073730469, -21.20599937438965, 74.53169250488281, 1.3144607543945312, -8.735599517822266, -11.425796508789062, 5.837577819824219, 63.45893859863281, 23.43444061279297, 89.59468841552734, 4.1116180419921875, 26.139053344726562, 9.670787811279297, 48.23268127441406, -7.480430603027344, -10.15384292602539, 0.5054779052734375, 34.60143280029297, 25.90460205078125, 47.596031188964844, -13.399616241455078, 16.81161880493164, 35.80142593383789, 1.7644615173339844, 40.95977783203125, 17.095354080200195, 69.11067199707031, 77.54547119140625, 25.397533416748047, 37.229949951171875, 21.754016876220703, 29.24767303466797], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000382.npy"}
|
||||
{"epoch": 0.5774754346182918, "step": 383, "batch_size": 64, "mean": 24.513856887817383, "std": 36.77443313598633, "min": -53.25482177734375, "p10": -21.198365592956538, "median": 20.562920570373535, "p90": 76.61189727783204, "max": 99.70230102539062, "pos_frac": 0.75, "sample": [0.5612316131591797, 59.59593200683594, 41.317718505859375, 2.541717529296875, 28.922645568847656, -15.288843154907227, 35.236392974853516, 21.219600677490234, 6.309700012207031, -26.579973220825195, 6.215396881103516, -30.109786987304688, 24.816219329833984, 20.863176345825195, 48.41963195800781, 15.6256103515625, 50.26068115234375, -26.698516845703125, 77.09286499023438, 56.1844367980957, 22.167396545410156, -2.6185150146484375, 70.68067932128906, 10.481048583984375, 73.86128234863281, 26.666961669921875, 23.325820922851562, -10.826446533203125, 10.217924118041992, 83.74505615234375, 77.81149291992188, -14.916975021362305, 14.081459045410156, 99.70230102539062, -3.3049278259277344, 31.443588256835938, 73.52388763427734, -53.25482177734375, 8.091262817382812, 31.65526580810547, -25.798439025878906, 13.649429321289062, 20.262664794921875, 35.27690124511719, 79.52532958984375, -13.536556243896484, 2.7186203002929688, 11.991363525390625, -23.73101806640625, 81.85596466064453, 3.7920970916748047, 92.83818817138672, 70.95576477050781, 2.1430206298828125, -1.1924667358398438, 70.9518814086914, 70.40655517578125, 8.236480712890625, -1.7476043701171875, -50.089508056640625, 60.06596755981445, -13.702590942382812, 75.48963928222656, 29.48564338684082], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000383.npy"}
|
||||
{"epoch": 0.5789871504157218, "step": 384, "batch_size": 64, "mean": 22.22673225402832, "std": 31.73046875, "min": -56.462310791015625, "p10": -17.753263854980467, "median": 19.35252571105957, "p90": 69.27762069702149, "max": 91.02783203125, "pos_frac": 0.765625, "sample": [-16.515243530273438, 69.46232604980469, 18.115306854248047, 33.73292541503906, 0.9049034118652344, 47.62620544433594, 49.644432067871094, 40.83606719970703, 69.95401000976562, -9.025760650634766, 12.9354248046875, 38.327545166015625, 7.85302734375, 2.9637680053710938, -9.696617126464844, 7.7982635498046875, -0.0930938720703125, 85.1575698852539, -2.162078857421875, 77.61869812011719, -21.50318145751953, 11.817474365234375, 20.767105102539062, -18.409515380859375, -22.122589111328125, 36.053070068359375, 85.22229766845703, 20.589744567871094, -38.47266387939453, 36.64512634277344, 80.57666015625, 40.1546630859375, 1.520467758178711, 14.26655387878418, 24.830047607421875, -3.4991455078125, 91.02783203125, 4.788246154785156, -56.462310791015625, 65.55380249023438, 32.28607177734375, 25.198135375976562, 60.572410583496094, -18.283843994140625, 68.84664154052734, 53.77520751953125, 25.70181655883789, 3.481365203857422, 9.171672821044922, 34.687896728515625, 31.150848388671875, 2.889739990234375, 28.05999183654785, 9.342361450195312, 3.3876953125, -0.3783302307128906, -23.713890075683594, 33.793487548828125, -2.8076858520507812, 7.949489593505859, 43.222808837890625, 45.621360778808594, 40.141021728515625, 9.63323974609375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000384.npy"}
|
||||
{"epoch": 0.5804988662131519, "step": 385, "batch_size": 64, "mean": 26.842784881591797, "std": 31.59510040283203, "min": -39.29203796386719, "p10": -8.70073013305664, "median": 26.325321197509766, "p90": 73.51552429199221, "max": 91.01882934570312, "pos_frac": 0.765625, "sample": [-30.088302612304688, 1.0137138366699219, 75.06886291503906, 23.2537841796875, 26.43639373779297, 13.261886596679688, 10.891250610351562, 35.08636474609375, 7.0804290771484375, 35.593536376953125, -15.794002532958984, 42.4962158203125, 32.92535400390625, 65.53433227539062, 25.778465270996094, 56.51795959472656, 76.58553314208984, 58.39396667480469, -2.763338088989258, 27.13378143310547, -8.350799560546875, -29.280868530273438, 10.110824584960938, 78.09028625488281, 76.45457458496094, 64.29774475097656, 26.686901092529297, 30.724014282226562, 30.53899383544922, 39.32242202758789, 29.01433563232422, -23.059898376464844, 14.048454284667969, -0.3721771240234375, 38.241085052490234, 77.78733825683594, 63.57267761230469, 46.306007385253906, 24.96702003479004, 26.000049591064453, -9.670299530029297, -8.850700378417969, 33.119140625, 26.214248657226562, -2.5293121337890625, 68.37174987792969, 68.63571166992188, -39.29203796386719, 4.560890197753906, -3.784820556640625, -2.66253662109375, 19.6722412109375, -1.62420654296875, 2.6973876953125, 34.43684387207031, 38.75151062011719, 2.4073257446289062, 69.89106750488281, -5.1219635009765625, 91.01882934570312, 0.4510765075683594, 3.56573486328125, 78.7998046875, 69.37535095214844], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000385.npy"}
|
||||
{"epoch": 0.582010582010582, "step": 386, "batch_size": 64, "mean": 16.507038116455078, "std": 31.102447509765625, "min": -74.4840087890625, "p10": -15.954823303222655, "median": 15.331016540527344, "p90": 55.64936523437501, "max": 89.78619384765625, "pos_frac": 0.671875, "sample": [44.63421630859375, -28.337976455688477, -3.8180274963378906, 47.581398010253906, 47.6168212890625, -8.012451171875, -1.7001800537109375, 21.83881378173828, 19.32235336303711, 34.34899139404297, 3.1838226318359375, 4.8497314453125, -48.06877136230469, 3.9836273193359375, -0.7879104614257812, 62.979774475097656, 36.35546875, 66.4232177734375, 56.48260498046875, 25.2095947265625, 15.212480545043945, 49.248085021972656, 15.919700622558594, -17.521785736083984, -8.779876708984375, 32.29372024536133, 73.44535827636719, -13.48443603515625, 39.633636474609375, 30.124267578125, 4.070793151855469, 23.345413208007812, -8.973533630371094, -15.7403564453125, -12.694198608398438, -14.203788757324219, 8.266807556152344, 15.347061157226562, 28.517234802246094, 67.43000030517578, -0.8964996337890625, -14.259193420410156, 36.849952697753906, 22.825225830078125, 65.01038360595703, -17.014209747314453, -3.955291748046875, 15.034156799316406, -0.7258186340332031, 89.78619384765625, 2.033519744873047, -16.046737670898438, -44.66307830810547, 41.11170959472656, 53.70513916015625, 33.91950225830078, 19.517913818359375, 26.245277404785156, 52.41596984863281, 46.366905212402344, 15.314971923828125, 7.112827301025391, 5.703948974609375, -74.4840087890625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000386.npy"}
|
||||
{"epoch": 0.5835222978080121, "step": 387, "batch_size": 64, "mean": 25.30979347229004, "std": 31.611083984375, "min": -46.13487243652344, "p10": -6.919490814208984, "median": 17.854053497314453, "p90": 77.21457901000977, "max": 96.52418518066406, "pos_frac": 0.796875, "sample": [15.6373291015625, 2.8805198669433594, 54.57105255126953, -4.090229034423828, 4.077796936035156, 78.46833801269531, 24.162811279296875, 1.8569374084472656, 22.68910026550293, 96.52418518066406, 25.49631690979004, -1.9418621063232422, 5.8177490234375, 11.322700500488281, 77.01986694335938, 2.6413955688476562, 17.747451782226562, 56.01087951660156, 61.58127212524414, -3.2679443359375, 31.367340087890625, 76.216796875, 36.01786804199219, -6.766963958740234, -11.466163635253906, 49.16242980957031, 6.846946716308594, 6.139806747436523, 76.78665161132812, 30.495336532592773, -1.3644371032714844, -7.3004302978515625, 13.075370788574219, 78.94808959960938, -10.826812744140625, 47.18584442138672, 32.48466873168945, 5.149881362915039, 5.892345428466797, -24.991928100585938, 1.7422657012939453, 23.686660766601562, 47.08189392089844, -46.13487243652344, -14.057479858398438, 25.556041717529297, 44.896705627441406, 90.86214447021484, 5.17884635925293, 77.29802703857422, -6.984859466552734, 35.542457580566406, 9.33404541015625, 32.60783386230469, 22.178939819335938, 13.656595230102539, -2.1874923706054688, 7.304468154907227, 30.707015991210938, 79.12879943847656, 94.80828857421875, 17.01302146911621, 30.388463973999023, 17.960655212402344], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000387.npy"}
|
||||
{"epoch": 0.5850340136054422, "step": 388, "batch_size": 64, "mean": 17.26071548461914, "std": 34.616798400878906, "min": -63.622230529785156, "p10": -22.66728096008301, "median": 13.3529052734375, "p90": 65.16118927001953, "max": 90.43931579589844, "pos_frac": 0.6875, "sample": [65.36361694335938, 41.660675048828125, 53.55345916748047, 90.43931579589844, 14.816734313964844, 32.73345947265625, 26.116790771484375, 5.9825286865234375, -2.4904098510742188, -13.323204040527344, 19.3387451171875, 59.61723327636719, -4.618831634521484, 33.12571716308594, -12.510520935058594, 83.81289672851562, -5.795520782470703, 78.37428283691406, 6.870216369628906, 37.29542922973633, 51.53619384765625, 2.6967620849609375, 54.66621398925781, 0.07161712646484375, 60.26393127441406, -63.622230529785156, 12.89678955078125, 32.960670471191406, 8.32042121887207, 5.770364761352539, -1.6445236206054688, -6.355430603027344, -4.2699127197265625, 64.60108947753906, -8.107587814331055, 13.80902099609375, 19.062061309814453, -22.336135864257812, 27.207456588745117, -41.468505859375, 35.45885467529297, 70.33430480957031, -40.564544677734375, 64.68885803222656, 59.40906524658203, 18.627578735351562, -8.057258605957031, 5.2232208251953125, 24.420974731445312, 71.7721176147461, 4.019264221191406, 74.82939147949219, -12.24774169921875, 0.50689697265625, -48.852691650390625, -22.809200286865234, 14.097793579101562, 19.157257080078125, -8.04519271850586, 0.07086563110351562, 28.92713165283203, -29.182876586914062, -37.575469970703125, 4.056278228759766], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000388.npy"}
|
||||
{"epoch": 0.5865457294028723, "step": 389, "batch_size": 64, "mean": 25.965023040771484, "std": 36.54664993286133, "min": -70.24067687988281, "p10": -10.163470458984374, "median": 27.124448776245117, "p90": 72.33959350585938, "max": 90.22259521484375, "pos_frac": 0.6875, "sample": [-23.014373779296875, -8.284177780151367, 45.53279113769531, 39.55964660644531, 43.07155990600586, 31.847686767578125, 55.78835678100586, 66.3194351196289, 90.22259521484375, 45.27777099609375, -25.651885986328125, -15.590423583984375, 42.00779724121094, 42.70269775390625, 15.824905395507812, 35.34516143798828, 32.08451843261719, 68.47696685791016, 22.29193115234375, 22.40121078491211, 71.5142822265625, 48.07411193847656, -0.14917564392089844, 9.560379028320312, 65.71690368652344, -3.5094528198242188, 60.680152893066406, -3.8211517333984375, 8.974105834960938, 32.7485466003418, -1.7884979248046875, 56.24348831176758, -8.98175048828125, 70.07028198242188, 81.2572250366211, 8.147430419921875, 89.14692687988281, 49.38269805908203, -2.219341278076172, -70.24067687988281, 72.69329833984375, -6.942386627197266, 7.377511978149414, -5.233257293701172, 39.167266845703125, 1.58209228515625, 67.2294921875, -30.615779876708984, -65.04119873046875, 8.38916015625, -10.669921875, 4.803152084350586, 56.124298095703125, -2.5311717987060547, -8.512351989746094, 80.3470458984375, 83.25907897949219, 45.87567138671875, 17.212875366210938, 78.78201293945312, -1.2238712310791016, 46.32220458984375, 2.9167213439941406, -6.571117401123047], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000389.npy"}
|
||||
{"epoch": 0.5880574452003023, "step": 390, "batch_size": 64, "mean": 22.029890060424805, "std": 32.344696044921875, "min": -56.48690414428711, "p10": -16.12719383239746, "median": 16.96172332763672, "p90": 65.78188095092774, "max": 94.36267852783203, "pos_frac": 0.78125, "sample": [-31.99273681640625, 24.135093688964844, 27.33757781982422, 7.7671051025390625, 65.1716079711914, 22.771522521972656, 54.361263275146484, 59.27693176269531, -18.466856002807617, 38.312171936035156, -1.3997325897216797, -29.9229736328125, 55.76213073730469, 3.3358211517333984, -2.1589279174804688, 94.36267852783203, 60.11927795410156, -17.16427230834961, 36.02960205078125, -13.707344055175781, 56.416015625, 3.2741165161132812, 9.137428283691406, 79.18824005126953, 52.2244873046875, 62.79815673828125, 7.105865478515625, -33.572853088378906, 6.743770599365234, 67.83959197998047, -2.6396636962890625, 6.234652519226074, 0.11151313781738281, 19.44464874267578, -5.646938323974609, -7.573265075683594, -56.48690414428711, 50.682891845703125, 71.66758728027344, 0.4382953643798828, 77.58625030517578, 6.599313735961914, 10.166034698486328, 24.054094314575195, 1.5885696411132812, 27.80690574645996, 9.598663330078125, 14.089622497558594, 66.04342651367188, 21.381004333496094, 51.767860412597656, 30.736621856689453, 2.6263275146484375, 29.557613372802734, 2.5510005950927734, 26.80577850341797, 3.7138824462890625, -24.79864501953125, 14.478797912597656, -8.109189987182617, 31.278724670410156, 38.120018005371094, 42.974205017089844, 87.978515625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000390.npy"}
|
||||
{"epoch": 0.5895691609977324, "step": 391, "batch_size": 64, "mean": 25.871173858642578, "std": 38.908851623535156, "min": -66.37063598632812, "p10": -20.685305023193358, "median": 26.67224884033203, "p90": 79.9113983154297, "max": 99.316162109375, "pos_frac": 0.765625, "sample": [5.5965728759765625, -59.73561096191406, 82.83857727050781, 27.20843505859375, 32.861083984375, 71.41919708251953, 5.039375305175781, 75.55986022949219, 81.44889831542969, 97.00775146484375, 20.76398468017578, -66.37063598632812, 64.58940124511719, -1.7102508544921875, 46.234073638916016, 82.47743225097656, -45.04589080810547, 16.37787628173828, -18.879653930664062, 10.502983093261719, 1.6404037475585938, 52.15647888183594, 27.5386962890625, 40.70825958251953, 10.518608093261719, -7.902135848999023, -53.0283203125, -7.438072204589844, 45.3929443359375, 4.6677398681640625, 99.316162109375, 50.625762939453125, 33.072105407714844, -8.028587341308594, 29.76776123046875, -1.3946533203125, 20.260530471801758, 37.001678466796875, 39.97935485839844, 54.944252014160156, 31.00958251953125, 40.37659454345703, -32.60478973388672, -23.15760040283203, 26.136062622070312, 19.00125503540039, 79.85802459716797, 1.230682373046875, 70.42388153076172, 79.93427276611328, -19.756103515625, -14.546073913574219, 75.02033996582031, 77.83612823486328, 84.26116180419922, 39.26932144165039, 46.40833282470703, 17.14944839477539, -21.083534240722656, 3.2845706939697266, 7.79510498046875, 21.460880279541016, 34.84562683105469, 13.61956787109375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000391.npy"}
|
||||
{"epoch": 0.5910808767951625, "step": 392, "batch_size": 64, "mean": 28.844318389892578, "std": 39.07586669921875, "min": -58.337677001953125, "p10": -19.98342742919921, "median": 20.743019104003906, "p90": 82.81034851074219, "max": 112.00421142578125, "pos_frac": 0.796875, "sample": [-55.64136505126953, 0.9448318481445312, 22.74396514892578, 21.9925537109375, 4.100685119628906, -1.1175460815429688, 83.29086303710938, -11.6859130859375, 56.650634765625, 69.18216705322266, 25.599700927734375, -58.337677001953125, 51.47157287597656, 72.43806457519531, 89.12317657470703, 18.936729431152344, 7.965972900390625, 64.89167785644531, 81.68914794921875, 16.168472290039062, 9.058006286621094, 5.13592529296875, 35.49897003173828, -36.80757141113281, 66.80116271972656, 63.53290557861328, 87.158935546875, 22.9442138671875, 44.80047607421875, -27.803009033203125, 19.493484497070312, 80.81251525878906, 54.60929870605469, 10.799942016601562, -27.363525390625, 33.02250671386719, 12.562150955200195, 55.95907211303711, 34.399940490722656, 69.66007232666016, 95.96630096435547, 91.94384002685547, -29.190683364868164, 60.484771728515625, 0.33058738708496094, 1.9086017608642578, -0.015375137329101562, 1.3276214599609375, 91.31828308105469, 18.791610717773438, -4.499504089355469, 112.00421142578125, -0.19163131713867188, 14.91961669921875, -23.539505004882812, 5.202724456787109, 66.69549560546875, -6.162933349609375, 5.512908935546875, 68.59762573242188, 8.496393203735352, 45.13958740234375, 9.451194763183594, 36.8614501953125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000392.npy"}
|
||||
{"epoch": 0.5925925925925926, "step": 393, "batch_size": 64, "mean": 27.956777572631836, "std": 33.344459533691406, "min": -37.20973205566406, "p10": -7.239076805114745, "median": 20.076274871826172, "p90": 77.62594375610352, "max": 93.59549713134766, "pos_frac": 0.78125, "sample": [40.23286437988281, -0.8208236694335938, 69.53974151611328, 81.37886047363281, 83.4129867553711, 43.86304473876953, 17.741291046142578, 21.175865173339844, 21.15277099609375, 16.304668426513672, -4.884553909301758, 1.9577255249023438, 85.37372589111328, 58.27484130859375, 24.189586639404297, -13.799293518066406, 6.310997009277344, 63.48921203613281, -1.856597900390625, 10.366031646728516, -0.4465484619140625, 2.182554244995117, -19.199615478515625, 93.59549713134766, 30.405044555664062, 75.01939392089844, -7.54283332824707, 75.40543365478516, 60.507415771484375, 67.50021362304688, 35.59174346923828, -21.740482330322266, 65.42561340332031, 78.57759094238281, 2.7018356323242188, 24.51811981201172, -37.20973205566406, 82.09136962890625, 3.239521026611328, 0.07374763488769531, 29.002307891845703, 9.262405395507812, 33.68476867675781, -6.334095001220703, 18.999778747558594, 36.8213996887207, 68.12933349609375, 13.642349243164062, 13.395183563232422, 29.960601806640625, 53.94377136230469, 91.43468475341797, 48.72001647949219, -6.530311584472656, 7.851100921630859, 46.23188018798828, 7.6548919677734375, 16.2919921875, -34.16865539550781, -9.425521850585938, 12.041170120239258, 62.395362854003906, -3.6893768310546875, 15.819839477539062], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000393.npy"}
|
||||
{"epoch": 0.5941043083900227, "step": 394, "batch_size": 64, "mean": 22.846038818359375, "std": 30.406597137451172, "min": -51.61506271362305, "p10": -12.73932666778564, "median": 19.817657470703125, "p90": 65.56610870361328, "max": 92.1004409790039, "pos_frac": 0.828125, "sample": [-1.8507232666015625, 88.76853942871094, 72.4521484375, 9.934547424316406, 6.770477294921875, 14.52008056640625, 2.02215576171875, 14.683456420898438, 69.22138214111328, 28.82716941833496, 66.01318359375, 34.71946334838867, 3.4033241271972656, 8.891780853271484, -22.289936065673828, 32.0496826171875, 2.999980926513672, -9.392797470092773, -14.173553466796875, 9.685916900634766, 49.81290817260742, -21.36138153076172, 25.724834442138672, 44.46644592285156, 40.33197021484375, 84.32664489746094, 43.86756896972656, 57.451690673828125, 28.646902084350586, 1.7147216796875, -32.38521194458008, 8.655899047851562, 81.97572326660156, 42.62205505371094, 16.666776657104492, 25.464691162109375, -0.6549034118652344, 19.21636199951172, 22.872100830078125, 7.881290435791016, 50.79307174682617, -15.574291229248047, 0.910552978515625, 24.92053985595703, 42.24018859863281, 92.1004409790039, -51.61506271362305, 64.52293395996094, -8.670211791992188, 28.72624969482422, -39.32257843017578, 44.156898498535156, 16.13317108154297, 18.71917724609375, 21.584075927734375, 10.083452224731445, 29.021150588989258, 42.759681701660156, 3.757526397705078, 6.7900543212890625, 3.5713348388671875, 58.14849853515625, 20.41895294189453, 33.41730499267578], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000394.npy"}
|
||||
{"epoch": 0.5956160241874527, "step": 395, "batch_size": 64, "mean": 19.264225006103516, "std": 33.05228042602539, "min": -52.744712829589844, "p10": -15.198721885681149, "median": 9.768329620361328, "p90": 62.65423049926758, "max": 88.44842529296875, "pos_frac": 0.71875, "sample": [44.591064453125, 60.429046630859375, 32.932586669921875, 1.6265678405761719, 62.142181396484375, 16.755706787109375, -1.569997787475586, 37.106544494628906, 84.3527603149414, -11.036142349243164, 1.8218917846679688, 37.439117431640625, 56.078643798828125, 57.675254821777344, 6.8187103271484375, 86.75759887695312, -3.3866043090820312, 41.36963653564453, -37.090057373046875, 6.8890228271484375, -3.2352752685546875, 8.95086669921875, 6.334564208984375, 17.917613983154297, 48.98368453979492, -48.036643981933594, 14.883544921875, 62.873680114746094, 19.96407699584961, -0.6773910522460938, 21.077028274536133, 88.44842529296875, 10.482437133789062, 9.054222106933594, -21.889022827148438, 18.106033325195312, -29.828731536865234, 1.9336814880371094, 61.09734344482422, -12.493860244750977, -3.7685928344726562, -0.012117385864257812, 86.1593246459961, 7.798549652099609, 17.93771743774414, 47.87855529785156, 7.2518768310546875, 63.98637390136719, -16.357948303222656, -5.022336959838867, 2.5949039459228516, 85.31733703613281, -11.111129760742188, 1.7876396179199219, 15.320587158203125, 58.57862854003906, -20.02163314819336, 13.43603515625, 25.394323348999023, -52.744712829589844, 6.858583450317383, -2.9459152221679688, 3.1107864379882812, 45.833824157714844], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000395.npy"}
|
||||
{"epoch": 0.5971277399848829, "step": 396, "batch_size": 64, "mean": 24.426685333251953, "std": 32.99622344970703, "min": -64.1927719116211, "p10": -13.027926635742185, "median": 17.265892028808594, "p90": 70.01120147705079, "max": 102.75651550292969, "pos_frac": 0.78125, "sample": [89.7459487915039, 102.75651550292969, 63.44207763671875, 85.58421325683594, 66.21498107910156, 51.91663360595703, 5.329944610595703, 63.554500579833984, 19.516738891601562, 15.322906494140625, 34.912132263183594, 10.566192626953125, 46.428550720214844, -13.972309112548828, 18.21379852294922, -0.4645862579345703, 16.316802978515625, 48.85832977294922, -0.7777309417724609, 19.049074172973633, 25.348892211914062, -15.926994323730469, 91.5434799194336, 27.257247924804688, 16.31798553466797, 23.18767547607422, 36.52981185913086, -64.1927719116211, 57.23150634765625, 29.825786590576172, 5.763269424438477, -23.023700714111328, 1.6232662200927734, 63.03692626953125, 2.662750244140625, 8.773704528808594, 26.257736206054688, 2.080841064453125, 85.29468536376953, 71.63815307617188, -4.900966644287109, 15.012447357177734, -0.5915470123291016, 32.96672058105469, 14.492881774902344, 32.82160949707031, 26.475975036621094, 0.20975494384765625, 37.89888000488281, -14.339977264404297, 13.103652954101562, -10.82436752319336, 10.069303512573242, 3.26123046875, -7.450916290283203, 58.9605598449707, 30.200624465942383, 62.09129333496094, 3.3367137908935547, 11.38691520690918, -24.400543212890625, -16.853496551513672, -2.416168212890625, 79.05227661132812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000396.npy"}
|
||||
{"epoch": 0.5986394557823129, "step": 397, "batch_size": 64, "mean": 12.380790710449219, "std": 38.75839614868164, "min": -81.94073486328125, "p10": -33.14506301879882, "median": 10.97615909576416, "p90": 60.79943847656251, "max": 93.66293334960938, "pos_frac": 0.65625, "sample": [1.4486331939697266, 7.170402526855469, -49.15773010253906, 18.478302001953125, -27.308929443359375, 25.279800415039062, 40.41005325317383, 25.788124084472656, 58.45054626464844, 6.5450592041015625, 3.2996883392333984, 33.50418472290039, -0.44794464111328125, 55.617225646972656, 51.274505615234375, 26.738746643066406, -81.94073486328125, -47.5194091796875, 12.784717559814453, -9.345726013183594, 51.69927978515625, -13.277206420898438, 23.245187759399414, 61.80610656738281, 9.296539306640625, -77.68165588378906, -26.509536743164062, 5.036224365234375, 62.25243377685547, 48.568206787109375, 23.88330078125, 13.683298110961914, 55.438194274902344, 11.459716796875, 67.9005126953125, 2.7586669921875, 11.93570327758789, 46.444488525390625, 44.36859893798828, -0.9484386444091797, -3.38677978515625, 71.00608825683594, 65.4717788696289, 25.082672119140625, -81.94050598144531, 12.519264221191406, 92.40103149414062, -9.191314697265625, -15.190750122070312, -1.2673263549804688, 3.203022003173828, 25.526100158691406, -0.568145751953125, 4.714641571044922, -0.12015151977539062, -9.760368347167969, 20.119888305664062, 10.49260139465332, -22.070594787597656, 52.75798797607422, -75.85942077636719, -35.646263122558594, -2.014965057373047, 93.66293334960938], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000397.npy"}
|
||||
{"epoch": 0.600151171579743, "step": 398, "batch_size": 64, "mean": 27.88081932067871, "std": 33.572776794433594, "min": -33.78147888183594, "p10": -4.809406280517578, "median": 20.670032501220703, "p90": 79.58779907226564, "max": 105.484375, "pos_frac": 0.765625, "sample": [42.45927429199219, 41.58437728881836, 56.1008415222168, 28.55780029296875, 70.83118438720703, 3.9605865478515625, 92.21731567382812, 16.959228515625, -4.378150939941406, 86.21420288085938, 56.63677978515625, 16.653656005859375, -7.7248382568359375, -1.5640945434570312, 57.977333068847656, -16.633743286132812, 47.530616760253906, -4.734710693359375, 13.340789794921875, -18.97408676147461, 37.515174865722656, 3.845365524291992, 90.39877319335938, 91.09219360351562, -1.3025856018066406, 75.30716705322266, 81.42235565185547, 34.05870056152344, 22.30802345275879, 8.800498962402344, -4.8103790283203125, 21.13677978515625, 8.19598388671875, 61.14324951171875, 3.6339492797851562, -11.106796264648438, 9.660125732421875, 65.00914764404297, 1.9677734375, 23.743637084960938, -17.504928588867188, 0.6357879638671875, 64.27485656738281, 15.284271240234375, 34.20492172241211, 58.81597137451172, 8.58741569519043, 2.5067672729492188, -0.15653610229492188, 30.335765838623047, -4.375801086425781, 105.484375, 21.303375244140625, -4.807136535644531, 30.98870849609375, -4.782356262207031, 71.1719970703125, -33.78147888183594, 20.203285217285156, 1.8130321502685547, 22.26517105102539, 66.86332702636719, 85.28964233398438, 10.718538284301758], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000398.npy"}
|
||||
{"epoch": 0.6016628873771731, "step": 399, "batch_size": 64, "mean": 31.081968307495117, "std": 36.7212028503418, "min": -43.61944580078125, "p10": -15.225567626953122, "median": 26.056434631347656, "p90": 85.19503402709961, "max": 120.1729736328125, "pos_frac": 0.765625, "sample": [36.89927673339844, 18.85546112060547, 50.39179229736328, 56.162506103515625, -17.544189453125, 30.810546875, 58.29600524902344, 12.93880844116211, -1.4015846252441406, -7.072486877441406, -19.155075073242188, 70.23016357421875, 37.657203674316406, -39.43033981323242, 25.41693115234375, 26.695938110351562, 27.636734008789062, 17.951324462890625, 16.93321990966797, 85.14352416992188, -16.730209350585938, 2.2015228271484375, -43.61944580078125, 45.77638626098633, 91.72219848632812, 32.054962158203125, 13.660648345947266, 120.1729736328125, 13.701873779296875, 8.096675872802734, 6.867042541503906, -1.6832275390625, 19.023040771484375, 4.785671234130859, 13.5870361328125, -18.245361328125, 28.910316467285156, 71.84461975097656, 28.56922149658203, 81.41789245605469, 37.237342834472656, -0.4479942321777344, 86.60295104980469, -2.3328857421875, 76.25033569335938, 39.98558807373047, 29.65520477294922, 25.297306060791016, -11.714736938476562, -18.303665161132812, 78.09259796142578, 89.3450927734375, 85.21710968017578, 58.68700408935547, 68.73146057128906, 61.901519775390625, 48.10957336425781, 102.7174301147461, 14.656234741210938, -6.428722381591797, 94.0176773071289, -1.9768733978271484, 21.456214904785156, 22.960594177246094], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000399.npy"}
|
||||
{"epoch": 0.6031746031746031, "step": 400, "batch_size": 64, "mean": 19.171371459960938, "std": 36.207576751708984, "min": -58.58769989013672, "p10": -17.318622779846187, "median": 10.35848331451416, "p90": 74.13057632446291, "max": 95.82096862792969, "pos_frac": 0.703125, "sample": [5.712802886962891, 27.394073486328125, 7.3906402587890625, -12.835250854492188, 37.965728759765625, 63.928279876708984, 5.019981384277344, 53.65498352050781, 51.434356689453125, 78.83283233642578, 12.722835540771484, -18.4152889251709, 17.671817779541016, 1.5334892272949219, 0.45790863037109375, 17.595474243164062, -14.759735107421875, 76.3831558227539, 51.590736389160156, 51.0723991394043, -22.525833129882812, 10.895557403564453, 36.10711669921875, -9.161520004272461, -7.6566162109375, 33.93939208984375, 9.821409225463867, 16.530242919921875, 4.984073638916016, -2.0550670623779297, -49.23451232910156, -2.1343612670898438, 39.44798278808594, 7.141548156738281, 92.95338439941406, 9.130870819091797, -50.7691650390625, 1.4400749206542969, 86.46205139160156, 43.815208435058594, 68.87455749511719, 53.827293395996094, -58.58769989013672, 44.021087646484375, 0.07909393310546875, 13.116775512695312, 1.06500244140625, -0.5758514404296875, -0.27083587646484375, 95.82096862792969, 14.941051483154297, 56.57044219970703, -12.548637390136719, 19.839988708496094, -10.20709228515625, 94.01018524169922, 91.02215576171875, 31.995651245117188, 5.807308197021484, -22.65411376953125, -43.08301544189453, -3.622325897216797, -10.65625, 34.69883728027344], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000400.npy"}
|
||||
{"epoch": 0.6046863189720333, "step": 401, "batch_size": 64, "mean": 30.528560638427734, "std": 34.10566329956055, "min": -24.828372955322266, "p10": -7.652694892883298, "median": 24.269437789916992, "p90": 84.40006408691407, "max": 99.70740509033203, "pos_frac": 0.8125, "sample": [-2.7219161987304688, 38.53919982910156, 63.269264221191406, 41.45574951171875, -4.887353897094727, 0.39104461669921875, 6.6620025634765625, 19.175434112548828, -10.932453155517578, -3.5555191040039062, 82.88277435302734, 2.0150604248046875, -24.828372955322266, 0.11527252197265625, 40.73334503173828, -11.165657043457031, 42.80438232421875, 73.65069580078125, 60.010650634765625, 85.05033111572266, 27.21277618408203, 25.596782684326172, 5.436347961425781, 12.063907623291016, -3.306671142578125, -17.60779571533203, 22.942092895507812, 35.285438537597656, 17.828102111816406, 21.4449462890625, 56.35176086425781, 14.400138854980469, 10.593538284301758, 71.64080047607422, -2.2231979370117188, 95.68695831298828, 1.1430492401123047, 50.31989288330078, 7.553956985473633, 60.42840576171875, 2.4891128540039062, 89.88597106933594, 27.280975341796875, -16.816125869750977, 79.01637268066406, 9.982769012451172, 44.786834716796875, 99.70740509033203, 65.06558227539062, 61.2657470703125, 27.461326599121094, 3.4317169189453125, 18.10464859008789, 6.5153350830078125, -11.001697540283203, 37.299034118652344, 31.69803237915039, 12.848176956176758, 94.43830108642578, 99.3958511352539, 25.82692527770996, -8.837841033935547, 43.3027229309082, 99.2254638671875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000401.npy"}
|
||||
{"epoch": 0.6061980347694633, "step": 402, "batch_size": 64, "mean": 23.725364685058594, "std": 36.519081115722656, "min": -53.75556182861328, "p10": -25.437621498107905, "median": 17.760149002075195, "p90": 78.54328842163086, "max": 100.03407287597656, "pos_frac": 0.78125, "sample": [38.34520721435547, 80.90386199951172, 50.5964241027832, 68.20185089111328, 41.46931457519531, 50.33350372314453, 100.03407287597656, 6.945526123046875, 38.297332763671875, 39.736305236816406, 6.8875732421875, 93.2721939086914, 14.312475204467773, -30.087615966796875, 18.256637573242188, 22.612403869628906, 0.27759552001953125, 17.263660430908203, 11.792129516601562, 12.75823974609375, 71.97938537597656, 0.6809272766113281, 5.219970703125, -53.75556182861328, 77.09921264648438, -1.5722846984863281, 28.417503356933594, 53.37684631347656, 22.502887725830078, 7.0170745849609375, 82.30088806152344, 82.22101593017578, -3.26690673828125, 16.027023315429688, 1.48681640625, 4.378107070922852, 79.16217803955078, 66.94668579101562, 15.9259033203125, 20.621166229248047, -27.829580307006836, -31.111156463623047, -38.7977294921875, 45.112342834472656, 8.592529296875, 7.366315841674805, 24.091310501098633, -12.902816772460938, -1.343109130859375, -3.843585968017578, 49.959205627441406, 66.36953735351562, 57.775516510009766, 84.15696716308594, 0.345855712890625, -1.3992290496826172, 1.5296707153320312, -43.620086669921875, -45.5911865234375, 35.92332458496094, 19.781543731689453, 62.867408752441406, -19.85638427734375, 21.869163513183594], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000402.npy"}
|
||||
{"epoch": 0.6077097505668935, "step": 403, "batch_size": 64, "mean": 30.217971801757812, "std": 39.43573760986328, "min": -53.46183776855469, "p10": -18.096949768066406, "median": 20.023587226867676, "p90": 85.92483139038086, "max": 114.68426513671875, "pos_frac": 0.8125, "sample": [-1.7908554077148438, -24.375, 90.55636596679688, 10.89556884765625, 85.7194595336914, 10.535812377929688, 102.82452392578125, 52.28999710083008, 15.397705078125, 16.139053344726562, 50.32231140136719, -9.363525390625, 73.05107116699219, 2.6957740783691406, 84.11077880859375, 33.82714080810547, 20.86833381652832, 4.484354019165039, 8.986083984375, 3.669788360595703, -8.81782341003418, -23.067058563232422, 114.68426513671875, 83.9615707397461, 2.5508575439453125, 56.70912551879883, 79.86121368408203, 2.485136032104492, 85.17118835449219, 15.248878479003906, -0.7631263732910156, 21.94499969482422, 21.339584350585938, 13.409835815429688, 41.839996337890625, -19.880977630615234, 13.896230697631836, 90.67427062988281, 11.107492446899414, 11.000152587890625, 14.330169677734375, 0.7292766571044922, 91.30419921875, 28.003536224365234, 56.959259033203125, -19.020172119140625, 55.93998336791992, 61.695594787597656, 31.663188934326172, 0.37347412109375, 54.09043884277344, 87.14990234375, -52.435447692871094, -24.255786895751953, 59.3875732421875, -53.46183776855469, 79.47567749023438, 40.07582092285156, 27.104839324951172, 86.01284790039062, -15.942764282226562, 2.8121871948242188, 19.17884063720703, 58.57884216308594], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000403.npy"}
|
||||
{"epoch": 0.6092214663643235, "step": 404, "batch_size": 64, "mean": 29.694786071777344, "std": 39.489845275878906, "min": -55.38184356689453, "p10": -14.008938980102538, "median": 20.418899536132812, "p90": 83.80830993652344, "max": 108.517333984375, "pos_frac": 0.765625, "sample": [20.43701171875, 19.56890106201172, 46.73767852783203, 10.954833984375, 17.785614013671875, -4.661651611328125, 53.259925842285156, 4.020185470581055, -8.432947158813477, 16.737350463867188, 83.25591278076172, 4.6508636474609375, -31.67055892944336, -55.38184356689453, -0.6304244995117188, 77.07403564453125, -14.570693969726562, -6.716037750244141, 87.05439758300781, 73.51899719238281, 53.50407028198242, 65.1767807006836, 32.20100402832031, 101.97124481201172, -26.569122314453125, 55.72443389892578, -12.698177337646484, 67.88156127929688, 57.72748947143555, 69.77545166015625, 4.644046783447266, 62.28571701049805, -5.554162979125977, 57.33440399169922, -7.854301452636719, 9.594825744628906, 18.603492736816406, 67.85527801513672, 48.43108367919922, 7.371807098388672, -4.3387298583984375, 10.789627075195312, 43.46205520629883, 22.34259033203125, 84.04505157470703, 18.568164825439453, 53.461883544921875, 84.24475860595703, 98.60238647460938, 16.644691467285156, 3.2213783264160156, 108.517333984375, 99.32449340820312, 20.400787353515625, 51.47126388549805, 48.590049743652344, 72.59423828125, -40.419151306152344, -27.479598999023438, -52.65910339355469, 23.40239715576172, 36.791751861572266, 3.1964874267578125, 5.292945861816406], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000404.npy"}
|
||||
{"epoch": 0.6107331821617535, "step": 405, "batch_size": 64, "mean": 22.857553482055664, "std": 36.1323127746582, "min": -43.48518371582031, "p10": -17.738286972045895, "median": 13.33829116821289, "p90": 79.89716567993165, "max": 96.83306884765625, "pos_frac": 0.703125, "sample": [71.1552734375, -11.039051055908203, -19.93100357055664, 78.57329559326172, 13.057228088378906, -0.6775283813476562, 73.3888168334961, -18.970123291015625, 1.8273811340332031, -31.627090454101562, 2.125396728515625, 4.124473571777344, 0.7131443023681641, -10.017786026000977, 37.18830871582031, 6.1057281494140625, 89.54875183105469, -5.269006729125977, -5.2147979736328125, 55.01637268066406, 80.46453857421875, 57.76092529296875, 11.199745178222656, 36.09501647949219, -2.4712562561035156, 20.999347686767578, 53.43505859375, 90.35740661621094, -14.864002227783203, 47.581260681152344, 9.330459594726562, -3.013416290283203, 51.409759521484375, 52.90892791748047, -10.364395141601562, 54.07270050048828, 8.961650848388672, -43.48518371582031, 26.409954071044922, 22.501129150390625, 9.990547180175781, -12.707986831665039, 30.324722290039062, 18.410554885864258, 1.3705978393554688, 12.715803146362305, 16.8555908203125, 21.84259033203125, 94.211669921875, 57.2647705078125, -38.27132797241211, 13.619354248046875, 43.10064697265625, 11.650566101074219, 85.51013946533203, 96.83306884765625, -36.20759582519531, 25.543384552001953, 48.646026611328125, -14.24169921875, -3.3952255249023438, 82.1956787109375, 43.32112121582031, -25.06700897216797], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000405.npy"}
|
||||
{"epoch": 0.6122448979591837, "step": 406, "batch_size": 64, "mean": 29.43549156188965, "std": 39.95695495605469, "min": -79.00159454345703, "p10": -10.283313941955564, "median": 23.756725311279297, "p90": 85.58768692016604, "max": 100.86085510253906, "pos_frac": 0.75, "sample": [73.59349822998047, 9.235313415527344, -11.001089096069336, -7.472419738769531, 46.96192932128906, 65.43604278564453, 87.12055206298828, 66.27432250976562, -19.468107223510742, 93.31153869628906, 48.514892578125, 64.03886413574219, 6.334789276123047, -0.2883338928222656, 68.4947280883789, 71.00331115722656, 75.72488403320312, 11.594623565673828, 22.223060607910156, 82.01100158691406, 53.33033752441406, 13.095451354980469, 46.965274810791016, 54.74341583251953, 14.536773681640625, 37.91471862792969, -7.874460220336914, 39.52978515625, -0.27631187438964844, 18.001781463623047, -1.082021713256836, -0.2651405334472656, -55.788578033447266, -6.15667724609375, 34.18507385253906, 76.66444396972656, 14.600168228149414, 89.62476348876953, -33.5795783996582, 93.63008880615234, 48.7662353515625, 16.64929962158203, 10.208244323730469, 100.86085510253906, 2.697052001953125, 91.35072326660156, -46.59871292114258, 5.523714065551758, 41.85746765136719, 93.29956817626953, 18.296125411987305, -8.608505249023438, 57.71342468261719, -33.275604248046875, 4.688362121582031, -79.00159454345703, 26.754364013671875, 46.82762145996094, -7.511802673339844, 6.692619323730469, 48.81504821777344, 17.731597900390625, 25.290390014648438, 59.40217590332031], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000406.npy"}
|
||||
{"epoch": 0.6137566137566137, "step": 407, "batch_size": 64, "mean": 22.641036987304688, "std": 36.098472595214844, "min": -48.8172607421875, "p10": -17.84391059875488, "median": 17.465855598449707, "p90": 72.06331939697266, "max": 103.86136627197266, "pos_frac": 0.671875, "sample": [3.0894546508789062, 16.36980628967285, -20.349288940429688, 55.06108856201172, -4.294944763183594, 77.77130126953125, 83.37550354003906, -12.218147277832031, -43.43314743041992, 51.96349334716797, 21.960861206054688, 8.04336929321289, -2.4541702270507812, -1.834259033203125, -39.857940673828125, 4.501708984375, 22.887428283691406, 53.98023223876953, 92.15182495117188, 75.90968322753906, 18.561904907226562, 3.864725112915039, 35.411563873291016, 33.95704650878906, 8.049392700195312, 36.28167724609375, 0.7104110717773438, -9.455169677734375, 52.9154052734375, 103.86136627197266, 58.14043426513672, 1.2289047241210938, 71.4981689453125, 69.0452880859375, 42.532501220703125, -0.19336700439453125, -18.540618896484375, 42.653961181640625, -40.20806884765625, 22.75127410888672, -2.5789337158203125, 55.148529052734375, 65.56153869628906, 39.66569519042969, 81.77516174316406, 72.30552673339844, 59.625816345214844, -2.2330856323242188, 28.873046875, -0.69158935546875, 10.690567016601562, -35.487579345703125, -48.8172607421875, 6.06500244140625, -12.433637619018555, 25.97597885131836, -2.696168899536133, 63.53499984741211, -16.218257904052734, -2.8155250549316406, -4.069725036621094, 33.0057373046875, 47.68932342529297, 11.460556030273438], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000407.npy"}
|
||||
{"epoch": 0.6152683295540439, "step": 408, "batch_size": 64, "mean": 34.35175323486328, "std": 37.42862319946289, "min": -61.7763671875, "p10": -7.418622398376464, "median": 31.408493995666504, "p90": 87.46909484863282, "max": 98.23672485351562, "pos_frac": 0.8125, "sample": [13.419939041137695, 85.55473327636719, 75.01629638671875, 36.64046859741211, -6.724880218505859, -29.771543502807617, -61.7763671875, 5.469234466552734, 66.13381958007812, 26.1765193939209, 37.52684783935547, 88.28953552246094, -0.9832515716552734, 51.47334289550781, 55.936859130859375, 76.48629760742188, 12.627391815185547, -28.362083435058594, 52.64150619506836, 11.0367431640625, 45.155120849609375, 2.4661788940429688, -4.479372024536133, 49.32652282714844, 95.20712280273438, -25.956558227539062, 22.681434631347656, 91.84664916992188, 62.414344787597656, 50.5113525390625, 92.95138549804688, -0.2797584533691406, 57.30906677246094, 4.832643508911133, 13.872024536132812, 83.84927368164062, 22.868160247802734, -7.715940475463867, 93.76547241210938, 10.849929809570312, 22.98663330078125, 73.95196533203125, 84.62799072265625, 98.23672485351562, 71.63642883300781, 9.093847274780273, 70.19536590576172, -8.929025650024414, 11.22463607788086, 16.656784057617188, 55.01368713378906, 49.95607376098633, 91.20442962646484, -34.58512878417969, 38.31694793701172, 43.56242370605469, 20.045902252197266, -2.5586605072021484, 23.752857208251953, 12.053661346435547, 47.371009826660156, 22.87704086303711, 7.6280975341796875, 45.906158447265625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000408.npy"}
|
||||
{"epoch": 0.6167800453514739, "step": 409, "batch_size": 64, "mean": 18.17670440673828, "std": 38.221378326416016, "min": -76.01158142089844, "p10": -20.56653461456299, "median": 7.466184616088867, "p90": 75.1461235046387, "max": 99.8231201171875, "pos_frac": 0.640625, "sample": [1.3333206176757812, 5.155893325805664, -17.474037170410156, -3.6762847900390625, 2.8700828552246094, -7.964237213134766, 19.603227615356445, -15.92679214477539, 8.172077178955078, -17.483253479003906, -3.7480850219726562, 93.39836883544922, 3.070920944213867, 99.8231201171875, 52.70083236694336, 6.31036376953125, 45.4921875, -4.642118453979492, -18.096229553222656, 92.1522445678711, 66.92296600341797, -24.145397186279297, 62.94694519042969, 60.383026123046875, 6.613800048828125, -20.625396728515625, 3.8342437744140625, 23.524063110351562, 44.67329406738281, -2.659576416015625, -21.817108154296875, 26.041275024414062, 39.72058868408203, -15.902435302734375, 47.746131896972656, -4.75520133972168, 77.36122131347656, 69.9775619506836, 19.181381225585938, 10.489309310913086, -27.52292251586914, -10.069816589355469, 14.83465576171875, -29.040653228759766, 6.577033996582031, -13.316177368164062, -1.321563720703125, 22.604267120361328, -44.90116882324219, 96.55111694335938, 14.686115264892578, 41.04311752319336, -0.31453704833984375, 96.20785522460938, -76.01158142089844, 6.760292053222656, 15.825775146484375, 13.015026092529297, 51.64573669433594, 55.569732666015625, 99.77963256835938, -20.429189682006836, 26.255462646484375, 14.298542022705078], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000409.npy"}
|
||||
{"epoch": 0.618291761148904, "step": 410, "batch_size": 64, "mean": 27.560718536376953, "std": 30.61851692199707, "min": -34.29191970825195, "p10": -5.2386924743652346, "median": 22.542949676513672, "p90": 73.36785812377931, "max": 104.17962646484375, "pos_frac": 0.8125, "sample": [19.213226318359375, 18.61614418029785, 12.722064971923828, 2.649188995361328, 92.68939208984375, -1.5309906005859375, 44.696781158447266, 25.38857650756836, -5.278900146484375, -34.29191970825195, 50.196311950683594, 44.59931945800781, 1.5373821258544922, -17.061981201171875, 58.281192779541016, 32.30607604980469, 51.87558364868164, -0.0500946044921875, 28.542226791381836, 78.48943328857422, 93.32350158691406, 26.09192657470703, -5.144874572753906, 51.07976531982422, 6.0048065185546875, 44.34236145019531, 3.7192840576171875, 46.63893127441406, 25.686439514160156, -4.638786315917969, 53.36180114746094, -7.337654113769531, 13.38467788696289, 80.09725952148438, 25.105384826660156, 104.17962646484375, 43.04878234863281, 6.088855743408203, 11.353837966918945, 56.75413513183594, -9.345298767089844, -17.029830932617188, 17.425588607788086, 14.808822631835938, 58.980003356933594, 50.47645568847656, 69.68282318115234, 0.82220458984375, 9.15066909790039, 19.980514526367188, 12.78265380859375, 27.538238525390625, 92.08172607421875, 34.20111083984375, 3.9491729736328125, 27.70020294189453, -0.4237556457519531, 6.773916244506836, 4.0207977294921875, 43.512630462646484, 74.94715881347656, 42.14775848388672, -10.531173706054688, 13.504539489746094], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000410.npy"}
|
||||
{"epoch": 0.6198034769463341, "step": 411, "batch_size": 64, "mean": 29.52808380126953, "std": 41.14615249633789, "min": -87.99475860595703, "p10": -12.025259399414063, "median": 26.567468643188477, "p90": 89.4001434326172, "max": 112.8638916015625, "pos_frac": 0.6875, "sample": [-13.690452575683594, 82.23423767089844, 73.44208526611328, 81.783935546875, 53.734100341796875, 4.0967254638671875, -11.663894653320312, -87.99475860595703, 38.960548400878906, -1.3566703796386719, 34.57872009277344, 93.1356430053711, 53.035072326660156, 49.14381790161133, 93.8116455078125, 41.113670349121094, 38.53664016723633, -12.180130004882812, 30.047569274902344, 25.543731689453125, 26.18213653564453, 87.28865051269531, 80.92129516601562, 6.44764518737793, -8.735061645507812, 44.09765625, 1.8843460083007812, 24.790088653564453, -8.87313461303711, -29.3370361328125, -0.43310546875, -11.032587051391602, -20.49793243408203, 28.217247009277344, 26.952800750732422, 15.406139373779297, -2.6924610137939453, 95.6357421875, -10.586898803710938, 112.8638916015625, 62.266021728515625, 19.56316375732422, -10.460464477539062, 82.99583435058594, 68.15103912353516, 91.02098083496094, -28.121109008789062, 19.415481567382812, -1.4522514343261719, -21.84747314453125, 17.89821434020996, -6.5117950439453125, 90.30506896972656, 68.88386535644531, 37.2978515625, 64.62004852294922, 40.30217742919922, 14.560949325561523, 112.27352905273438, 33.70939636230469, -11.521690368652344, 2.5179290771484375, -9.139373779296875, 28.258291244506836], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000411.npy"}
|
||||
{"epoch": 0.6213151927437641, "step": 412, "batch_size": 64, "mean": 27.005107879638672, "std": 37.09235382080078, "min": -32.005088806152344, "p10": -14.75986213684082, "median": 18.248035430908203, "p90": 79.26767578125, "max": 111.42864990234375, "pos_frac": 0.703125, "sample": [-32.005088806152344, 76.38184356689453, 33.92632293701172, 102.49783325195312, 30.733741760253906, 43.354957580566406, 8.693954467773438, 72.83029174804688, -15.968074798583984, 3.8437843322753906, 49.674583435058594, 8.679862976074219, 27.21453857421875, 98.8636703491211, -10.561088562011719, -10.515327453613281, 65.544921875, -22.442012786865234, 9.734199523925781, 0.3395538330078125, 78.69439697265625, 38.02293395996094, 18.279396057128906, 31.2078857421875, 18.2166748046875, -15.099212646484375, 29.56646728515625, 4.342891693115234, 35.72763442993164, 9.155303955078125, 9.789865493774414, 89.88264465332031, 4.619327545166016, 111.42864990234375, -11.460792541503906, -0.8807449340820312, -6.433837890625, -11.093097686767578, 64.15674591064453, -20.330543518066406, -6.636026382446289, -2.1660919189453125, 37.06935119628906, -22.08206558227539, 5.898159027099609, 79.6790771484375, 79.51336669921875, 63.099754333496094, 101.12911987304688, -13.96804428100586, 31.721725463867188, 42.4881477355957, 40.25986099243164, -2.5415573120117188, 28.770641326904297, 76.9039306640625, -2.8460330963134766, 17.60556983947754, -17.707530975341797, 37.43830871582031, 72.78939819335938, 3.611478805541992, 68.82457733154297, -9.143218994140625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000412.npy"}
|
||||
{"epoch": 0.6228269085411943, "step": 413, "batch_size": 64, "mean": 23.077537536621094, "std": 43.0744514465332, "min": -91.82453918457031, "p10": -29.11774387359619, "median": 22.428038597106934, "p90": 77.1014778137207, "max": 111.20567321777344, "pos_frac": 0.71875, "sample": [-0.07841300964355469, -47.258636474609375, 36.399574279785156, 16.380146026611328, -6.145179748535156, -4.239662170410156, 8.032058715820312, 100.24942016601562, -29.34963035583496, 75.33349609375, -73.25888061523438, 66.46212005615234, 43.13568878173828, -23.220352172851562, 4.606407165527344, 7.265541076660156, -31.073535919189453, 89.66305541992188, -0.07483482360839844, 24.109582901000977, 108.14219665527344, 41.93596649169922, 19.55560302734375, 38.23991775512695, 43.78797912597656, 18.869726181030273, 111.20567321777344, 40.66607666015625, 9.918739318847656, 35.183738708496094, 37.17554473876953, 56.80900573730469, 38.065975189208984, 25.007781982421875, 101.89393615722656, -35.15643310546875, 27.045791625976562, -8.152053833007812, 38.903724670410156, 1.86138916015625, -57.782196044921875, 74.94503021240234, 77.85918426513672, -28.576675415039062, 31.142730712890625, -24.697877883911133, -91.82453918457031, 6.645866394042969, 21.846078872680664, 8.800148010253906, 1.6830978393554688, -9.66888427734375, 54.557891845703125, -5.854038238525391, 12.248825073242188, 55.9705810546875, 23.009998321533203, 71.20111083984375, 14.569568634033203, 50.502403259277344, -25.569725036621094, 53.68665313720703, 88.60025024414062, 65.76873779296875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000413.npy"}
|
||||
{"epoch": 0.6243386243386243, "step": 414, "batch_size": 64, "mean": 30.46070098876953, "std": 38.90776824951172, "min": -81.7701416015625, "p10": -9.62547454833984, "median": 27.569947242736816, "p90": 86.93080368041994, "max": 107.53397369384766, "pos_frac": 0.734375, "sample": [66.75557708740234, -11.02203369140625, 60.52496337890625, 24.448028564453125, -0.25247955322265625, 23.1435489654541, -14.66650390625, 69.50814819335938, 33.97416687011719, 5.2240142822265625, 21.9013671875, 58.4483528137207, 76.7475357055664, -21.95306396484375, 81.08960723876953, 39.0762825012207, -38.19403076171875, -17.49999237060547, 5.6636199951171875, 32.634376525878906, 28.826784133911133, 28.168764114379883, 107.53397369384766, 20.846458435058594, 95.07228088378906, -0.2473297119140625, 43.06736373901367, 20.521385192871094, 52.01582336425781, 65.61600494384766, 89.43417358398438, -2.668792724609375, 52.26002502441406, -6.3668365478515625, 91.86940002441406, 39.283843994140625, 73.77234649658203, 42.90260314941406, 45.088592529296875, 79.75233459472656, 29.1134033203125, -5.543701171875, 6.800235748291016, 0.3563709259033203, 40.207542419433594, -2.009857177734375, -0.17351531982421875, 50.97955322265625, -81.7701416015625, 16.935935974121094, 26.97113037109375, 0.166259765625, -4.929250717163086, 21.77248764038086, 102.71626281738281, 103.51248931884766, 15.051803588867188, 5.04150390625, 50.7743034362793, 43.07775115966797, -29.339675903320312, -4.101408004760742, -2.6712303161621094, 104.245849609375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000414.npy"}
|
||||
{"epoch": 0.6258503401360545, "step": 415, "batch_size": 64, "mean": 27.21916961669922, "std": 41.258460998535156, "min": -76.81722259521484, "p10": -18.345240783691406, "median": 21.75989532470703, "p90": 85.88291320800785, "max": 108.61653900146484, "pos_frac": 0.75, "sample": [-4.632789611816406, 60.6718635559082, 28.834054946899414, 43.181278228759766, 18.263713836669922, 12.787406921386719, 68.99421691894531, 98.7953872680664, -16.423049926757812, 69.83843231201172, 34.19884490966797, -37.388755798339844, 64.18704223632812, 75.74134063720703, 3.734771728515625, 95.80615234375, 108.61653900146484, 21.265296936035156, 65.10316467285156, -4.7317047119140625, -7.4555511474609375, 40.54328918457031, 25.214431762695312, -1.2004585266113281, 20.22991943359375, 99.07617950439453, -64.36566162109375, 16.321022033691406, 45.67747497558594, 48.490081787109375, -11.59320068359375, 55.69783020019531, 37.48841094970703, -6.5267486572265625, -67.30896759033203, 100.47943115234375, 55.78350830078125, 68.3968734741211, 67.3671875, -19.169036865234375, 40.414085388183594, -23.471389770507812, -23.94416046142578, -76.81722259521484, 15.51763916015625, 43.562232971191406, 1.990966796875, 4.951942443847656, 11.914802551269531, 2.2120532989501953, 2.299459457397461, 35.982505798339844, 22.254493713378906, 6.594432830810547, 12.415946960449219, 28.850528717041016, -1.6047897338867188, 50.61870574951172, 50.484588623046875, -0.41834259033203125, 20.97742462158203, 14.270195007324219, 102.75231170654297, 90.22930145263672], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000415.npy"}
|
||||
{"epoch": 0.6273620559334845, "step": 416, "batch_size": 64, "mean": 21.66620445251465, "std": 41.64846420288086, "min": -70.4676284790039, "p10": -24.327703857421874, "median": 14.617523193359375, "p90": 83.9215026855469, "max": 123.64187622070312, "pos_frac": 0.703125, "sample": [3.8149566650390625, -70.4676284790039, -11.534626007080078, 21.916610717773438, 12.963211059570312, 91.58958435058594, -3.4597320556640625, -23.814987182617188, 77.03399658203125, 105.66189575195312, 10.969192504882812, 86.873291015625, 123.64187622070312, 5.283210754394531, -1.1814308166503906, 38.165679931640625, 4.414209365844727, 54.12290954589844, -63.226104736328125, -35.781646728515625, 16.943960189819336, 47.510658264160156, 22.970247268676758, 75.74115753173828, -0.8842449188232422, 41.18487548828125, 14.108795166015625, 106.47482299804688, 104.36115264892578, 52.33293914794922, 15.126251220703125, 36.96075439453125, 9.450248718261719, -45.97956848144531, 6.746965408325195, -6.420989990234375, -24.547439575195312, -43.3902587890625, 7.230634689331055, 4.790668487548828, -21.053058624267578, 21.333717346191406, -14.357879638671875, 2.6897125244140625, 16.727210998535156, -11.115001678466797, 63.14752960205078, -27.148406982421875, 1.1278724670410156, 47.60884094238281, 26.796905517578125, 6.049127578735352, -10.573333740234375, 54.850345611572266, -0.24950408935546875, 17.572818756103516, -6.219276428222656, 23.448532104492188, 42.668968200683594, 23.10926055908203, 64.58634948730469, 67.0021743774414, 93.5487060546875, 37.38939666748047], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000416.npy"}
|
||||
{"epoch": 0.6288737717309146, "step": 417, "batch_size": 64, "mean": 31.503847122192383, "std": 41.83521270751953, "min": -94.23138427734375, "p10": -15.26795139312744, "median": 34.73992729187012, "p90": 86.02593612670898, "max": 110.39329528808594, "pos_frac": 0.765625, "sample": [15.612876892089844, -1.6769218444824219, -3.8891372680664062, -86.0322036743164, 87.93338775634766, 69.796630859375, -94.23138427734375, 91.74982452392578, 27.323101043701172, 52.63566589355469, 110.39329528808594, 10.096359252929688, 14.496482849121094, 44.30370330810547, 53.02849578857422, -50.31095886230469, 86.29737854003906, 35.47991180419922, 4.7318115234375, 49.96455383300781, 60.950355529785156, 16.44525146484375, 41.67261505126953, 27.0162353515625, 30.330604553222656, -7.6254730224609375, 72.4803466796875, 20.94708251953125, 18.436893463134766, 47.02220916748047, -16.0173397064209, 80.45675659179688, 97.44375610351562, -21.881515502929688, 52.127288818359375, -9.33026123046875, 24.87872314453125, -0.21783447265625, 78.69843292236328, 50.636802673339844, 85.39257049560547, 11.499153137207031, 86.9422607421875, 41.287513732910156, -11.023651123046875, 12.173171997070312, 19.47043800354004, 35.094383239746094, -17.911773681640625, 44.36390686035156, 18.569717407226562, 61.523193359375, 6.7539520263671875, -13.519378662109375, 55.297637939453125, -21.71552276611328, 43.28269958496094, 39.41264724731445, 42.24415588378906, -2.7233448028564453, 74.73329162597656, 34.38547134399414, 80.90361022949219, 107.63623809814453], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000417.npy"}
|
||||
{"epoch": 0.6303854875283447, "step": 418, "batch_size": 64, "mean": 21.837738037109375, "std": 40.5361328125, "min": -45.01373291015625, "p10": -22.524560546875, "median": 16.58911895751953, "p90": 84.18213424682618, "max": 128.3236846923828, "pos_frac": 0.640625, "sample": [28.879196166992188, 65.50946044921875, 68.65485382080078, 22.3372802734375, 16.3560791015625, 67.5459213256836, 15.739349365234375, 52.96855926513672, 101.09913635253906, -33.46790313720703, 81.0720443725586, 89.23918151855469, -22.647415161132812, 50.88580322265625, 32.45275115966797, -37.20812225341797, 65.7962875366211, -0.8282089233398438, 85.51502990722656, -16.207359313964844, 1.6088142395019531, 128.3236846923828, 31.644554138183594, 10.30712890625, 86.66561126708984, 24.305038452148438, -9.857391357421875, 5.3214569091796875, 10.523021697998047, -3.2559356689453125, -37.82940673828125, -17.853363037109375, 91.10038757324219, 16.822158813476562, 96.97162628173828, -3.3577423095703125, -2.3998546600341797, -9.160964965820312, 22.014930725097656, 3.2025794982910156, -17.61302947998047, -20.32392120361328, -43.14407730102539, -12.044870376586914, 32.824737548828125, -16.1588134765625, 7.391593933105469, 24.54358673095703, -38.052101135253906, -3.7049179077148438, 17.924781799316406, -1.293060302734375, 69.32872772216797, 67.02903747558594, 27.382827758789062, -7.608245849609375, -45.01373291015625, 39.615455627441406, 30.41877555847168, 64.54518127441406, 41.36188507080078, -22.237899780273438, 18.677932739257812, 4.977043151855469], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000418.npy"}
|
||||
{"epoch": 0.6318972033257747, "step": 419, "batch_size": 64, "mean": 34.83051300048828, "std": 43.132728576660156, "min": -53.259620666503906, "p10": -12.778517150878901, "median": 22.236085891723633, "p90": 96.08396835327149, "max": 119.76362609863281, "pos_frac": 0.84375, "sample": [1.6699867248535156, 112.09367370605469, 24.65697479248047, 17.02539825439453, -20.97504425048828, 20.35749053955078, 103.4475326538086, 18.01001739501953, -6.601402282714844, 6.049581527709961, -53.259620666503906, 81.73883056640625, 75.23231506347656, 59.54432678222656, -21.809967041015625, 5.8609466552734375, 61.57982635498047, 10.590850830078125, 94.11515045166016, 13.867988586425781, 24.883148193359375, 49.40514373779297, -14.477813720703125, 32.8240966796875, 6.566246032714844, -2.0156478881835938, 92.71546936035156, -8.813491821289062, 91.62535095214844, 63.257225036621094, 1.0248222351074219, -33.65818786621094, -23.400035858154297, 94.59188079833984, 119.76362609863281, 38.468807220458984, 71.87176513671875, 13.715923309326172, 12.421119689941406, 42.731597900390625, 7.941705703735352, 100.42037963867188, 4.380069732666016, 0.320098876953125, 96.72343444824219, 0.6514129638671875, 50.197845458984375, 23.765365600585938, 20.706806182861328, 101.89466857910156, 88.2396240234375, 0.4351787567138672, 20.668643951416016, 7.806327819824219, 67.16783905029297, 2.9299488067626953, 65.83345794677734, 56.51074981689453, 107.00608825683594, 3.501922607421875, 38.67538070678711, 49.252593994140625, -52.069984436035156, 89.49740600585938], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000419.npy"}
|
||||
{"epoch": 0.6334089191232048, "step": 420, "batch_size": 64, "mean": 17.547388076782227, "std": 44.800933837890625, "min": -92.45637512207031, "p10": -35.4492202758789, "median": 10.859122276306152, "p90": 84.86853408813477, "max": 115.82193756103516, "pos_frac": 0.6875, "sample": [3.08343505859375, -25.680988311767578, 76.4305648803711, -0.33249664306640625, 38.11469268798828, -62.944881439208984, 6.4210968017578125, 7.883144378662109, 10.084831237792969, -92.45637512207031, -8.34735107421875, 30.230064392089844, 8.627845764160156, 73.72068786621094, -3.6424484252929688, 74.32492065429688, 104.13232421875, 63.47731399536133, 104.70535278320312, 83.84204864501953, 11.633413314819336, -9.33932876586914, 8.223543167114258, 34.45238494873047, 40.241615295410156, 89.26756286621094, -36.76331329345703, -41.35089111328125, 15.400888442993164, 8.358255386352539, -53.5126953125, 11.941017150878906, -5.5066680908203125, 14.592296600341797, 1.5429496765136719, -32.23322296142578, 6.838615417480469, -15.383743286132812, 70.55741882324219, 19.041366577148438, -11.002128601074219, -80.28179931640625, -31.70996856689453, 0.2191162109375, 85.30845642089844, 21.84731101989746, 25.159648895263672, 85.9776840209961, 4.2774658203125, 15.65203857421875, 85.82254791259766, 28.43411636352539, 64.54922485351562, -7.254682540893555, 33.90681838989258, 51.55104064941406, 42.705535888671875, -37.386146545410156, 19.98482894897461, 13.011688232421875, 8.773181915283203, 115.82193756103516, -32.38300323486328, -9.62740707397461], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000420.npy"}
|
||||
{"epoch": 0.6349206349206349, "step": 421, "batch_size": 64, "mean": 24.51070213317871, "std": 47.97883605957031, "min": -99.38909912109375, "p10": -33.81535530090331, "median": 20.506985664367676, "p90": 90.76916732788087, "max": 124.01443481445312, "pos_frac": 0.703125, "sample": [-69.48806762695312, 59.32135009765625, -1.8202705383300781, 72.07101440429688, 62.88713073730469, 37.57666015625, 30.789871215820312, -14.912635803222656, 1.9809646606445312, 124.01443481445312, 110.17140197753906, -45.403541564941406, 90.68863677978516, 27.979286193847656, -25.168834686279297, 11.303337097167969, 10.5638427734375, -20.072357177734375, 85.46553802490234, 48.19929504394531, 66.10147094726562, 56.95068359375, -0.4261608123779297, -0.7041664123535156, 28.81903648376465, -8.28152847290039, 44.06175994873047, -99.38909912109375, -69.45304107666016, -29.172832489013672, 22.478219985961914, 99.4948959350586, 74.63634490966797, 3.699209213256836, 18.535751342773438, 98.1235580444336, 71.2183837890625, 75.54407501220703, 3.7762069702148438, 7.415493011474609, 96.32725524902344, -7.899150848388672, 32.166297912597656, -48.45710754394531, -35.80500793457031, 55.55773162841797, 3.0753021240234375, 27.733367919921875, -49.171875, 8.221481323242188, 90.80368041992188, 93.99791717529297, 4.212715148925781, 11.401199340820312, 7.3270416259765625, -15.598800659179688, 27.28270721435547, 35.625, 36.26869583129883, -27.919431686401367, 76.8852310180664, 71.85490417480469, 17.684814453125, -2.464405059814453], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000421.npy"}
|
||||
{"epoch": 0.636432350718065, "step": 422, "batch_size": 64, "mean": 31.305706024169922, "std": 39.63995361328125, "min": -52.56110382080078, "p10": -15.740359878540037, "median": 31.70711326599121, "p90": 84.85302047729493, "max": 109.830078125, "pos_frac": 0.75, "sample": [80.71160125732422, 94.26092529296875, -44.08953094482422, -1.3441295623779297, 75.03054809570312, -9.782623291015625, 89.45649719238281, -21.973190307617188, 26.158653259277344, 106.4380111694336, 1.720672607421875, 21.824716567993164, -1.5283279418945312, 36.77522277832031, 61.78880310058594, 21.039859771728516, 79.91912841796875, 86.62791442871094, -33.63865661621094, 62.560035705566406, 43.316009521484375, -3.810302734375, -16.46267318725586, 16.142559051513672, 37.26585388183594, 91.09877014160156, 69.37753295898438, 66.85880279541016, 105.558837890625, 33.69898986816406, 35.411277770996094, 52.410003662109375, 32.742000579833984, -10.399927139282227, 10.809944152832031, 45.35725784301758, 65.50413513183594, 24.49869728088379, 79.43216705322266, 8.536933898925781, -3.5523605346679688, -32.046695709228516, 54.815765380859375, 57.99785232543945, 10.310958862304688, -28.66827392578125, -14.054962158203125, 59.12323760986328, 2.1424407958984375, 34.474056243896484, 8.09830093383789, 109.830078125, 8.086944580078125, -3.0354537963867188, 1.8633232116699219, 30.672225952148438, 6.153234481811523, -52.56110382080078, 15.546403884887695, 34.196983337402344, 73.80644226074219, 54.69878387451172, -1.6363983154296875, 58.0003662109375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000422.npy"}
|
||||
{"epoch": 0.6379440665154951, "step": 423, "batch_size": 64, "mean": 33.78448486328125, "std": 42.65328598022461, "min": -73.19898223876953, "p10": -10.015956115722654, "median": 19.291141510009766, "p90": 97.8634407043457, "max": 114.79203033447266, "pos_frac": 0.8125, "sample": [21.330787658691406, -5.9508514404296875, 72.49555969238281, -14.120218276977539, 114.78167724609375, 5.055341720581055, 26.284351348876953, -32.06824493408203, 102.43096923828125, 20.713157653808594, -8.311874389648438, 85.65225219726562, 3.424591064453125, -12.270210266113281, 5.3938446044921875, 2.8650283813476562, 1.2235107421875, -1.8662872314453125, 103.66249084472656, -31.025840759277344, 29.810047149658203, 51.820350646972656, -15.909942626953125, -0.6883258819580078, 49.20792007446289, 85.98313903808594, 85.18653106689453, 1.7063026428222656, -73.19898223876953, 31.32574462890625, 82.08255004882812, 83.09132385253906, 45.099159240722656, 31.5458984375, 62.172332763671875, 17.023117065429688, 12.7650146484375, 67.41361999511719, 114.79203033447266, 103.04878234863281, 11.06622314453125, 11.810466766357422, 3.8387985229492188, 49.81278991699219, 70.76920318603516, 96.3725814819336, 94.79291534423828, 16.275033950805664, 4.254077911376953, 33.10786437988281, 110.80165100097656, 55.31633758544922, 2.093690872192383, 15.560083389282227, 55.69139862060547, 98.50238037109375, 9.160743713378906, 2.2297210693359375, -10.74627685546875, 17.869125366210938, 71.9019775390625, 1.4981842041015625, 17.468425750732422, -1.2169933319091797], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000423.npy"}
|
||||
{"epoch": 0.6394557823129252, "step": 424, "batch_size": 64, "mean": 34.29045867919922, "std": 43.29853439331055, "min": -50.398048400878906, "p10": -15.050854682922362, "median": 25.240528106689453, "p90": 100.64809112548828, "max": 112.78732299804688, "pos_frac": 0.78125, "sample": [-14.008514404296875, 104.7407455444336, 13.419754028320312, 74.02671813964844, -25.920730590820312, 69.2180404663086, 112.78732299804688, 67.41680145263672, -47.49968719482422, -28.718475341796875, -9.412395477294922, 20.24427604675293, 20.811723709106445, -50.398048400878906, 58.984535217285156, 8.389739990234375, 79.69390869140625, -10.889602661132812, 32.95124816894531, 4.293403625488281, -15.49757194519043, 52.639068603515625, 81.86841583251953, 25.30536651611328, -10.90463638305664, 80.90019989013672, 100.67843627929688, 17.208465576171875, 111.44046020507812, 20.30437660217285, 48.450836181640625, 43.21295928955078, 25.15540313720703, 26.039020538330078, 21.008941650390625, 25.175689697265625, 61.703704833984375, 100.57728576660156, 61.216758728027344, 56.748146057128906, 29.509252548217773, 15.261848449707031, 11.030590057373047, 110.2955093383789, 14.317649841308594, 109.07881164550781, 58.41600036621094, 0.5375709533691406, -32.850425720214844, 5.219144821166992, 99.6737060546875, 39.197601318359375, 65.5794677734375, 100.77632904052734, -2.5498275756835938, -2.1735153198242188, 36.34443664550781, -45.53680419921875, 61.5343017578125, 81.27790832519531, 0.4641876220703125, 13.469871520996094, -4.6985321044921875, 17.052215576171875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000424.npy"}
|
||||
{"epoch": 0.6409674981103552, "step": 425, "batch_size": 64, "mean": 29.15235137939453, "std": 41.834285736083984, "min": -51.29176330566406, "p10": -26.53899936676025, "median": 23.81252670288086, "p90": 88.59442520141603, "max": 108.15209197998047, "pos_frac": 0.78125, "sample": [79.64485931396484, 60.36150360107422, 25.75731658935547, 9.830551147460938, 84.10962677001953, 1.0089683532714844, 108.15209197998047, -4.149173736572266, 35.271629333496094, 32.05801010131836, 32.81488037109375, -27.42344856262207, 35.08512496948242, -2.8492584228515625, -39.98435592651367, 67.23876953125, 8.252113342285156, 97.05036926269531, 3.29315185546875, 39.43446731567383, -30.97846221923828, 29.879257202148438, 1.7746849060058594, 34.8271484375, 51.21821594238281, -1.5182418823242188, 4.449592590332031, 106.12261962890625, 1.3474349975585938, 104.7431640625, 98.226318359375, -20.198619842529297, 51.6591796875, 9.102367401123047, 16.4769287109375, 21.00933074951172, 19.904823303222656, 107.35662078857422, 27.276927947998047, 27.026912689208984, 86.2822494506836, -51.29176330566406, 30.874618530273438, 64.23661804199219, 30.316879272460938, -43.651039123535156, 21.205169677734375, 28.505123138427734, 21.86773681640625, -4.4019317626953125, 84.92205810546875, 6.367763519287109, -30.17091941833496, 89.58535766601562, 9.406028747558594, 0.039394378662109375, 85.69866943359375, 17.747634887695312, -11.842567443847656, 85.32019805908203, -24.475284576416016, 10.075847625732422, 82.93165588378906, -28.462310791015625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000425.npy"}
|
||||
{"epoch": 0.6424792139077853, "step": 426, "batch_size": 64, "mean": 17.234432220458984, "std": 46.249542236328125, "min": -100.52891540527344, "p10": -33.827545928955075, "median": 10.595134735107422, "p90": 75.08426742553712, "max": 110.97921752929688, "pos_frac": 0.65625, "sample": [13.568462371826172, -1.7928314208984375, 64.47628784179688, 91.35331726074219, 39.95845031738281, 72.67111206054688, -7.069183349609375, 102.92379760742188, 10.100044250488281, 19.487014770507812, -6.143135070800781, -11.725120544433594, 27.861465454101562, -59.13159942626953, 11.090225219726562, 21.97986602783203, 42.554222106933594, 7.9085693359375, 28.455488204956055, -11.921165466308594, -90.80810546875, -2.168975830078125, 9.760665893554688, 16.888275146484375, -19.22832489013672, 0.961761474609375, 42.10484313964844, -100.52891540527344, 54.502262115478516, 14.462181091308594, 61.250244140625, 68.9574203491211, 65.84359741210938, -17.866453170776367, -92.08445739746094, 28.660137176513672, -7.3818206787109375, 4.744138717651367, -2.215423583984375, 96.24169921875, 96.17967224121094, 86.87486267089844, -54.88170623779297, -52.88835144042969, 0.3656005859375, 110.97921752929688, -0.1417236328125, 0.8143157958984375, -1.4284172058105469, -27.452377319335938, 37.047210693359375, 76.11847686767578, 2.1923675537109375, 69.41343688964844, 22.283180236816406, -36.55976104736328, 6.234149932861328, 55.437705993652344, 6.218048095703125, -17.189971923828125, 47.187164306640625, 30.67119598388672, 71.63890075683594, -14.809638977050781], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000426.npy"}
|
||||
{"epoch": 0.6439909297052154, "step": 427, "batch_size": 64, "mean": 26.24768829345703, "std": 46.931068420410156, "min": -76.24617767333984, "p10": -36.07154998779296, "median": 17.556819915771484, "p90": 94.69425125122072, "max": 132.46963500976562, "pos_frac": 0.734375, "sample": [39.7730598449707, 12.823812484741211, -2.995311737060547, -46.19557189941406, -9.478225708007812, 76.9095458984375, 102.73580169677734, 11.549640655517578, 7.942169189453125, -33.660621643066406, 86.60684967041016, -12.984294891357422, -37.10480499267578, -39.391693115234375, 31.330108642578125, 65.26254272460938, 36.697715759277344, 33.43212890625, 6.837196350097656, -4.884521484375, 0.16771507263183594, 36.29653549194336, 66.89215087890625, 59.111480712890625, 23.809600830078125, 87.6985855102539, 47.97704315185547, 65.72188568115234, 10.243022918701172, 23.78998565673828, 108.0679931640625, 13.811477661132812, 13.549530029296875, 11.028091430664062, 16.66783905029297, 96.10653686523438, 1.314788818359375, -76.24617767333984, 75.18750762939453, 91.39891815185547, 2.4630050659179688, -2.1039276123046875, 18.44580078125, -0.49553680419921875, -57.742427825927734, -47.871437072753906, 83.88005065917969, 46.532257080078125, 3.965728759765625, -75.96841430664062, 103.90616607666016, 96.47610473632812, -3.6245269775390625, 39.87047576904297, 42.45862579345703, 2.1013031005859375, 48.09501647949219, 34.619564056396484, 5.4693145751953125, -6.784759521484375, 33.58831024169922, -23.752273559570312, 132.46963500976562, 106.05404663085938], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000427.npy"}
|
||||
{"epoch": 0.6455026455026455, "step": 428, "batch_size": 64, "mean": 32.557342529296875, "std": 46.869415283203125, "min": -71.77183532714844, "p10": -20.559135627746578, "median": 20.961528778076172, "p90": 99.45496673583985, "max": 136.479736328125, "pos_frac": 0.765625, "sample": [-55.36058044433594, 54.880958557128906, 100.11968994140625, -3.284902572631836, 1.5850639343261719, 11.514549255371094, 3.3401527404785156, -48.59101867675781, -38.86170196533203, 12.653892517089844, 111.45649719238281, -18.164337158203125, 52.60862731933594, -28.481849670410156, 83.0519790649414, 17.086883544921875, 97.84017944335938, 11.31173324584961, -4.0447845458984375, 13.344337463378906, 77.23249816894531, 7.325740814208984, 1.7398757934570312, 10.616127014160156, 97.90394592285156, 36.82223129272461, 31.41357421875, -71.77183532714844, 25.472213745117188, -7.266693115234375, 38.40386962890625, -5.992683410644531, 8.306228637695312, 38.8111686706543, 39.74297332763672, 112.93146514892578, 96.25334930419922, 72.66676330566406, 42.41150665283203, -21.585477828979492, 76.82281494140625, 136.479736328125, 50.675262451171875, 94.8908462524414, 2.0531158447265625, -0.2908935546875, 62.54216003417969, 24.83617401123047, 3.6098079681396484, -22.48797607421875, 5.966583251953125, 13.197135925292969, -0.06382369995117188, 0.48104095458984375, 102.39234924316406, 106.7613525390625, 108.2109375, 57.28157043457031, 67.4555892944336, 80.50767517089844, 42.111083984375, 73.65800476074219, 1.7420883178710938, -10.604869842529297], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000428.npy"}
|
||||
{"epoch": 0.6470143613000756, "step": 429, "batch_size": 64, "mean": 25.24394989013672, "std": 46.16117858886719, "min": -78.32830047607422, "p10": -36.37867622375488, "median": 18.2674617767334, "p90": 86.46268234252932, "max": 107.87014770507812, "pos_frac": 0.78125, "sample": [31.593948364257812, -59.923797607421875, 1.1817550659179688, 12.446876525878906, 80.6478042602539, 52.09894561767578, 28.431072235107422, 14.723884582519531, -8.563045501708984, 4.7921295166015625, 3.59674072265625, -48.27464294433594, 24.31771469116211, 21.980987548828125, 4.685981750488281, 68.55076599121094, -78.32830047607422, 6.29193115234375, -7.0230712890625, 61.44953918457031, 72.49351501464844, -16.399089813232422, 59.49584197998047, 77.29287719726562, 104.34771728515625, 107.57454681396484, 100.29163360595703, 3.0046005249023438, 34.56352996826172, -36.6605110168457, -35.72106170654297, -31.62215805053711, 3.2855911254882812, -14.284019470214844, 80.52080535888672, 8.44024658203125, -44.8167724609375, 54.84043884277344, 2.4972000122070312, 1.0851287841796875, 90.39669799804688, 78.17981719970703, 25.80413055419922, 79.18196868896484, 88.95477294921875, -70.83953094482422, -46.066864013671875, 17.57613754272461, 9.138046264648438, 62.13544464111328, 9.444541931152344, 47.614402770996094, 10.497825622558594, -25.449779510498047, 18.958786010742188, 36.184478759765625, 107.87014770507812, 34.71717071533203, 103.96745300292969, 73.94178009033203, 3.57525634765625, 54.22028732299805, 13.936630249023438, 46.765907287597656], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000429.npy"}
|
||||
{"epoch": 0.6485260770975056, "step": 430, "batch_size": 64, "mean": 36.47758483886719, "std": 44.86935806274414, "min": -95.33340454101562, "p10": -10.942835235595698, "median": 31.397541046142578, "p90": 101.73092803955079, "max": 116.68439483642578, "pos_frac": 0.84375, "sample": [34.490875244140625, 13.398290634155273, 0.5573272705078125, 9.751136779785156, -5.867958068847656, 94.01251220703125, 0.1944866180419922, 111.65846252441406, 18.587987899780273, 36.00385284423828, -35.90290069580078, 67.92810821533203, 103.9900131225586, 27.64463996887207, 102.96194458007812, 43.772666931152344, 47.521583557128906, 0.0261077880859375, 112.07997131347656, -17.287382125854492, 8.814498901367188, 116.68439483642578, 64.28778839111328, 27.414310455322266, -0.6919040679931641, 106.18256378173828, 13.925193786621094, 85.0659408569336, 37.402435302734375, 80.79129028320312, 48.30088806152344, -13.117782592773438, 42.25471496582031, 59.054595947265625, -24.75290298461914, 87.53211975097656, 100.77391815185547, 50.397071838378906, -43.58876419067383, 80.642333984375, 9.090911865234375, 28.30420684814453, 14.30111312866211, 19.513931274414062, 10.982101440429688, 65.33782958984375, 1.2844009399414062, 2.1803951263427734, -95.33340454101562, 1.9065170288085938, 1.6864013671875, 57.38351821899414, -0.40006256103515625, 92.57785034179688, 7.4688262939453125, 102.14107513427734, 87.40467834472656, 55.425331115722656, 71.38616943359375, 50.30628967285156, 61.477630615234375, -42.952457427978516, 20.669448852539062, 19.528121948242188], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000430.npy"}
|
||||
{"epoch": 0.6500377928949358, "step": 431, "batch_size": 64, "mean": 26.87633514404297, "std": 42.60953903198242, "min": -75.18095397949219, "p10": -14.842457580566405, "median": 18.275300979614258, "p90": 86.9301986694336, "max": 118.41065979003906, "pos_frac": 0.78125, "sample": [-75.18095397949219, 60.0755615234375, 32.05315017700195, 25.72058868408203, 3.020732879638672, 69.16516876220703, 28.39093017578125, 69.27765655517578, 42.20793151855469, -4.540063858032227, 0.31445884704589844, 112.11749267578125, 68.79327392578125, -8.741592407226562, 32.55097198486328, 13.529333114624023, 112.56684112548828, -5.989892959594727, 85.443603515625, 13.95068359375, 28.97095489501953, -65.85124969482422, 13.737876892089844, 71.37838745117188, -26.77337646484375, 103.0936279296875, 47.36181640625, -55.893516540527344, 42.98652648925781, -5.1297607421875, 13.63621711730957, 10.712501525878906, 54.91681671142578, 89.85531616210938, -47.13762664794922, 7.794151306152344, 97.09314727783203, 23.037246704101562, -13.249267578125, -28.230079650878906, -2.9917144775390625, 3.1216278076171875, 1.9656295776367188, 75.17373657226562, 16.71977996826172, 13.350898742675781, 68.02561950683594, 17.501739501953125, 17.631332397460938, 118.41065979003906, 3.6654052734375, 23.413585662841797, 4.23486328125, 18.919269561767578, 45.296875, 10.83908462524414, 23.221786499023438, -9.580814361572266, 9.97049331665039, 87.08020782470703, 86.5801773071289, 38.828033447265625, 27.19671630859375, -15.525253295898438], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000431.npy"}
|
||||
{"epoch": 0.6515495086923658, "step": 432, "batch_size": 64, "mean": 28.123023986816406, "std": 47.285789489746094, "min": -103.05404663085938, "p10": -23.363822174072265, "median": 21.912867546081543, "p90": 98.03246078491212, "max": 110.21038818359375, "pos_frac": 0.703125, "sample": [-1.0470504760742188, -26.45219612121582, -29.165451049804688, -103.05404663085938, 2.2584571838378906, 2.431163787841797, -1.3335113525390625, 19.698394775390625, -23.570884704589844, 102.91496276855469, 31.467632293701172, -59.80432891845703, 10.3189697265625, 28.92548370361328, 98.73295593261719, -70.57298278808594, 44.270469665527344, -3.8691558837890625, 100.28224182128906, 46.918487548828125, -6.128318786621094, 65.34725189208984, 74.39510345458984, 3.4188385009765625, -2.625049591064453, 96.3979721069336, 12.750038146972656, 96.23745727539062, 82.42536163330078, -4.068012237548828, -22.88067626953125, 24.827327728271484, 3.6678085327148438, -33.84770202636719, 26.943695068359375, 24.12734031677246, 100.38639831542969, 51.64006805419922, 25.034683227539062, 80.40641021728516, -14.908649444580078, 30.933696746826172, 73.59657287597656, 4.272422790527344, 82.34744262695312, -14.781173706054688, 6.350751876831055, 100.98188018798828, 3.9404258728027344, 110.21038818359375, 67.61383819580078, 59.31874465942383, 57.422035217285156, 9.037307739257812, 19.17010498046875, 105.95899200439453, -11.114509582519531, 74.78024291992188, 41.51885986328125, 1.747344970703125, -11.595033645629883, 87.40901184082031, -1.136810302734375, 48.994075775146484], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000432.npy"}
|
||||
{"epoch": 0.6530612244897959, "step": 433, "batch_size": 64, "mean": 37.677452087402344, "std": 46.655696868896484, "min": -84.00720977783203, "p10": -15.91332778930664, "median": 34.806081771850586, "p90": 102.3922607421875, "max": 112.2794189453125, "pos_frac": 0.796875, "sample": [17.137447357177734, 70.4139175415039, -2.1278724670410156, 40.59818649291992, 14.088859558105469, 11.439254760742188, 0.09079360961914062, 72.97265625, -29.71792984008789, 12.7982177734375, 18.631439208984375, 36.31304931640625, 45.4609375, 11.777053833007812, 102.98216247558594, -69.3337631225586, 34.87153244018555, 84.95365905761719, 75.52035522460938, -27.13034439086914, 40.00941467285156, 34.740631103515625, 15.902641296386719, 81.32327270507812, -84.00720977783203, 22.57107162475586, 86.80162048339844, 6.946067810058594, 75.13182067871094, 81.8607177734375, 14.46066665649414, 75.50748443603516, -19.096834182739258, 2.4693832397460938, 51.55509567260742, -3.6163482666015625, -12.618209838867188, 48.08082580566406, 0.9368896484375, 111.0928955078125, 112.2794189453125, 105.56705474853516, 109.67960357666016, 18.124679565429688, 68.93263244628906, -41.940696716308594, 19.01936149597168, 61.25336456298828, -6.372840881347656, 95.81016540527344, -12.490409851074219, 41.71344757080078, 68.0654296875, -16.255210876464844, 22.56507110595703, 101.01582336425781, 73.27404022216797, 25.346237182617188, 98.44902801513672, 89.45150756835938, 109.65970611572266, 24.54296875, 106.99063873291016, -15.1156005859375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000433.npy"}
|
||||
{"epoch": 0.654572940287226, "step": 434, "batch_size": 64, "mean": 36.25547790527344, "std": 50.05290222167969, "min": -100.52101135253906, "p10": -24.921791839599603, "median": 43.36564064025879, "p90": 99.05086975097657, "max": 113.3265380859375, "pos_frac": 0.78125, "sample": [-87.4930419921875, 51.364959716796875, 70.57173156738281, -7.5896759033203125, 71.85499572753906, 97.77549743652344, -35.65961837768555, 83.1821517944336, -27.458908081054688, 21.238815307617188, -3.774534225463867, -30.350906372070312, -100.52101135253906, 38.94712829589844, 58.22241973876953, -19.001853942871094, 58.957584381103516, 53.1555290222168, 85.45062255859375, 113.3265380859375, 36.259803771972656, 4.49901008605957, 111.90888977050781, 2.5505752563476562, -49.748199462890625, 22.667251586914062, 102.76080322265625, 51.962005615234375, 3.577728271484375, 61.17138671875, 10.236797332763672, 75.50111389160156, 55.145751953125, 109.08979797363281, 113.08268737792969, 93.00761413574219, -10.73513412475586, 48.76921844482422, 94.51834106445312, 0.47327423095703125, 92.63623046875, -0.4055328369140625, 21.657032012939453, 61.45280838012695, 1.5247421264648438, 24.641769409179688, 9.500185012817383, 58.727989196777344, 38.57958984375, 70.57413482666016, 1.8820648193359375, 102.6358871459961, -71.99063110351562, -3.2043609619140625, 11.212638854980469, 59.35735321044922, 83.5283203125, 99.59745788574219, 47.78415298461914, -16.39679718017578, 21.698772430419922, 65.85709381103516, 83.41655731201172, 27.18604850769043], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000434.npy"}
|
||||
{"epoch": 0.656084656084656, "step": 435, "batch_size": 64, "mean": 31.113502502441406, "std": 45.69175338745117, "min": -77.46009063720703, "p10": -21.130824661254877, "median": 26.02593231201172, "p90": 95.17499847412111, "max": 115.69003295898438, "pos_frac": 0.75, "sample": [64.56156158447266, 25.209850311279297, 2.887096405029297, 11.988250732421875, -41.372894287109375, 3.0969467163085938, 11.212631225585938, 1.1810874938964844, 82.87853240966797, -9.217689514160156, 66.80619812011719, 98.19957733154297, -12.715667724609375, 55.130889892578125, 91.79483032226562, 84.79872131347656, 4.269172668457031, -41.31120300292969, -9.151451110839844, 90.3696060180664, 48.650535583496094, 3.520824432373047, 74.60903930664062, 115.69003295898438, 5.482696533203125, 29.3817138671875, -1.5836029052734375, 20.787033081054688, 99.84703063964844, 83.28411102294922, -0.5501556396484375, 33.782806396484375, -56.08649444580078, 58.236446380615234, 38.75458908081055, 57.388484954833984, -4.020072937011719, -5.45941162109375, 57.671302795410156, 87.09889221191406, 85.42803955078125, 108.67385864257812, 103.53300476074219, 43.784751892089844, 1.34149169921875, 36.43225860595703, 35.34146499633789, -14.64442253112793, -14.116649627685547, 98.6754150390625, -30.631576538085938, 12.387649536132812, -23.91071128845215, 43.52198791503906, 26.84201431274414, 45.76308059692383, 96.62364196777344, -46.2491455078125, -77.46009063720703, 14.023696899414062, 0.6072540283203125, 12.922149658203125, 81.71488189697266, 23.558364868164062], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000435.npy"}
|
||||
{"epoch": 0.6575963718820862, "step": 436, "batch_size": 64, "mean": 27.47412872314453, "std": 46.44340133666992, "min": -82.15584564208984, "p10": -26.220392799377436, "median": 26.520708084106445, "p90": 98.21717071533205, "max": 104.25985717773438, "pos_frac": 0.6875, "sample": [49.15118408203125, -4.235904693603516, 101.647216796875, 0.05218315124511719, 44.65625762939453, 3.164752960205078, 27.885948181152344, 85.15005493164062, -31.894187927246094, 20.367774963378906, 30.547996520996094, 101.62136840820312, 5.929634094238281, 49.2684326171875, -0.3806495666503906, 99.8394546508789, 96.30967712402344, 76.17204284667969, -58.85432434082031, 57.351661682128906, 34.7213249206543, 34.77427673339844, -7.617591857910156, -9.401969909667969, 43.01640319824219, -2.1513595581054688, 101.54925537109375, -5.641683578491211, 78.40263366699219, 18.903030395507812, 33.35622787475586, -15.294921875, -82.15584564208984, 26.951934814453125, -9.25440788269043, 22.614551544189453, 67.729736328125, -28.0454158782959, -11.418266296386719, -58.35273742675781, 74.42091369628906, 65.18734741210938, 26.089481353759766, 5.363616943359375, -53.73016357421875, 40.47783279418945, -21.962005615234375, 104.25985717773438, 37.520050048828125, 93.3359375, 66.45276641845703, 0.973358154296875, 92.17011260986328, 55.45463562011719, -13.09903335571289, 13.650890350341797, 99.03466796875, -20.85863494873047, 6.526695251464844, 0.4537010192871094, -20.434234619140625, 101.25877380371094, -29.222347259521484, 48.584205627441406], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000436.npy"}
|
||||
{"epoch": 0.6591080876795162, "step": 437, "batch_size": 64, "mean": 27.10342788696289, "std": 50.160011291503906, "min": -109.25558471679688, "p10": -38.41168441772461, "median": 27.27667999267578, "p90": 97.39526290893555, "max": 124.1346435546875, "pos_frac": 0.734375, "sample": [109.55960083007812, 98.5096435546875, 93.9879379272461, 33.97174072265625, 27.147300720214844, -27.274959564208984, 73.903564453125, 48.76087951660156, 124.1346435546875, 24.399311065673828, 83.28195190429688, 1.7647705078125, -109.25558471679688, 45.267181396484375, -49.692108154296875, -40.51896667480469, 2.5863189697265625, 44.50356674194336, 67.15901184082031, -51.044097900390625, 112.64722442626953, 98.11563873291016, 92.63092041015625, -14.14187240600586, -24.464313507080078, 52.867401123046875, 76.10955810546875, -13.919689178466797, -54.019805908203125, -40.010719299316406, -4.911399841308594, -25.087522506713867, 27.40605926513672, -15.663688659667969, 1.3915634155273438, 106.85812377929688, 35.58857727050781, -16.67639923095703, 40.12696075439453, 21.850543975830078, 15.950363159179688, 2.008941650390625, -18.866703033447266, 78.95942687988281, 11.065353393554688, 95.71438598632812, 22.94121551513672, 31.89126968383789, 32.548553466796875, -62.606117248535156, 46.74467468261719, 60.68801498413086, 68.98108673095703, 77.40391540527344, 16.571884155273438, 15.719825744628906, 33.66754913330078, 38.172393798828125, -34.68060302734375, 6.092220306396484, 0.03720664978027344, 31.994781494140625, 3.18731689453125, 102.58358764648438], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000437.npy"}
|
||||
{"epoch": 0.6606198034769464, "step": 438, "batch_size": 64, "mean": 38.58998107910156, "std": 46.43825149536133, "min": -56.358238220214844, "p10": -8.968727493286131, "median": 24.060943603515625, "p90": 99.81189727783203, "max": 130.4671630859375, "pos_frac": 0.765625, "sample": [116.98374938964844, -6.17633056640625, 130.4671630859375, -33.21028518676758, 0.2970409393310547, -17.570587158203125, -0.7999382019042969, 71.17068481445312, 3.243074417114258, 83.44992065429688, -1.2421226501464844, 120.13982391357422, -5.058311462402344, 0.38661956787109375, 90.8117904663086, 0.4928474426269531, 87.2072525024414, -56.358238220214844, 73.88143157958984, 55.52320861816406, 88.36321258544922, 81.71585083007812, -9.429344177246094, 20.03814697265625, 32.667667388916016, 88.93635559082031, 24.2374267578125, 19.588855743408203, 104.33036804199219, 80.18592834472656, 46.173431396484375, 1.72705078125, 100.46794891357422, 1.4431743621826172, -2.3780250549316406, 91.32831573486328, 66.22077941894531, -9.877119064331055, 43.03225326538086, 48.166961669921875, 94.96348571777344, -27.7725830078125, 23.88446044921875, 116.39218139648438, 92.67475891113281, 20.1732177734375, 72.65021514892578, 15.696090698242188, 2.7868270874023438, 7.097190856933594, 4.876598358154297, -7.893955230712891, 17.31605339050293, 75.59088134765625, 70.11267852783203, -43.875450134277344, 98.9465103149414, -3.5869293212890625, -6.162193298339844, 100.18277740478516, 6.050312042236328, 22.11900520324707, 46.30927658081055, 40.64921569824219], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000438.npy"}
|
||||
{"epoch": 0.6621315192743764, "step": 439, "batch_size": 64, "mean": 32.17955017089844, "std": 42.92832946777344, "min": -95.52371215820312, "p10": -9.284405326843258, "median": 20.420616149902344, "p90": 94.08294296264648, "max": 120.0318603515625, "pos_frac": 0.796875, "sample": [53.834632873535156, -0.9008674621582031, 9.899824142456055, -29.875579833984375, 30.453493118286133, 85.893798828125, 45.20433807373047, 26.98779296875, -2.6524581909179688, 95.77056884765625, 120.0318603515625, -2.724111557006836, 94.8313980102539, 19.161651611328125, 7.32305908203125, 92.3365478515625, 83.56953430175781, 3.1613388061523438, 80.17092895507812, 2.85479736328125, 6.834300994873047, 60.50469207763672, 32.377052307128906, -5.637727737426758, 90.40414428710938, 7.811866760253906, 86.69572448730469, -38.37007522583008, 2.870553970336914, 112.20741271972656, -12.820974349975586, 100.59909057617188, -22.777137756347656, 5.607025146484375, 50.182044982910156, 15.443328857421875, 3.90521240234375, 6.170452117919922, -95.52371215820312, 2.135059356689453, -4.547271728515625, 21.495193481445312, 80.68798828125, 35.72515869140625, 37.7581672668457, 82.09043884277344, 49.343048095703125, 110.92013549804688, 19.346038818359375, 32.964168548583984, 16.534671783447266, 48.84489440917969, 11.418586730957031, 58.01825714111328, 49.67796325683594, 107.92826843261719, 15.99786376953125, 4.579498291015625, -10.847267150878906, -17.4222412109375, 8.486927032470703, 50.01422119140625, 26.941429138183594, -0.41593170166015625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000439.npy"}
|
||||
{"epoch": 0.6636432350718064, "step": 440, "batch_size": 64, "mean": 35.21943283081055, "std": 49.26496505737305, "min": -74.67306518554688, "p10": -25.966851806640623, "median": 30.293145179748535, "p90": 107.18481063842773, "max": 120.81549072265625, "pos_frac": 0.765625, "sample": [-10.358474731445312, 17.254470825195312, 50.39616394042969, 120.81549072265625, 110.89916229248047, 89.33792114257812, 111.33660888671875, -3.7863731384277344, 91.23334503173828, 2.570281982421875, 29.30974578857422, 103.97752380371094, 32.24754333496094, 79.26841735839844, 47.31650161743164, 6.5307159423828125, 88.36558532714844, -33.85089874267578, -62.94676971435547, 21.467653274536133, 43.55004119873047, -7.853208541870117, 1.9641189575195312, 66.91389465332031, 29.541091918945312, 68.026123046875, 44.35304260253906, 82.46963500976562, 6.419042587280273, 7.770957946777344, 7.212488174438477, 47.46820831298828, 68.92054748535156, 1.2218132019042969, 1.1676750183105469, 33.313018798828125, 16.59114646911621, -74.67306518554688, 31.045198440551758, 107.8631362915039, 77.61045837402344, 101.58981323242188, -7.519317626953125, 105.60205078125, -43.919742584228516, 21.972618103027344, 108.75025939941406, 24.363319396972656, 64.42282104492188, -6.919059753417969, -16.15726089477539, -24.171117782592773, 40.72239685058594, -28.300369262695312, 40.685035705566406, 22.714656829833984, -14.07962417602539, -26.736452102661133, 90.42770385742188, -56.526275634765625, 15.014144897460938, 112.30961608886719, 112.59381866455078, 64.92467498779297], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000440.npy"}
|
||||
{"epoch": 0.6651549508692366, "step": 441, "batch_size": 64, "mean": 35.738895416259766, "std": 40.04811096191406, "min": -36.381103515625, "p10": -6.696646881103516, "median": 27.22576904296875, "p90": 97.37419586181642, "max": 112.65447235107422, "pos_frac": 0.8125, "sample": [60.582908630371094, 36.790077209472656, 100.347900390625, 93.4080810546875, -0.8533306121826172, 57.886451721191406, 73.87873077392578, 0.9623489379882812, 37.485992431640625, -1.3794746398925781, -22.35614013671875, 91.95883178710938, -2.1991043090820312, 101.2595443725586, 6.905933380126953, -6.769744873046875, 25.8179931640625, 13.070693969726562, -33.68818283081055, 105.2257080078125, 28.633544921875, 35.126190185546875, 41.623687744140625, 93.03768920898438, 14.215667724609375, 111.4375991821289, 8.911441802978516, 22.86595916748047, 109.19795227050781, -1.695068359375, 8.55572509765625, 24.413604736328125, -12.983957290649414, 76.37614440917969, 64.48994445800781, 0.987152099609375, -29.902145385742188, 89.19416809082031, 57.63652801513672, 35.00678253173828, 22.723846435546875, 99.07395935058594, 6.951881408691406, 21.368541717529297, 59.77461242675781, 12.248687744140625, 40.429134368896484, 9.259132385253906, 46.50019836425781, -6.526084899902344, 70.29055786132812, 13.45985221862793, 80.56244659423828, 112.65447235107422, -24.876737594604492, 29.013080596923828, 21.673843383789062, 7.171150207519531, 8.953056335449219, 48.119850158691406, -36.381103515625, 37.02882385253906, 78.24882507324219, 14.103464126586914], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000441.npy"}
|
||||
{"epoch": 0.6666666666666666, "step": 442, "batch_size": 64, "mean": 34.23713684082031, "std": 46.671478271484375, "min": -77.45043182373047, "p10": -20.411750030517574, "median": 25.220094680786133, "p90": 95.02160034179687, "max": 120.36077880859375, "pos_frac": 0.78125, "sample": [24.82318878173828, 67.7425537109375, 27.784347534179688, -16.50604248046875, 24.4399356842041, 120.36077880859375, 96.7762451171875, 102.23042297363281, 52.3809928894043, 85.79650115966797, 43.46062088012695, 79.82035827636719, 88.26956176757812, -73.84466552734375, 61.55834197998047, 48.43190002441406, -0.256561279296875, 17.75696563720703, 5.7914276123046875, -16.26456069946289, -35.09765625, 94.95230102539062, 51.25560760498047, 0.097869873046875, 82.04719543457031, 15.996772766113281, -22.08562469482422, 62.6072998046875, 16.13777732849121, 89.85160827636719, 11.516403198242188, 26.7396240234375, 73.14147186279297, 91.12947845458984, 95.45693969726562, 38.63361358642578, -16.405445098876953, 4.728565216064453, 108.7970962524414, 60.41028594970703, 23.961912155151367, -5.056129455566406, 5.321577072143555, 25.617000579833984, 1.214181900024414, -77.45043182373047, -45.160308837890625, 112.65353393554688, -28.487060546875, -10.776016235351562, 92.71409606933594, 81.93343353271484, -27.430419921875, 22.901268005371094, 15.66046142578125, -12.609212875366211, 54.917877197265625, 13.538284301757812, 29.936447143554688, 87.70073699951172, 17.819229125976562, 95.05130004882812, 12.760566711425781, 13.98097038269043], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000442.npy"}
|
||||
{"epoch": 0.6681783824640968, "step": 443, "batch_size": 64, "mean": 40.92947006225586, "std": 52.03473663330078, "min": -93.31681823730469, "p10": -12.940199089050289, "median": 30.264580726623535, "p90": 108.64215774536133, "max": 127.36004638671875, "pos_frac": 0.796875, "sample": [3.8147926330566406, 88.34452056884766, 27.207015991210938, 96.33627319335938, 22.408775329589844, 16.75750732421875, 42.147064208984375, -46.81685256958008, 17.78851318359375, 55.54663848876953, -15.541374206542969, -14.773675918579102, 20.127269744873047, 91.33395385742188, 28.951709747314453, 108.1789321899414, 3.717264175415039, 52.74620819091797, 43.5048828125, 78.69384002685547, 127.36004638671875, -92.3662338256836, -17.33521270751953, 92.0683822631836, -1.1139354705810547, 116.75048065185547, 86.02116394042969, 30.700349807739258, 113.0975570678711, 89.97750854492188, 85.25222778320312, 46.06867599487305, 71.15657806396484, 82.5473861694336, 29.828811645507812, 102.4697036743164, 90.45743560791016, 115.09074401855469, 79.91813659667969, 97.23118591308594, 7.093162536621094, 91.14013671875, 8.38905143737793, 45.682395935058594, 108.84068298339844, 94.44804382324219, -8.662086486816406, 9.770248413085938, 8.350914001464844, 2.6069278717041016, -2.032102584838867, -54.867958068847656, -3.41961669921875, 6.5307769775390625, 9.288448333740234, 19.583850860595703, -93.31681823730469, -6.0253753662109375, 110.8973388671875, 18.27997398376465, 0.0017910003662109375, -1.5700302124023438, 56.73741149902344, 126.08470153808594], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000443.npy"}
|
||||
{"epoch": 0.6696900982615268, "step": 444, "batch_size": 64, "mean": 13.219983100891113, "std": 45.569801330566406, "min": -97.98210144042969, "p10": -33.69317779541016, "median": 6.112514495849609, "p90": 76.57862701416016, "max": 103.97340393066406, "pos_frac": 0.625, "sample": [76.30933380126953, -0.20725631713867188, -23.57103729248047, 4.128335952758789, 13.248275756835938, 19.395618438720703, 33.518760681152344, -3.7054290771484375, -14.515995025634766, 77.58710479736328, 5.870353698730469, 48.556907653808594, -66.60546875, -84.9405517578125, -2.9137725830078125, 10.777740478515625, 53.35273361206055, 23.26215934753418, 22.36355972290039, -33.595733642578125, 65.84597778320312, 63.74071502685547, 2.301982879638672, -29.81426239013672, -7.96112060546875, 8.267250061035156, 3.906383514404297, -2.6992244720458984, -30.310638427734375, -31.5245361328125, 54.197357177734375, 6.420337677001953, 89.93041229248047, 21.413467407226562, 5.270355224609375, 76.69403839111328, 9.771148681640625, 75.45939636230469, -5.411598205566406, -97.98210144042969, 6.35467529296875, 97.76708984375, -76.83988952636719, -22.55854034423828, -7.373899459838867, 92.188720703125, 54.48516845703125, 31.737060546875, 103.97340393066406, 37.63811492919922, 25.452592849731445, 50.734619140625, 5.185047149658203, 4.414085388183594, 73.84789276123047, -10.772981643676758, -50.8007698059082, 16.2106876373291, 5.1455230712890625, -33.73493957519531, -71.35071563720703, 87.34333801269531, -3.2737045288085938, -5.524629592895508], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000444.npy"}
|
||||
{"epoch": 0.671201814058957, "step": 445, "batch_size": 64, "mean": 37.59710693359375, "std": 48.77501678466797, "min": -72.18122863769531, "p10": -18.125324440002437, "median": 34.97845458984375, "p90": 104.63645324707032, "max": 117.42996215820312, "pos_frac": 0.78125, "sample": [42.535396575927734, 117.42996215820312, 5.795875549316406, 4.60107421875, -8.619140625, 17.0474853515625, 100.86692810058594, 65.90354919433594, 20.744049072265625, 32.08389663696289, -2.6134490966796875, -41.43044662475586, -19.200218200683594, 10.947784423828125, -50.26460266113281, 104.80327606201172, -54.309661865234375, 2.1873931884765625, 4.1156005859375, 30.304054260253906, 1.9098968505859375, 99.50165557861328, -72.18122863769531, 6.0253448486328125, -22.161277770996094, 94.32554626464844, 18.001365661621094, 100.1283187866211, 10.046638488769531, -5.643890380859375, -31.937694549560547, -9.201581954956055, 45.370426177978516, 114.51895141601562, 104.24720001220703, 37.73748779296875, 105.72166442871094, 108.56489562988281, 22.566322326660156, -8.225120544433594, 103.49834442138672, 35.85254669189453, 49.8637580871582, 108.34721374511719, 87.21636199951172, 94.30178833007812, 98.88153076171875, 4.194803237915039, 69.1878662109375, -15.617238998413086, 48.028690338134766, 3.0305709838867188, -6.562896728515625, 40.31993865966797, 34.10436248779297, 56.409027099609375, 95.75650787353516, 83.26197814941406, 39.88226318359375, 40.26276397705078, 66.54764556884766, 51.55127716064453, 107.9188232421875, 7.733360290527344], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000445.npy"}
|
||||
{"epoch": 0.672713529856387, "step": 446, "batch_size": 64, "mean": 42.1799201965332, "std": 43.6795539855957, "min": -31.384647369384766, "p10": -4.171293258666992, "median": 33.65931510925293, "p90": 106.40922775268555, "max": 129.56387329101562, "pos_frac": 0.828125, "sample": [1.8143634796142578, 1.3521499633789062, -31.384647369384766, 67.92941284179688, 21.578353881835938, 28.389816284179688, 20.879749298095703, -11.593849182128906, 129.56387329101562, 55.880615234375, 11.81027603149414, 80.33924865722656, 72.94789123535156, 88.27157592773438, 64.74759674072266, 107.01410675048828, 23.54973602294922, 55.66484832763672, 47.444580078125, 11.604972839355469, 3.6062049865722656, 67.89806365966797, -1.529571533203125, 94.17904663085938, 111.2735595703125, 38.24925231933594, -0.5234184265136719, 96.93708038330078, 104.74617004394531, -22.74895477294922, 75.9186782836914, 2.1204586029052734, -15.157417297363281, -3.7777328491210938, -4.339962005615234, 1.510671615600586, 4.433265686035156, -26.08877944946289, 40.165428161621094, 68.96221160888672, 23.871170043945312, 118.60020446777344, 106.67326354980469, 84.92985534667969, 116.41006469726562, 0.5337009429931641, 100.77520751953125, 92.96987915039062, -7.789514541625977, 35.39101791381836, 34.851646423339844, 19.386016845703125, 16.053863525390625, 73.58206939697266, 105.79314422607422, 114.10189819335938, 11.581470489501953, 19.680770874023438, 8.344047546386719, 44.177459716796875, -2.3216114044189453, 32.466983795166016, 3.223236083984375, 62.590065002441406], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000446.npy"}
|
||||
{"epoch": 0.674225245653817, "step": 447, "batch_size": 64, "mean": 44.52255630493164, "std": 52.02357482910156, "min": -104.64789581298828, "p10": -32.736821746826166, "median": 47.07750129699707, "p90": 109.29614944458008, "max": 116.56884765625, "pos_frac": 0.828125, "sample": [112.96524810791016, 101.18255615234375, 76.64341735839844, 15.296409606933594, 13.230224609375, -3.12713623046875, 113.30438232421875, -6.606536865234375, -67.93230438232422, 6.5029144287109375, 38.127967834472656, 76.78250885009766, 10.174217224121094, 63.162933349609375, 96.2434310913086, 116.56884765625, 29.14473533630371, 57.85258865356445, 31.089557647705078, 39.5384407043457, 19.320293426513672, -29.620643615722656, 28.78839111328125, -42.09546661376953, 6.9970550537109375, 108.46353149414062, 101.84555053710938, 92.83592224121094, 69.62069702148438, -34.07232666015625, 49.012939453125, 76.51868438720703, 81.00743103027344, 72.781005859375, 45.14206314086914, 88.60856628417969, 113.0423812866211, 52.61842346191406, 110.8132095336914, 1.7171707153320312, 39.415740966796875, 33.732337951660156, 99.70034790039062, -43.659385681152344, 9.906543731689453, -63.95233154296875, 21.940589904785156, 113.83877563476562, 72.93173217773438, 68.19010162353516, -0.24074554443359375, 94.23402404785156, -35.14985275268555, 100.50579833984375, 81.68040466308594, 35.284332275390625, 58.77455139160156, -104.64789581298828, 36.23350524902344, 109.43397521972656, 108.97455596923828, 1.500457763671875, 17.487640380859375, 59.83922576904297], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000447.npy"}
|
||||
{"epoch": 0.6757369614512472, "step": 448, "batch_size": 64, "mean": 34.15762710571289, "std": 47.890045166015625, "min": -105.66983032226562, "p10": -15.248419952392574, "median": 27.049049377441406, "p90": 99.79743423461915, "max": 120.14049530029297, "pos_frac": 0.796875, "sample": [92.03800201416016, -72.13350677490234, 98.92158508300781, 27.209739685058594, 43.031410217285156, 45.16265106201172, -2.423137664794922, 63.88523864746094, 28.41545867919922, -4.942848205566406, 54.809783935546875, 71.68253326416016, 13.141632080078125, -10.512252807617188, 78.15511322021484, 104.66816711425781, -8.568973541259766, 114.73311614990234, 25.088218688964844, 8.147891998291016, 1.87066650390625, 106.7371826171875, 26.88835906982422, 37.49378967285156, 60.948394775390625, 1.4426612854003906, 95.44486999511719, 38.660072326660156, 8.527626037597656, -21.999588012695312, -31.208852767944336, 7.4430999755859375, 50.053199768066406, 4.8034820556640625, 6.292304992675781, 55.35647964477539, -23.8350830078125, 20.565858840942383, 113.03239440917969, -6.94154167175293, 8.116552352905273, -105.66983032226562, 63.12504577636719, 6.4643707275390625, 20.336563110351562, 80.70999145507812, 46.62054443359375, 100.17279815673828, 34.713035583496094, 16.370275497436523, -17.27820587158203, 42.224822998046875, 24.88641357421875, 78.89524841308594, 0.7828559875488281, 95.24917602539062, -5.904975891113281, -60.45000457763672, 92.69068908691406, 103.64198303222656, 92.02973937988281, 11.170112609863281, 14.965354919433594, 120.14049530029297], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000448.npy"}
|
||||
{"epoch": 0.6772486772486772, "step": 449, "batch_size": 64, "mean": 17.956323623657227, "std": 40.67932891845703, "min": -60.334190368652344, "p10": -34.232668685913076, "median": 9.644489288330078, "p90": 67.27529983520509, "max": 118.0196533203125, "pos_frac": 0.65625, "sample": [0.12338066101074219, 8.673248291015625, -47.343505859375, 29.773056030273438, -3.1837539672851562, -4.321525573730469, -1.854095458984375, 10.615730285644531, 8.209075927734375, 104.2522201538086, 60.006568908691406, 35.770652770996094, -3.0162582397460938, -57.27622985839844, -2.8829269409179688, 27.828826904296875, 118.0196533203125, 6.916572570800781, 3.2331695556640625, 34.30472946166992, -12.163898468017578, 39.05854797363281, 112.17594146728516, 95.70645141601562, -59.00190353393555, 24.291004180908203, 61.64344024658203, -3.1262664794921875, 63.140235900878906, 16.97303009033203, 17.131973266601562, -13.662269592285156, 44.70819091796875, -37.36091232299805, 59.322959899902344, 7.573295593261719, 80.93681335449219, -2.6242637634277344, 19.997310638427734, 31.799118041992188, 3.322286605834961, -10.744598388671875, -17.73958969116211, 19.396728515625, -37.863548278808594, 54.539302825927734, -8.037269592285156, 88.18862915039062, 12.31863784790039, 34.29779815673828, 59.998992919921875, 0.75506591796875, 7.907402038574219, 23.822311401367188, -15.01416015625, 43.663665771484375, 47.03667449951172, -60.334190368652344, 69.04747009277344, -26.933433532714844, 45.70887756347656, -53.824981689453125, 2.6266822814941406, -7.301399230957031], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000449.npy"}
|
||||
{"epoch": 0.6787603930461074, "step": 450, "batch_size": 64, "mean": 25.42965316772461, "std": 47.18435287475586, "min": -69.27059173583984, "p10": -31.38733482360839, "median": 14.98003101348877, "p90": 100.45774841308595, "max": 120.7703857421875, "pos_frac": 0.71875, "sample": [0.7375888824462891, -16.040847778320312, -19.199546813964844, -22.169166564941406, 105.42971801757812, 118.78594207763672, 38.557098388671875, 11.799591064453125, 88.54692077636719, 115.48401641845703, 36.751075744628906, 10.54896354675293, 40.45752716064453, 19.869163513183594, -2.2937088012695312, 79.53743743896484, 3.1797103881835938, 33.58251953125, 88.37303924560547, 120.7703857421875, -5.417572021484375, 43.691532135009766, 1.7877960205078125, -13.073379516601562, 41.24273681640625, -33.718387603759766, 23.33879852294922, 79.0418701171875, 23.69048309326172, -17.4858455657959, 10.765846252441406, 73.50018310546875, 9.94677734375, 54.02992248535156, -69.27059173583984, 102.54400634765625, 67.9627914428711, 0.25457000732421875, 95.58981323242188, 28.988460540771484, 8.99978256225586, -0.7508621215820312, 28.26828956604004, -54.79169464111328, 26.683120727539062, 8.536108016967773, 44.57841491699219, 5.155242919921875, 29.272674560546875, -24.55526351928711, 14.646699905395508, -59.111419677734375, 13.13755989074707, 54.96892547607422, 112.78753662109375, 10.46978759765625, -47.44230651855469, 68.201171875, -37.419898986816406, 109.3638916015625, -25.948211669921875, 15.313362121582031, -34.60015869140625, -8.382179260253906], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000450.npy"}
|
||||
{"epoch": 0.6802721088435374, "step": 451, "batch_size": 64, "mean": 26.37521743774414, "std": 47.30452346801758, "min": -77.0894546508789, "p10": -24.515937805175774, "median": 20.121299743652344, "p90": 96.47067947387697, "max": 127.51852416992188, "pos_frac": 0.71875, "sample": [70.16854858398438, -7.135467529296875, 19.991928100585938, 22.361923217773438, 71.21232604980469, 20.25067138671875, -77.0894546508789, 42.3819580078125, 7.870233535766602, 97.9244384765625, 45.28657531738281, 25.163124084472656, 79.799072265625, -0.033966064453125, 86.88276672363281, 24.038497924804688, 38.70800018310547, 23.031898498535156, 10.491720199584961, 52.450401306152344, 0.9789104461669922, -72.28011322021484, -42.39021682739258, 127.51852416992188, 78.37533569335938, -15.405441284179688, 26.278289794921875, -63.06689453125, -18.062789916992188, -0.056049346923828125, 43.176170349121094, 99.99288940429688, -30.52438735961914, 9.03108024597168, 63.50117492675781, 50.990814208984375, 105.19708251953125, 63.23948669433594, -6.762935638427734, 8.005531311035156, 3.1295204162597656, 83.55000305175781, 1.1444225311279297, -3.5181427001953125, 51.94358825683594, 0.4056396484375, 8.879467010498047, 99.806640625, 114.13349914550781, 50.513824462890625, 1.60107421875, 62.313087463378906, 8.173782348632812, -27.20843505859375, -3.83935546875, -18.233444213867188, 93.07857513427734, 0.8777427673339844, -3.188539505004883, 30.83694839477539, 118.24930572509766, -9.887104034423828, -59.20063781738281, 2.9608154296875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000451.npy"}
|
||||
{"epoch": 0.6817838246409675, "step": 452, "batch_size": 64, "mean": 29.629127502441406, "std": 50.39764404296875, "min": -104.42945861816406, "p10": -30.63205337524414, "median": 30.767955780029297, "p90": 91.01188201904297, "max": 117.47651672363281, "pos_frac": 0.765625, "sample": [20.872299194335938, -7.776466369628906, 12.33814811706543, 65.87977600097656, 22.41064453125, 42.578277587890625, 13.610809326171875, 90.73341369628906, 85.95140075683594, 97.5759506225586, 10.22872543334961, 111.23051452636719, -98.09051513671875, 47.97784423828125, 87.11827850341797, 59.99811553955078, 117.47651672363281, 39.006744384765625, 65.3304443359375, 31.110198974609375, 31.79936981201172, -24.874526977539062, 113.05831909179688, 58.58595275878906, 6.238750457763672, -9.363435745239258, 75.34593200683594, 15.348007202148438, 13.856544494628906, 5.303230285644531, 6.586101531982422, 116.31645202636719, -39.98096466064453, 91.1312255859375, -30.18309783935547, 55.891929626464844, 16.295921325683594, 33.47779846191406, -70.01811981201172, 16.646873474121094, 57.503868103027344, -14.773605346679688, 44.57758331298828, 40.294471740722656, 67.04794311523438, -64.01301574707031, 110.95346069335938, 69.0941390991211, 41.13102722167969, 0.22088241577148438, 34.52709197998047, -8.163494110107422, 12.91731071472168, 75.12471008300781, -0.3646202087402344, -4.42144775390625, -30.824462890625, 85.79420471191406, 85.38020324707031, -48.73612976074219, 0.507354736328125, 30.42571258544922, -104.42945861816406, 19.467018127441406], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000452.npy"}
|
||||
{"epoch": 0.6832955404383976, "step": 453, "batch_size": 64, "mean": 34.10637664794922, "std": 51.964778900146484, "min": -65.01292419433594, "p10": -29.560384750366204, "median": 22.468541145324707, "p90": 110.18956069946289, "max": 122.50564575195312, "pos_frac": 0.71875, "sample": [64.88460540771484, 28.298730850219727, -20.640533447265625, 3.0880355834960938, 99.73814392089844, 7.6186676025390625, 6.008735656738281, 91.30255126953125, 74.07276916503906, -12.675201416015625, -65.01292419433594, 2.222076416015625, 32.40440368652344, -7.7123870849609375, 91.4697036743164, 97.6474609375, 109.44432067871094, 49.8651123046875, 112.2169418334961, 25.673809051513672, 4.964498519897461, -14.487899780273438, 15.744369506835938, 22.81134033203125, -1.6253204345703125, -62.78508758544922, -42.157127380371094, 110.98467254638672, -31.95807647705078, 93.58171844482422, 122.50564575195312, 19.926403045654297, 70.58274841308594, -55.84284973144531, -2.130908966064453, 22.125741958618164, 102.70325469970703, 76.92562866210938, -32.579322814941406, 112.20974731445312, 19.67993927001953, 112.83541870117188, -48.01266098022461, 2.464366912841797, -6.411651611328125, 28.992599487304688, -19.860740661621094, 20.301876068115234, 85.62181854248047, 59.82330322265625, 4.324970245361328, -13.197467803955078, 35.6324462890625, 53.03434753417969, 48.42688751220703, -23.965770721435547, -0.4675331115722656, 75.63804626464844, 99.45303344726562, 7.648773193359375, 118.0943832397461, 82.30980682373047, 110.50894927978516, 8.518722534179688], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000453.npy"}
|
||||
{"epoch": 0.6848072562358276, "step": 454, "batch_size": 64, "mean": 34.84968566894531, "std": 49.449974060058594, "min": -117.23151397705078, "p10": -11.409585571289062, "median": 26.418017387390137, "p90": 102.74977035522461, "max": 124.11770629882812, "pos_frac": 0.8125, "sample": [99.36923217773438, -26.914398193359375, -11.891571044921875, 83.6296157836914, -117.23151397705078, -10.83871078491211, 79.52949523925781, 6.701416015625, 12.08255386352539, 11.791419982910156, 60.909542083740234, 105.93365478515625, 25.2261962890625, -89.40007019042969, 100.23116302490234, 42.04097366333008, 109.941162109375, 54.33464050292969, 84.33374786376953, 8.624832153320312, -11.300605773925781, -64.624755859375, -1.1494731903076172, 90.8698501586914, 103.75537872314453, 112.916259765625, -7.023262023925781, 24.140213012695312, 0.5509109497070312, 6.952049255371094, 11.740524291992188, 39.005157470703125, 55.2757568359375, 8.231285095214844, 34.22755432128906, 13.449043273925781, 124.11770629882812, -24.727787017822266, 116.08392333984375, 27.609838485717773, 44.09285354614258, 43.40758514404297, -4.099418640136719, 7.387133598327637, 7.6342926025390625, 17.461082458496094, 105.05763244628906, 63.613258361816406, 6.88701057434082, 51.33759307861328, 58.31129455566406, 63.06512451171875, 8.781414031982422, 100.40335083007812, 85.70567321777344, 14.042449951171875, 42.630584716796875, 48.05097961425781, 10.446014404296875, 80.17218780517578, 84.3785171508789, 14.417308807373047, 0.1493358612060547, -11.456291198730469], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000454.npy"}
|
||||
{"epoch": 0.6863189720332578, "step": 455, "batch_size": 64, "mean": 35.63090515136719, "std": 46.761226654052734, "min": -53.02845001220703, "p10": -12.169264984130859, "median": 26.41750144958496, "p90": 106.5253189086914, "max": 135.81961059570312, "pos_frac": 0.734375, "sample": [60.30622863769531, -8.999710083007812, 43.36943817138672, 107.18356323242188, 29.51988983154297, -2.904956817626953, 106.93528747558594, 110.26673889160156, -10.831626892089844, 3.9113998413085938, 40.965091705322266, -53.02845001220703, 51.390480041503906, 38.9544563293457, 54.48976135253906, 135.81961059570312, -4.516777038574219, -7.860260009765625, 11.810562133789062, -35.577415466308594, 105.5687255859375, 1.1238842010498047, 88.5525894165039, -0.8675823211669922, -4.926395416259766, 95.9625244140625, 26.702865600585938, 58.75665283203125, 97.88333129882812, 27.16199493408203, 10.73211669921875, 13.81920051574707, 104.7861557006836, 0.8223342895507812, 52.517921447753906, -0.13344573974609375, -48.42067337036133, 66.39159393310547, 84.06336212158203, 95.34478759765625, -29.089336395263672, -1.9206771850585938, 46.767822265625, -32.568328857421875, 102.14970397949219, 109.23283386230469, 35.02098846435547, 25.99341583251953, 11.581146240234375, 0.9010009765625, 117.61285400390625, 42.62474822998047, -1.2341041564941406, 85.32975769042969, 0.4609832763671875, 22.729827880859375, 26.132137298583984, -14.560707092285156, -12.742538452148438, 22.953758239746094, 16.569047927856445, 27.840274810791016, 15.512798309326172, 116.0352783203125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000455.npy"}
|
||||
{"epoch": 0.6878306878306878, "step": 456, "batch_size": 64, "mean": 30.673297882080078, "std": 46.76139450073242, "min": -75.58602905273438, "p10": -12.72520999908447, "median": 24.87005615234375, "p90": 98.93759689331056, "max": 131.3521728515625, "pos_frac": 0.71875, "sample": [59.9696044921875, -3.4791336059570312, -0.7817611694335938, 46.436988830566406, 131.3521728515625, 0.6214942932128906, -7.702251434326172, 31.950153350830078, -0.4609489440917969, 84.85658264160156, 43.21232604980469, 34.438167572021484, 95.0218505859375, 107.10863494873047, 21.89666748046875, 4.274688720703125, 44.99201965332031, 31.695396423339844, 13.156255722045898, 95.42835235595703, -50.13108825683594, 7.867645263671875, 43.02362823486328, -0.9467086791992188, 78.72642517089844, -4.860780715942383, -59.91498565673828, -75.58602905273438, 74.45586395263672, -10.48480224609375, 8.352367401123047, 28.60302734375, 13.631776809692383, 54.12908935546875, 8.550529479980469, 53.80664825439453, 25.035110473632812, -3.2501373291015625, 65.37013244628906, 9.62240219116211, -13.685384750366211, 26.195178985595703, 21.156923294067383, 37.82829284667969, -47.84919738769531, -58.15760040283203, 86.60667419433594, 7.0788421630859375, -8.781730651855469, -6.841327667236328, 72.92367553710938, 118.49862670898438, -0.6555023193359375, 111.10588073730469, 17.840805053710938, 93.10771179199219, 100.44155883789062, 16.291114807128906, 36.58226776123047, 114.97197723388672, 32.76898193359375, -23.754295349121094, 24.705001831054688, 104.72526550292969], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000456.npy"}
|
||||
{"epoch": 0.6893424036281179, "step": 457, "batch_size": 64, "mean": 31.06912612915039, "std": 40.555824279785156, "min": -57.87493896484375, "p10": -11.801387786865234, "median": 23.21545124053955, "p90": 90.22802124023438, "max": 115.35081481933594, "pos_frac": 0.75, "sample": [3.1016616821289062, -1.443450927734375, -1.5132904052734375, 51.627811431884766, 39.89952087402344, -36.26740646362305, 22.95523452758789, 11.950347900390625, -11.992149353027344, 1.6359100341796875, 23.47566795349121, 61.8167724609375, 82.76708984375, 44.34899139404297, -7.02540397644043, -43.486785888671875, 91.16189575195312, 9.401512145996094, -0.07421493530273438, 75.43763732910156, 105.11934661865234, 68.66566467285156, 115.35081481933594, 21.97021484375, -3.1222267150878906, 46.40202331542969, -8.859025955200195, 81.25729370117188, -23.35425567626953, 7.045223236083984, -30.023468017578125, -57.87493896484375, 3.5978851318359375, 63.187355041503906, 26.30276870727539, -13.269882202148438, -11.356277465820312, 36.731536865234375, 37.319915771484375, 20.682220458984375, 71.83647918701172, -0.5511245727539062, 96.40727996826172, 88.04898071289062, -1.0931682586669922, 11.269248962402344, 56.48085021972656, 4.226106643676758, 20.73723602294922, 101.23465728759766, 69.14130401611328, 29.57830238342285, 56.5498046875, 34.66191864013672, 75.04502868652344, 114.48530578613281, 97.39832305908203, 26.48553466796875, 17.187158584594727, 2.9099998474121094, 52.147003173828125, 6.6573638916015625, 38.73490905761719, 15.295866012573242], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000457.npy"}
|
||||
{"epoch": 0.690854119425548, "step": 458, "batch_size": 64, "mean": 33.504337310791016, "std": 48.777469635009766, "min": -111.90975952148438, "p10": -15.871083831787107, "median": 22.932926177978516, "p90": 107.42239990234378, "max": 119.54728698730469, "pos_frac": 0.78125, "sample": [55.62358093261719, 77.49633026123047, 2.6339073181152344, 56.12091064453125, 28.619916915893555, 29.37375259399414, 56.12654113769531, -111.90975952148438, -0.13245582580566406, 9.796012878417969, -13.784034729003906, -16.765533447265625, -13.083099365234375, 83.73454284667969, 98.36497497558594, 17.100187301635742, 40.077171325683594, 95.70301818847656, 19.248443603515625, 23.034957885742188, 8.543428421020508, 6.80615234375, 22.830894470214844, -4.2866973876953125, 75.48619079589844, 2.805023193359375, 92.31581115722656, 60.532249450683594, 16.21971893310547, 85.56468963623047, 111.40621948242188, 31.170682907104492, 19.630409240722656, -38.45757293701172, 16.3016357421875, 0.04181861877441406, 1.9002227783203125, 17.510345458984375, 35.212646484375, 111.69932556152344, 58.450809478759766, 34.85343551635742, -80.1809310913086, -0.18585586547851562, 87.16644287109375, 3.3169288635253906, 58.89160919189453, -1.6717033386230469, 113.61029052734375, 119.54728698730469, -18.00415802001953, 111.30415344238281, 81.79736328125, 41.861724853515625, -32.03227996826172, 116.38764190673828, 27.19780731201172, 92.4422836303711, -36.22174072265625, -4.7829132080078125, 11.152080535888672, 12.986770629882812, 113.95355224609375, 21.824462890625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000458.npy"}
|
||||
{"epoch": 0.6923658352229781, "step": 459, "batch_size": 64, "mean": 39.97838592529297, "std": 48.90543746948242, "min": -77.18949890136719, "p10": -21.41968116760254, "median": 32.93705368041992, "p90": 102.9183891296387, "max": 120.41717529296875, "pos_frac": 0.765625, "sample": [36.292625427246094, 70.27143859863281, -22.926834106445312, 67.85263061523438, -1.0725784301757812, -55.2630615234375, 7.06822395324707, 75.554931640625, 77.00588989257812, 84.27767944335938, 95.64913940429688, -21.561527252197266, 52.67842483520508, -21.088706970214844, 117.07374572753906, 111.84754943847656, 20.803970336914062, 29.58148193359375, 14.714073181152344, 72.4330825805664, 75.0189437866211, 71.84062194824219, 29.518962860107422, 17.60527801513672, 9.550973892211914, 88.02178192138672, 11.205604553222656, 90.79315948486328, 44.07204818725586, 83.92550659179688, 113.8311767578125, -3.3628768920898438, -15.544309616088867, 13.233739852905273, -1.73626708984375, 13.905054092407227, 62.60712432861328, -77.18949890136719, 96.17295837402344, -10.89858627319336, 2.059743881225586, 39.19486999511719, 87.44744873046875, 4.71905517578125, 119.24650573730469, -22.247596740722656, 64.10051727294922, 1.3164443969726562, -43.34759521484375, 87.52204132080078, 2.472381591796875, 9.647674560546875, -9.790407180786133, 92.47486877441406, 53.49283218383789, -21.810470581054688, 82.82833862304688, 29.046966552734375, 119.52256774902344, 92.79961395263672, 120.41717529296875, -1.0569190979003906, 105.80928802490234, 18.98767852783203], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000459.npy"}
|
||||
{"epoch": 0.6938775510204082, "step": 460, "batch_size": 64, "mean": 41.11860656738281, "std": 47.214515686035156, "min": -45.91023254394531, "p10": -14.03334426879882, "median": 30.285507202148438, "p90": 105.31318817138673, "max": 130.16380310058594, "pos_frac": 0.796875, "sample": [101.95040893554688, 59.89015579223633, 113.65472412109375, 6.5248870849609375, 32.40174865722656, 95.6097640991211, 98.37908935546875, 38.65327453613281, 130.16380310058594, -0.34796905517578125, -20.559112548828125, 4.0091094970703125, -1.1716690063476562, 11.183700561523438, 3.217010498046875, 8.769485473632812, 14.393186569213867, 87.9527816772461, -18.076988220214844, 113.83473205566406, 9.733434677124023, 114.47160339355469, 24.36646270751953, 88.44332885742188, 106.75437927246094, 68.69075775146484, 14.321086883544922, 36.426422119140625, 96.6451187133789, 49.577186584472656, 41.441802978515625, 95.73399353027344, 81.42750549316406, 49.48899841308594, 7.197858810424805, 48.136474609375, 9.018299102783203, 21.098342895507812, 0.514251708984375, 1.8191509246826172, 75.8988037109375, 92.76841735839844, -5.843971252441406, -0.4066791534423828, 79.13998413085938, 118.46912384033203, -17.543075561523438, -26.686691284179688, 82.36212158203125, -41.30677795410156, -1.3528423309326172, 13.911956787109375, 101.38140869140625, 11.311681747436523, 28.169265747070312, 87.50035095214844, 18.960548400878906, 90.74008178710938, -39.87313461303711, -45.91023254394531, 46.083099365234375, 8.916748046875, -0.7207565307617188, 109.88294982910156], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000460.npy"}
|
||||
{"epoch": 0.6953892668178382, "step": 461, "batch_size": 64, "mean": 41.18246078491211, "std": 48.115299224853516, "min": -46.1744270324707, "p10": -18.619982147216795, "median": 38.094478607177734, "p90": 101.9984832763672, "max": 120.65724182128906, "pos_frac": 0.765625, "sample": [101.47236633300781, 48.72554016113281, 38.088714599609375, 31.603038787841797, 102.188720703125, 98.19841003417969, 25.00177764892578, 89.14816284179688, 94.52912139892578, -19.003414154052734, -38.09306335449219, 37.958282470703125, 51.78544616699219, 4.425416946411133, 61.78030014038086, -9.616897583007812, 0.219970703125, 114.81021118164062, 2.1248779296875, 11.894689559936523, 62.83123779296875, 91.7413330078125, 78.6332015991211, 11.820098876953125, 15.076272964477539, 76.53564453125, -24.682212829589844, -3.7675132751464844, -13.689674377441406, 43.418418884277344, 88.7700424194336, -26.369483947753906, -25.89537239074707, 38.100242614746094, 63.09296798706055, 100.65716552734375, 99.16098022460938, 57.49627685546875, 16.571640014648438, 120.65724182128906, 85.57711791992188, 102.77079010009766, 85.22039794921875, 42.663612365722656, 3.667112350463867, -12.090568542480469, 25.066234588623047, -1.712076187133789, 88.90579223632812, 14.812538146972656, 101.55459594726562, 50.978179931640625, 5.1130828857421875, 6.08721923828125, 93.36920166015625, -30.651687622070312, 113.76182556152344, -46.1744270324707, -16.636369705200195, -17.72530746459961, 109.77108001708984, -7.1735382080078125, 119.45757293701172, 1.6649551391601562], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000461.npy"}
|
||||
{"epoch": 0.6969009826152683, "step": 462, "batch_size": 64, "mean": 45.61006164550781, "std": 49.484554290771484, "min": -91.90626525878906, "p10": -4.094149589538573, "median": 40.129629135131836, "p90": 106.74443817138672, "max": 129.92236328125, "pos_frac": 0.859375, "sample": [98.24917602539062, 118.1234130859375, 115.84407806396484, 103.35465240478516, 2.8047027587890625, 33.87139129638672, 65.16129302978516, 0.5373268127441406, 1.0317611694335938, 0.8442764282226562, -6.065299987792969, 109.73785400390625, 98.7607650756836, 9.882743835449219, 95.99232482910156, 110.82107543945312, 3.9375762939453125, 83.22999572753906, 25.952598571777344, 68.87183380126953, 53.13174819946289, 22.106430053710938, 27.006454467773438, 69.4552230834961, 33.383941650390625, 3.4577560424804688, 89.17378234863281, 103.66499328613281, 1.2292251586914062, 84.46812438964844, -0.56915283203125, -23.526840209960938, 22.31133270263672, 91.74417877197266, 108.83995819091797, -26.552825927734375, -12.428136825561523, 64.58002471923828, 58.8658447265625, 37.40935134887695, 89.15206146240234, 95.71189880371094, 59.609169006347656, 21.502944946289062, 12.841720581054688, 107.31254577636719, 0.7874069213867188, 12.884777069091797, 92.33280944824219, 86.84827423095703, 42.84990692138672, -79.54359436035156, 87.04412078857422, 105.41885375976562, 89.05680847167969, 5.4745025634765625, 19.955896377563477, 129.92236328125, 61.2122802734375, 22.82367515563965, -91.90626525878906, -4.4606781005859375, -3.2389163970947266, 6.756374359130859], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000462.npy"}
|
||||
{"epoch": 0.6984126984126984, "step": 463, "batch_size": 64, "mean": 29.165672302246094, "std": 43.62831115722656, "min": -99.44532775878906, "p10": -14.622930908203124, "median": 22.73725128173828, "p90": 86.47143630981445, "max": 112.22235107421875, "pos_frac": 0.765625, "sample": [-3.9597816467285156, 57.447021484375, 75.62767791748047, -99.44532775878906, 55.66331481933594, 59.80897903442383, -8.08067512512207, 112.22235107421875, 74.05477142333984, 46.13690948486328, -24.692546844482422, -4.2536773681640625, 10.114028930664062, 31.55902862548828, 70.47715759277344, -8.09942626953125, 19.09435272216797, 42.199462890625, -0.213653564453125, -32.07353973388672, 30.608854293823242, 3.7716140747070312, 0.44293975830078125, 92.08939361572266, 49.2609977722168, 36.820220947265625, 9.983535766601562, 21.73992156982422, -13.067787170410156, 9.85165786743164, 19.983642578125, 23.734580993652344, 111.34002685546875, 86.66283416748047, 27.478214263916016, 74.63665008544922, 85.33834075927734, 13.444023132324219, 17.676429748535156, 106.15657043457031, 28.182430267333984, 11.390052795410156, 6.924163818359375, -24.869674682617188, -12.356124877929688, -13.867752075195312, 111.20272827148438, 60.14030456542969, 86.02484130859375, 65.00080871582031, 31.10274887084961, -80.09947204589844, 5.600044250488281, -26.703176498413086, 6.1487274169921875, 69.52752685546875, 45.725276947021484, 18.630706787109375, 35.122398376464844, 77.4395523071289, -14.946578979492188, 11.132705688476562, 1.43719482421875, 87.17449951171875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000463.npy"}
|
||||
{"epoch": 0.6999244142101285, "step": 464, "batch_size": 64, "mean": 38.8583984375, "std": 52.754947662353516, "min": -95.211669921875, "p10": -16.724061775207517, "median": 32.1596794128418, "p90": 108.05139999389648, "max": 119.95590209960938, "pos_frac": 0.765625, "sample": [-11.404205322265625, 8.792366027832031, 90.59391021728516, -31.27288818359375, 88.51713562011719, 0.8352203369140625, 2.4081497192382812, 104.84404754638672, 73.85981750488281, 66.1397705078125, 91.32907104492188, -9.157752990722656, 80.95307922363281, 55.399871826171875, 50.43860626220703, -9.405120849609375, 107.26258087158203, 73.88676452636719, 7.8594207763671875, 80.6888198852539, 16.96440887451172, 105.11824035644531, -17.465185165405273, 53.302825927734375, 8.452400207519531, 77.33349609375, 11.116096496582031, 117.42481994628906, 54.683170318603516, 73.54853820800781, 14.348373413085938, 4.1862030029296875, 111.94056701660156, 108.38946533203125, -95.211669921875, 22.019027709960938, 109.45160675048828, 30.903541564941406, -45.75054931640625, -45.268184661865234, 111.74857330322266, 17.50049591064453, 22.03860092163086, 104.65149688720703, -33.144981384277344, -4.795417785644531, 5.043663024902344, 23.253387451171875, -3.1666412353515625, 97.27406311035156, 57.81200408935547, 110.96247863769531, -88.08332061767578, -14.431888580322266, 33.41581726074219, -12.441543579101562, 19.904502868652344, 105.9720687866211, 49.315040588378906, 119.95590209960938, -14.994773864746094, 70.92882537841797, 67.14193725585938, 3.021484375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000464.npy"}
|
||||
{"epoch": 0.7014361300075586, "step": 465, "batch_size": 64, "mean": 33.785911560058594, "std": 52.12548828125, "min": -69.38064575195312, "p10": -34.980351257324216, "median": 25.336724281311035, "p90": 106.93455505371097, "max": 115.13923645019531, "pos_frac": 0.671875, "sample": [-31.61273193359375, 95.52999877929688, 84.13838958740234, -36.66546630859375, 74.49227905273438, 86.0219955444336, -6.1674346923828125, 115.13923645019531, 19.731735229492188, 40.32927703857422, 23.655044555664062, 109.16510009765625, 81.4565658569336, 112.25968933105469, -43.68256378173828, 93.6185302734375, -19.52320098876953, 111.35174560546875, 33.968475341796875, 6.121223449707031, -19.957435607910156, -46.15483093261719, 94.74696350097656, 112.50288391113281, 86.43479919433594, 111.71163940429688, -35.77415466308594, -39.25865936279297, -4.337928771972656, 18.33128547668457, 3.7296905517578125, -12.225898742675781, 2.25897216796875, 47.739723205566406, 58.06048583984375, 25.22979736328125, -33.128143310546875, -6.313591003417969, -2.2542247772216797, 78.99128723144531, -69.38064575195312, 16.520545959472656, 38.188846588134766, 17.88555145263672, -12.720436096191406, -65.65887451171875, -16.533832550048828, 25.44365119934082, 17.399307250976562, 101.72994995117188, -0.012226104736328125, 100.24295043945312, 13.973823547363281, 75.62483215332031, 32.23217010498047, -10.881368637084961, 26.170753479003906, -0.013128280639648438, 73.72139739990234, 53.991065979003906, 109.8338623046875, 49.66570281982422, 96.07408905029297, 99.13987731933594], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000465.npy"}
|
||||
{"epoch": 0.7029478458049887, "step": 466, "batch_size": 64, "mean": 22.65721321105957, "std": 42.07292556762695, "min": -51.36528015136719, "p10": -22.088626098632805, "median": 14.59817886352539, "p90": 91.07366638183593, "max": 120.75558471679688, "pos_frac": 0.6875, "sample": [-10.0855712890625, 90.04828643798828, -11.839324951171875, -48.626373291015625, 30.90898323059082, -1.3282527923583984, 0.6948013305664062, -5.33856201171875, 16.16490936279297, -50.74505615234375, 24.591049194335938, 38.771263122558594, 96.86283111572266, -34.851776123046875, 36.68458557128906, 20.011642456054688, 68.84232330322266, 118.09323120117188, 20.008956909179688, 111.73828125, 34.945247650146484, 19.62939453125, 91.51311492919922, -0.6833343505859375, -1.950408935546875, 3.0569000244140625, 47.177345275878906, -25.423015594482422, 120.75558471679688, 7.9017333984375, -0.9952392578125, 12.804000854492188, -3.636524200439453, 36.572540283203125, 86.75076293945312, 23.99851417541504, 59.43235778808594, 24.401382446289062, -14.30838394165039, 22.726280212402344, 0.32251739501953125, -13.727874755859375, -31.032739639282227, -40.638214111328125, 37.77947998046875, 9.935317993164062, -3.8244781494140625, 67.0025634765625, 116.36945343017578, 3.2836341857910156, -51.36528015136719, 14.410720825195312, -4.257835388183594, 29.720230102539062, 14.785636901855469, 1.2131977081298828, 13.551729202270508, -1.2213916778564453, 19.28358268737793, 75.35607147216797, 102.62832641601562, 16.73801040649414, 5.149330139160156, 13.325157165527344], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000466.npy"}
|
||||
{"epoch": 0.7044595616024187, "step": 467, "batch_size": 64, "mean": 29.86865997314453, "std": 42.57486343383789, "min": -80.21683502197266, "p10": -8.077302551269531, "median": 23.865142822265625, "p90": 90.47862014770509, "max": 142.12057495117188, "pos_frac": 0.78125, "sample": [8.537551879882812, 59.6042366027832, 15.75244140625, 28.466751098632812, 9.631103515625, 28.019065856933594, -2.631683349609375, -10.712505340576172, 6.345386505126953, 42.225379943847656, 31.672637939453125, -34.4097785949707, 107.93516540527344, -10.007741928100586, 26.20025634765625, 89.488525390625, 29.156661987304688, 9.633705139160156, -29.406997680664062, 7.9150543212890625, 75.4775390625, -71.91365051269531, 6.515892028808594, 90.90294647216797, 112.14524841308594, 3.3165283203125, 12.202964782714844, 50.98313522338867, -80.21683502197266, 67.64706420898438, -3.6496429443359375, 42.30439758300781, 16.49611473083496, 21.5443115234375, -4.0099334716796875, 9.436782836914062, 47.7568359375, 98.18699645996094, 66.31315612792969, 20.83258056640625, 2.0831298828125, -0.118133544921875, -7.9652252197265625, 37.18305206298828, 4.353366851806641, -8.125335693359375, 30.8465576171875, 0.7535476684570312, 76.96240234375, 65.02326965332031, 142.12057495117188, 9.395820617675781, 32.61638641357422, 7.939092636108398, 58.84009552001953, 38.85343933105469, -0.9257278442382812, 114.18061065673828, 26.18597412109375, 76.39588928222656, -0.66534423828125, 85.85685729980469, 97.54745483398438, 26.56885528564453], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000467.npy"}
|
||||
{"epoch": 0.7059712773998488, "step": 468, "batch_size": 64, "mean": 44.47518539428711, "std": 50.341796875, "min": -68.17046356201172, "p10": -13.590483665466307, "median": 33.570953369140625, "p90": 113.95087966918946, "max": 130.03237915039062, "pos_frac": 0.796875, "sample": [6.706892013549805, 26.972597122192383, 41.23910903930664, 122.70561218261719, 30.21839141845703, -14.322038650512695, 2.6545276641845703, 97.54139709472656, 67.87396240234375, 28.759605407714844, 84.23649597167969, 114.47583770751953, 82.93409729003906, -68.17046356201172, 7.8235321044921875, 64.80657196044922, 36.037925720214844, -25.765228271484375, -52.27081298828125, 46.03837585449219, 111.53057861328125, 28.808746337890625, 75.01677703857422, 95.92018127441406, 19.476762771606445, 69.38365173339844, 117.41915130615234, 9.335687637329102, -11.883522033691406, -55.48070526123047, 113.08329772949219, -31.72198486328125, -3.502288818359375, -5.183666229248047, 31.103981018066406, -21.312108993530273, 70.47039031982422, 47.81055450439453, -2.0529632568359375, 97.97764587402344, 17.888961791992188, 19.697784423828125, 104.73875427246094, 122.10397338867188, 109.65628051757812, 6.524120330810547, 25.323034286499023, 12.994674682617188, 15.955299377441406, 25.525503158569336, 114.32270050048828, 44.007347106933594, 67.22987365722656, 18.15057373046875, 126.97171020507812, 38.0697021484375, -3.8430099487304688, 130.03237915039062, 98.61449432373047, 94.6239013671875, 101.70724487304688, 72.44095611572266, 29.838363647460938, -2.8593215942382812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000468.npy"}
|
||||
{"epoch": 0.7074829931972789, "step": 469, "batch_size": 64, "mean": 39.333194732666016, "std": 45.15597915649414, "min": -62.56013488769531, "p10": -5.416951751708984, "median": 31.985523223876953, "p90": 106.85360641479492, "max": 124.77220153808594, "pos_frac": 0.8125, "sample": [11.087448120117188, 87.01820373535156, 69.10780334472656, 57.07152557373047, 3.5275192260742188, 10.715557098388672, 19.697402954101562, 68.83539581298828, 68.59642028808594, -62.56013488769531, 104.7061767578125, 43.91301345825195, 57.21540832519531, -3.5268630981445312, 7.892875671386719, 95.27365112304688, 58.9739990234375, 32.55768585205078, 54.968238830566406, 116.18872833251953, 107.77393341064453, 6.652923583984375, 62.41631317138672, 6.4843902587890625, 84.8292236328125, -54.90502166748047, 40.30675506591797, 5.4651031494140625, 0.9662456512451172, 20.78155517578125, -2.695526123046875, 0.7398090362548828, 51.871734619140625, 124.77220153808594, 102.16539764404297, -6.1449737548828125, 0.3314018249511719, -0.6710128784179688, 109.28294372558594, 101.44564819335938, 82.56185913085938, 45.675819396972656, 18.96963119506836, 61.92986297607422, 4.556495666503906, -5.6902618408203125, 29.225341796875, 1.1580162048339844, 118.54698181152344, 31.413360595703125, -12.018035888671875, -3.2955970764160156, 122.65505981445312, 2.8189620971679688, 108.7744140625, -4.779228210449219, 49.485618591308594, 22.119651794433594, 65.36038208007812, -22.948171615600586, 56.685752868652344, 3.5726146697998047, -9.23358154296875, 86.65032958984375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000469.npy"}
|
||||
{"epoch": 0.708994708994709, "step": 470, "batch_size": 64, "mean": 36.148399353027344, "std": 49.8276252746582, "min": -116.05300903320312, "p10": -18.644834709167476, "median": 37.40645790100098, "p90": 101.08703994750977, "max": 121.11837005615234, "pos_frac": 0.796875, "sample": [79.3126220703125, 44.877220153808594, 120.58775329589844, -116.05300903320312, 39.7791748046875, 79.97589111328125, 28.585617065429688, 37.778236389160156, 99.35298156738281, 69.95831298828125, 94.8548812866211, -66.37904357910156, 115.97411346435547, 14.98651123046875, 95.05010986328125, 14.371315002441406, 60.66621398925781, -15.834569931030273, 0.2357635498046875, 93.34024047851562, -32.07510757446289, 86.972412109375, 7.251222610473633, -45.24092102050781, 103.48526000976562, 42.30830383300781, 93.59845733642578, 75.33256530761719, 16.19275665283203, -15.948480606079102, 47.585662841796875, 0.7997608184814453, 73.80783081054688, 0.23467254638671875, -37.713531494140625, 1.8929214477539062, 27.220890045166016, 3.047475814819336, 16.136672973632812, -22.24553680419922, 52.27418518066406, 88.89498901367188, 101.83020782470703, 3.9434814453125, -13.949729919433594, -19.8004150390625, 13.828556060791016, 66.50601959228516, 60.49751281738281, 108.18750762939453, 39.33774948120117, 108.3100814819336, -8.137771606445312, 57.9048957824707, 7.7974853515625, 75.9530258178711, 37.0346794128418, 53.437225341796875, -4.8653717041015625, 9.782270431518555, 121.11837005615234, -1.1820621490478516, 18.42654037475586, 2.3045196533203125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000470.npy"}
|
||||
{"epoch": 0.7105064247921391, "step": 471, "batch_size": 64, "mean": 45.47620391845703, "std": 52.13991165161133, "min": -94.24186706542969, "p10": -8.40641403198242, "median": 39.73263359069824, "p90": 115.72443466186525, "max": 127.46257019042969, "pos_frac": 0.8125, "sample": [2.9023208618164062, 122.87130737304688, 95.6544189453125, 69.50963592529297, 22.1748046875, -9.391937255859375, 97.15148162841797, 46.86601257324219, 103.39817810058594, 113.25077056884766, 58.43689727783203, 34.679290771484375, 120.7333755493164, 1.3308258056640625, -64.13542175292969, -3.3842830657958984, 39.626220703125, 121.1261978149414, 112.54203796386719, -31.03948974609375, 11.18130111694336, 58.94560623168945, 78.90396881103516, 45.66328430175781, 99.70893096923828, -94.24186706542969, 68.47747039794922, 53.1595458984375, 116.78457641601562, 2.341785430908203, 88.05233764648438, 39.7083625793457, 6.564235687255859, 8.926212310791016, 26.07630157470703, 39.75690460205078, 109.76201629638672, -18.875015258789062, 16.27143096923828, 103.9251937866211, 14.983558654785156, 1.9168701171875, 117.34706115722656, 98.13798522949219, -0.5497417449951172, 4.555511474609375, 60.10993957519531, 69.43354797363281, -42.493309020996094, 91.52342224121094, -8.994651794433594, 113.15129852294922, 17.898818969726562, 4.304939270019531, 72.87946319580078, -4.726509094238281, 13.746688842773438, 84.6422119140625, -7.0338592529296875, 123.1949462890625, 5.9457550048828125, 39.62638854980469, -1.9812545776367188, 127.46257019042969], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000471.npy"}
|
||||
{"epoch": 0.7120181405895691, "step": 472, "batch_size": 64, "mean": 35.099578857421875, "std": 56.92478561401367, "min": -107.66867065429688, "p10": -38.81995162963865, "median": 20.029057502746582, "p90": 114.18789291381836, "max": 127.0496826171875, "pos_frac": 0.75, "sample": [7.704093933105469, 19.342674255371094, 46.751487731933594, -19.841995239257812, 114.23484802246094, 92.8570556640625, -13.328834533691406, 109.45121002197266, 110.70082092285156, 65.29617309570312, 20.035480499267578, 127.0496826171875, 30.82770347595215, -11.93511962890625, -1.8509883880615234, 114.07833099365234, 20.022634506225586, 40.770904541015625, 122.71807861328125, 4.002960205078125, -46.95336151123047, 32.075355529785156, -15.059795379638672, 99.52041625976562, -2.66802978515625, 78.9697265625, 7.51751708984375, 109.86144256591797, 119.32676696777344, -14.650714874267578, 10.706619262695312, 54.37567138671875, 19.283418655395508, -53.84534454345703, -17.661285400390625, 16.846725463867188, 98.33737182617188, 120.51809692382812, -1.3298492431640625, 13.314537048339844, 89.33192443847656, 119.11305236816406, -107.66867065429688, 113.39136505126953, -58.61582946777344, -62.759681701660156, 90.12117004394531, 8.724126815795898, -72.1355209350586, 57.027137756347656, 4.671392440795898, 16.87419319152832, 14.494476318359375, 16.767166137695312, 16.254440307617188, 12.988525390625, 57.65538024902344, 26.452392578125, 117.18154907226562, 20.531997680664062, 50.14622116088867, 36.75907897949219, -52.6579704284668, 104.35261535644531], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000472.npy"}
|
||||
{"epoch": 0.7135298563869993, "step": 473, "batch_size": 64, "mean": 31.687740325927734, "std": 57.43254089355469, "min": -90.04497528076172, "p10": -34.80315475463867, "median": 30.424004554748535, "p90": 113.74971313476563, "max": 137.314453125, "pos_frac": 0.734375, "sample": [111.14312744140625, 120.19828796386719, 2.4053096771240234, -64.58393859863281, -69.97535705566406, 85.67677307128906, 20.254425048828125, 37.02885437011719, 39.261131286621094, 111.73335266113281, 113.98148345947266, 14.237890243530273, -2.204336166381836, 39.03984069824219, 14.32269287109375, -26.591598510742188, 126.61100769042969, 113.20891571044922, 15.685966491699219, 109.11589813232422, -86.22508239746094, 19.14159393310547, 29.701507568359375, -30.694976806640625, -35.866546630859375, 71.26893615722656, 65.01453399658203, 36.996307373046875, 12.505838394165039, -1.4353561401367188, -1.8195343017578125, 1.0613784790039062, -90.04497528076172, -17.35480499267578, -18.493314743041992, 50.632789611816406, 137.314453125, 133.19091796875, 11.952518463134766, -1.574026107788086, 120.05731201171875, 77.55967712402344, 38.1480712890625, 76.89031982421875, 36.14985275268555, 41.81997299194336, 34.242225646972656, 3.097930908203125, 25.050270080566406, 97.9475326538086, 43.30007553100586, 1.77593994140625, 52.4891357421875, 31.243629455566406, -7.076362609863281, 31.146501541137695, 116.0825424194336, -80.34439086914062, 2.27337646484375, -32.32190704345703, 7.700050354003906, 94.36314392089844, 84.00277709960938, -63.40419387817383], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000473.npy"}
|
||||
{"epoch": 0.7150415721844293, "step": 474, "batch_size": 64, "mean": 30.96058464050293, "std": 54.740299224853516, "min": -95.77349853515625, "p10": -27.245189476013177, "median": 17.844528198242188, "p90": 105.50315246582032, "max": 125.56387329101562, "pos_frac": 0.703125, "sample": [114.3905258178711, 93.00752258300781, 74.877197265625, -30.11080551147461, -7.269306182861328, 0.010417938232421875, 65.77935791015625, 125.56387329101562, 67.50630187988281, 16.487030029296875, 28.453811645507812, 44.64996337890625, -95.77349853515625, 66.32637023925781, 20.841018676757812, 104.25540161132812, -69.6694564819336, 4.203636169433594, -19.115535736083984, 14.207931518554688, -29.60715675354004, -6.408012390136719, 96.58831787109375, 25.32417869567871, 92.23190307617188, 16.03556251525879, -1.107645034790039, -21.733932495117188, 0.47048187255859375, -9.837165832519531, 86.0613021850586, 23.230594635009766, -83.31707763671875, -1.6290283203125, -10.221820831298828, 99.7902603149414, 2.6497802734375, 112.12857055664062, 98.5677261352539, 0.9544143676757812, 13.628232955932617, 10.819686889648438, 119.34037017822266, 61.563262939453125, -18.350887298583984, 114.01670837402344, 34.013031005859375, 49.948516845703125, 102.15421295166016, -6.907941818237305, 110.60633850097656, 102.28565216064453, -6.8651580810546875, 38.90320587158203, 6.871368408203125, -18.588916778564453, 106.03790283203125, -70.56916809082031, 19.2020263671875, 31.39385986328125, 99.62158203125, -36.11991882324219, 5.1118927001953125, 4.568660736083984], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000474.npy"}
|
||||
{"epoch": 0.7165532879818595, "step": 475, "batch_size": 64, "mean": 48.320594787597656, "std": 48.60615921020508, "min": -38.070831298828125, "p10": -4.121309661865234, "median": 49.70280456542969, "p90": 119.17060165405275, "max": 130.75686645507812, "pos_frac": 0.875, "sample": [3.373697280883789, 5.940448760986328, 20.114707946777344, 8.542236328125, 90.12838745117188, 53.93922424316406, 112.46884155273438, 120.06732177734375, 9.425786972045898, -8.066596984863281, 88.52159881591797, -22.86907958984375, 65.31678771972656, 121.13174438476562, 4.009439468383789, 21.45782470703125, 85.43769073486328, 96.40855407714844, 13.205814361572266, 67.03910064697266, 16.05675506591797, -13.592071533203125, 1.640645980834961, 14.579559326171875, 64.57515716552734, 5.298236846923828, 19.76842498779297, 50.705169677734375, 56.74706268310547, -4.279808044433594, 127.44719696044922, 117.07825469970703, 101.70559692382812, 127.20893096923828, 25.414216995239258, 2.228282928466797, 82.97549438476562, 1.8292007446289062, 71.85206604003906, 9.822883605957031, 121.03460693359375, 76.780029296875, 5.737247467041016, 19.595733642578125, 71.36382293701172, -27.076229095458984, 4.533672332763672, 58.812042236328125, 124.0948486328125, 5.547538757324219, 116.99069213867188, 130.75686645507812, 105.91879272460938, 86.29463958740234, 100.99891662597656, -26.37017059326172, -38.070831298828125, 92.715087890625, 48.700439453125, 104.44453430175781, 21.133914947509766, 6.494468688964844, -3.7514801025390625, 51.18391036987305], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000475.npy"}
|
||||
{"epoch": 0.7180650037792895, "step": 476, "batch_size": 64, "mean": 32.46859359741211, "std": 52.2520751953125, "min": -81.90811157226562, "p10": -23.80629940032959, "median": 22.465906143188477, "p90": 106.3283142089844, "max": 135.634765625, "pos_frac": 0.703125, "sample": [119.09703826904297, 52.756832122802734, -20.769933700561523, 25.33758544921875, -33.280120849609375, 15.459579467773438, -17.0916748046875, 114.37874603271484, 135.634765625, 15.140838623046875, 108.7200927734375, -0.7812519073486328, 118.87447357177734, 77.48880004882812, 79.66697692871094, 70.04341888427734, 5.3734130859375, 8.031333923339844, 54.981300354003906, 30.945770263671875, -51.58526611328125, 2.8011627197265625, 66.61915588378906, 82.49354553222656, 14.416244506835938, -81.90811157226562, -3.2437667846679688, 13.535102844238281, -9.092330932617188, 19.594226837158203, 123.80210876464844, -16.630813598632812, 69.25057983398438, 2.7776451110839844, 36.19103240966797, -17.485763549804688, -64.59051513671875, -65.91925048828125, 34.07722473144531, 7.78021240234375, -0.8456573486328125, 53.17277526855469, 0.15130615234375, 11.20916748046875, 54.594200134277344, 40.6029052734375, 55.734130859375, 12.142936706542969, -5.539031982421875, 99.95368957519531, 53.26261901855469, -30.164398193359375, 100.74749755859375, -23.577545166015625, 64.34793090820312, 129.59259033203125, -2.054391860961914, 31.935379028320312, 100.55538177490234, 79.40679931640625, -22.0544490814209, 94.34961700439453, 81.4803466796875, -23.90433692932129], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000476.npy"}
|
||||
{"epoch": 0.7195767195767195, "step": 477, "batch_size": 64, "mean": 29.727947235107422, "std": 60.62425994873047, "min": -105.25750732421875, "p10": -63.788605499267575, "median": 23.19241714477539, "p90": 109.52333068847656, "max": 126.35787963867188, "pos_frac": 0.75, "sample": [-65.63655090332031, -64.41059875488281, 102.9908218383789, 34.413108825683594, 9.747833251953125, 78.15249633789062, 6.926567077636719, 9.712053298950195, 3.958587646484375, -0.7907867431640625, 107.64250946044922, 24.075233459472656, -86.05514526367188, -1.5216140747070312, 27.978120803833008, 1.8933486938476562, -5.347259521484375, 126.35787963867188, 57.292572021484375, -29.571374893188477, 86.11128234863281, 97.65971374511719, 22.309600830078125, 66.3700180053711, 77.61078643798828, 0.27211761474609375, -62.33728790283203, 4.441551208496094, 109.0990219116211, 6.190135955810547, -81.06059265136719, 14.846427917480469, 20.600425720214844, 5.745021820068359, -105.25750732421875, -38.74441146850586, -22.44140625, 26.902976989746094, 47.14402770996094, 31.251113891601562, 110.13346862792969, -5.8574676513671875, 104.60179901123047, 119.73377990722656, 44.958030700683594, 117.80321502685547, -85.34827423095703, 41.525390625, 73.62804412841797, -9.57769775390625, 119.34814453125, 48.555519104003906, 7.653810501098633, 95.30337524414062, 114.90623474121094, 40.12773132324219, -102.33084106445312, 17.561614990234375, 85.32205200195312, 109.7051773071289, 15.863595962524414, 99.61654663085938, 5.6384124755859375, 89.19624328613281], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000477.npy"}
|
||||
{"epoch": 0.7210884353741497, "step": 478, "batch_size": 64, "mean": 29.590330123901367, "std": 56.91326904296875, "min": -96.50286102294922, "p10": -34.62805099487304, "median": 19.10400390625, "p90": 113.41901245117188, "max": 123.92558288574219, "pos_frac": 0.703125, "sample": [-1.3405895233154297, 122.14671325683594, 13.142501831054688, 91.62744140625, 58.31729507446289, -0.10840606689453125, 123.92558288574219, 58.162078857421875, -3.5892200469970703, 72.23147583007812, 3.2472686767578125, 2.8553314208984375, 79.65650939941406, 123.42913055419922, -23.55896759033203, 18.018844604492188, -63.69413757324219, 20.189163208007812, -4.8246002197265625, 28.867591857910156, -25.748197555541992, -23.502153396606445, 84.97114562988281, 63.83897399902344, 99.91150665283203, -37.035743713378906, 68.15839385986328, 62.223724365234375, 54.24951171875, -96.50286102294922, 43.58696746826172, 5.817523956298828, -84.95612335205078, -89.3062515258789, 2.903797149658203, 112.28657531738281, 110.45484161376953, 3.394195556640625, 77.93073272705078, 4.239921569824219, 117.84600067138672, -4.750505447387695, -51.157188415527344, 21.340425491333008, 120.31043243408203, 27.641868591308594, 0.3527374267578125, 113.90434265136719, 9.612007141113281, -6.932762145996094, 72.52796936035156, 67.43589782714844, -11.196075439453125, 48.81206512451172, 7.1735992431640625, 118.9677505493164, 102.77833557128906, -19.441749572753906, 2.9033432006835938, 42.66062927246094, -29.010101318359375, -66.7730484008789, 44.69312286376953, 8.464632034301758], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000478.npy"}
|
||||
{"epoch": 0.7226001511715797, "step": 479, "batch_size": 64, "mean": 25.232295989990234, "std": 51.57713317871094, "min": -127.96118927001953, "p10": -31.68003196716308, "median": 22.353254318237305, "p90": 92.08504791259766, "max": 119.43295288085938, "pos_frac": 0.6875, "sample": [17.815624237060547, 21.034961700439453, 42.699432373046875, -32.8775749206543, -22.630422592163086, 34.556243896484375, 118.9967041015625, 48.81183624267578, 73.51496887207031, -19.966697692871094, 32.85417175292969, -23.813312530517578, 4.7159576416015625, -9.632696151733398, -1.2376785278320312, 18.11126708984375, 114.58961486816406, 94.67520904541016, -9.827615737915039, 1.5622062683105469, -37.16485595703125, 22.292736053466797, 116.85699462890625, 107.09959411621094, 28.811403274536133, -72.50563049316406, 22.413772583007812, 4.69464111328125, -40.218299865722656, -28.885765075683594, 0.0099029541015625, 74.26923370361328, 1.7507400512695312, 92.25689697265625, 9.951202392578125, 23.979419708251953, -75.62784576416016, -23.038925170898438, 10.248214721679688, 79.11109161376953, 72.94485473632812, -127.96118927001953, -3.3099746704101562, 48.788543701171875, 91.68406677246094, 119.43295288085938, 72.63821411132812, -62.339263916015625, 4.020538330078125, 30.67145538330078, 27.103973388671875, 79.8246078491211, -27.920320510864258, 71.12982177734375, -2.3513660430908203, 66.15458679199219, 76.54400634765625, 26.019113540649414, 45.55511474609375, 75.74646759033203, -1.7083244323730469, -3.6615753173828125, 44.33678436279297, 71.26702880859375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000479.npy"}
|
||||
{"epoch": 0.7241118669690099, "step": 480, "batch_size": 64, "mean": 38.47267150878906, "std": 56.72149658203125, "min": -124.31047058105469, "p10": -21.239020538330077, "median": 47.88535690307617, "p90": 102.1065742492676, "max": 142.66098022460938, "pos_frac": 0.75, "sample": [49.06822967529297, -9.938201904296875, 90.42340087890625, -80.50165557861328, 88.267578125, 3.5609798431396484, 87.8114242553711, -124.31047058105469, 13.7264404296875, 142.66098022460938, -7.3834228515625, 38.01651382446289, 94.77580261230469, -21.94641876220703, 50.260009765625, -5.0001068115234375, -8.281005859375, 84.66938781738281, -34.540771484375, 7.366611480712891, -64.53190612792969, 86.57884979248047, 46.87725067138672, -69.16851043701172, 92.60335540771484, 95.47550964355469, 132.2913818359375, 48.893463134765625, 105.55801391601562, 23.268646240234375, 65.72736358642578, 0.3030853271484375, 9.994895935058594, 59.365806579589844, 38.53684997558594, 39.94194793701172, 94.38919830322266, 79.79637145996094, -86.26020050048828, 62.201507568359375, 109.12873840332031, 72.9670639038086, 49.934364318847656, 90.9490966796875, -17.459609985351562, 76.6339111328125, 21.439254760742188, -14.039810180664062, 40.32808303833008, 61.90520095825195, 3.1238346099853516, 16.153518676757812, 103.4274673461914, 67.08576965332031, 69.47126007080078, 68.61628723144531, 121.43087005615234, 99.02449035644531, 26.00802230834961, -18.134658813476562, 14.949531555175781, 113.0205078125, -14.67181396484375, -19.588424682617188], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000480.npy"}
|
||||
{"epoch": 0.7256235827664399, "step": 481, "batch_size": 64, "mean": 37.62444305419922, "std": 55.112815856933594, "min": -91.25741577148438, "p10": -27.500361442565918, "median": 25.21712875366211, "p90": 110.92901916503907, "max": 134.88925170898438, "pos_frac": 0.71875, "sample": [48.993797302246094, 7.808860778808594, 115.1959457397461, 16.188697814941406, -23.075042724609375, 72.74305725097656, 91.54405975341797, 62.452938079833984, -37.12718200683594, -30.731380462646484, 105.49896240234375, 14.71923828125, 103.40412902832031, 68.83546447753906, -5.036705017089844, 108.01602172851562, 96.19689178466797, 23.755279541015625, 84.11399841308594, 101.78560638427734, -91.25741577148438, -1.9830360412597656, 93.2739486694336, 36.340980529785156, 1.6771202087402344, -19.519140243530273, 17.833236694335938, 98.51519012451172, 9.800968170166016, 55.821685791015625, -0.1510753631591797, 110.98320007324219, 19.947967529296875, -38.560638427734375, 33.647491455078125, 11.689010620117188, 48.09416961669922, 10.718101501464844, 4.145450592041016, 8.124740600585938, -7.02532958984375, 26.678977966308594, 110.80259704589844, 127.56915283203125, -8.925827026367188, 118.04908752441406, -23.087173461914062, -2.6133499145507812, 116.54718017578125, 103.59452819824219, 91.8717041015625, 17.055755615234375, 21.383468627929688, -40.732688903808594, -27.529329299926758, 131.62954711914062, -79.99398803710938, 37.141746520996094, 66.92698669433594, 134.88925170898438, 60.38288879394531, 30.228469848632812, -27.432769775390625, -3.8711280822753906], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000481.npy"}
|
||||
{"epoch": 0.72713529856387, "step": 482, "batch_size": 64, "mean": 32.40789794921875, "std": 60.664894104003906, "min": -112.40480041503906, "p10": -41.22733459472656, "median": 21.741947174072266, "p90": 118.2738510131836, "max": 146.76126098632812, "pos_frac": 0.71875, "sample": [-4.9208831787109375, -112.40480041503906, -14.691337585449219, 38.782569885253906, 41.4857177734375, 77.5517578125, 7.839080810546875, -44.64807891845703, 3.8273239135742188, 119.66395568847656, 39.02595138549805, 115.0302734375, 76.55565643310547, 22.38524627685547, 86.26298522949219, 71.77625274658203, 64.85980224609375, 19.691368103027344, -103.57810974121094, 69.6540298461914, 122.26343536376953, -13.012725830078125, 146.76126098632812, -14.799484252929688, -0.7293968200683594, 87.77467346191406, 100.01738739013672, 52.48825454711914, -2.010284423828125, 6.108425140380859, -27.634056091308594, 81.76065826416016, -59.615196228027344, 55.76359558105469, 126.3212661743164, 101.15304565429688, 75.90337371826172, 11.751750946044922, 7.369853973388672, -13.384185791015625, 130.67947387695312, 36.857940673828125, 20.415061950683594, 5.1772003173828125, 12.647392272949219, 103.53636169433594, 40.71070098876953, 13.024194717407227, 142.62767028808594, -80.95803833007812, 70.55925750732422, 0.01629638671875, 47.78061294555664, 21.098648071289062, 65.78369140625, 0.5528125762939453, -17.23400115966797, 124.14006042480469, -33.24559783935547, -52.60803985595703, 92.29896545410156, -1.8377609252929688, -89.00668334960938, 2.6887073516845703], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000482.npy"}
|
||||
{"epoch": 0.7286470143613001, "step": 483, "batch_size": 64, "mean": 41.23766326904297, "std": 50.286949157714844, "min": -117.80946350097656, "p10": -10.574667739868163, "median": 34.10135841369629, "p90": 106.75315170288087, "max": 129.8541259765625, "pos_frac": 0.796875, "sample": [5.040672302246094, 104.65058898925781, -20.560466766357422, 42.5790901184082, 87.89657592773438, 117.29991912841797, 93.70811462402344, 12.012046813964844, -11.164749145507812, -9.197811126708984, 31.75836753845215, 75.75392150878906, 56.288909912109375, 108.77436828613281, -14.273143768310547, 79.33566284179688, 34.43586349487305, -17.733848571777344, 4.3019866943359375, 96.24836730957031, 23.484130859375, 113.95979309082031, 104.86476135253906, 6.150970458984375, 112.94428253173828, 55.332801818847656, 93.99546813964844, 6.963554382324219, -6.3184967041015625, 48.775909423828125, 49.15962219238281, -77.00877380371094, 113.049560546875, 0.6298789978027344, 5.050018310546875, 99.79763793945312, 33.76685333251953, -117.80946350097656, 33.680503845214844, -17.89740753173828, 45.30323028564453, 28.523056030273438, 83.13572692871094, 47.194854736328125, -3.9190521240234375, 57.96110916137695, 129.8541259765625, 96.58627319335938, 37.937278747558594, 83.3660888671875, 20.43170928955078, 17.557266235351562, 26.47130584716797, 100.61101531982422, -1.861898422241211, 9.812681198120117, 17.418682098388672, 7.443536758422852, 107.56246185302734, 85.19683074951172, 1.115570068359375, -3.2023773193359375, -2.880290985107422, 87.86508178710938], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000483.npy"}
|
||||
{"epoch": 0.7301587301587301, "step": 484, "batch_size": 64, "mean": 33.390480041503906, "std": 43.17259979248047, "min": -68.09310150146484, "p10": -12.286620712280271, "median": 30.31922721862793, "p90": 99.15954589843754, "max": 123.2499771118164, "pos_frac": 0.734375, "sample": [23.63943099975586, 32.55003356933594, -9.237533569335938, 13.72979736328125, 122.43942260742188, 2.4166183471679688, 38.83192443847656, -35.95488739013672, 103.69076538085938, -14.017486572265625, 19.17969512939453, -68.09310150146484, 14.30496597290039, 49.306148529052734, 70.70187377929688, 41.02665328979492, -13.593372344970703, -3.035938262939453, 69.0948486328125, 48.61290740966797, -3.0068225860595703, 88.58670043945312, 83.67610168457031, 23.078758239746094, 56.001991271972656, 54.91815948486328, 116.41949462890625, 24.368576049804688, 49.04121017456055, -3.530599594116211, 9.583553314208984, 28.25611114501953, -4.7470550537109375, -23.58379364013672, 123.2499771118164, -0.6597557067871094, 23.47041893005371, 21.903518676757812, 33.896602630615234, 8.697463989257812, 66.66059875488281, -6.235282897949219, 116.8878173828125, -7.3131103515625, 87.88367462158203, 57.97954559326172, 109.72730255126953, 7.815700531005859, 66.669921875, 48.773284912109375, 109.61997985839844, 27.11083984375, 38.272430419921875, 2.6550750732421875, -25.09667205810547, -52.599876403808594, 32.38234329223633, -3.49749755859375, 56.33875274658203, 85.03609466552734, 36.58180236816406, 35.07099914550781, 37.69401931762695, -6.640380859375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000484.npy"}
|
||||
{"epoch": 0.7316704459561603, "step": 485, "batch_size": 64, "mean": 33.485816955566406, "std": 50.26402282714844, "min": -106.40438079833984, "p10": -12.16510772705078, "median": 21.852535247802734, "p90": 110.81755828857423, "max": 132.58309936523438, "pos_frac": 0.71875, "sample": [32.134376525878906, 45.73815155029297, 71.32589721679688, 45.83210754394531, 40.80400848388672, 8.070503234863281, 11.45893669128418, -12.834123611450195, 20.527725219726562, -23.262033462524414, -88.45988464355469, 22.406982421875, 120.33167266845703, 98.3238296508789, -0.7857818603515625, 75.24513244628906, 9.99496841430664, 112.30741882324219, 132.44480895996094, 12.804197311401367, 4.853750228881836, 15.83648681640625, -10.689773559570312, 100.7186279296875, -2.2033729553222656, -0.3067893981933594, -106.40438079833984, 70.88447570800781, 21.29808807373047, 66.19905090332031, -12.797393798828125, 4.372928619384766, 41.6314697265625, -0.8006305694580078, 86.45609283447266, -32.58177185058594, 132.58309936523438, 124.33641815185547, -23.774520874023438, 77.20803833007812, 25.806812286376953, -10.515741348266602, 42.37603759765625, -3.3173980712890625, 29.375213623046875, -1.3602886199951172, -2.717761993408203, 107.34121704101562, 77.62027740478516, 11.235855102539062, 7.685840606689453, 30.73573112487793, 78.68743896484375, 121.42537689208984, 33.15617370605469, 79.74668884277344, 58.822654724121094, 119.99490356445312, -4.152305603027344, -3.665445327758789, 0.0097198486328125, 7.681468963623047, 10.008163452148438, 35.88275909423828], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000485.npy"}
|
||||
{"epoch": 0.7331821617535903, "step": 486, "batch_size": 64, "mean": 28.059404373168945, "std": 53.31881332397461, "min": -107.22870635986328, "p10": -29.954524421691893, "median": 18.961151123046875, "p90": 114.95432662963869, "max": 156.03143310546875, "pos_frac": 0.6875, "sample": [34.132972717285156, 5.283294677734375, 84.2850341796875, 10.997858047485352, 0.000762939453125, -4.788928985595703, 28.911773681640625, 156.03143310546875, 12.073814392089844, -4.2416839599609375, -29.88749885559082, 66.40621948242188, 54.198158264160156, 20.91461753845215, 58.63713836669922, 8.384902954101562, -0.6379852294921875, 116.65589141845703, -1.0675392150878906, 76.0098876953125, 48.475257873535156, 23.768463134765625, -29.98324966430664, -7.70037841796875, 130.6185302734375, -57.81317138671875, 18.0975341796875, -37.59279251098633, 53.085914611816406, -64.61388397216797, 85.60226440429688, 38.56171417236328, 1.774139404296875, -86.93946075439453, 18.53798484802246, 47.34406280517578, 43.16862487792969, 69.44448852539062, -32.54170227050781, 4.454029083251953, -0.022167205810546875, 110.9840087890625, 19.11962890625, 133.383544921875, 119.20513916015625, -18.649627685546875, 9.195236206054688, 82.00572967529297, 41.90234375, 116.88894653320312, 32.0703125, -10.61273193359375, 18.80267333984375, 76.21142578125, -16.765342712402344, -2.6412506103515625, 9.370433807373047, 46.20268249511719, 129.39085388183594, -3.7255420684814453, 27.351577758789062, -107.22870635986328, 33.1656608581543, -7.851570129394531], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000486.npy"}
|
||||
{"epoch": 0.7346938775510204, "step": 487, "batch_size": 64, "mean": 32.34306716918945, "std": 50.281219482421875, "min": -85.63639068603516, "p10": -18.32132110595703, "median": 24.495439529418945, "p90": 106.39665908813481, "max": 141.52755737304688, "pos_frac": 0.765625, "sample": [96.82246398925781, -39.9483642578125, 19.084632873535156, 60.49968719482422, 49.92127990722656, 18.623647689819336, -15.087570190429688, 27.548614501953125, 86.37708282470703, 94.45452880859375, 36.10443115234375, -19.70721435546875, 15.788497924804688, -14.077766418457031, -22.73932647705078, 122.06370544433594, 93.94886779785156, 50.75082778930664, 120.7353515625, 56.526611328125, -2.9431686401367188, 13.04364013671875, 34.59166717529297, 111.44224548339844, 55.892616271972656, -7.495937347412109, 9.721595764160156, -78.78897094726562, 0.6856460571289062, 6.01605224609375, 117.09407043457031, 24.078269958496094, 32.47819519042969, -2.633737564086914, 110.49988555908203, 15.172080993652344, 31.391958236694336, -48.73652648925781, 32.82080078125, 53.90687561035156, 72.10589599609375, 24.875408172607422, -12.61456298828125, 79.39784240722656, -80.31326293945312, 23.559261322021484, 4.380516052246094, 0.5055694580078125, 17.005205154418945, -85.63639068603516, -6.720300674438477, 117.38526916503906, 24.11547088623047, 20.20250701904297, 13.125869750976562, 88.8558120727539, 17.26803970336914, 37.474884033203125, -9.47296142578125, 95.94770812988281, 68.89818572998047, 45.580833435058594, 141.52755737304688, 26.574840545654297], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000487.npy"}
|
||||
{"epoch": 0.7362055933484505, "step": 488, "batch_size": 64, "mean": 39.08435821533203, "std": 49.79668426513672, "min": -87.72065734863281, "p10": -10.004100799560545, "median": 26.05753517150879, "p90": 117.16454696655273, "max": 132.1253662109375, "pos_frac": 0.765625, "sample": [7.0184173583984375, -11.075897216796875, -1.4245929718017578, -5.077482223510742, 22.890357971191406, -22.46129035949707, -19.43295669555664, 99.02909088134766, 98.07389831542969, 47.51804733276367, 18.74403190612793, 13.544075012207031, 124.27722930908203, 6.407569885253906, 0.03847503662109375, 132.1253662109375, 24.41492462158203, 108.18582153320312, 3.3942337036132812, 29.85805892944336, 56.856224060058594, 18.60802459716797, -7.503242492675781, 2.0776519775390625, -3.3092498779296875, 16.641075134277344, -38.23240661621094, 121.85540008544922, 7.5355987548828125, 120.61251831054688, 24.9625244140625, -33.341331481933594, -1.0885677337646484, 116.66539001464844, 93.66265869140625, 41.943138122558594, 111.49177551269531, -6.886039733886719, 117.37847137451172, 89.28973388671875, 91.54869079589844, 60.59422302246094, 71.89199829101562, 0.036956787109375, 64.76042938232422, 53.55561065673828, -4.8738250732421875, 40.57098388671875, -28.186843872070312, 48.28999328613281, 32.43721008300781, 54.28437805175781, -87.72065734863281, 60.57725524902344, 18.556991577148438, 89.38785552978516, 66.16744995117188, 56.341365814208984, 123.55059814453125, 27.152545928955078, 6.731170654296875, 12.27972412109375, -4.282310485839844, 122.4802474975586], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000488.npy"}
|
||||
{"epoch": 0.7377173091458806, "step": 489, "batch_size": 64, "mean": 22.175716400146484, "std": 52.84355545043945, "min": -94.8982162475586, "p10": -43.097480201721176, "median": 13.173758506774902, "p90": 103.76520080566408, "max": 121.76625061035156, "pos_frac": 0.65625, "sample": [-89.05892944335938, 79.73628997802734, 27.26678466796875, 65.74126434326172, 74.98283386230469, 107.00122833251953, -2.8730010986328125, 53.281410217285156, 39.98294448852539, 121.76625061035156, 73.6015853881836, 6.459342956542969, 30.635101318359375, -64.02995300292969, -2.2356300354003906, -21.647125244140625, -10.466598510742188, -17.820941925048828, 104.90625, 38.69248962402344, 116.3026123046875, 68.33514404296875, 40.189064025878906, 0.8328819274902344, 62.872100830078125, 95.69242858886719, -17.37166976928711, -18.688865661621094, 2.392120361328125, 16.24770736694336, -94.8982162475586, -50.64608383178711, 101.10275268554688, 20.48847198486328, -0.6112213134765625, -8.123664855957031, -9.924110412597656, 1.4448413848876953, 95.6616439819336, 25.365074157714844, 13.095243453979492, 16.365386962890625, -7.107597351074219, 117.75100708007812, 32.63622283935547, -54.04829025268555, 111.34905242919922, 4.024566650390625, 49.27366638183594, 13.252273559570312, -69.04488372802734, 2.2505416870117188, 2.4129981994628906, 1.9498043060302734, -3.3815689086914062, -63.296661376953125, 82.68821716308594, 12.226615905761719, -23.34069061279297, 14.034164428710938, -25.484071731567383, 117.00508117675781, -5.651773452758789, 17.701919555664062], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000489.npy"}
|
||||
{"epoch": 0.7392290249433107, "step": 490, "batch_size": 64, "mean": 29.84263038635254, "std": 61.73842239379883, "min": -115.27910614013672, "p10": -47.114870071411126, "median": 26.37110710144043, "p90": 117.34560089111329, "max": 129.3577423095703, "pos_frac": 0.671875, "sample": [-7.916007995605469, -19.123825073242188, -113.17607116699219, 30.09253692626953, -10.712244033813477, 56.118919372558594, 2.1739349365234375, 99.029541015625, 126.44474792480469, 119.0892333984375, -102.89849090576172, 120.84152221679688, -115.27910614013672, 5.494688034057617, 73.93325805664062, 41.852073669433594, 91.62664031982422, 35.38945770263672, 103.21548461914062, 17.81291961669922, 2.80615234375, 106.98257446289062, 96.89734649658203, 98.93297576904297, -37.217918395996094, -5.498748779296875, 120.26116943359375, 1.0852851867675781, 22.649677276611328, 12.969612121582031, -62.7896728515625, 57.030120849609375, 44.18827819824219, 63.43180847167969, -3.574148178100586, 7.031219482421875, -3.9153213500976562, 123.27767944335938, -53.681732177734375, -3.1626510620117188, 43.78301239013672, -33.16442108154297, 17.191299438476562, 92.13262176513672, -6.449174880981445, 70.17822265625, 4.359466552734375, -2.3397388458251953, -41.798927307128906, -0.07989501953125, 81.90350341796875, 65.06592559814453, -1.3900489807128906, 53.95375442504883, 129.3577423095703, 38.75026321411133, 118.253662109375, 30.166404724121094, 38.36803436279297, 115.22679138183594, -96.12910461425781, -49.393131256103516, 20.9925537109375, 79.27659606933594], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000490.npy"}
|
||||
{"epoch": 0.7407407407407407, "step": 491, "batch_size": 64, "mean": 33.843692779541016, "std": 49.47552490234375, "min": -88.97715759277344, "p10": -20.117404174804683, "median": 15.480518341064453, "p90": 105.99823760986328, "max": 116.34678649902344, "pos_frac": 0.78125, "sample": [73.03836059570312, -9.876449584960938, 83.08035278320312, 27.0445556640625, 93.5617904663086, 73.93492126464844, 0.7165050506591797, 43.89982604980469, 88.63994598388672, 9.171138763427734, 20.218887329101562, -75.47615814208984, 106.92366027832031, 8.369743347167969, 100.93293762207031, -31.53418731689453, 4.380207061767578, 86.35159301757812, 3.9353904724121094, 59.30043029785156, -35.178070068359375, 14.635589599609375, 111.24620819091797, 3.4767189025878906, 65.53907012939453, 81.86168670654297, 5.426307678222656, -8.804092407226562, 16.32544708251953, 8.82297134399414, 116.34678649902344, 76.21560668945312, -22.324539184570312, 24.157875061035156, 94.26277923583984, 42.081363677978516, 0.42569732666015625, 105.27293395996094, 12.3492431640625, -14.967422485351562, 107.00700378417969, -1.224822998046875, 10.334037780761719, 12.246763229370117, 6.104442596435547, 73.07261657714844, -8.2838134765625, -31.799903869628906, 74.09880828857422, 111.07614135742188, 10.055130004882812, 80.02635192871094, 106.30908203125, 107.83119201660156, 10.794595718383789, -2.8273162841796875, 14.503448486328125, 44.19707489013672, 47.501014709472656, -88.97715759277344, -45.7730712890625, 2.146944046020508, -7.844074249267578, 71.63626098632812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000491.npy"}
|
||||
{"epoch": 0.7422524565381708, "step": 492, "batch_size": 64, "mean": 26.7197265625, "std": 54.27701187133789, "min": -105.67271423339844, "p10": -24.36770553588867, "median": 13.576786041259766, "p90": 100.9981170654297, "max": 130.8223419189453, "pos_frac": 0.640625, "sample": [130.8223419189453, 50.44384765625, -37.61936569213867, 5.328500747680664, 117.64238739013672, -4.367914199829102, 87.32330322265625, 31.33584976196289, 48.50566101074219, -5.706750869750977, 54.40022277832031, 100.58616638183594, 11.535186767578125, -1.9549636840820312, 14.288681030273438, 24.196075439453125, -105.67271423339844, 45.428436279296875, 2.557126998901367, 28.070144653320312, 81.21492004394531, -23.224464416503906, 3.1291160583496094, -21.872093200683594, 12.864891052246094, 66.51364135742188, 50.044921875, 125.48809814453125, -12.51043701171875, -22.876632690429688, 119.46577453613281, 31.169939041137695, -12.045707702636719, 43.555076599121094, -6.2942352294921875, 16.50537109375, -78.27949523925781, 122.5436782836914, -24.857666015625, -16.110740661621094, 96.06512451171875, 8.008384704589844, 58.532310485839844, -4.464580535888672, -19.87872314453125, 77.87297821044922, 48.987117767333984, 16.64662742614746, -27.944725036621094, 97.00782012939453, -101.96783447265625, -15.099273681640625, -2.5960235595703125, 91.31306457519531, 101.17466735839844, 2.912059783935547, 85.13777160644531, 1.85589599609375, 76.25163269042969, -7.2745208740234375, 2.973651885986328, -31.081085205078125, -13.918800354003906, 117.98284149169922], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000492.npy"}
|
||||
{"epoch": 0.7437641723356009, "step": 493, "batch_size": 64, "mean": 38.8165283203125, "std": 54.13271713256836, "min": -111.59317016601562, "p10": -17.902690124511714, "median": 22.868640899658203, "p90": 109.41434555053712, "max": 129.5365753173828, "pos_frac": 0.828125, "sample": [20.704078674316406, 16.491161346435547, -4.210418701171875, 95.71073913574219, 8.039756774902344, 22.52655029296875, -111.59317016601562, 19.705379486083984, 10.470705032348633, 11.105018615722656, 45.93257141113281, 98.92617797851562, 56.99127197265625, 127.09998321533203, 16.910228729248047, -14.964614868164062, 76.97494506835938, 0.46053123474121094, 86.80572509765625, 106.39561462402344, 15.961786270141602, 4.536582946777344, -1.4156208038330078, 44.43803405761719, 37.108726501464844, -55.125396728515625, 107.92137145996094, 14.396751403808594, 129.5365753173828, 62.64684295654297, 49.03047180175781, 88.115966796875, -21.647232055664062, 2.5864830017089844, -19.161865234375, 69.30135345458984, 119.89010620117188, -35.894134521484375, 84.82685852050781, 2.8497180938720703, 110.05419158935547, 57.45027160644531, 89.92613220214844, 12.422760009765625, 12.711601257324219, 126.14329528808594, -45.410430908203125, 105.42276000976562, 98.22015380859375, 4.400459289550781, -3.214385986328125, 82.7359619140625, 23.210731506347656, 9.515274047851562, 124.83567810058594, 76.15017700195312, 29.192123413085938, 11.633331298828125, 63.849700927734375, 123.18413543701172, 12.181968688964844, 20.94469451904297, 52.747901916503906, -104.43616485595703], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000493.npy"}
|
||||
{"epoch": 0.745275888133031, "step": 494, "batch_size": 64, "mean": 28.798973083496094, "std": 59.797447204589844, "min": -127.6979751586914, "p10": -25.14305534362793, "median": 16.19752311706543, "p90": 105.51667709350586, "max": 131.400390625, "pos_frac": 0.71875, "sample": [91.5146484375, 3.1454925537109375, 16.273597717285156, 7.5778961181640625, 95.03243255615234, 3.02435302734375, -19.415264129638672, 91.73472595214844, 46.94769287109375, -55.933258056640625, 16.088783264160156, -22.47631072998047, 11.933258056640625, 77.97283172607422, 16.121448516845703, 29.95989990234375, 105.12036895751953, -3.8431549072265625, 36.7703971862793, -15.819915771484375, 4.423213958740234, 130.85948181152344, 65.92094421386719, 18.314071655273438, 30.23863983154297, 118.09957885742188, 88.27117919921875, -0.5688095092773438, 66.97625732421875, 107.39908599853516, 52.95716094970703, -117.85780334472656, -116.4261703491211, 100.04643249511719, -4.659875869750977, -76.38819885253906, -59.55952453613281, 92.6136703491211, 8.147964477539062, 78.37258911132812, 6.6269989013671875, 0.5068225860595703, -127.6979751586914, -1.4341697692871094, 56.86648178100586, 37.46533966064453, 104.63255310058594, 6.528470993041992, -4.770362854003906, 5.914093017578125, 126.73233032226562, 38.44769287109375, 48.46174621582031, 131.400390625, 10.345115661621094, 25.857398986816406, -14.209991455078125, -20.106857299804688, 82.188232421875, 11.556793212890625, 105.6865234375, -2.8146514892578125, -26.285945892333984, 122.32756805419922], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000494.npy"}
|
||||
{"epoch": 0.7467876039304611, "step": 495, "batch_size": 64, "mean": 26.206092834472656, "std": 51.715362548828125, "min": -88.35418701171875, "p10": -26.246645736694326, "median": 17.422714233398438, "p90": 111.28124008178713, "max": 128.25387573242188, "pos_frac": 0.6875, "sample": [-42.59404754638672, -14.219535827636719, -81.54527282714844, -88.35418701171875, 104.54895782470703, -6.959442138671875, -45.59688949584961, 22.920425415039062, 53.19205093383789, -29.7081298828125, -11.133209228515625, -79.69519805908203, 0.1641693115234375, 43.86257553100586, 44.55872344970703, 119.46900177001953, -6.940254211425781, 19.610076904296875, 73.29696655273438, 1.1248340606689453, 32.56679153442383, 42.06523132324219, 81.48638916015625, -0.9253444671630859, 91.7497329711914, 115.28094482421875, -13.477848052978516, 43.64842224121094, 0.7953262329101562, 12.877002716064453, 123.721923828125, 20.293466567993164, 5.278770446777344, -18.169849395751953, 21.45876121520996, 11.869041442871094, -9.828521728515625, 73.66731262207031, 85.05032348632812, 99.51094055175781, -32.40885925292969, 4.4196929931640625, 40.451236724853516, 42.11628723144531, 128.25387573242188, 0.38339996337890625, 31.080575942993164, -2.543781280517578, -5.666893005371094, 123.96749114990234, -7.730279922485352, 86.08131408691406, 19.68419647216797, -8.09393310546875, -9.829109191894531, 114.16650390625, 15.2353515625, 0.5299224853515625, 35.294921875, 21.741607666015625, 54.226951599121094, 3.832263946533203, 121.77217102050781, 5.304660797119141], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000495.npy"}
|
||||
{"epoch": 0.7482993197278912, "step": 496, "batch_size": 64, "mean": 32.22122573852539, "std": 55.537330627441406, "min": -81.45230102539062, "p10": -39.944210815429685, "median": 24.797574043273926, "p90": 103.9044860839844, "max": 171.45733642578125, "pos_frac": 0.71875, "sample": [67.23226928710938, 66.5765380859375, 131.11349487304688, 67.37823486328125, -49.89678192138672, 44.46893310546875, 62.239105224609375, 96.12777709960938, 32.598777770996094, 76.55284118652344, -55.61647033691406, 2.6520767211914062, 46.48468780517578, 24.784225463867188, 0.12473869323730469, 6.415130615234375, -7.321537017822266, 24.810922622680664, 91.20549011230469, 22.389488220214844, -40.977928161621094, 171.45733642578125, -1.3507652282714844, 108.13811492919922, 56.076332092285156, -81.45230102539062, 141.58712768554688, 80.46127319335938, -2.051084518432617, 76.0869140625, -58.231285095214844, 6.108676910400391, 118.67366027832031, 105.72671508789062, 4.117584228515625, 8.513017654418945, -40.25555419921875, -15.181055068969727, 9.314834594726562, 59.41675567626953, 29.238006591796875, 25.238052368164062, 91.27169799804688, 59.8518180847168, 69.73262786865234, 20.161203384399414, -32.10490417480469, 54.84925842285156, -7.1885833740234375, 6.413951873779297, 124.13339233398438, -70.04891204833984, -17.082595825195312, 2.7464351654052734, 99.65261840820312, 83.79110717773438, 2.387298583984375, 96.4100570678711, -37.99630355834961, 30.954559326171875, -39.217742919921875, 21.36806297302246, -8.389591217041016, -0.5113296508789062], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000496.npy"}
|
||||
{"epoch": 0.7498110355253212, "step": 497, "batch_size": 64, "mean": 41.654964447021484, "std": 52.46949005126953, "min": -100.00159454345703, "p10": -8.271884918212889, "median": 33.55369567871094, "p90": 110.75944900512695, "max": 139.06951904296875, "pos_frac": 0.765625, "sample": [123.85299682617188, 20.080001831054688, 139.06951904296875, 109.71224212646484, 74.589599609375, -10.949325561523438, 5.5915069580078125, -0.2779083251953125, 7.530782699584961, 31.920578002929688, 69.13909912109375, 50.382843017578125, 85.91082763671875, 9.778297424316406, 98.81979370117188, 12.045669555664062, 68.43148803710938, -100.00159454345703, 111.208251953125, 106.25452423095703, -0.6975688934326172, 0.7258033752441406, -4.199455261230469, -2.1357040405273438, 32.42688751220703, -9.021438598632812, -4.1121826171875, 70.26852416992188, 73.9897232055664, 115.37698364257812, -3.4475269317626953, 39.04960632324219, 75.38327026367188, 87.31962585449219, -91.67911529541016, 94.08108520507812, 103.19852447509766, 76.75132751464844, 68.503173828125, -6.522926330566406, 116.64944458007812, 60.57353210449219, 4.336376190185547, 33.719482421875, 23.516006469726562, -12.875640869140625, 130.6721954345703, 106.30498504638672, 2.6875991821289062, 11.817459106445312, 39.83639144897461, 18.34320068359375, 34.338043212890625, 75.87611389160156, 31.54998016357422, 132.0147705078125, -35.529449462890625, 2.763874053955078, -2.066028594970703, 97.07675170898438, 49.410797119140625, 28.06220245361328, -44.8961181640625, 33.387908935546875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000497.npy"}
|
||||
{"epoch": 0.7513227513227513, "step": 498, "batch_size": 64, "mean": 37.555335998535156, "std": 48.4223518371582, "min": -83.03231811523438, "p10": -10.131139755249023, "median": 26.750110626220703, "p90": 109.31200408935547, "max": 127.694580078125, "pos_frac": 0.765625, "sample": [-4.2025909423828125, 36.31169891357422, 109.25138854980469, 43.99738693237305, -11.05145263671875, -8.481487274169922, 88.41596221923828, 97.619873046875, -10.191638946533203, -9.989974975585938, 91.42640686035156, 22.33187484741211, 30.776580810546875, 2.7949752807617188, 83.29147338867188, 25.124267578125, 3.6422348022460938, -13.304609298706055, 6.559608459472656, 78.3267822265625, 127.694580078125, 28.375953674316406, 109.33798217773438, 4.5400238037109375, 20.7447509765625, 49.91661071777344, -16.314231872558594, 99.8305435180664, 97.36442565917969, 8.565155029296875, 98.63534545898438, 103.14132690429688, 0.2315216064453125, 30.282180786132812, 45.609336853027344, -35.12420654296875, -2.7898788452148438, 32.2684326171875, 0.6751537322998047, -18.30158805847168, 38.48414611816406, 118.27953338623047, -83.03231811523438, 72.98353576660156, 44.285377502441406, 11.84466552734375, 0.3175163269042969, 52.238487243652344, 23.33725357055664, 33.21523666381836, 115.24376678466797, 126.68484497070312, 37.69086456298828, 4.313346862792969, 90.57859802246094, 1.7561397552490234, 127.01414489746094, 12.555023193359375, -1.8280048370361328, -3.55242919921875, -3.9427490234375, -3.5380325317382812, 14.125663757324219, 127.15489196777344], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000498.npy"}
|
||||
{"epoch": 0.7528344671201814, "step": 499, "batch_size": 64, "mean": 35.225341796875, "std": 59.13756561279297, "min": -105.60672760009766, "p10": -50.2309829711914, "median": 34.930416107177734, "p90": 117.49747772216799, "max": 134.75958251953125, "pos_frac": 0.734375, "sample": [78.14118194580078, 104.64273071289062, -105.60672760009766, 94.06167602539062, 10.063018798828125, 60.51432418823242, 25.83977699279785, -93.1931381225586, 5.657657623291016, 66.08662414550781, 57.19488525390625, 57.49681091308594, 125.14553833007812, -62.69380187988281, 103.282470703125, 9.532371520996094, 134.75958251953125, -69.13671112060547, 123.65214538574219, 111.7458267211914, 28.566085815429688, -8.362564086914062, -10.575067520141602, -44.881317138671875, -33.148704528808594, 39.444854736328125, 9.92901611328125, 102.19842529296875, -17.152509689331055, 58.00695037841797, -7.094329833984375, 2.979633331298828, 64.99159240722656, 58.45912170410156, 35.95359802246094, -5.050384521484375, 12.887069702148438, 69.00403594970703, -7.496965408325195, 48.07194900512695, 126.36751556396484, 5.937019348144531, 110.45677947998047, 22.336795806884766, -52.52369689941406, 33.90723419189453, 33.17052459716797, 71.7920150756836, 50.35992431640625, 130.6316680908203, 107.22525024414062, 8.387809753417969, 50.834442138671875, -53.507659912109375, -71.24356079101562, 54.98468017578125, 79.41832733154297, 69.58861541748047, -23.83050537109375, 120.21343994140625, -0.5719985961914062, 119.96247100830078, 26.400047302246094, 0.2080078125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000499.npy"}
|
||||
{"epoch": 0.7543461829176115, "step": 500, "batch_size": 64, "mean": 34.61691665649414, "std": 56.052650451660156, "min": -105.37640380859375, "p10": -39.059464263916006, "median": 30.035367965698242, "p90": 112.06812896728516, "max": 123.30451965332031, "pos_frac": 0.78125, "sample": [-43.56434631347656, 59.46601104736328, 9.648200988769531, 5.314390182495117, 53.414703369140625, 38.141632080078125, 57.73286437988281, 100.13508605957031, 31.490585327148438, 121.741943359375, 23.077713012695312, 117.95707702636719, 105.08514404296875, -28.548072814941406, 112.51695251464844, 33.37965393066406, 123.30451965332031, 26.78155517578125, 34.416404724121094, -13.569122314453125, 117.41888427734375, 87.67053985595703, 15.884193420410156, 65.34420776367188, -9.035690307617188, 116.04031372070312, 71.65574645996094, 1.4403648376464844, -0.04935455322265625, -61.469383239746094, -53.22602844238281, 28.580150604248047, 3.3956260681152344, 7.6036376953125, -52.4217414855957, -3.9551944732666016, 15.706466674804688, 91.38581848144531, 93.40220642089844, 0.5219802856445312, 77.41500854492188, 1.8578243255615234, 44.46623229980469, 20.773990631103516, -27.46963119506836, 79.05616760253906, 20.40998077392578, -105.37640380859375, 108.83795166015625, -104.78500366210938, 121.23912048339844, 13.100250244140625, 1.2274131774902344, 41.08598327636719, 68.89168548583984, 14.27553939819336, 10.341140747070312, 41.82801818847656, 85.52111053466797, -2.7362918853759766, -64.1835708618164, 66.80765533447266, 111.0208740234375, 88.0619888305664], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000500.npy"}
|
||||
{"epoch": 0.7558578987150416, "step": 501, "batch_size": 64, "mean": 30.07050323486328, "std": 41.658424377441406, "min": -101.25294494628906, "p10": -5.421255493164061, "median": 15.315861701965332, "p90": 92.87111358642579, "max": 120.53164672851562, "pos_frac": 0.765625, "sample": [15.144828796386719, 93.74053955078125, -3.8057098388671875, 57.8158073425293, 49.60938262939453, 53.271644592285156, 9.717941284179688, 7.88702392578125, 8.289043426513672, 90.42089080810547, 88.31149291992188, 23.349607467651367, 70.25712585449219, 6.602226257324219, -10.007417678833008, -0.0001544952392578125, 42.889808654785156, 110.21162414550781, -12.606071472167969, -2.8309268951416016, 7.269615173339844, -1.1031837463378906, -1.3213996887207031, 90.84245300292969, 110.96173095703125, 89.22833251953125, 66.55775451660156, 120.53164672851562, 13.123912811279297, -14.35821533203125, 16.23546600341797, -2.4778594970703125, -2.147846221923828, 1.0017738342285156, 26.51995849609375, 8.45404052734375, 66.42710876464844, 1.3831253051757812, 11.514263153076172, 10.238615036010742, 29.547842025756836, 95.81392669677734, 3.471843719482422, 28.13410186767578, 1.8837203979492188, -101.25294494628906, 5.148750305175781, 28.116561889648438, -23.23514175415039, 30.653968811035156, 23.331024169921875, 65.0010757446289, 19.265174865722656, -6.1136322021484375, 46.18794250488281, -12.556131362915039, 10.801666259765625, 15.486894607543945, 99.6639175415039, 105.49391174316406, 71.72187042236328, 7.052581787109375, 64.61753845214844, -0.874267578125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000501.npy"}
|
||||
{"epoch": 0.7573696145124716, "step": 502, "batch_size": 64, "mean": 30.78525161743164, "std": 44.2186164855957, "min": -46.50642395019531, "p10": -21.981692886352533, "median": 19.494686126708984, "p90": 91.6656318664551, "max": 138.6575469970703, "pos_frac": 0.78125, "sample": [-6.485496520996094, 3.24676513671875, 19.242164611816406, -33.48432922363281, 68.02957153320312, 69.00614929199219, 8.441062927246094, 0.4712066650390625, -44.65522003173828, 74.95175170898438, 9.05251693725586, 111.23158264160156, 1.5284576416015625, -34.02942657470703, 18.535675048828125, 128.45449829101562, 6.99853515625, -46.50642395019531, 1.7275543212890625, 21.0792236328125, 11.381744384765625, -6.643741607666016, 50.19764709472656, 57.26838684082031, 72.28558349609375, 18.291030883789062, 99.38460540771484, -17.83981704711914, 0.42273902893066406, 19.747207641601562, 3.98541259765625, -2.1283702850341797, 59.44976043701172, 2.474020004272461, 31.820438385009766, 52.07640075683594, 124.15911865234375, 138.6575469970703, -33.949249267578125, -0.4591941833496094, -23.75678253173828, 24.783409118652344, 28.366491317749023, 4.9215850830078125, 42.3978271484375, 7.441375732421875, 9.552963256835938, 66.25292205810547, 81.44507598876953, -38.565284729003906, 43.77857971191406, 26.624435424804688, 85.21800231933594, 34.18013000488281, 83.78974914550781, -2.849395751953125, 31.700607299804688, 94.42890167236328, -0.7023391723632812, 111.66206359863281, 16.66879653930664, 61.229393005371094, 52.80072021484375, 71.46979522705078], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000502.npy"}
|
||||
{"epoch": 0.7588813303099018, "step": 503, "batch_size": 64, "mean": 35.4834098815918, "std": 54.07965850830078, "min": -100.69805908203125, "p10": -16.729939460754395, "median": 21.442964553833008, "p90": 112.3438804626465, "max": 125.45118713378906, "pos_frac": 0.8125, "sample": [66.8515853881836, -4.429866790771484, -10.1046142578125, 16.11223793029785, 67.76305389404297, 9.392921447753906, 108.1651611328125, -81.14739990234375, 3.1372146606445312, 65.70735168457031, 79.90487670898438, 22.664886474609375, 16.434371948242188, 125.45118713378906, -9.648384094238281, 78.55116271972656, -15.820966720581055, 18.249252319335938, 108.01606750488281, -37.179588317871094, 1.4673385620117188, 63.7083854675293, 110.7431869506836, 44.10247802734375, 13.449447631835938, 8.5159912109375, 78.76372528076172, 120.7081069946289, 20.22104263305664, 54.472198486328125, -12.650676727294922, 16.708770751953125, 115.86766052246094, 114.3974609375, 115.25912475585938, 6.9068603515625, 51.308082580566406, -17.11949920654297, -50.52301025390625, 15.731819152832031, 1.4134788513183594, 30.330154418945312, 1.6800670623779297, 50.273399353027344, 0.8186416625976562, 5.3295135498046875, 23.980979919433594, 98.69453430175781, 87.20582580566406, 6.424571990966797, 90.89637756347656, 113.02989196777344, -100.69805908203125, 2.4029541015625, 122.66560363769531, 79.52238464355469, 5.220552444458008, -46.66449737548828, 58.36042785644531, 81.57325744628906, 12.90955924987793, 50.009315490722656, 84.86510467529297, -89.41473388671875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000503.npy"}
|
||||
{"epoch": 0.7603930461073318, "step": 504, "batch_size": 64, "mean": 45.58683395385742, "std": 47.819759368896484, "min": -56.54270935058594, "p10": -2.9340175628662104, "median": 39.442291259765625, "p90": 113.21696319580079, "max": 141.991455078125, "pos_frac": 0.828125, "sample": [2.663299560546875, 115.57716369628906, -12.217971801757812, 74.42179107666016, 107.02851867675781, -0.3288726806640625, 74.45901489257812, 37.42913055419922, 126.54998779296875, -1.4995098114013672, 5.1902313232421875, 36.235015869140625, 17.521148681640625, 119.75776672363281, 25.08249282836914, 88.8727035522461, 98.06282043457031, -38.436500549316406, 66.6072998046875, 105.6122055053711, 10.52315902709961, 46.10879898071289, 1.72979736328125, 95.86076354980469, 120.83305358886719, -2.6247100830078125, 50.38996887207031, 114.38735961914062, 41.45545196533203, 141.991455078125, 97.47897338867188, 110.48603820800781, -1.7389163970947266, 33.4231071472168, 56.055145263671875, 14.34490966796875, 2.5860824584960938, -50.176307678222656, 78.68804931640625, 42.751800537109375, 90.15457916259766, -3.367889404296875, -3.066577911376953, 3.712646484375, -19.153564453125, 107.30584716796875, 42.741878509521484, 32.473724365234375, 65.50192260742188, 43.2660026550293, 94.83885955810547, 10.68270492553711, -56.54270935058594, 24.40484619140625, 77.108154296875, 14.900169372558594, 13.492223739624023, 34.57615661621094, 70.24049377441406, 4.801713943481445, 14.307291030883789, 129.8133544921875, 47.625526428222656, 24.59821891784668], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000504.npy"}
|
||||
{"epoch": 0.7619047619047619, "step": 505, "batch_size": 64, "mean": 28.56766700744629, "std": 59.15830993652344, "min": -101.44647979736328, "p10": -38.59841384887695, "median": 12.589213371276855, "p90": 113.88477172851563, "max": 134.39718627929688, "pos_frac": 0.625, "sample": [43.14626693725586, 1.6892509460449219, 30.814971923828125, 20.883785247802734, -0.27440643310546875, 113.63876342773438, -68.85279846191406, 133.72776794433594, 82.23194885253906, 105.99928283691406, 119.00505065917969, -14.271072387695312, 11.090797424316406, -0.35796356201171875, 20.09918975830078, -1.8563232421875, -1.740234375, -1.0391731262207031, 63.93489074707031, 10.492218017578125, -54.98881530761719, 84.43716430664062, 89.5439224243164, 21.56856918334961, 51.41547393798828, 134.39718627929688, 33.43608856201172, -20.941635131835938, 26.151535034179688, -0.2765007019042969, -3.8237762451171875, -31.06061553955078, 111.05208587646484, -65.87763977050781, 129.08233642578125, 98.13379669189453, 71.3410873413086, 87.62943267822266, -2.7011547088623047, -37.07475280761719, 4.8997650146484375, 0.7818145751953125, -1.0844802856445312, -101.44647979736328, 0.965850830078125, 24.4271183013916, 35.4329833984375, -10.037668228149414, 123.0404052734375, -57.82632064819336, -16.767967224121094, 95.43167877197266, 113.99020385742188, 33.26312255859375, 112.3613510131836, 6.495025634765625, 83.08845520019531, -14.283554077148438, 4.36041259765625, 125.68016052246094, 14.087629318237305, -3.877544403076172, -39.25141143798828, -95.20590209960938], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000505.npy"}
|
||||
{"epoch": 0.763416477702192, "step": 506, "batch_size": 64, "mean": 29.690635681152344, "std": 56.03010940551758, "min": -120.58212280273438, "p10": -30.086015701293945, "median": 25.80628204345703, "p90": 104.44664535522462, "max": 138.0067138671875, "pos_frac": 0.734375, "sample": [55.285240173339844, 105.56622314453125, -1.2040596008300781, -6.9224853515625, 65.94667053222656, 82.38453674316406, 40.614200592041016, -30.147624969482422, 54.64802551269531, -39.668060302734375, 1.5217266082763672, 42.112709045410156, 21.886985778808594, 25.45648956298828, 12.10451889038086, 24.587631225585938, -105.03307342529297, 122.73667907714844, 138.0067138671875, 9.542167663574219, 93.13499450683594, -60.428810119628906, -0.565582275390625, -120.58212280273438, -58.27550506591797, -106.89865112304688, 6.3002166748046875, 122.66592407226562, 93.58370971679688, 101.83429718017578, 128.1294403076172, -20.245040893554688, 118.10953521728516, -9.552902221679688, 110.98359680175781, 31.043373107910156, 95.11311340332031, 12.601844787597656, 71.53588104248047, -29.9422607421875, -25.92547607421875, 0.5065841674804688, 6.2162322998046875, 19.1436767578125, 76.04383850097656, 28.307598114013672, 8.848308563232422, 10.470863342285156, 49.52996063232422, 42.078407287597656, 38.40263366699219, 57.25457763671875, -0.1377410888671875, 75.4981689453125, 65.02611541748047, -3.0639724731445312, 7.327949523925781, 51.839927673339844, 2.3451995849609375, -8.904228210449219, 49.09739685058594, 59.207977294921875, 66.96037292480469, 26.15607452392578], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000506.npy"}
|
||||
{"epoch": 0.764928193499622, "step": 507, "batch_size": 64, "mean": 33.99785614013672, "std": 55.62834930419922, "min": -72.43460083007812, "p10": -30.61903209686279, "median": 26.611412048339844, "p90": 120.42019577026369, "max": 154.09152221679688, "pos_frac": 0.703125, "sample": [96.43614959716797, -9.676177978515625, -16.823467254638672, 29.069671630859375, 154.09152221679688, 9.324089050292969, 116.27389526367188, -69.76754760742188, -43.052337646484375, 45.326087951660156, 14.891548156738281, 24.80950927734375, -33.32366943359375, 106.14785766601562, -19.316184997558594, 47.35736083984375, 35.69230651855469, -7.701210021972656, 95.77803039550781, 9.375633239746094, 28.413314819335938, 22.823776245117188, 109.89888000488281, -31.925142288208008, 122.19718170166016, -1.2939891815185547, 31.850112915039062, 102.6533203125, 31.406879425048828, 8.231475830078125, 18.634811401367188, 15.568893432617188, 134.14151000976562, -22.304359436035156, 89.17029571533203, 51.116294860839844, -14.089752197265625, 89.31639099121094, -35.939544677734375, -5.468471527099609, -12.190826416015625, 32.07875061035156, 1.4356346130371094, 58.9642333984375, 60.768978118896484, 21.741073608398438, 31.205734252929688, 3.002838134765625, 134.51348876953125, 134.1985626220703, -58.51447296142578, 42.965850830078125, -23.501014709472656, 8.534675598144531, 69.677734375, -8.356300354003906, 93.88967895507812, -27.571441650390625, 33.448158264160156, -72.43460083007812, 11.920608520507812, 125.72882843017578, 60.09391403198242, 124.94775390625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000507.npy"}
|
||||
{"epoch": 0.7664399092970522, "step": 508, "batch_size": 64, "mean": 36.50687789916992, "std": 53.05398178100586, "min": -80.72069549560547, "p10": -20.39627571105957, "median": 24.251800537109375, "p90": 115.79680099487307, "max": 135.07772827148438, "pos_frac": 0.75, "sample": [86.54570007324219, 3.4357681274414062, 77.42384338378906, 25.411453247070312, 8.05535888671875, -33.44024658203125, 135.07772827148438, -9.53564453125, 67.96162414550781, 111.30917358398438, 133.51463317871094, 103.41551208496094, 10.094024658203125, 15.253860473632812, -16.871002197265625, 84.65479278564453, -80.72069549560547, 21.922760009765625, 57.06550598144531, -20.48563003540039, 35.963226318359375, 0.8448867797851562, 75.67579650878906, -15.900138854980469, -20.187782287597656, -66.27056884765625, 33.237247467041016, 2.340017318725586, 98.85159301757812, 78.85676574707031, 6.416290283203125, 108.94560241699219, 24.744537353515625, 39.43244171142578, 89.40773010253906, 19.99102783203125, 53.42417526245117, 19.956756591796875, 122.02025604248047, -12.471412658691406, -1.725860595703125, -11.672676086425781, 119.02001190185547, 101.80435180664062, -4.553749084472656, -33.50634765625, 64.4259262084961, 52.37371826171875, 117.7200698852539, 6.167196273803711, 91.34869384765625, 3.8927459716796875, 16.860225677490234, -5.51072883605957, 8.390106201171875, -55.692665100097656, -26.704559326171875, 121.1052017211914, 86.73421478271484, 23.759063720703125, 25.59526824951172, 28.197723388671875, 13.460357666015625, 119.58502960205078], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000508.npy"}
|
||||
{"epoch": 0.7679516250944822, "step": 509, "batch_size": 64, "mean": 34.30541229248047, "std": 48.197425842285156, "min": -93.4246826171875, "p10": -26.294687652587882, "median": 32.08490753173828, "p90": 100.32371139526369, "max": 128.571533203125, "pos_frac": 0.765625, "sample": [77.04805755615234, -3.213703155517578, 35.50131607055664, 96.0782241821289, 8.490036010742188, 0.13491439819335938, 12.262674331665039, -13.36937141418457, -41.777183532714844, 15.488937377929688, 8.057289123535156, 58.763092041015625, 13.637481689453125, 43.21357345581055, 82.50658416748047, -93.4246826171875, 42.581390380859375, 47.057945251464844, 4.0340576171875, 6.521003723144531, 80.232666015625, -43.69060516357422, 46.657440185546875, -19.733474731445312, 118.90413665771484, 44.745147705078125, -7.0476226806640625, 16.446483612060547, 40.61540222167969, -1.99163818359375, 81.72955322265625, 50.64567947387695, 79.04325866699219, 59.08349609375, 33.79301452636719, -0.6124629974365234, 105.97906494140625, 65.31169128417969, 41.407867431640625, 117.20030212402344, 6.618827819824219, 3.2732391357421875, 107.48047637939453, -32.23029708862305, 54.570098876953125, 89.11027526855469, 28.347854614257812, 30.376800537109375, 128.571533203125, 116.0842056274414, 101.9884262084961, -2.9003677368164062, -19.08415985107422, -46.034088134765625, 10.754638671875, 88.01437377929688, 59.22173309326172, -50.435428619384766, -29.10663604736328, 24.31049346923828, 21.330631256103516, 22.67565155029297, 96.43937683105469, 77.85763549804688], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000509.npy"}
|
||||
{"epoch": 0.7694633408919124, "step": 510, "batch_size": 64, "mean": 38.184967041015625, "std": 55.26813888549805, "min": -81.10487365722656, "p10": -27.32657165527343, "median": 32.81068134307861, "p90": 115.28066635131839, "max": 133.8087158203125, "pos_frac": 0.765625, "sample": [13.678237915039062, 105.77174377441406, 97.46930694580078, 37.58158874511719, -20.675994873046875, -5.137420654296875, 10.889400482177734, 133.8087158203125, -0.5855789184570312, 1.937652587890625, 0.01947784423828125, 2.9097137451171875, 9.492694854736328, 19.28491973876953, -68.70295715332031, -4.5326080322265625, -0.701385498046875, 28.948272705078125, 19.529998779296875, 29.895112991333008, 36.222625732421875, 104.40159606933594, 121.15034484863281, -30.17681884765625, 5.72918701171875, 106.52943420410156, 52.677101135253906, -0.593994140625, 1.8718109130859375, 96.28865051269531, 61.44328308105469, 51.50221252441406, 133.07730102539062, 47.59953308105469, 13.55471420288086, 83.74476623535156, 57.20018768310547, 132.82913208007812, 7.725181579589844, -43.222900390625, 12.549589157104492, -81.10487365722656, 100.43756103515625, 62.44548416137695, 109.49615478515625, 15.239479064941406, 59.996437072753906, 133.59786987304688, 84.1846923828125, -18.17839813232422, 39.742759704589844, 46.89030075073242, 117.7597427368164, 67.96456146240234, -66.49240112304688, 10.579513549804688, 100.56820678710938, 122.11730194091797, -2.5352325439453125, 35.72624969482422, 103.74649047851562, 40.57868194580078, -34.789344787597656, -67.11699676513672], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000510.npy"}
|
||||
{"epoch": 0.7709750566893424, "step": 511, "batch_size": 64, "mean": 43.72998809814453, "std": 59.229251861572266, "min": -104.971435546875, "p10": -27.862645721435538, "median": 33.81001663208008, "p90": 122.78420639038086, "max": 134.57647705078125, "pos_frac": 0.828125, "sample": [127.11605834960938, 112.2784423828125, 35.114356994628906, 124.57981872558594, 12.10507583618164, 4.234626770019531, 28.423343658447266, 9.471122741699219, 104.54267883300781, 115.21620178222656, 32.50567626953125, -31.31542205810547, 4.5490264892578125, 13.837409973144531, 107.07090759277344, 107.50361633300781, 85.67942810058594, -8.338340759277344, 91.37065124511719, 54.06676483154297, 112.18844604492188, -87.31451416015625, 23.86615753173828, -80.4994125366211, -59.302093505859375, 14.93536376953125, 123.58946228027344, 54.174407958984375, 62.118629455566406, 48.004669189453125, 4.864164352416992, 61.100982666015625, 24.059463500976562, 14.506256103515625, 77.79452514648438, 72.67793273925781, 103.23419952392578, 14.673282623291016, -19.806167602539062, 11.43562126159668, 5.4806060791015625, 65.24398803710938, 100.24346923828125, 132.4144287109375, 123.0440673828125, 96.385009765625, -59.40056610107422, 15.707855224609375, 7.63984489440918, 122.17786407470703, 27.193443298339844, 2.7822418212890625, 73.36317443847656, 62.72736358642578, 5.9035186767578125, 24.89606285095215, -42.67167282104492, 134.57647705078125, 94.47303771972656, 89.61135864257812, 125.64287567138672, -4.878574371337891, -104.971435546875, -5.177989959716797], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000511.npy"}
|
||||
{"epoch": 0.7724867724867724, "step": 512, "batch_size": 64, "mean": 32.43523406982422, "std": 56.61812210083008, "min": -89.38232421875, "p10": -40.645813369750975, "median": 17.234882354736328, "p90": 108.844287109375, "max": 151.1199951171875, "pos_frac": 0.75, "sample": [83.0550765991211, 31.087066650390625, 38.576904296875, -71.84086608886719, -37.83967971801758, 121.21675109863281, 6.543407440185547, 7.847806930541992, 7.404655456542969, 72.70794677734375, 74.7950439453125, 102.14329528808594, -89.38232421875, -30.441570281982422, 33.94871520996094, 7.113670349121094, -15.175254821777344, -4.939556121826172, 100.28929901123047, -0.24552345275878906, 120.62178802490234, 96.7145767211914, 9.878746032714844, 67.01739501953125, 8.205032348632812, 98.44863891601562, -60.536956787109375, 97.05803680419922, 13.965141296386719, 114.77947998046875, 1.2928657531738281, 32.591552734375, 96.99794006347656, 2.884237289428711, 68.99526977539062, 6.28692626953125, 2.195159912109375, 39.60456085205078, 2.248910903930664, 110.89176177978516, -2.3895721435546875, -84.63681030273438, -0.4708843231201172, -10.290374755859375, 68.43634033203125, 106.68278503417969, 20.504623413085938, 11.351631164550781, 8.398529052734375, 2.7864418029785156, 109.77064514160156, -1.4702682495117188, 10.910232543945312, -66.13308715820312, 35.572288513183594, 42.08058166503906, 151.1199951171875, 96.37991333007812, 125.14246368408203, 89.1396255493164, 41.175865173339844, -42.61119079589844, 39.2476921081543, -41.84844207763672], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000512.npy"}
|
||||
{"epoch": 0.7739984882842026, "step": 513, "batch_size": 64, "mean": 36.527976989746094, "std": 46.313934326171875, "min": -70.22206115722656, "p10": -8.352564239501952, "median": 24.643651962280273, "p90": 120.94943008422852, "max": 132.7757568359375, "pos_frac": 0.828125, "sample": [132.7757568359375, 59.77510070800781, 121.22061157226562, 78.28834533691406, 41.545860290527344, 46.536991119384766, 41.064544677734375, 16.008731842041016, 1.6975269317626953, -7.5880126953125, 32.86046600341797, -37.76784896850586, 66.87059783935547, 39.42291259765625, 23.62679672241211, 40.988433837890625, -4.335166931152344, 17.866844177246094, 126.03672790527344, 22.31249237060547, 34.92933654785156, 127.56387329101562, 63.445701599121094, 19.457263946533203, 120.3166732788086, 5.3751983642578125, 7.4744110107421875, 57.80369186401367, 17.024080276489258, 51.00996398925781, 25.660507202148438, 12.821231842041016, 79.42916107177734, -70.22206115722656, 19.28838348388672, 13.794588088989258, -26.933921813964844, 75.43832397460938, 129.239501953125, -0.5914115905761719, -3.4335174560546875, -17.46329116821289, 131.01678466796875, 1.1600799560546875, 75.76458740234375, 120.22132873535156, 127.24919128417969, -8.680229187011719, 72.89988708496094, 45.701499938964844, 19.669296264648438, 26.05889892578125, 37.2252197265625, 26.318954467773438, 12.4913330078125, -27.13935661315918, 10.018058776855469, 4.209407806396484, 14.052051544189453, 12.309900283813477, 42.503196716308594, -28.25592041015625, 3.45672607421875, 18.90408706665039], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000513.npy"}
|
||||
{"epoch": 0.7755102040816326, "step": 514, "batch_size": 64, "mean": 42.724876403808594, "std": 50.41887664794922, "min": -110.47711181640625, "p10": -17.80139694213867, "median": 47.79246711730957, "p90": 107.70900192260743, "max": 131.6182861328125, "pos_frac": 0.765625, "sample": [4.04139518737793, 43.4648551940918, 49.74882125854492, 67.41018676757812, 80.71987915039062, 45.83611297607422, 94.83793640136719, -5.426322937011719, 20.281707763671875, -4.818347930908203, 56.61557388305664, 107.32208251953125, 56.339847564697266, 63.349609375, -5.828790664672852, 32.38892364501953, -19.934814453125, 53.09503936767578, 1.8490676879882812, 57.102439880371094, -110.47711181640625, 6.470191955566406, 107.87482452392578, 90.34503173828125, 125.28580474853516, 0.9190826416015625, 11.447479248046875, 89.52029418945312, 76.8121337890625, 106.30915069580078, -19.399017333984375, -27.076772689819336, 100.46577453613281, 80.4467544555664, 37.87169647216797, 56.758628845214844, 20.00885009765625, 98.75093078613281, -0.6106796264648438, 52.97201919555664, 13.727867126464844, -37.838104248046875, -20.193706512451172, 19.491500854492188, 72.99211120605469, 64.76058197021484, 72.91819763183594, -6.62957763671875, 15.844269752502441, 43.949195861816406, -14.073616027832031, 30.801753997802734, 13.089576721191406, 111.5753402709961, -47.61326599121094, 73.41783905029297, 117.69181823730469, 58.33485412597656, 121.50975799560547, -10.245607376098633, 131.6182861328125, 117.73821258544922, -13.001968383789062, 101.43647766113281], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000514.npy"}
|
||||
{"epoch": 0.7770219198790628, "step": 515, "batch_size": 64, "mean": 34.295352935791016, "std": 53.014339447021484, "min": -77.72295379638672, "p10": -41.60429077148437, "median": 29.177780151367188, "p90": 112.75807723999026, "max": 131.2621612548828, "pos_frac": 0.734375, "sample": [2.2367897033691406, 55.53124237060547, 18.26197052001953, 47.546714782714844, 123.4326171875, 100.6886978149414, 22.00014877319336, 104.45848083496094, 4.693572998046875, 103.05950164794922, 27.755943298339844, 30.59961700439453, 69.50641632080078, -11.709707260131836, 58.93243408203125, 22.305023193359375, -60.550201416015625, 114.84440612792969, -46.98455047607422, -53.815338134765625, 34.929100036621094, -0.4372406005859375, -4.9217376708984375, 27.041824340820312, -77.72295379638672, 57.70877456665039, 123.82926940917969, 34.980560302734375, 19.92353057861328, -2.6256484985351562, 78.74176788330078, 73.74537658691406, -0.8841590881347656, 4.085626602172852, 23.538719177246094, 107.88997650146484, -44.86799621582031, 68.59272766113281, 0.9156742095947266, 33.58125305175781, -43.69437789916992, 20.515010833740234, 127.12147521972656, -62.19684600830078, 34.84910583496094, 96.8287353515625, 11.940805435180664, -26.27369499206543, 54.20549774169922, 117.44634246826172, 33.327674865722656, 60.14134979248047, 52.130615234375, 131.2621612548828, 117.72096252441406, -36.727420806884766, 75.83966827392578, 3.9052772521972656, 104.62367248535156, -21.40245819091797, 19.20421600341797, 38.61573791503906, -1.6751480102539062, -3.6440811157226562], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000515.npy"}
|
||||
{"epoch": 0.7785336356764928, "step": 516, "batch_size": 64, "mean": 30.182415008544922, "std": 52.6483154296875, "min": -115.69737243652344, "p10": -21.498902893066404, "median": 29.347872734069824, "p90": 96.88527908325196, "max": 129.70816040039062, "pos_frac": 0.734375, "sample": [111.28285217285156, 129.70816040039062, 90.53419494628906, -55.39654541015625, -29.868515014648438, 27.902423858642578, -22.22138214111328, 61.59027862548828, 49.799774169921875, 72.33328247070312, 8.267143249511719, 21.80150032043457, 25.694580078125, 129.3557586669922, 13.310176849365234, 2.951295852661133, 68.93630981445312, 78.58595275878906, 107.84416198730469, 46.40361022949219, -19.81311798095703, -115.69737243652344, 36.98762512207031, 53.181419372558594, 118.06206512451172, 30.79332160949707, 43.50519561767578, 3.5925369262695312, 95.27873229980469, 53.662872314453125, -103.7403335571289, -6.059684753417969, 0.3856182098388672, -16.77564239501953, 4.484916687011719, 97.57379913330078, 92.9782485961914, -2.1166019439697266, 2.494792938232422, -3.685810089111328, 63.33409881591797, -84.0573959350586, 45.787567138671875, 94.17139434814453, 41.10585021972656, 11.396453857421875, 82.43535614013672, 44.23690414428711, 18.82567596435547, 44.15299987792969, 9.906806945800781, 50.67346954345703, 5.414894104003906, -10.874069213867188, -47.87284851074219, 62.06317138671875, -11.20479965209961, 61.857215881347656, -4.059619903564453, -1.3623390197753906, 55.81895446777344, 1.6646537780761719, -13.210655212402344, 107.56324768066406], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000516.npy"}
|
||||
{"epoch": 0.780045351473923, "step": 517, "batch_size": 64, "mean": 25.303403854370117, "std": 49.89949417114258, "min": -85.34424591064453, "p10": -31.257707023620604, "median": 17.712565422058105, "p90": 94.12304077148438, "max": 139.46334838867188, "pos_frac": 0.71875, "sample": [20.564739227294922, 1.1278762817382812, -12.892024993896484, 56.85467529296875, -58.447853088378906, 30.729991912841797, -0.8773956298828125, 36.17173767089844, 122.69847106933594, 14.298633575439453, 22.782913208007812, -29.91823959350586, -33.913848876953125, 15.847541809082031, 9.821128845214844, -27.88776397705078, 2.431173324584961, 10.056327819824219, 48.23412322998047, -1.9886245727539062, 75.63298034667969, -54.673343658447266, 42.708892822265625, 92.45477294921875, -13.484710693359375, 24.6122989654541, 2.570465087890625, 52.41602325439453, 8.95294189453125, 65.78700256347656, 59.507225036621094, 107.70970153808594, 18.80522346496582, 19.539703369140625, -27.809295654296875, -74.08431243896484, 5.862396240234375, 79.66653442382812, -1.3331241607666016, -48.275550842285156, 54.42217254638672, 94.8380126953125, 138.18914794921875, 14.919178009033203, 55.54631042480469, 91.63538360595703, 19.896587371826172, 41.435943603515625, -30.34569549560547, 1.9670562744140625, -31.648569107055664, 57.26144027709961, 2.9912338256835938, 127.6706771850586, 139.46334838867188, 97.67619323730469, 22.305816650390625, -8.801704406738281, 16.136104583740234, 16.61990737915039, 70.38140869140625, -85.34424591064453, 53.25672149658203, -3.3140106201171875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000517.npy"}
|
||||
{"epoch": 0.781557067271353, "step": 518, "batch_size": 64, "mean": 31.204975128173828, "std": 49.74598693847656, "min": -83.24522399902344, "p10": -32.913719940185544, "median": 22.542579650878906, "p90": 101.05728454589844, "max": 140.6531982421875, "pos_frac": 0.671875, "sample": [37.73735809326172, 0.09796142578125, -0.42894744873046875, 111.30772399902344, -0.02906036376953125, -1.2974624633789062, 13.041786193847656, 89.67720031738281, -37.08097457885742, 23.0257568359375, 21.79343032836914, -45.97779083251953, 25.335548400878906, 54.27430725097656, 81.09259796142578, -23.069931030273438, 22.059402465820312, 88.18567657470703, -39.68980407714844, 88.14323425292969, 105.93304443359375, -83.24522399902344, 1.2073287963867188, 51.390132904052734, 99.91305541992188, 64.53070068359375, -37.867919921875, 49.49147033691406, -28.175918579101562, 85.94512176513672, -1.009185791015625, -36.26312255859375, -2.5456066131591797, 71.23345947265625, 23.049972534179688, -2.8986892700195312, 12.09562873840332, -6.8551177978515625, 29.720169067382812, 140.6531982421875, 23.23748779296875, 24.68695831298828, 3.059816360473633, 126.560791015625, 70.96131896972656, 119.33880615234375, -13.732894897460938, 101.54766845703125, 76.2869873046875, -34.94420623779297, -4.780305862426758, -10.637292861938477, 70.33572387695312, 0.3467559814453125, 21.585411071777344, 83.20806121826172, 59.13334655761719, 43.26691436767578, -0.9449634552001953, -3.7599639892578125, 14.825614929199219, 1.4757556915283203, 126.25320434570312, 55.30694580078125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000518.npy"}
|
||||
{"epoch": 0.783068783068783, "step": 519, "batch_size": 64, "mean": 23.527294158935547, "std": 61.862144470214844, "min": -133.7369384765625, "p10": -53.26936225891113, "median": 15.009159088134766, "p90": 108.3472640991211, "max": 127.73076629638672, "pos_frac": 0.625, "sample": [-67.9198989868164, 108.92587280273438, 5.588409423828125, -116.69139099121094, -9.97488784790039, 88.31269836425781, -1.3292980194091797, 127.73076629638672, 41.69120788574219, 82.25228881835938, -0.7494659423828125, 19.578813552856445, 71.73148345947266, -0.8387832641601562, 14.125144958496094, 36.909156799316406, 105.52713012695312, 11.905054092407227, -64.51634216308594, -19.292327880859375, 98.55162048339844, 80.1644287109375, -87.60865783691406, 89.00205993652344, 67.86384582519531, 124.05191802978516, -30.654441833496094, 22.235305786132812, 59.824546813964844, 67.47360229492188, -22.491819381713867, 91.10533142089844, 15.893173217773438, -40.10723876953125, 42.76629638671875, -40.17291259765625, -2.4235000610351562, 11.069194793701172, 120.20344543457031, -36.607147216796875, 41.8597412109375, 5.898853302001953, 76.11489868164062, 45.676780700683594, 32.54981231689453, -133.7369384765625, 23.556716918945312, -7.4976348876953125, 28.226104736328125, -61.181800842285156, 122.1324234008789, -12.140876770019531, -53.94865036010742, 123.50758361816406, 13.367767333984375, 12.880184173583984, 106.99717712402344, 74.74894714355469, 12.674327850341797, -51.684356689453125, -17.055770874023438, -8.044702529907227, 115.79571533203125, -48.05416488647461], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000519.npy"}
|
||||
{"epoch": 0.7845804988662132, "step": 520, "batch_size": 64, "mean": 27.83536148071289, "std": 48.21017074584961, "min": -79.69062805175781, "p10": -25.54710006713867, "median": 19.821502685546875, "p90": 101.20083160400391, "max": 126.44413757324219, "pos_frac": 0.765625, "sample": [68.3190689086914, -15.007562637329102, -1.4358673095703125, 7.714351654052734, 103.62060546875, -16.878387451171875, 57.19850540161133, 2.5314865112304688, 2.9109230041503906, 100.49105834960938, 83.78511047363281, 33.07518768310547, 12.069938659667969, -20.72277069091797, 28.272903442382812, 19.98406219482422, 10.180755615234375, -27.614669799804688, 2.682628631591797, 107.40090942382812, 93.96636962890625, 33.120304107666016, 41.751304626464844, 109.00239562988281, 16.9498291015625, 46.955543518066406, 48.61207580566406, -66.45726013183594, -51.69519805908203, 83.6434097290039, 76.26095581054688, 5.913078308105469, 10.419715881347656, 15.133079528808594, 3.9918060302734375, 1.5417156219482422, 117.90161895751953, 54.03423309326172, -13.823333740234375, -71.5683364868164, -57.46031188964844, 21.02197265625, 15.760505676269531, -79.69062805175781, 19.65894317626953, 7.951393127441406, 46.38591003417969, 64.66801452636719, 47.20191955566406, 12.106857299804688, 28.714324951171875, 118.10336303710938, 101.50502014160156, 20.59946060180664, 126.44413757324219, 20.708593368530273, 30.071563720703125, 10.456954956054688, -1.0095634460449219, -0.5404052734375, -3.0256576538085938, 73.19265747070312, 81.63003540039062, -37.22349548339844], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000520.npy"}
|
||||
{"epoch": 0.7860922146636432, "step": 521, "batch_size": 64, "mean": 36.347511291503906, "std": 49.41761016845703, "min": -104.85062408447266, "p10": -9.656789779663086, "median": 28.683053016662598, "p90": 102.13699035644534, "max": 133.9506378173828, "pos_frac": 0.75, "sample": [89.43047332763672, 5.06744384765625, 78.77239990234375, 19.4451904296875, 104.323974609375, 35.37586975097656, 104.23267364501953, -0.9245491027832031, -10.948699951171875, 59.30513000488281, 62.15473937988281, -8.97214126586914, 66.52407836914062, 31.959068298339844, 79.11866760253906, 69.30236053466797, 29.37028694152832, 16.30921173095703, 5.4367218017578125, 117.54959106445312, -0.22528839111328125, 78.66918182373047, -1.174407958984375, 3.120635986328125, -6.268632888793945, 73.97718811035156, -41.008392333984375, 97.24706268310547, -4.522064208984375, 89.17682647705078, 133.9506378173828, 128.72604370117188, -66.33927917480469, 9.133712768554688, 8.026222229003906, -31.97943878173828, 3.117340087890625, 46.970481872558594, 1.3311309814453125, 112.65583038330078, 94.62205505371094, 10.824146270751953, -49.130027770996094, 42.259315490722656, -5.340259552001953, 8.461698532104492, 72.8137435913086, 59.27696228027344, 18.886287689208984, 91.18983459472656, 65.51556396484375, 35.84681701660156, -104.85062408447266, 122.88064575195312, -9.950210571289062, 26.881860733032227, 26.100236892700195, -2.9058990478515625, 58.28434753417969, -1.0098648071289062, 38.895660400390625, 27.995819091796875, 90.50396728515625, 20.77142333984375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000521.npy"}
|
||||
{"epoch": 0.7876039304610734, "step": 522, "batch_size": 64, "mean": 30.094348907470703, "std": 61.328224182128906, "min": -123.00616455078125, "p10": -54.392635345458984, "median": 17.440712928771973, "p90": 117.6834487915039, "max": 133.59938049316406, "pos_frac": 0.734375, "sample": [-11.971176147460938, 58.034751892089844, 3.7473011016845703, 37.744850158691406, 50.53508758544922, 108.58073425292969, 15.677619934082031, 124.41405487060547, 89.59530639648438, 45.754661560058594, -86.28865051269531, -56.082054138183594, 11.421768188476562, 124.75940704345703, 12.988449096679688, 106.66234588623047, -28.69977569580078, 30.05929183959961, 82.66607666015625, 20.17780876159668, 117.73579406738281, 11.538227081298828, 30.173561096191406, -123.00616455078125, 95.8668212890625, 2.6007041931152344, 7.884555816650391, -51.689048767089844, -88.12152099609375, 16.26121711730957, -11.55556869506836, 117.84734344482422, -10.993698120117188, 117.56130981445312, 91.93364715576172, 52.13336181640625, -4.262443542480469, 20.855209350585938, -13.84946060180664, 9.651573181152344, 29.557687759399414, 8.322319030761719, 3.0978469848632812, 2.6682510375976562, 64.07441711425781, -1.092803955078125, 93.94670104980469, -92.94119262695312, 133.39161682128906, -20.641260147094727, 17.71477508544922, -55.55131530761719, 85.48405456542969, 133.59938049316406, 3.424102783203125, 10.528953552246094, -63.55207061767578, 124.17613220214844, 82.71699523925781, 91.12686920166016, 17.166650772094727, -23.630470275878906, 69.66764068603516, 84.43964385986328], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000522.npy"}
|
||||
{"epoch": 0.7891156462585034, "step": 523, "batch_size": 64, "mean": 32.03522491455078, "std": 55.881202697753906, "min": -102.65254211425781, "p10": -42.87394561767577, "median": 39.39883232116699, "p90": 95.70872039794922, "max": 124.88662719726562, "pos_frac": 0.765625, "sample": [-10.54214096069336, 118.02494049072266, 9.02597427368164, 15.89373779296875, 94.00460052490234, 63.591758728027344, 8.080245971679688, 20.43627166748047, 69.31460571289062, 19.18511199951172, 2.8566226959228516, 48.08665466308594, 106.59658813476562, -80.33805847167969, 71.00051879882812, 41.544464111328125, 50.979522705078125, 0.33843040466308594, 31.112884521484375, -94.8911361694336, 11.725540161132812, 91.97186279296875, -30.195175170898438, 46.270286560058594, -84.62489318847656, 78.08682250976562, -51.00415802001953, -11.10177993774414, 9.966758728027344, 70.79707336425781, 25.91180992126465, 50.40589904785156, 78.68170166015625, 48.81309509277344, 102.41827392578125, 87.78057861328125, 94.63798522949219, -2.701690673828125, 31.313499450683594, -92.5743408203125, 47.0538330078125, -36.942787170410156, 124.88662719726562, 57.69010925292969, 95.95674133300781, 22.63080596923828, 26.094234466552734, 35.647186279296875, 106.09083557128906, -45.415870666503906, 122.51022338867188, 39.51772689819336, 77.84263610839844, -28.991561889648438, 95.1300048828125, 59.736412048339844, 42.88372802734375, 39.279937744140625, 34.76689147949219, -102.65254211425781, -27.772682189941406, 61.89141845703125, -27.11907958984375, 88.65878295898438], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000523.npy"}
|
||||
{"epoch": 0.7906273620559335, "step": 524, "batch_size": 64, "mean": 31.4329891204834, "std": 46.366050720214844, "min": -107.01786041259766, "p10": -28.053798675537102, "median": 29.400108337402344, "p90": 98.17564163208009, "max": 124.97359466552734, "pos_frac": 0.78125, "sample": [84.76577758789062, 27.861190795898438, 105.69095611572266, 35.2978630065918, -2.3193359375, -9.206626892089844, -30.380165100097656, -31.395660400390625, 72.00071716308594, 18.87136459350586, 84.91088104248047, -1.4448089599609375, -35.20399475097656, 38.80256652832031, 12.935455322265625, 99.6706771850586, -78.35578918457031, -33.01509094238281, 26.728191375732422, 94.68722534179688, 92.64436340332031, -107.01786041259766, 83.19583129882812, 0.856048583984375, 0.4700775146484375, 18.207374572753906, 124.97359466552734, 50.637428283691406, 53.68616485595703, 2.7693634033203125, -5.542472839355469, 56.204063415527344, 9.004806518554688, 44.65656280517578, 51.706024169921875, 42.94770431518555, 37.965667724609375, 52.683326721191406, 11.959457397460938, -31.804046630859375, 34.99176025390625, -5.3522491455078125, 6.1256561279296875, 105.09640502929688, 23.916040420532227, 81.56026458740234, 103.74810028076172, 0.6187477111816406, 30.93902587890625, 24.103851318359375, 43.158775329589844, 7.857688903808594, 36.010353088378906, 116.0576171875, 22.039432525634766, 62.982330322265625, 99.9657974243164, 34.13762664794922, -22.6256103515625, 27.52265167236328, 63.44683074951172, -10.161201477050781, 53.45310974121094, 1.0134124755859375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000524.npy"}
|
||||
{"epoch": 0.7921390778533636, "step": 525, "batch_size": 64, "mean": 25.432737350463867, "std": 47.73548126220703, "min": -75.24605560302734, "p10": -25.472559928894032, "median": 18.944074630737305, "p90": 97.98550720214844, "max": 131.3442840576172, "pos_frac": 0.6875, "sample": [13.2542724609375, -74.01507568359375, -31.901641845703125, -7.773227691650391, -53.13568878173828, 22.304725646972656, 20.058486938476562, 28.69549560546875, 122.525146484375, -10.851112365722656, 59.83144760131836, 6.935111999511719, 51.898582458496094, 13.257614135742188, 35.97392272949219, 94.34573364257812, 22.130964279174805, -39.699180603027344, 51.46177673339844, 0.179779052734375, 29.786224365234375, -12.644390106201172, 109.8356704711914, 103.62960815429688, -0.7189407348632812, 63.56869888305664, -7.434700012207031, 54.73818588256836, 23.595542907714844, 66.02491760253906, -11.51092529296875, 70.3631591796875, 16.19396209716797, 53.10353469848633, 53.520503997802734, -10.181594848632812, 17.511646270751953, 18.30340576171875, 131.3442840576172, -15.396074295043945, -13.056900024414062, 19.58474349975586, 5.957939147949219, -75.24605560302734, 3.3498687744140625, 96.86915588378906, 3.012594223022461, 9.727561950683594, 9.701995849609375, 98.46394348144531, -29.791053771972656, 116.9813461303711, 21.74915313720703, -5.664634704589844, 55.560218811035156, 93.05014038085938, -7.27996826171875, 54.46351623535156, 28.317806243896484, 28.875442504882812, -7.799224853515625, 111.83216857910156, -1.4639339447021484, -68.61056518554688], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000525.npy"}
|
||||
{"epoch": 0.7936507936507936, "step": 526, "batch_size": 64, "mean": 26.58679962158203, "std": 48.25823211669922, "min": -95.74739074707031, "p10": -29.377161026000966, "median": 21.296669006347656, "p90": 89.38223114013672, "max": 125.02830505371094, "pos_frac": 0.75, "sample": [125.02830505371094, 18.18758773803711, -19.0999698638916, 9.205272674560547, -37.65785217285156, 7.897186279296875, 4.397991180419922, 110.97590637207031, -53.68721008300781, -47.8023681640625, 30.82574462890625, 39.65842056274414, -67.67340087890625, 9.096466064453125, 63.169677734375, -19.90160369873047, 35.74775314331055, 107.34329986572266, 17.576759338378906, 78.1425552368164, 24.20147705078125, 89.76275634765625, 13.308731079101562, 102.5631103515625, 86.6796875, 9.677989959716797, 19.71057891845703, 53.34320831298828, 5.709129333496094, 20.50311279296875, 0.4900493621826172, 25.717681884765625, 43.07769012451172, -8.808771133422852, 88.49433898925781, -8.798402786254883, 76.43971252441406, -2.3562240600585938, 63.545928955078125, -7.282112121582031, 69.92200469970703, 30.783000946044922, 22.090225219726562, 60.751564025878906, 31.445697784423828, -95.74739074707031, -5.852199554443359, 42.08759307861328, 71.13822937011719, 0.855621337890625, 41.78636169433594, -15.908672332763672, 122.30386352539062, 46.593631744384766, 113.94352722167969, 9.0169677734375, -33.438114166259766, 69.47010803222656, 19.007789611816406, 14.733741760253906, 42.88996124267578, -13.541763305664062, 38.11151123046875, -88.29828643798828], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000526.npy"}
|
||||
{"epoch": 0.7951625094482238, "step": 527, "batch_size": 64, "mean": 19.762725830078125, "std": 51.73079299926758, "min": -116.16806030273438, "p10": -38.30462112426758, "median": 11.407331466674805, "p90": 90.40589828491214, "max": 127.12077331542969, "pos_frac": 0.640625, "sample": [-9.72563362121582, 29.569564819335938, 22.96430206298828, 75.23944091796875, 13.023849487304688, 2.3415603637695312, 120.30636596679688, 39.293392181396484, 94.2459945678711, 83.36712646484375, 97.98304748535156, -30.693328857421875, 45.156734466552734, -7.962532043457031, 20.61834716796875, -33.98821258544922, 93.13004302978516, 3.5638980865478516, 47.20637130737305, -40.154510498046875, -55.749755859375, -26.77442169189453, -61.56440734863281, -46.212127685546875, 57.82693099975586, 13.503585815429688, 124.26396942138672, -1.3683624267578125, 9.790813446044922, 6.2892303466796875, 35.73371124267578, 7.255424499511719, -11.839981079101562, 33.35691452026367, 27.93521499633789, 127.12077331542969, 57.40052032470703, 1.69451904296875, -9.554136276245117, 84.049560546875, 54.46480941772461, -0.5318183898925781, 62.500823974609375, -25.817264556884766, -12.651824951171875, -2.6025390625, -33.053497314453125, 111.44546508789062, 1.3650054931640625, -7.573709487915039, -15.86456298828125, 6.335943222045898, 15.426025390625, 63.826622009277344, -64.38873291015625, -18.553558349609375, 53.16041564941406, 13.430122375488281, 74.74571228027344, 82.67320251464844, -116.16806030273438, -87.99848937988281, 66.55941772460938, 5.4411468505859375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000527.npy"}
|
||||
{"epoch": 0.7966742252456538, "step": 528, "batch_size": 64, "mean": 20.59501838684082, "std": 53.99073028564453, "min": -103.30219268798828, "p10": -29.233149719238273, "median": 18.99056911468506, "p90": 92.5818084716797, "max": 127.88771057128906, "pos_frac": 0.671875, "sample": [87.68064880371094, -102.57110595703125, 110.2327651977539, -13.606081008911133, 45.71198272705078, 66.92996978759766, 34.84007263183594, -3.7050628662109375, 10.742683410644531, 6.215404510498047, -1.8700714111328125, -17.934982299804688, 85.95732116699219, -102.24779510498047, -9.01081657409668, -5.0084228515625, 99.9718017578125, 5.168661117553711, 18.573911666870117, -16.94202423095703, -9.690277099609375, 92.82186889648438, 0.6975860595703125, -90.23739624023438, -15.498268127441406, 29.1878662109375, -32.35448455810547, -21.950035095214844, 28.423215866088867, 88.02332305908203, 54.472164154052734, 0.0769195556640625, 19.997509002685547, 125.37047576904297, 54.22438049316406, 0.6603755950927734, 127.88771057128906, 24.103553771972656, 22.271377563476562, 23.001312255859375, 77.32787322998047, 102.4857177734375, 1.0951061248779297, 25.9915828704834, -74.78701782226562, 42.41912841796875, 14.013965606689453, -8.486480712890625, 69.93199157714844, 5.4843597412109375, 21.591567993164062, -103.30219268798828, -82.14506530761719, 102.05963134765625, 19.4072265625, 33.2012825012207, -11.397144317626953, -2.4072952270507812, 92.02166748046875, -6.013162612915039, 60.03858947753906, 29.77893829345703, 73.19806671142578, 15.954757690429688], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000528.npy"}
|
||||
{"epoch": 0.7981859410430839, "step": 529, "batch_size": 64, "mean": 52.53473663330078, "std": 53.15156936645508, "min": -111.63087463378906, "p10": -9.944218826293945, "median": 64.72685241699219, "p90": 117.02172164916993, "max": 137.07522583007812, "pos_frac": 0.796875, "sample": [77.39580535888672, -10.636470794677734, 80.06690979003906, 13.831932067871094, 75.76235961914062, 10.942047119140625, 137.07522583007812, 125.13764190673828, -17.256603240966797, 5.256141662597656, -2.3135852813720703, 0.32204246520996094, 14.000640869140625, 42.29710388183594, 74.04913330078125, 58.77409362792969, -30.205421447753906, 93.81485748291016, -8.328964233398438, 89.79532623291016, 103.96363830566406, -4.414825439453125, -23.223846435546875, 73.12030792236328, 70.62561798095703, 113.25605773925781, -26.297744750976562, 11.289848327636719, 65.31710815429688, 38.77796173095703, 114.37683868408203, 100.82308959960938, 92.04681396484375, 123.97462463378906, 44.29328155517578, 68.94318389892578, 106.84065246582031, 113.98845672607422, -3.2343101501464844, 101.87765502929688, 75.76583862304688, 103.84092712402344, 133.6946258544922, 13.538772583007812, 68.05902862548828, 107.50448608398438, 36.30555725097656, -1.1975479125976562, 81.26016998291016, 45.003814697265625, 13.953521728515625, 80.20592498779297, 28.652162551879883, 120.42250061035156, -43.89998245239258, 122.3387680053711, 41.84741973876953, 113.2867431640625, 64.1365966796875, 29.87103271484375, -111.63087463378906, 5.421470642089844, -0.43770599365234375, 118.15524291992188], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000529.npy"}
|
||||
{"epoch": 0.799697656840514, "step": 530, "batch_size": 64, "mean": 31.08292007446289, "std": 45.830379486083984, "min": -45.28374099731445, "p10": -20.691079711914057, "median": 19.830982208251953, "p90": 102.53315658569336, "max": 119.0496597290039, "pos_frac": 0.71875, "sample": [116.21527099609375, 60.976627349853516, 13.774589538574219, 7.414386749267578, 14.37078857421875, -3.8084945678710938, 116.3363037109375, -10.594108581542969, 96.66151428222656, 77.21856689453125, 14.611774444580078, 103.23348236083984, -8.594491958618164, 25.112579345703125, 49.639137268066406, 37.99932098388672, 24.913970947265625, 21.83111572265625, 113.9383316040039, -15.662506103515625, 0.1761760711669922, -22.81653594970703, -41.64886474609375, 71.48442077636719, 69.05584716796875, -0.33577537536621094, 36.181640625, 46.661441802978516, 13.743698120117188, 119.0496597290039, 32.121910095214844, 7.952186584472656, 113.04470825195312, 114.45144653320312, 0.271270751953125, 8.296957015991211, 26.011959075927734, 89.43277740478516, -45.28374099731445, 4.4798583984375, 21.489730834960938, 29.559555053710938, -0.32170867919921875, 42.63755798339844, -30.024276733398438, -15.731681823730469, 97.1611328125, -4.989589691162109, -34.79914855957031, -27.159423828125, -5.399110794067383, -31.38018798828125, 93.7724838256836, 2.354114532470703, 100.89906311035156, 44.983795166015625, 62.0399284362793, -9.68634033203125, 18.17223358154297, -2.4002532958984375, 93.96067810058594, 10.722503662109375, 2.698894500732422, 32.82759094238281], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000530.npy"}
|
||||
{"epoch": 0.8012093726379441, "step": 531, "batch_size": 64, "mean": 31.139102935791016, "std": 51.2233772277832, "min": -111.67546081542969, "p10": -23.370223999023427, "median": 23.654003143310547, "p90": 107.7132537841797, "max": 125.69010925292969, "pos_frac": 0.75, "sample": [-54.48170471191406, 82.22139739990234, 98.8026123046875, 56.77899169921875, 23.80469512939453, 53.064361572265625, 3.8862380981445312, 53.1439208984375, 12.833061218261719, 0.19894027709960938, 21.123092651367188, 0.4587688446044922, 102.24911499023438, -4.7618865966796875, 91.5813217163086, 108.83678436279297, -73.72028350830078, -27.801193237304688, 5.540374755859375, 106.04438781738281, 21.576309204101562, 111.77314758300781, -41.77397918701172, -12.450920104980469, -13.031295776367188, 29.095720291137695, 45.45575714111328, 30.823997497558594, 23.503311157226562, 3.7481460571289062, 36.55757141113281, -2.150754928588867, 0.21625709533691406, 12.711517333984375, -111.67546081542969, -1.1209068298339844, 32.199981689453125, 77.75782012939453, 24.893760681152344, -41.71484375, 66.0457763671875, 47.14105987548828, 108.42848205566406, 101.48006439208984, -34.52262878417969, -8.285789489746094, 117.77912902832031, 110.17318725585938, 36.34535217285156, 11.195663452148438, -2.697174072265625, 24.466472625732422, 48.08192443847656, 90.06510925292969, 18.975845336914062, -3.177570343017578, 23.86883544921875, 2.6741905212402344, 93.47836303710938, 125.69010925292969, 1.2762260437011719, -1.6123027801513672, 124.34536743164062, 5.488624572753906], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000531.npy"}
|
||||
{"epoch": 0.8027210884353742, "step": 532, "batch_size": 64, "mean": 35.395713806152344, "std": 54.614070892333984, "min": -85.01068878173828, "p10": -35.37934951782226, "median": 30.502687454223633, "p90": 109.95324401855468, "max": 131.5936279296875, "pos_frac": 0.734375, "sample": [66.51960754394531, 62.43982696533203, 86.78543853759766, 104.5115737915039, -30.668838500976562, -2.7760391235351562, 37.064414978027344, 129.46316528320312, 13.28219223022461, 28.241836547851562, -46.476539611816406, 77.23492431640625, 110.182373046875, 21.05596923828125, 74.4844970703125, 21.61297607421875, 54.857234954833984, 82.06816101074219, 9.97230339050293, -66.85737609863281, -9.229059219360352, 10.047296524047852, 100.72438049316406, 19.67869758605957, 7.46685791015625, 49.67335510253906, 32.7635383605957, 119.82594299316406, 42.01687240600586, 108.6601333618164, 88.53489685058594, 22.475811004638672, 36.176513671875, 131.5936279296875, 5.523139953613281, 26.835891723632812, 44.76573181152344, -43.15095520019531, -12.0880126953125, 19.418115615844727, -37.39813995361328, -23.308929443359375, 13.269603729248047, -9.860767364501953, 127.89490509033203, 17.97826385498047, 119.97775268554688, -13.337577819824219, 35.28147888183594, 66.68231201171875, 73.25009155273438, 66.1545181274414, -15.763938903808594, 122.96940612792969, -85.01068878173828, 63.993282318115234, -69.8658676147461, 16.021163940429688, 109.41860961914062, -73.1431655883789, -0.6245937347412109, -16.712121963500977, 84.13626098632812, 58.6133918762207], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000532.npy"}
|
||||
{"epoch": 0.8042328042328042, "step": 533, "batch_size": 64, "mean": 31.54303741455078, "std": 48.473854064941406, "min": -115.62010192871094, "p10": -12.673028373718262, "median": 21.923529624938965, "p90": 93.55602951049805, "max": 128.13125610351562, "pos_frac": 0.765625, "sample": [13.749977111816406, 93.30883026123047, -15.519401550292969, 26.697677612304688, 55.431514739990234, 57.472938537597656, 0.46158599853515625, -9.388813018798828, 91.27680969238281, 68.51470947265625, 6.727691650390625, 4.742481231689453, 7.810146331787109, -2.7532119750976562, 77.74454498291016, 27.57049560546875, 9.644769668579102, 74.4223403930664, 4.12030029296875, -4.522712707519531, 7.183815002441406, 8.133180618286133, 33.99403381347656, 7.188262939453125, -50.78797149658203, 81.03308868408203, 76.43148803710938, 1.7104339599609375, 25.005470275878906, -90.05011749267578, 9.204475402832031, -0.054576873779296875, 115.94538879394531, 101.39080047607422, -12.19917106628418, 93.66197204589844, -19.472900390625, 128.13125610351562, 22.43641471862793, -11.253726959228516, 21.41064453125, 72.86876678466797, 58.0454216003418, 59.87071990966797, 100.05899047851562, 123.56128692626953, 25.789138793945312, -25.466785430908203, 18.902908325195312, -12.876110076904297, -0.9589672088623047, 83.86126708984375, 56.13855743408203, -1.5213775634765625, 4.802766799926758, 5.829216003417969, 45.80720520019531, 44.680938720703125, 21.217620849609375, 121.00790405273438, 78.56915283203125, 83.04704284667969, -115.62010192871094, 34.584075927734375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000533.npy"}
|
||||
{"epoch": 0.8057445200302343, "step": 534, "batch_size": 64, "mean": 36.74319839477539, "std": 55.13877868652344, "min": -102.59679412841797, "p10": -24.32723121643066, "median": 28.41054344177246, "p90": 113.83728637695313, "max": 137.998291015625, "pos_frac": 0.6875, "sample": [24.119461059570312, -9.627885818481445, 81.97610473632812, 24.458728790283203, -0.3966197967529297, 56.012657165527344, 10.75726318359375, -99.277587890625, 28.903827667236328, -32.173858642578125, -1.9406261444091797, 20.02313232421875, 97.49224853515625, 68.85697174072266, -26.035537719726562, 137.998291015625, 70.87809753417969, 72.22647094726562, 46.75359344482422, -19.553558349609375, 114.0645980834961, 127.74172973632812, 30.357982635498047, 35.41926574707031, -8.934680938720703, -3.05511474609375, -102.59679412841797, -4.179037094116211, 113.30689239501953, 22.166419982910156, 24.578781127929688, -20.341182708740234, 116.8377685546875, 117.47793579101562, 120.96829223632812, 82.40908813476562, -30.341739654541016, 33.26685333251953, 127.16401672363281, 102.69257354736328, 41.664215087890625, 105.25112915039062, -33.87139129638672, 0.297393798828125, 95.88890075683594, 77.1444091796875, -6.433860778808594, 93.63981628417969, -2.825469970703125, -7.028717041015625, 17.224319458007812, 110.54957580566406, 33.34941101074219, -15.24026870727539, 0.49495697021484375, 8.228347778320312, 77.18746948242188, -26.3560791015625, 41.65993881225586, 11.950584411621094, 68.22425842285156, 27.917259216308594, 102.23574829101562, -20.04214096069336], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000534.npy"}
|
||||
{"epoch": 0.8072562358276644, "step": 535, "batch_size": 64, "mean": 28.552780151367188, "std": 52.91854476928711, "min": -120.54976654052734, "p10": -25.57030029296875, "median": 23.86011791229248, "p90": 96.61653137207034, "max": 137.19708251953125, "pos_frac": 0.71875, "sample": [5.0509185791015625, -38.318546295166016, 76.50831604003906, 27.706241607666016, 2.6804580688476562, 32.19004440307617, -90.67756652832031, -29.070072174072266, -49.616233825683594, 30.658203125, 40.17474365234375, -20.57740592956543, 26.208532333374023, 62.855445861816406, -17.808467864990234, 0.7522811889648438, -24.032684326171875, 20.540565490722656, -0.18638992309570312, -26.229278564453125, 21.511703491210938, 68.509033203125, 71.76591491699219, -10.06951904296875, 85.56614685058594, -5.112205505371094, 119.36019897460938, -64.77142333984375, -14.943321228027344, 11.487367630004883, 91.12770080566406, 104.046142578125, 50.470272064208984, -4.455375671386719, 33.07472610473633, 84.95384979248047, 6.452735900878906, 132.1451873779297, -22.369552612304688, 70.60881805419922, 31.46430206298828, 3.416046142578125, 137.19708251953125, 67.91877746582031, 0.9434967041015625, -120.54976654052734, -12.109298706054688, 69.16136932373047, 3.848949432373047, 55.15874481201172, 70.6503677368164, -16.70440673828125, 19.694454193115234, 127.68878173828125, 46.17863082885742, 52.42848587036133, 10.890861511230469, 69.35589599609375, 98.96888732910156, 11.473655700683594, 31.377452850341797, 4.039985656738281, 119.43891906738281, 87.27874755859375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000535.npy"}
|
||||
{"epoch": 0.8087679516250945, "step": 536, "batch_size": 64, "mean": 42.34637451171875, "std": 44.542667388916016, "min": -29.861984252929688, "p10": -3.6067392349243153, "median": 30.999116897583008, "p90": 105.6855224609375, "max": 139.3666534423828, "pos_frac": 0.828125, "sample": [32.193180084228516, 4.5898895263671875, -29.861984252929688, 112.4798355102539, 41.841156005859375, 9.584884643554688, 14.067481994628906, -4.9791259765625, 36.40064239501953, 12.056650161743164, -28.4716796875, 32.87825012207031, 64.23628997802734, 101.2378921508789, 89.61325073242188, 70.52804565429688, 15.150726318359375, 104.8912353515625, 139.3666534423828, 33.01371765136719, 18.2291316986084, -1.7169628143310547, 100.9211196899414, 49.67921447753906, 7.9956512451171875, 126.59329986572266, 65.85540008544922, 2.1949329376220703, 50.818199157714844, 38.74985885620117, 121.85140228271484, -4.267589569091797, 3.9469146728515625, 53.13522720336914, -4.68560791015625, 7.612403869628906, 96.78626251220703, 6.400215148925781, 66.17826843261719, 1.104207992553711, 105.29283142089844, 105.42594909667969, 36.00340270996094, -4.111396789550781, 3.7833595275878906, 26.596633911132812, 13.717720031738281, 42.928062438964844, 26.379661560058594, 92.25732421875, -0.8273754119873047, 100.73426818847656, 11.712905883789062, 110.72000885009766, 105.79676818847656, 29.8050537109375, 23.50989532470703, 124.83515930175781, -2.4292049407958984, -13.963882446289062, 22.41475486755371, 80.27056884765625, -0.317138671875, 11.434249877929688], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000536.npy"}
|
||||
{"epoch": 0.8102796674225246, "step": 537, "batch_size": 64, "mean": 33.01459503173828, "std": 51.95868682861328, "min": -60.8657341003418, "p10": -37.454315948486325, "median": 21.541325569152832, "p90": 108.16305999755862, "max": 140.2962188720703, "pos_frac": 0.703125, "sample": [66.67729187011719, 44.084068298339844, 14.328018188476562, 42.5545654296875, 114.98919677734375, 89.17210388183594, 91.47421264648438, -0.7799968719482422, 33.968780517578125, 47.0090217590332, 19.72727394104004, 9.850122451782227, 115.18740844726562, 98.02377319335938, 95.58512115478516, 102.6925048828125, 110.55406951904297, 10.760168075561523, -5.495613098144531, -18.93842887878418, -1.1432456970214844, 18.21600341796875, 9.943374633789062, 1.4030838012695312, -6.396829605102539, 101.85003662109375, 39.409576416015625, 3.6923751831054688, -52.268959045410156, -3.2680187225341797, 28.982986450195312, 33.739200592041016, 34.56957244873047, -7.8596649169921875, -52.537208557128906, 4.500247955322266, 130.15151977539062, 55.11912155151367, 18.393112182617188, -3.2321739196777344, 50.963844299316406, 111.3741455078125, -39.95756530761719, 9.665573120117188, -13.757108688354492, -7.110601425170898, 19.567764282226562, -24.90338134765625, 42.61297607421875, 23.355377197265625, 7.7289581298828125, 92.60918426513672, -40.238990783691406, 110.50758361816406, 100.90202331542969, 140.2962188720703, -46.41083908081055, 25.673019409179688, -39.12467956542969, 51.41554260253906, 100.62471008300781, 96.8752212524414, -60.8657341003418, -33.556800842285156], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000537.npy"}
|
||||
{"epoch": 0.8117913832199547, "step": 538, "batch_size": 64, "mean": 37.45811462402344, "std": 50.67782211303711, "min": -100.08073425292969, "p10": -28.059236145019533, "median": 30.52304458618164, "p90": 108.47163848876954, "max": 146.59275817871094, "pos_frac": 0.765625, "sample": [-37.90874481201172, 78.03397369384766, 19.80304718017578, 12.987422943115234, 40.10799789428711, -100.08073425292969, 36.25840377807617, -6.1300506591796875, 61.045658111572266, 88.28541564941406, 40.546539306640625, 104.77970886230469, 11.260772705078125, 100.765869140625, 27.048288345336914, 21.453231811523438, 84.72370910644531, 124.61915588378906, 13.841285705566406, 43.644775390625, 13.484909057617188, -0.053558349609375, 118.11264038085938, 33.656402587890625, 27.389686584472656, -34.23207092285156, 53.86656188964844, 26.299072265625, 131.5902099609375, 12.397621154785156, -28.973190307617188, 71.01134490966797, -20.087440490722656, -43.660797119140625, -6.744468688964844, 16.755821228027344, 26.527664184570312, 66.5394058227539, 2.402374267578125, 26.75, 112.81867218017578, -27.803070068359375, 110.05389404296875, 65.6893310546875, -46.55354309082031, 64.54348754882812, -6.886466979980469, 104.35696411132812, 57.97246551513672, 146.59275817871094, 77.35479736328125, 38.222930908203125, -28.169021606445312, -8.826629638671875, 81.72952270507812, 44.568626403808594, 122.33338928222656, 2.997180938720703, 77.67414855957031, 9.946014404296875, -0.09938812255859375, 64.30638885498047, 76.07530975341797, 0.3036079406738281], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000538.npy"}
|
||||
{"epoch": 0.8133030990173847, "step": 539, "batch_size": 64, "mean": 29.9753475189209, "std": 45.5098762512207, "min": -78.9300537109375, "p10": -15.860599136352535, "median": 15.410223960876465, "p90": 103.2127151489258, "max": 131.91482543945312, "pos_frac": 0.796875, "sample": [39.339988708496094, 81.7452392578125, 1.3833484649658203, 118.72877502441406, 43.907737731933594, 6.244880676269531, 104.21739196777344, 91.94793701171875, 63.59173583984375, 4.807884216308594, 59.37471008300781, 39.83673095703125, 32.52674865722656, 7.582618713378906, -20.152420043945312, -4.730846405029297, 5.34466552734375, -8.618659973144531, 124.6771240234375, 41.13777160644531, 8.56199836730957, -17.73757553100586, 41.26432800292969, -78.9300537109375, 6.111732482910156, -32.33434295654297, 4.351617813110352, 35.08805847167969, 131.91482543945312, 25.074066162109375, 11.376289367675781, 3.2217254638671875, -5.7440948486328125, 10.421333312988281, 112.85739135742188, 34.307029724121094, 7.167396545410156, 8.886260986328125, 105.2643814086914, 7.220293045043945, -11.480987548828125, 29.651145935058594, 23.25625228881836, 2.2237472534179688, 47.12763977050781, 37.547027587890625, -43.776390075683594, 107.21880340576172, 13.8780517578125, -21.675460815429688, 75.052001953125, 69.5355453491211, 9.920074462890625, 5.984825134277344, 65.3360595703125, 28.106353759765625, -4.3345794677734375, 16.14278793334961, -66.16398620605469, 89.0953598022461, 14.67765998840332, 100.86846923828125, 79.58978271484375, -0.5939178466796875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000539.npy"}
|
||||
{"epoch": 0.8148148148148148, "step": 540, "batch_size": 64, "mean": 40.096641540527344, "std": 50.97816848754883, "min": -82.601806640625, "p10": -12.610888671875, "median": 27.340994834899902, "p90": 116.23126678466797, "max": 129.2545166015625, "pos_frac": 0.796875, "sample": [-47.097747802734375, 8.741302490234375, 54.621002197265625, 110.05793762207031, 5.453805923461914, -82.601806640625, 71.58619689941406, 23.8778076171875, 31.222984313964844, 19.26126480102539, 26.270732879638672, 52.40980529785156, 116.56185913085938, 67.64954376220703, -40.950050354003906, 28.411256790161133, 118.23365783691406, 58.116188049316406, 0.20238876342773438, 63.45628356933594, 32.132301330566406, -18.375015258789062, 44.86956787109375, 15.904191970825195, 124.19259643554688, 110.63935089111328, -5.315862655639648, 38.197967529296875, -13.173004150390625, 17.4932861328125, -10.49212646484375, 124.17753601074219, 113.92581176757812, 43.71717834472656, 108.96640014648438, -2.3890419006347656, -1.2442359924316406, 90.96353912353516, -18.690185546875, 1.2009162902832031, -0.17495346069335938, 75.18045043945312, 100.24236297607422, 129.2545166015625, 125.03366088867188, 4.8681182861328125, 113.76863098144531, 24.234554290771484, 18.435165405273438, -45.23136901855469, 21.794227600097656, 25.560623168945312, 13.572845458984375, 4.793865203857422, 10.080951690673828, 29.079620361328125, -11.299285888671875, 127.13423156738281, 9.39300537109375, 50.63301086425781, 82.56168365478516, 5.945112228393555, 53.678565979003906, 115.45988464355469], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000540.npy"}
|
||||
{"epoch": 0.8163265306122449, "step": 541, "batch_size": 64, "mean": 42.50551986694336, "std": 50.12174987792969, "min": -80.77650451660156, "p10": -13.537638854980468, "median": 37.91822814941406, "p90": 112.75454406738281, "max": 125.81709289550781, "pos_frac": 0.734375, "sample": [105.92219543457031, 58.013671875, 63.32546615600586, 118.71195983886719, -3.3952713012695312, 118.80471801757812, -13.933059692382812, 88.80195617675781, 112.77996826171875, -13.666595458984375, -8.356948852539062, 32.637725830078125, 29.45130157470703, 45.765846252441406, -0.5516357421875, -80.77650451660156, 72.6494140625, 113.4169921875, 21.60394287109375, 109.47163391113281, -32.09386444091797, 37.87354278564453, 125.81709289550781, 110.40819549560547, 98.3914566040039, 71.42601013183594, 67.19476318359375, -9.6025390625, -31.520992279052734, -2.9866714477539062, 73.39090728759766, 16.33111000061035, 125.02963256835938, 112.69522094726562, 1.7426681518554688, 70.84788513183594, 27.79909324645996, -20.633819580078125, 79.59373474121094, 4.387014389038086, 4.914724349975586, -44.05086135864258, 79.13656616210938, 100.890625, 17.359594345092773, 42.2070426940918, 97.38238525390625, 68.76663970947266, -13.236740112304688, -7.233772277832031, 1.5423259735107422, 92.80899047851562, 36.732261657714844, 25.372955322265625, -1.43505859375, 25.44549560546875, 47.29856872558594, 52.18320846557617, -8.229806900024414, 16.491737365722656, 38.03472137451172, -11.098148345947266, 124.33971405029297, 37.962913513183594], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000541.npy"}
|
||||
{"epoch": 0.817838246409675, "step": 542, "batch_size": 64, "mean": 43.837928771972656, "std": 55.85661315917969, "min": -85.37631225585938, "p10": -23.775904464721673, "median": 39.44317626953125, "p90": 116.35946121215821, "max": 129.61279296875, "pos_frac": 0.8125, "sample": [89.40345764160156, 88.81979370117188, 10.280967712402344, 115.5697021484375, 35.37928771972656, 10.831634521484375, 30.9329833984375, 46.80912780761719, 42.972259521484375, 40.310546875, 35.51053237915039, 1.4422988891601562, -26.32250213623047, 86.33418273925781, 15.393926620483398, 79.91386413574219, 39.50554656982422, 123.2841796875, -63.791290283203125, 116.53599548339844, 115.94754791259766, 114.02616119384766, 8.949413299560547, 6.625511169433594, -85.37631225585938, -80.89134216308594, 71.748291015625, 40.71342468261719, 36.178550720214844, -32.087921142578125, 115.13740539550781, 111.00245666503906, 1.4538383483886719, 105.21847534179688, -32.72016906738281, -11.54254150390625, -2.166473388671875, 43.214263916015625, -4.2122039794921875, 70.57357025146484, 34.07923889160156, 6.931232452392578, 121.66781616210938, -17.833843231201172, 46.72846603393555, -0.05242156982421875, 115.58901977539062, 10.047691345214844, 29.91112518310547, 128.3208770751953, 106.55057525634766, 129.61279296875, -54.91397476196289, 46.09075927734375, 2.787534713745117, 3.446247100830078, 8.227020263671875, 124.92500305175781, 104.24042510986328, 34.095924377441406, 43.252227783203125, 112.39309692382812, 39.38080596923828, 119.24150085449219], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000542.npy"}
|
||||
{"epoch": 0.8193499622071051, "step": 543, "batch_size": 64, "mean": 38.306365966796875, "std": 50.58341598510742, "min": -124.52849578857422, "p10": -11.355885314941405, "median": 32.412240982055664, "p90": 106.2343032836914, "max": 128.421875, "pos_frac": 0.828125, "sample": [35.91011047363281, 18.873981475830078, 36.48186111450195, -0.05527687072753906, 59.770477294921875, 20.245773315429688, 50.40583801269531, 105.3395004272461, -100.09239196777344, 74.91351318359375, 42.65571594238281, 92.3425521850586, 69.9754638671875, 100.29209899902344, 128.421875, 37.9755859375, 122.25755310058594, -60.52931594848633, 26.672761917114258, 102.2659912109375, 19.89765739440918, 7.7717742919921875, 1.8494949340820312, 24.847930908203125, 14.903331756591797, 97.47669219970703, -124.52849578857422, 0.7084121704101562, 16.28116226196289, -19.35728645324707, 27.797531127929688, 110.38276672363281, 106.61779022216797, 30.597728729248047, 89.7499771118164, 17.770263671875, 122.48470306396484, -2.519319534301758, -19.17278289794922, -3.717374801635742, 67.51829528808594, 60.449485778808594, 35.100921630859375, 29.35052490234375, 108.23454284667969, 16.690345764160156, 70.68697357177734, 103.42484283447266, 10.107852935791016, 34.22675323486328, -10.308208465576172, 39.40049743652344, 26.892311096191406, 3.8969497680664062, 45.02256774902344, 5.18670654296875, 43.9298095703125, 29.46619987487793, 5.4476165771484375, -16.06704330444336, 81.52073669433594, 116.47974395751953, -11.804889678955078, 72.78822326660156], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000543.npy"}
|
||||
{"epoch": 0.8208616780045351, "step": 544, "batch_size": 64, "mean": 43.14006805419922, "std": 54.01602554321289, "min": -83.48876953125, "p10": -31.15019073486328, "median": 37.12882614135742, "p90": 120.0373504638672, "max": 143.1499481201172, "pos_frac": 0.8125, "sample": [20.410202026367188, -55.74566650390625, 22.485580444335938, 27.297548294067383, 90.7741928100586, 25.675140380859375, -36.69520568847656, 36.760826110839844, 16.731536865234375, 32.52977752685547, 42.79090881347656, 37.6810188293457, 32.60992431640625, 94.07331085205078, 56.42301940917969, 4.639318466186523, -83.48876953125, 63.07975769042969, 50.30066680908203, 25.035140991210938, 56.75054931640625, 37.496826171875, 67.47669982910156, 120.51641845703125, 111.34817504882812, 53.906959533691406, 25.36205291748047, -9.830123901367188, -0.31029701232910156, -67.79241943359375, 139.09829711914062, -56.72980499267578, 78.58758544921875, -1.3796882629394531, 123.63174438476562, 43.537384033203125, 107.9415283203125, -32.509552001953125, 29.79322052001953, 28.594566345214844, 92.71617126464844, 17.00324249267578, 129.90882873535156, 15.667470932006836, 33.27391815185547, 43.45436096191406, 88.30685424804688, 41.60863494873047, 127.194091796875, -27.978347778320312, 1.8546085357666016, 129.2975311279297, 108.87713623046875, 143.1499481201172, -39.85059356689453, 12.172714233398438, 75.79421997070312, 9.550537109375, 118.91952514648438, 115.25908660888672, -16.662315368652344, 31.839752197265625, 55.860267639160156, 94.88845825195312], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000544.npy"}
|
||||
{"epoch": 0.8223733938019653, "step": 545, "batch_size": 64, "mean": 29.575918197631836, "std": 53.671810150146484, "min": -121.57628631591797, "p10": -29.46519222259521, "median": 23.392131805419922, "p90": 113.58298492431642, "max": 129.04019165039062, "pos_frac": 0.734375, "sample": [55.247657775878906, 76.74105072021484, -31.779525756835938, 123.56974792480469, 31.386669158935547, 14.391380310058594, -21.933387756347656, -24.065080642700195, -13.800935745239258, 30.29733657836914, 24.659759521484375, -1.723907470703125, 114.60392761230469, 19.077951431274414, 15.414083480834961, 29.72759246826172, 28.953872680664062, 87.32492065429688, -36.338470458984375, 61.38970947265625, 10.194511413574219, 101.24163818359375, -3.7216720581054688, 63.91728210449219, 34.40901184082031, 10.959999084472656, 31.79467010498047, 6.980936050415039, 5.1151580810546875, 21.55976676940918, 1.6192131042480469, -59.576255798339844, -20.346435546875, 62.761356353759766, -60.67486572265625, -22.3409481048584, 111.33540344238281, 121.40379333496094, 101.15609741210938, -7.196037292480469, -1.4354820251464844, 114.54623413085938, 90.46464538574219, 22.63451385498047, 3.6739273071289062, 62.30835723876953, -17.314353942871094, 67.36771392822266, 29.673004150390625, -121.57628631591797, 4.6168212890625, 81.79397583007812, 24.149749755859375, 129.04019165039062, 74.57864379882812, -59.94054412841797, 2.077045440673828, 60.919654846191406, -59.43687438964844, 1.1772098541259766, 45.36328887939453, 119.03652954101562, 115.20198059082031, 10.201786041259766], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000545.npy"}
|
||||
{"epoch": 0.8238851095993953, "step": 546, "batch_size": 64, "mean": 36.10563659667969, "std": 49.98921203613281, "min": -79.07884216308594, "p10": -21.003422546386716, "median": 33.09170913696289, "p90": 103.93974685668947, "max": 124.5067138671875, "pos_frac": 0.765625, "sample": [111.0208740234375, -10.027313232421875, 112.54817199707031, 16.659461975097656, 75.46180725097656, 99.19771575927734, 13.34075927734375, -8.353652954101562, 0.896942138671875, -18.352598190307617, -71.23272705078125, 69.65253448486328, 19.22092056274414, 16.671478271484375, -79.07884216308594, 96.68069458007812, 12.452150344848633, 8.604473114013672, -22.032608032226562, 33.72871398925781, -50.43125915527344, 39.86834716796875, 57.44816589355469, 77.54655456542969, 25.476577758789062, 40.692962646484375, 76.93336486816406, -15.718353271484375, 88.34086608886719, 120.1825180053711, 57.16210174560547, 21.54754638671875, 94.5946273803711, 9.156814575195312, 52.68342208862305, -40.314300537109375, 15.303489685058594, 105.9720458984375, 30.501585006713867, 43.38435363769531, 13.286884307861328, 42.42931365966797, -45.36891555786133, 36.319114685058594, 32.45470428466797, 57.002357482910156, -15.616622924804688, 22.560443878173828, 63.63200378417969, 86.31134033203125, 93.78739929199219, 18.872337341308594, 119.98509979248047, -18.60198974609375, -2.5461807250976562, 31.716154098510742, 69.65342712402344, 45.85002136230469, 124.5067138671875, 54.26447296142578, 115.97808074951172, -58.0186653137207, 95.9438705444336, -1.0309581756591797], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000546.npy"}
|
||||
{"epoch": 0.8253968253968254, "step": 547, "batch_size": 64, "mean": 39.75334930419922, "std": 52.437843322753906, "min": -118.11235809326172, "p10": -10.537597656249995, "median": 39.02982139587402, "p90": 109.25448760986329, "max": 129.52951049804688, "pos_frac": 0.796875, "sample": [2.0127792358398438, 15.073577880859375, 19.694738388061523, 22.141799926757812, 103.87457275390625, 18.178133010864258, -54.15113067626953, 129.52951049804688, 39.67081832885742, 93.57960510253906, 50.072265625, 74.09960174560547, 38.230628967285156, 42.925880432128906, 44.63229751586914, -1.079437255859375, 49.41453552246094, 118.18373107910156, 5.175312042236328, 27.130935668945312, 79.5504150390625, 93.17591857910156, 122.0189437866211, -1.78546142578125, 87.50003051757812, 68.04122161865234, 35.890342712402344, 58.788917541503906, 48.045257568359375, 14.144905090332031, 27.831130981445312, -2.164957046508789, -118.11235809326172, 67.1468505859375, -44.10509490966797, 38.97799301147461, 110.5272445678711, 10.859230041503906, 92.12147521972656, 39.08164978027344, 55.45915985107422, 118.84294128417969, 73.19001770019531, 84.80516052246094, -58.98826599121094, 38.95303726196289, -4.331779479980469, -101.19991302490234, -5.5444793701171875, 28.074615478515625, 123.738037109375, 81.53974151611328, 124.60060119628906, -12.677505493164062, 61.95403289794922, -26.735107421875, 25.844844818115234, 87.44944763183594, 4.957193374633789, 13.303550720214844, 42.213951110839844, 16.595809936523438, 106.28472137451172, -0.03922271728515625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000547.npy"}
|
||||
{"epoch": 0.8269085411942555, "step": 548, "batch_size": 64, "mean": 27.277132034301758, "std": 51.91350555419922, "min": -68.89671325683594, "p10": -37.679895019531244, "median": 13.969711303710938, "p90": 94.87633361816407, "max": 135.623046875, "pos_frac": 0.71875, "sample": [60.04362487792969, 82.0115737915039, 2.9593505859375, -35.469024658203125, -52.551307678222656, 27.981903076171875, -21.837356567382812, 135.623046875, 92.62985229492188, 60.98094940185547, -17.925430297851562, 124.85511016845703, 0.5269317626953125, 14.091606140136719, 13.505889892578125, 4.243776321411133, -38.627410888671875, 21.979965209960938, 2.5045700073242188, 19.783035278320312, 95.839111328125, 37.884193420410156, -40.62208938598633, 11.287445068359375, 74.77667236328125, 21.071094512939453, 122.07859802246094, 88.57655334472656, 11.676025390625, 87.89590454101562, -21.365924835205078, 43.75239944458008, -1.2533721923828125, 81.03565979003906, 8.246795654296875, -16.698867797851562, 129.61741638183594, 28.302902221679688, -26.544631958007812, 40.902183532714844, 43.89031219482422, 1.540212631225586, 71.80561065673828, -7.2093658447265625, 112.38671875, 3.0274887084960938, -20.13489532470703, 46.89707946777344, -30.43549156188965, 46.5582275390625, 5.38427734375, -68.89671325683594, 35.476585388183594, 84.26779174804688, 10.628349304199219, -0.21277618408203125, 65.46875, -42.06896209716797, 13.847816467285156, -63.16496276855469, 126.48258972167969, 90.85200500488281, -59.62445831298828, 5.20147705078125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000548.npy"}
|
||||
{"epoch": 0.8284202569916855, "step": 549, "batch_size": 64, "mean": 29.4459228515625, "std": 52.558956146240234, "min": -120.66725158691406, "p10": -37.57565231323241, "median": 25.71888256072998, "p90": 101.71096954345707, "max": 126.86231994628906, "pos_frac": 0.6875, "sample": [19.529266357421875, 19.25647735595703, -47.49925994873047, 62.28955078125, -13.950736999511719, -17.175735473632812, 42.377418518066406, 2.994935989379883, 40.936981201171875, 126.86231994628906, 108.31648254394531, 12.28216552734375, 51.531837463378906, 64.01704406738281, -10.030227661132812, -6.734291076660156, -74.96589660644531, 59.48190689086914, 55.732940673828125, 85.30375671386719, 90.90440368652344, 92.04330444335938, 27.356170654296875, -42.005435943603516, 28.276641845703125, -12.93890380859375, 5.957820892333984, 121.3382797241211, 47.59105682373047, -43.24608612060547, 105.74788665771484, 87.61199188232422, 37.72772216796875, -8.541337966918945, 81.4261703491211, 111.11055755615234, -1.7919483184814453, -43.026763916015625, -14.726486206054688, 82.6178207397461, -68.89950561523438, 12.662643432617188, 3.057971954345703, 11.472946166992188, 43.25669479370117, 79.61418151855469, -17.6383056640625, 71.69329833984375, 107.61343383789062, 1.5241775512695312, -4.825172424316406, 64.84573364257812, 67.92385864257812, 34.68621826171875, 47.102691650390625, 16.956947326660156, 24.081594467163086, 6.790470123291016, -2.1022911071777344, 92.29149627685547, 111.16621398925781, -120.66725158691406, -27.239490509033203, -4.81927490234375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000549.npy"}
|
||||
{"epoch": 0.8299319727891157, "step": 550, "batch_size": 64, "mean": 22.392438888549805, "std": 47.683937072753906, "min": -90.68247985839844, "p10": -29.40070877075195, "median": 18.35225009918213, "p90": 84.78286285400391, "max": 126.3128662109375, "pos_frac": 0.734375, "sample": [21.783958435058594, 126.29261779785156, 121.5655517578125, 19.841049194335938, -75.8378677368164, 4.024497985839844, 0.1453990936279297, -13.904182434082031, -27.928199768066406, 37.41178894042969, 62.36096954345703, 13.992950439453125, 33.12477493286133, 3.423633575439453, 16.86345100402832, 108.6789321899414, 126.3128662109375, 66.63784790039062, 7.366546630859375, 41.3370246887207, 81.1902084350586, 0.9710731506347656, 49.25121307373047, 45.243560791015625, 83.89599609375, -27.480361938476562, 117.05886840820312, -0.43590545654296875, 9.215110778808594, -30.031784057617188, -48.29340362548828, 0.7573394775390625, -16.179306030273438, 31.577880859375, 5.0611419677734375, 32.09358215332031, 23.253868103027344, -13.53179931640625, -22.039138793945312, 4.4058685302734375, -45.18331527709961, 63.1363525390625, 7.662136077880859, 14.396167755126953, 28.157821655273438, 54.43897247314453, 23.615142822265625, 29.056291580200195, 40.44010925292969, 66.79194641113281, -51.8592529296875, 76.14215087890625, 29.340606689453125, 23.959299087524414, 98.5184326171875, 9.551870346069336, -90.68247985839844, -3.0720996856689453, 85.16294860839844, -68.7003402709961, 36.93760681152344, 2.1889495849609375, -11.76706314086914, -4.593715667724609], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000550.npy"}
|
||||
{"epoch": 0.8314436885865457, "step": 551, "batch_size": 64, "mean": 27.502866744995117, "std": 44.385250091552734, "min": -91.166015625, "p10": -10.510474014282224, "median": 20.233516693115234, "p90": 99.57105560302735, "max": 135.46282958984375, "pos_frac": 0.71875, "sample": [17.078330993652344, 54.748046875, 65.75411987304688, 13.359664916992188, 87.89117431640625, 58.34480285644531, 135.46282958984375, -7.862506866455078, 24.910764694213867, 29.34387969970703, -27.12118911743164, 8.249061584472656, 91.64896392822266, 7.705156326293945, 38.369232177734375, 23.08742904663086, 20.474029541015625, 30.84949493408203, -91.166015625, -5.659358978271484, 6.421873092651367, 9.580741882324219, 65.16307830810547, 27.9140625, 34.671234130859375, 126.78013610839844, 5.6136016845703125, 52.406959533691406, 29.54534912109375, 92.98970031738281, 26.618677139282227, -11.645317077636719, 1.1820259094238281, 6.03192138671875, -12.830970764160156, 104.38189697265625, -5.692863464355469, 98.71163940429688, -1.4926166534423828, 23.373092651367188, -3.119312286376953, 25.756420135498047, -5.7596893310546875, 9.47769546508789, 92.87459564208984, 102.02265930175781, 99.93937683105469, 26.588642120361328, -5.28656005859375, -2.095226287841797, 22.41240692138672, 19.993003845214844, 4.76080322265625, -27.431747436523438, -5.057365417480469, -7.8378448486328125, -7.810771942138672, 106.08892822265625, -25.031692504882812, 2.830615997314453, -50.12089538574219, 2.4784622192382812, 100.34341430664062, 28.9754638671875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000551.npy"}
|
||||
{"epoch": 0.8329554043839759, "step": 552, "batch_size": 64, "mean": 35.340240478515625, "std": 47.56452941894531, "min": -104.43655395507812, "p10": -14.245242500305174, "median": 33.921051025390625, "p90": 84.66341400146486, "max": 127.05453491210938, "pos_frac": 0.828125, "sample": [-15.102628707885742, 27.873611450195312, 12.248451232910156, 9.925132751464844, 86.64435577392578, 30.53921127319336, 1.0836238861083984, 7.3138885498046875, 30.3946533203125, 74.05905151367188, 74.19638061523438, 127.05453491210938, 81.87985229492188, 30.20690155029297, 0.5223579406738281, -25.331064224243164, -2.1779708862304688, 74.4525375366211, 29.477569580078125, -12.244674682617188, 32.52100372314453, 74.38685607910156, 57.54917907714844, 74.48596954345703, 85.85636901855469, 15.346454620361328, 36.37799835205078, 94.37702178955078, 81.73777770996094, -104.43655395507812, 14.518152236938477, 32.489532470703125, 107.87884521484375, -91.89801025390625, 53.686012268066406, -0.06716537475585938, 39.669464111328125, 41.02330017089844, 5.0283966064453125, -0.09201812744140625, 103.62841033935547, 11.890792846679688, -24.98908233642578, 50.82011413574219, 69.22500610351562, 15.211044311523438, 65.91380310058594, 40.918785095214844, -98.16590881347656, 72.02433776855469, 126.73649597167969, 59.418312072753906, 14.388845443725586, -57.666053771972656, 66.05876159667969, 53.05268096923828, 35.32109832763672, 27.193891525268555, 74.84268188476562, 32.012451171875, 54.331443786621094, 73.30290985107422, 76.11695861816406, 26.733461380004883], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000552.npy"}
|
||||
{"epoch": 0.8344671201814059, "step": 553, "batch_size": 64, "mean": 30.97012710571289, "std": 56.85546875, "min": -122.2243423461914, "p10": -25.585744476318357, "median": 17.58858871459961, "p90": 116.20071411132814, "max": 140.09429931640625, "pos_frac": 0.640625, "sample": [135.06903076171875, -22.288734436035156, 10.06512451171875, 49.54148864746094, 2.9431304931640625, 4.806646347045898, 101.26712036132812, 59.708099365234375, -15.104209899902344, -2.1284561157226562, 20.971336364746094, -0.465850830078125, 117.02688598632812, 120.42422485351562, -30.525779724121094, 42.563507080078125, 0.8878345489501953, 69.53128814697266, 24.613540649414062, -122.2243423461914, -7.501241683959961, 69.68560791015625, -9.543907165527344, 87.33984375, 99.82206726074219, 40.851768493652344, -7.027549743652344, 8.52386474609375, -6.8530426025390625, -35.01972961425781, 134.1463623046875, 104.78504943847656, 75.15424346923828, 140.09429931640625, 13.39093017578125, 73.80803680419922, 114.27297973632812, 48.573699951171875, 14.205841064453125, 71.87677764892578, 10.070297241210938, 37.17800521850586, -1.759897232055664, 4.430812835693359, -26.998748779296875, 28.76702880859375, -10.32391357421875, 131.06008911132812, -98.1908187866211, -48.84552764892578, 34.534088134765625, 74.26615905761719, 83.06002807617188, -9.259025573730469, -16.843162536621094, 23.99810791015625, -7.161571502685547, -48.70616149902344, 60.62872314453125, 76.25511932373047, -13.060676574707031, -19.95598602294922, 124.05458068847656, -2.377105712890625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000553.npy"}
|
||||
{"epoch": 0.8359788359788359, "step": 554, "batch_size": 64, "mean": 36.556236267089844, "std": 51.7978630065918, "min": -113.83558654785156, "p10": -15.59427261352539, "median": 26.530553817749023, "p90": 114.82102737426759, "max": 135.94374084472656, "pos_frac": 0.796875, "sample": [31.477798461914062, 116.3321533203125, 124.09890747070312, 108.90431213378906, 3.6436996459960938, 24.54428482055664, 17.782533645629883, 18.521137237548828, 15.881278991699219, 86.63318634033203, -7.338253021240234, -15.901786804199219, 0.4473724365234375, 81.86725616455078, -113.83558654785156, 14.432220458984375, 79.35551452636719, -14.876739501953125, 55.56690216064453, 6.790094375610352, 0.9367828369140625, 55.14478302001953, 25.125795364379883, -46.584747314453125, 57.95524597167969, 87.83129119873047, 119.04251861572266, 14.901103973388672, -8.192913055419922, 12.053901672363281, 122.06954193115234, 13.433181762695312, 49.01945495605469, 9.86197280883789, 1.3321075439453125, -16.33591079711914, 95.68754577636719, 19.50956916809082, 40.458335876464844, -22.745010375976562, -55.61515808105469, 62.306365966796875, 28.146514892578125, 31.785083770751953, -2.6061859130859375, 51.69590377807617, 8.527624130249023, 72.5578842163086, 27.935312271118164, -2.66424560546875, 92.13546752929688, 8.540794372558594, -13.83687973022461, 111.2950668334961, 132.87461853027344, 11.722152709960938, 56.40675354003906, 74.81471252441406, 47.63478088378906, 102.14729309082031, 135.94374084472656, -58.096717834472656, 121.7125015258789, 29.404815673828125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000554.npy"}
|
||||
{"epoch": 0.8374905517762661, "step": 555, "batch_size": 64, "mean": 33.95972442626953, "std": 56.68629837036133, "min": -121.61972045898438, "p10": -43.33028221130371, "median": 33.98156452178955, "p90": 107.91308593750001, "max": 122.95355224609375, "pos_frac": 0.78125, "sample": [2.4052486419677734, -116.9039077758789, 25.29777717590332, 40.40727233886719, 68.16754913330078, 19.978004455566406, -57.19746398925781, -11.94464111328125, -44.25760269165039, 40.335113525390625, 115.23294830322266, -18.191192626953125, 44.55120849609375, 10.421867370605469, 91.59361267089844, 122.95355224609375, 24.47259521484375, 2.3295059204101562, 15.355178833007812, 109.85041046142578, 46.158973693847656, -1.9293975830078125, 117.66325378417969, 36.79200744628906, 105.30133056640625, 13.068588256835938, 69.14138793945312, 109.03240966796875, 27.140579223632812, 111.14559173583984, 73.46549987792969, 73.18547058105469, 66.89393615722656, 96.54027557373047, -45.62303161621094, 93.28948211669922, -121.61972045898438, 58.54754638671875, -45.011146545410156, 21.641239166259766, 93.98614501953125, 46.731910705566406, -3.0880203247070312, 6.536354064941406, -14.586990356445312, 29.684436798095703, 62.97540283203125, 37.940818786621094, 96.651123046875, -18.953601837158203, 66.73933410644531, 0.5611114501953125, -84.17027282714844, 0.5119857788085938, 98.25590515136719, 7.6591796875, 72.83964538574219, 9.028579711914062, 62.388763427734375, -41.166534423828125, 120.38729858398438, 31.17112159729004, 99.97196197509766, 1.6855831146240234], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000555.npy"}
|
||||
{"epoch": 0.8390022675736961, "step": 556, "batch_size": 64, "mean": 30.785144805908203, "std": 56.95640182495117, "min": -108.32099914550781, "p10": -32.51471786499023, "median": 17.505990982055664, "p90": 111.12736206054687, "max": 143.7752685546875, "pos_frac": 0.671875, "sample": [72.12161254882812, 14.533493041992188, 21.279754638671875, -5.157199859619141, -7.9785308837890625, 99.72034454345703, 128.20938110351562, 32.54024124145508, -5.796743392944336, -17.821533203125, 5.73130989074707, 3.9424209594726562, 69.1948013305664, 92.11257934570312, 8.343807220458984, -15.8192138671875, 24.491409301757812, 0.5450363159179688, 77.07444763183594, 110.00132751464844, -4.104209899902344, 71.22048950195312, 91.06514739990234, 12.155845642089844, -18.203454971313477, 6.854835510253906, -87.68711853027344, 28.75510597229004, 79.32953643798828, -47.67320251464844, -0.5941677093505859, -108.32099914550781, 60.10966491699219, -0.6973037719726562, -0.305755615234375, 8.250570297241211, 83.12925720214844, 107.62437438964844, 3.2667083740234375, 20.71894645690918, 120.69210052490234, 143.7752685546875, 22.721256256103516, 32.88383483886719, 110.30284118652344, 104.11022186279297, -20.60809326171875, 9.255126953125, -69.19083404541016, -4.821632385253906, 124.12528991699219, -32.68915557861328, -12.00543212890625, 78.73200988769531, 20.47848892211914, -32.107696533203125, 12.94488525390625, -39.301666259765625, 49.894012451171875, 111.48072814941406, 122.79744720458984, 116.39126586914062, 26.884756088256836, -38.658870697021484], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000556.npy"}
|
||||
{"epoch": 0.8405139833711263, "step": 557, "batch_size": 64, "mean": 41.473854064941406, "std": 49.0832633972168, "min": -45.043243408203125, "p10": -18.963616180419923, "median": 39.56513786315918, "p90": 115.63680496215824, "max": 131.726318359375, "pos_frac": 0.75, "sample": [126.63690185546875, 69.46553039550781, 52.1336784362793, -4.889747619628906, -8.237773895263672, -19.25354766845703, 126.73867797851562, 5.510871887207031, 97.525634765625, 44.76353454589844, 7.49591064453125, -9.613388061523438, 1.0745391845703125, 77.10323333740234, 16.803272247314453, 22.02303695678711, 74.06326293945312, 10.096946716308594, 102.41343688964844, 81.0322494506836, -2.203876495361328, -12.423004150390625, 106.97525024414062, -45.043243408203125, -37.433990478515625, -18.287109375, 119.3488998413086, 131.726318359375, 22.686084747314453, -8.052955627441406, 79.52603149414062, 53.135231018066406, 84.67530059814453, -19.838459014892578, 8.793701171875, 121.87486267089844, 46.697654724121094, 99.93155670166016, 44.6268310546875, 124.04679107666016, 5.4062042236328125, 12.061494827270508, -4.206031799316406, -20.129974365234375, 1.6653499603271484, -20.997161865234375, 48.22947692871094, 83.32901000976562, 91.36335754394531, 17.33837127685547, 2.6412124633789062, 47.074424743652344, 20.784896850585938, 83.81797790527344, 82.37834167480469, 29.111236572265625, 71.15969848632812, 52.268497467041016, 34.50344467163086, 63.77024841308594, -28.146881103515625, -10.707321166992188, 123.10757446289062, 94.85501098632812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000557.npy"}
|
||||
{"epoch": 0.8420256991685563, "step": 558, "batch_size": 64, "mean": 44.0766716003418, "std": 50.92596435546875, "min": -79.4477310180664, "p10": -11.112223815917966, "median": 32.7145938873291, "p90": 115.51904220581055, "max": 131.3845672607422, "pos_frac": 0.75, "sample": [70.74964141845703, 109.37303161621094, 119.14889526367188, 0.9999237060546875, 110.7327880859375, 8.395824432373047, 117.66009521484375, -12.402523040771484, 1.2352066040039062, 23.509109497070312, -3.2867813110351562, 99.7224349975586, -4.2115325927734375, 15.944541931152344, -3.4875030517578125, 103.23890686035156, 92.85015869140625, 7.62744140625, 55.25679397583008, 35.132171630859375, 122.70608520507812, 95.88310241699219, 33.257572174072266, 110.35078430175781, 70.34634399414062, -15.210311889648438, 79.55319213867188, 8.555366516113281, 85.93439483642578, 39.591163635253906, 116.30751037597656, 3.94476318359375, 22.240264892578125, 131.3845672607422, -0.548828125, -2.306711196899414, 57.16387176513672, 7.2944488525390625, 32.17161560058594, 128.59481811523438, 77.07554626464844, -12.470108032226562, 127.66844177246094, -12.142105102539062, 113.67928314208984, 110.85181427001953, -7.565837860107422, -18.43487548828125, 98.34941101074219, 15.730987548828125, 27.504196166992188, -79.4477310180664, 44.69403076171875, -12.6427001953125, 6.1605072021484375, -3.2462692260742188, 84.25651550292969, 0.4207572937011719, -8.511550903320312, -8.70916748046875, 97.96090698242188, 40.37273406982422, 43.66698455810547, 20.282470703125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000558.npy"}
|
||||
{"epoch": 0.8435374149659864, "step": 559, "batch_size": 64, "mean": 37.217071533203125, "std": 55.01764678955078, "min": -73.453125, "p10": -22.755068016052242, "median": 22.334659576416016, "p90": 115.57797164916992, "max": 142.648681640625, "pos_frac": 0.75, "sample": [0.6034393310546875, -11.288154602050781, 11.118770599365234, -6.143836975097656, 75.72258758544922, 142.648681640625, 114.63959503173828, 115.98013305664062, 18.847145080566406, 0.7815132141113281, -65.95353698730469, 59.380218505859375, 24.233253479003906, 45.53459167480469, 7.638395309448242, 120.80242919921875, 83.90137481689453, 24.658180236816406, 93.42303466796875, 71.31248474121094, 100.91915130615234, 24.24340057373047, -9.909744262695312, 79.99905395507812, 2.1302566528320312, 16.35356903076172, -54.83522033691406, 98.36434173583984, 20.436065673828125, 132.29745483398438, 15.083595275878906, 25.390670776367188, 17.713363647460938, 97.78216552734375, 119.35343933105469, -20.17527198791504, 66.48213195800781, -0.6660556793212891, -3.9744491577148438, -11.063804626464844, -1.1804428100585938, -23.860694885253906, -14.16000747680664, 31.8673095703125, 48.28480529785156, 111.83999633789062, 68.92261505126953, -29.819313049316406, 102.19593811035156, 3.613922119140625, 13.443243026733398, 127.25816345214844, -73.453125, 2.169189453125, 90.65641784667969, 130.34417724609375, 6.603099822998047, 53.673973083496094, 9.897289276123047, 14.782670974731445, 77.8267822265625, 94.19757080078125, -34.80064392089844, -72.1746826171875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000559.npy"}
|
||||
{"epoch": 0.8450491307634165, "step": 560, "batch_size": 64, "mean": 35.50212097167969, "std": 48.470401763916016, "min": -46.71095275878906, "p10": -20.56209487915039, "median": 27.52404022216797, "p90": 103.60194854736329, "max": 126.40963745117188, "pos_frac": 0.703125, "sample": [41.9678955078125, 40.519775390625, 104.64921569824219, 27.950416564941406, -35.262908935546875, -17.44599151611328, 119.0728530883789, 22.943374633789062, 77.08901977539062, 3.3904285430908203, 46.55012512207031, 17.086036682128906, 23.0689697265625, 33.181175231933594, -4.7072296142578125, -24.45013427734375, -0.83367919921875, -7.017621994018555, 13.66012954711914, 87.87614440917969, -21.348663330078125, 82.67561340332031, 27.09766387939453, 1.9521903991699219, 94.15884399414062, 84.17015075683594, -7.426063537597656, 97.48169708251953, 97.10081481933594, 19.31208610534668, 31.9215087890625, -4.427772521972656, -18.53589630126953, 126.40963745117188, 99.65023040771484, -46.71095275878906, 11.987388610839844, -15.238372802734375, 123.7928237915039, -37.094200134277344, 124.66058349609375, 86.5993881225586, 29.119956970214844, 55.69503402709961, 5.924949645996094, 81.49161529541016, -4.799522399902344, 119.93095397949219, 17.184032440185547, -4.6405029296875, -18.726768493652344, 39.7835693359375, -29.100830078125, -5.2125091552734375, 28.08966064453125, 73.4350814819336, -40.23902893066406, 32.46950149536133, 101.1583251953125, 43.265071868896484, 111.01731872558594, 78.55772399902344, 17.691242218017578, 12.564140319824219], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000560.npy"}
|
||||
{"epoch": 0.8465608465608465, "step": 561, "batch_size": 64, "mean": 37.81717300415039, "std": 54.14962387084961, "min": -69.43487548828125, "p10": -19.388770294189452, "median": 18.396292686462402, "p90": 112.79295654296875, "max": 133.85781860351562, "pos_frac": 0.765625, "sample": [3.2006607055664062, 2.8805465698242188, 8.279525756835938, 2.2746334075927734, 9.534242630004883, 83.60297393798828, 38.16214370727539, -16.461036682128906, 103.65655517578125, 15.812850952148438, 9.568450927734375, 77.94967651367188, 4.342060089111328, 85.81304931640625, 7.7793121337890625, -57.33591079711914, 18.716415405273438, 108.71548461914062, 133.85781860351562, 5.5009307861328125, -2.635711669921875, 88.97689819335938, 70.09941101074219, -20.469402313232422, 75.68135070800781, -4.098426818847656, 68.07238006591797, -37.38276290893555, -69.43487548828125, 127.2364501953125, -19.39295196533203, 37.24610900878906, 98.65997314453125, 112.45205688476562, 25.318939208984375, 11.610244750976562, 31.112998962402344, 24.389074325561523, 11.366884231567383, 128.45538330078125, 124.54640197753906, 124.53935241699219, 105.89779663085938, 131.4668731689453, 93.2969970703125, -59.006919860839844, -16.673362731933594, 39.420799255371094, 45.80909729003906, 18.076169967651367, -8.118072509765625, 9.255378723144531, 1.5131759643554688, -0.10628318786621094, 78.22412109375, 108.46453857421875, 12.608606338500977, 76.80926513671875, 83.93190002441406, -52.215057373046875, -5.588319778442383, 11.472185134887695, 112.93905639648438, -19.379013061523438], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000561.npy"}
|
||||
{"epoch": 0.8480725623582767, "step": 562, "batch_size": 64, "mean": 41.324989318847656, "std": 52.74250030517578, "min": -68.18370819091797, "p10": -19.098839569091798, "median": 36.26303482055664, "p90": 113.78890380859376, "max": 136.4716796875, "pos_frac": 0.71875, "sample": [8.926105499267578, 112.5734634399414, -50.950138092041016, 76.512939453125, 83.51066589355469, 94.6860580444336, -3.6434707641601562, 136.4716796875, 121.44099426269531, -19.245628356933594, 44.489166259765625, 37.190216064453125, -51.036865234375, 8.49746322631836, 34.41192626953125, 0.6355781555175781, -18.4036865234375, 48.252166748046875, 64.54379272460938, -12.816986083984375, 60.823875427246094, 112.12881469726562, -19.687889099121094, -11.388603210449219, -10.402793884277344, 121.86079406738281, -1.6291561126708984, 1.0762557983398438, 23.28564453125, 18.945865631103516, 97.75918579101562, -20.548141479492188, 114.30980682373047, 96.76809692382812, 36.59764862060547, 126.03103637695312, -35.42654800415039, 1.0975494384765625, -1.2879257202148438, 119.33306121826172, 12.129871368408203, -4.337394714355469, 61.52178955078125, 66.35655212402344, 59.41314697265625, 121.00737762451172, 104.60220336914062, 84.63497924804688, 94.48379516601562, 2.3927001953125, 111.49654388427734, 12.890045166015625, -1.8984012603759766, 30.06768035888672, 71.53057861328125, 45.202476501464844, 101.14427947998047, 94.99484252929688, 66.1633529663086, -68.18370819091797, -0.199310302734375, -18.756332397460938, 16.52190399169922, 35.92842102050781], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000562.npy"}
|
||||
{"epoch": 0.8495842781557067, "step": 563, "batch_size": 64, "mean": 37.51539993286133, "std": 58.296592712402344, "min": -108.1363296508789, "p10": -37.83309173583984, "median": 43.82720756530762, "p90": 115.72788696289062, "max": 130.28048706054688, "pos_frac": 0.734375, "sample": [21.66785430908203, 81.07349395751953, 29.525936126708984, 0.9644851684570312, 109.73329162597656, 66.23078918457031, -108.1363296508789, 117.42889404296875, 68.62120056152344, -19.297080993652344, 2.8468246459960938, 83.37666320800781, 0.636260986328125, -40.7188720703125, 111.95670318603516, 49.30389404296875, -77.6253662109375, 53.09711456298828, -21.17304229736328, 123.80595397949219, 31.409637451171875, 116.06124877929688, -33.66071319580078, 90.3510513305664, 94.75284576416016, 47.738128662109375, 73.46311950683594, 8.983680725097656, 54.86228942871094, -66.2366714477539, -4.2951812744140625, -19.97378158569336, -3.2350730895996094, 15.15582275390625, -10.72616195678711, 84.70123291015625, 17.908061981201172, -73.72259521484375, 60.485198974609375, 7.762828826904297, 124.57820892333984, 123.18865203857422, 104.93949127197266, -39.621253967285156, -28.434494018554688, 30.41082763671875, 42.54008483886719, -20.52904510498047, 118.04141998291016, 58.66564178466797, -62.298004150390625, 45.11433029174805, 52.567562103271484, 90.37776947021484, 79.44354248046875, 38.804115295410156, 130.28048706054688, 114.95004272460938, 8.298965454101562, 98.79208374023438, 59.67931365966797, -0.888671875, 78.96421813964844, 8.016632080078125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000563.npy"}
|
||||
{"epoch": 0.8510959939531368, "step": 564, "batch_size": 64, "mean": 45.69166564941406, "std": 56.98165512084961, "min": -111.473876953125, "p10": -19.997921562194815, "median": 44.76517677307129, "p90": 125.53953475952149, "max": 135.49447631835938, "pos_frac": 0.8125, "sample": [105.1666259765625, -0.24523162841796875, 126.6437759399414, 86.86725616455078, 80.72079467773438, 127.48892974853516, 92.18923950195312, -111.473876953125, 1.1981658935546875, -0.644378662109375, 130.0819091796875, 106.58094024658203, 28.579341888427734, -23.73906898498535, 14.22991943359375, 4.205863952636719, -11.268577575683594, 73.34089660644531, 62.843894958496094, 20.990821838378906, 22.490375518798828, 1.360311508178711, 74.12092590332031, 49.918914794921875, 106.47171020507812, 69.15548706054688, 125.77804565429688, -39.63865661621094, 3.983213424682617, 61.168426513671875, 126.40056610107422, 135.49447631835938, 130.85934448242188, 41.5937385559082, 17.307052612304688, -49.94908905029297, 75.15190124511719, 26.714298248291016, 36.59583282470703, 2.1097488403320312, 119.05535125732422, 40.527549743652344, 27.072113037109375, 0.4685192108154297, -41.515506744384766, 21.714569091796875, 41.35033416748047, 61.0535888671875, 58.601707458496094, -39.65541076660156, 76.55569458007812, 78.87080383300781, 63.4486083984375, -6.99726676940918, 98.48773193359375, 8.344612121582031, 104.07717895507812, 124.9830093383789, -95.28614807128906, 119.08358001708984, -9.13885498046875, 14.14157485961914, 47.936614990234375, 80.24287414550781], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000564.npy"}
|
||||
{"epoch": 0.8526077097505669, "step": 565, "batch_size": 64, "mean": 29.57483673095703, "std": 49.636539459228516, "min": -84.73301696777344, "p10": -30.508055496215814, "median": 21.66120719909668, "p90": 101.7699966430664, "max": 136.46475219726562, "pos_frac": 0.703125, "sample": [15.446277618408203, -22.509899139404297, -13.892555236816406, 24.784393310546875, -4.897636413574219, 13.484527587890625, 14.803375244140625, 15.332145690917969, -10.584075927734375, 8.195472717285156, 10.038078308105469, 55.19728088378906, 107.84918975830078, -84.73301696777344, 5.9446868896484375, 82.52796936035156, 36.12814712524414, 87.10594940185547, 136.46475219726562, 127.46001434326172, 96.96196746826172, -39.53974914550781, -1.5823745727539062, 5.103233337402344, 47.2914924621582, 50.064300537109375, 12.524559020996094, 23.117332458496094, 31.18915557861328, 48.870994567871094, 101.9132080078125, 2.835672378540039, -3.5069808959960938, -33.504085540771484, 66.02108764648438, -34.52241897583008, 15.561525344848633, 79.33399963378906, -55.114463806152344, 26.854080200195312, 47.544273376464844, -22.503482818603516, -23.517318725585938, 55.10044860839844, -9.566558837890625, 63.082828521728516, 10.873321533203125, 101.43583679199219, 127.72489929199219, -8.013755798339844, 72.60818481445312, -17.091934204101562, -47.021697998046875, -4.686397552490234, 24.639812469482422, 22.728374481201172, -40.2003173828125, 85.30081176757812, 20.594039916992188, 56.477264404296875, 126.16383361816406, 67.81700897216797, 33.00501251220703, 106.27742767333984], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000565.npy"}
|
||||
{"epoch": 0.854119425547997, "step": 566, "batch_size": 64, "mean": 32.56593322753906, "std": 60.351539611816406, "min": -124.20600891113281, "p10": -39.6252456665039, "median": 30.813118934631348, "p90": 112.42254791259766, "max": 134.04148864746094, "pos_frac": 0.703125, "sample": [71.38095092773438, 22.991416931152344, -22.317771911621094, 42.908721923828125, 5.5841064453125, 51.1852912902832, 123.51312255859375, 66.71263885498047, 4.687894821166992, 85.80003356933594, 36.41628646850586, 17.079330444335938, 29.34421730041504, 26.562862396240234, 20.679283142089844, 98.69268798828125, -124.20600891113281, 99.33844757080078, 102.36570739746094, 76.68389129638672, 111.4554443359375, -3.922985076904297, -28.28862762451172, 72.97917938232422, 1.4707317352294922, -20.285507202148438, 77.24803161621094, 113.01411437988281, 107.2734375, -11.564361572265625, 90.73493957519531, 81.94701385498047, 92.86531066894531, 35.015106201171875, 23.0692138671875, -113.47888946533203, -47.80530548095703, 45.8923454284668, 119.26063537597656, -54.79241180419922, 20.395339965820312, -25.9439697265625, 134.04148864746094, -28.791770935058594, 112.83702087402344, 18.932769775390625, -40.66436004638672, -75.79271697998047, 32.282020568847656, 0.5030059814453125, -13.594413757324219, -4.025276184082031, -16.292343139648438, 96.96656799316406, -12.711254119873047, 32.378360748291016, -37.200645446777344, 117.87476348876953, 42.018836975097656, -67.6256103515625, 79.06950378417969, 34.64232635498047, 28.058258056640625, 129.37139892578125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000566.npy"}
|
||||
{"epoch": 0.8556311413454271, "step": 567, "batch_size": 64, "mean": 37.026668548583984, "std": 50.090728759765625, "min": -65.40704345703125, "p10": -31.638343429565428, "median": 34.958927154541016, "p90": 109.00868301391603, "max": 127.4762954711914, "pos_frac": 0.765625, "sample": [63.41490173339844, 8.559459686279297, -6.97308349609375, 117.99871063232422, -9.64666748046875, 66.70355987548828, 127.4762954711914, 92.19740295410156, 120.48080444335938, -2.3006210327148438, 89.38795471191406, 41.73841857910156, 32.72285461425781, -40.19944763183594, 41.3546142578125, 2.6041488647460938, -4.4585723876953125, 9.974761962890625, 1.3210067749023438, 44.312435150146484, 28.22918701171875, 3.2021255493164062, 0.7704753875732422, 91.92093658447266, -7.669715881347656, 0.5482540130615234, -65.40704345703125, -36.824562072753906, 45.63634490966797, -37.20787811279297, 100.53810119628906, 27.803634643554688, 110.60292053222656, 115.08343505859375, 105.2887954711914, 1.8707389831542969, 51.606285095214844, 14.982528686523438, 37.22279357910156, 125.99720764160156, -48.953399658203125, 59.128883361816406, 2.379119873046875, 27.282546997070312, 85.79522705078125, 83.76628112792969, -4.024559020996094, 10.53564453125, 100.93048858642578, 79.95484924316406, 50.66847229003906, 4.262287139892578, 102.13134765625, -0.6348876953125, 7.7054901123046875, -32.889625549316406, -28.718685150146484, 57.901824951171875, 123.55931091308594, -33.678802490234375, 47.041561126708984, 37.19499969482422, 60.612937927246094, 66.89181518554688], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000567.npy"}
|
||||
{"epoch": 0.8571428571428571, "step": 568, "batch_size": 64, "mean": 42.83213806152344, "std": 50.041664123535156, "min": -49.612457275390625, "p10": -14.13071670532226, "median": 34.210975646972656, "p90": 112.7650520324707, "max": 158.33969116210938, "pos_frac": 0.8125, "sample": [54.520164489746094, 2.519441604614258, 68.64016723632812, 5.3024749755859375, 52.693416595458984, 70.971435546875, 67.9934310913086, -43.07499694824219, 19.446990966796875, -22.710433959960938, -49.612457275390625, 104.59029388427734, 13.424173355102539, 98.02920532226562, -16.342788696289062, -42.43467712402344, 22.25267791748047, 8.130271911621094, 47.24712371826172, 69.56842041015625, -35.06781005859375, 3.45513916015625, 6.6175031661987305, 102.3445816040039, 110.77041625976562, 98.85250091552734, 77.24933624267578, 131.15902709960938, -40.8164176940918, 14.278656005859375, 46.60072326660156, 42.13831329345703, 38.34429931640625, -8.969215393066406, 80.16230773925781, 16.59404945373535, -1.826812744140625, 27.04882049560547, -2.5088024139404297, 84.59778594970703, 30.077651977539062, 114.4283218383789, 48.45985412597656, 49.4393310546875, 99.14105987548828, 12.026374816894531, 20.717056274414062, -2.42132568359375, 17.795425415039062, 13.587783813476562, 51.22483825683594, 90.43475341796875, -2.022125244140625, 100.1921157836914, 108.70671081542969, 24.15717887878418, 0.92205810546875, 115.85342407226562, 113.6198959350586, 113.75027465820312, 0.0921783447265625, 16.52924346923828, 124.02621459960938, 158.33969116210938], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000568.npy"}
|
||||
{"epoch": 0.8586545729402872, "step": 569, "batch_size": 64, "mean": 27.106124877929688, "std": 52.758731842041016, "min": -90.76409912109375, "p10": -30.572161102294913, "median": 20.598179817199707, "p90": 113.0834327697754, "max": 132.3479766845703, "pos_frac": 0.71875, "sample": [109.61680603027344, 2.1800994873046875, 12.38304328918457, 114.56912994384766, 28.508407592773438, 119.01109313964844, -8.634017944335938, 36.293975830078125, 34.5704345703125, 125.85684967041016, 71.693115234375, -8.143745422363281, 7.351482391357422, 29.385488510131836, -90.76409912109375, 44.24884796142578, 11.79110336303711, 115.54925537109375, -2.984273910522461, 55.978981018066406, 106.67390441894531, 27.06585693359375, 14.572551727294922, 125.76288604736328, -4.7043914794921875, 102.15220642089844, 46.57777404785156, 11.184268951416016, 6.3194732666015625, -72.10904693603516, -85.01642608642578, 22.12112808227539, -6.5738525390625, 18.806316375732422, -13.251495361328125, 3.8849945068359375, 132.3479766845703, 9.924272537231445, 24.363250732421875, 12.850435256958008, 35.549102783203125, -18.727203369140625, -40.3262939453125, -2.1295242309570312, 45.477996826171875, -69.5138168334961, 20.707931518554688, -22.349441528320312, 25.77889633178711, 63.052452087402344, 27.627960205078125, 17.935897827148438, 104.17002868652344, -15.235000610351562, -48.027137756347656, 20.488428115844727, 55.897499084472656, 14.622726440429688, 104.07036590576172, -7.8921661376953125, -34.09618377685547, 21.46044921875, 115.70899963378906, 29.126014709472656], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000569.npy"}
|
||||
{"epoch": 0.8601662887377173, "step": 570, "batch_size": 64, "mean": 49.55414962768555, "std": 52.08034896850586, "min": -75.24099731445312, "p10": -15.125679206848135, "median": 43.219635009765625, "p90": 121.4651092529297, "max": 150.27200317382812, "pos_frac": 0.796875, "sample": [111.13813018798828, -47.656837463378906, 86.47802734375, 20.446788787841797, 72.56818389892578, 30.811912536621094, -25.235610961914062, -5.3408050537109375, 43.25659942626953, -1.88299560546875, 27.427207946777344, 31.88922119140625, 21.980283737182617, 84.50115966796875, -0.16747283935546875, 95.36546325683594, -2.046476364135742, -0.3541069030761719, 38.556854248046875, 89.42669677734375, 71.21690368652344, -36.6750373840332, 17.98003387451172, 101.24578857421875, 129.42352294921875, 0.6370906829833984, 63.77976608276367, 75.58872985839844, 121.37216186523438, 15.143714904785156, 17.985618591308594, 86.13367462158203, 89.16146850585938, -38.49391174316406, 26.437984466552734, 43.18267059326172, 99.57176971435547, 69.78736877441406, 121.77281951904297, 41.89753723144531, 124.85726928710938, 85.94891357421875, 121.50494384765625, -6.047109603881836, 19.755813598632812, 48.02105712890625, 86.5079345703125, 107.04847717285156, 56.23741149902344, -75.24099731445312, 107.40955352783203, -28.48175811767578, 18.21490478515625, 92.66658782958984, 125.43348693847656, 15.42132568359375, 37.137168884277344, 2.202180862426758, 52.0296630859375, 111.33995819091797, 150.27200317382812, 21.29526138305664, 128.63616943359375, -19.016494750976562], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000570.npy"}
|
||||
{"epoch": 0.8616780045351474, "step": 571, "batch_size": 64, "mean": 28.562530517578125, "std": 51.14894104003906, "min": -106.42486572265625, "p10": -37.255176544189446, "median": 34.363956451416016, "p90": 88.71677246093752, "max": 115.51457977294922, "pos_frac": 0.765625, "sample": [71.91828155517578, 33.37006378173828, 22.045700073242188, 14.821554183959961, 1.8476104736328125, 91.14553833007812, 22.786788940429688, 54.543243408203125, 71.66449737548828, 79.58294677734375, 46.80339050292969, 68.6233139038086, -24.144729614257812, 8.56778335571289, -2.7666854858398438, 70.75648498535156, 16.5096435546875, 76.27639770507812, 1.86944580078125, 79.95904541015625, 100.04707336425781, 0.3875732421875, 62.50665283203125, 48.88523864746094, 93.99617767333984, 81.93368530273438, 58.92637634277344, 34.565818786621094, -106.42486572265625, 44.10009002685547, -2.0408077239990234, -62.25770568847656, 1.0455398559570312, -54.44628143310547, 34.16209411621094, 0.2819194793701172, -39.875221252441406, -1.0701408386230469, -31.141738891601562, 1.02239990234375, 75.2104721069336, 47.79707336425781, 83.04965209960938, 42.55921936035156, 24.118846893310547, 115.51457977294922, 94.86271667480469, -89.93497467041016, 14.385673522949219, 7.147186279296875, -26.065948486328125, -82.56930541992188, 36.545692443847656, 59.317222595214844, 51.722408294677734, -11.345115661621094, 52.060577392578125, 102.36259460449219, 69.44730377197266, 66.11143493652344, -20.650768280029297, 2.139190673828125, -70.82267761230469, 114.25465393066406], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000571.npy"}
|
||||
{"epoch": 0.8631897203325775, "step": 572, "batch_size": 64, "mean": 36.385189056396484, "std": 44.53242874145508, "min": -38.50425720214844, "p10": -14.020532226562498, "median": 33.03816795349121, "p90": 105.256990814209, "max": 130.221435546875, "pos_frac": 0.75, "sample": [8.289924621582031, -33.639060974121094, 6.635047912597656, 64.79573059082031, 66.92745208740234, 27.90740966796875, 111.53499603271484, 24.41527557373047, -12.274606704711914, 130.221435546875, 7.0462493896484375, 79.35309600830078, 38.0360107421875, 106.64266204833984, -4.237689971923828, 99.15320587158203, -11.2869873046875, 21.333974838256836, 11.270597457885742, 71.86146545410156, 52.750308990478516, 73.2645263671875, 32.92951965332031, 102.02375793457031, 89.95301055908203, 47.25944519042969, -6.941032409667969, 58.184791564941406, -10.527763366699219, -1.9609222412109375, 114.16854095458984, 114.9009780883789, 112.44462585449219, -32.829132080078125, 75.23733520507812, 33.14681625366211, 29.703628540039062, -38.47203063964844, 2.141937255859375, 47.21546173095703, 56.368446350097656, 25.298852920532227, 89.02310943603516, 44.0578727722168, 33.20782470703125, 44.93250274658203, 15.90576171875, -38.50425720214844, 124.2786865234375, 41.2286376953125, 47.38304901123047, 56.10955810546875, -14.634979248046875, 21.417572021484375, -32.2200927734375, -28.195510864257812, 4.627265930175781, 63.8111572265625, -5.531345367431641, -12.586822509765625, 51.739349365234375, -0.46903419494628906, 15.825384140014648, 16.999116897583008], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000572.npy"}
|
||||
{"epoch": 0.8647014361300076, "step": 573, "batch_size": 64, "mean": 31.338947296142578, "std": 59.70273208618164, "min": -105.8660888671875, "p10": -46.83993301391601, "median": 28.399023056030273, "p90": 121.1054069519043, "max": 146.7135467529297, "pos_frac": 0.734375, "sample": [23.041549682617188, 61.753501892089844, -101.97135925292969, 101.58787536621094, -21.751693725585938, 17.138458251953125, -6.512716293334961, 23.70691680908203, 17.979537963867188, -50.228668212890625, -105.8660888671875, -71.7884292602539, -6.36488151550293, 8.327835083007812, -0.5760345458984375, 34.906219482421875, 16.173141479492188, 44.41963195800781, 128.74765014648438, -90.39964294433594, 31.58965301513672, 32.22118377685547, 120.94670104980469, 29.92499542236328, -21.5843505859375, 2.138965606689453, 146.7135467529297, 12.229621887207031, 95.27220916748047, 17.680679321289062, 29.33734130859375, 8.261016845703125, 37.81303405761719, 46.608154296875, 64.6055908203125, 27.460704803466797, 31.3939208984375, 21.216175079345703, 27.28400421142578, 136.8612060546875, 109.01609802246094, 92.9604721069336, 73.82035827636719, -0.41889381408691406, -11.847640991210938, 34.25965118408203, 134.9199676513672, 7.9718780517578125, 27.42998504638672, 31.65306854248047, 121.17342376708984, 122.09481811523438, 58.268218994140625, 69.76058197021484, -5.954273223876953, -86.16986083984375, 132.10955810546875, -38.932884216308594, 59.69229507446289, 70.31394958496094, 46.05003356933594, -7.669342041015625, -62.03746032714844, 106.93145751953125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000573.npy"}
|
||||
{"epoch": 0.8662131519274376, "step": 574, "batch_size": 64, "mean": 40.49908447265625, "std": 46.57517623901367, "min": -72.09241485595703, "p10": -7.720802307128906, "median": 34.689022064208984, "p90": 116.24143447875977, "max": 131.11328125, "pos_frac": 0.8125, "sample": [40.575714111328125, -30.265792846679688, -5.452964782714844, -72.09241485595703, 27.529495239257812, 71.23041534423828, 16.30419921875, 0.6312942504882812, 4.982263565063477, -13.421409606933594, -7.4310760498046875, 38.00468444824219, 11.332626342773438, 10.205987930297852, -4.536262512207031, -4.6124725341796875, -10.935070037841797, 16.322921752929688, 25.397048950195312, 63.26654815673828, 104.73095703125, 34.659996032714844, 65.81007385253906, 78.18585205078125, -0.9108657836914062, 131.11328125, 21.050086975097656, 90.22799682617188, 0.7667446136474609, 116.64948272705078, 37.06549072265625, 0.2653083801269531, 54.712650299072266, 21.6021728515625, -7.844970703125, 69.03376770019531, -30.007766723632812, 67.57415771484375, 122.30818176269531, 3.575288772583008, 63.757080078125, 21.01544189453125, 125.17935180664062, 64.02337646484375, 91.36949157714844, 126.65589141845703, 57.32695007324219, 34.718048095703125, 44.55873107910156, 21.474327087402344, 8.385604858398438, 125.1849365234375, 20.483291625976562, 40.81639862060547, 77.55976104736328, -13.230857849121094, 41.669734954833984, 121.43208312988281, 98.35237884521484, 115.28932189941406, 0.7913131713867188, 45.22157287597656, 98.36691284179688, 3.9365386962890625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000574.npy"}
|
||||
{"epoch": 0.8677248677248677, "step": 575, "batch_size": 64, "mean": 27.361236572265625, "std": 58.25798797607422, "min": -104.75616455078125, "p10": -52.02212638854979, "median": 18.843873977661133, "p90": 115.30205841064456, "max": 136.94277954101562, "pos_frac": 0.671875, "sample": [41.472557067871094, -73.139404296875, -80.78288269042969, -6.871789932250977, 118.75808715820312, 6.111175537109375, -0.6739425659179688, 52.68122100830078, -6.153694152832031, -14.374378204345703, 84.04783630371094, -5.844978332519531, 131.2635955810547, -30.883258819580078, -70.85783386230469, 49.47972106933594, -70.70551300048828, 32.62681579589844, 8.430992126464844, 19.29027557373047, 37.486183166503906, 103.837158203125, 77.87934875488281, 20.737380981445312, 4.373908996582031, 8.240890502929688, -57.442665100097656, 62.36609649658203, 84.42440795898438, -0.36017608642578125, 22.84467887878418, 36.46739959716797, 119.09063720703125, 135.37474060058594, 100.65988159179688, -57.80535125732422, 136.94277954101562, -18.26251220703125, 131.622802734375, 91.77195739746094, -6.489288330078125, 19.07514190673828, 63.549034118652344, 5.71722412109375, -1.9531707763671875, 3.1238021850585938, 10.446538925170898, -104.75616455078125, 34.64564514160156, 134.60665893554688, 107.23799133300781, 36.86082458496094, 8.361557006835938, 23.747783660888672, -39.374202728271484, 18.113380432128906, 18.612606048583984, -12.327587127685547, 18.520883560180664, 68.61103820800781, -14.428077697753906, 76.11051940917969, 83.15165710449219, -24.168888092041016], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000575.npy"}
|
||||
{"epoch": 0.8692365835222978, "step": 576, "batch_size": 64, "mean": 40.50676345825195, "std": 55.97388458251953, "min": -102.35017395019531, "p10": -22.796404838562008, "median": 32.683135986328125, "p90": 117.4791748046875, "max": 135.5491180419922, "pos_frac": 0.6875, "sample": [40.675628662109375, 116.63786315917969, -2.4985809326171875, 23.93040657043457, -3.87371826171875, 32.62200927734375, 127.10823059082031, 24.957910537719727, 35.162933349609375, 57.217308044433594, -14.547239303588867, -28.181549072265625, 88.78819274902344, -55.49433898925781, 64.8785171508789, -62.42897033691406, -32.521751403808594, 95.89313507080078, 135.5491180419922, -5.80059814453125, 101.25562286376953, 124.3543701171875, 74.02891540527344, 0.2000408172607422, 99.20294952392578, 61.594871520996094, 32.7442626953125, -27.262453079223633, 126.34587860107422, -20.78484344482422, 10.412940979003906, 1.6058731079101562, 21.377044677734375, 101.62142944335938, 56.585479736328125, 9.670417785644531, 133.5034942626953, -1.4442901611328125, -1.031768798828125, 10.216484069824219, 102.94059753417969, -1.0351524353027344, 117.83973693847656, -6.648256301879883, 114.84962463378906, 2.7718353271484375, 134.13136291503906, 84.14447021484375, -0.4602069854736328, 113.78214263916016, 87.56239318847656, -102.35017395019531, 95.57090759277344, 12.053985595703125, 36.24744415283203, 80.83755493164062, -23.65850257873535, -2.8585243225097656, 50.96160125732422, 70.52278900146484, -0.9463291168212891, 15.273370742797852, -4.710289001464844, 63.33927917480469], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000576.npy"}
|
||||
{"epoch": 0.8707482993197279, "step": 577, "batch_size": 64, "mean": 30.416513442993164, "std": 55.22623825073242, "min": -113.75109100341797, "p10": -18.528007316589353, "median": 29.729833602905273, "p90": 113.56436157226562, "max": 127.59387969970703, "pos_frac": 0.734375, "sample": [30.379886627197266, 4.002296447753906, 29.07978057861328, 34.077880859375, -8.391014099121094, 75.70516967773438, 2.9179115295410156, -14.107938766479492, 116.67147064208984, -14.868659973144531, -6.669380187988281, 99.72880554199219, 64.93116760253906, -13.455718994140625, -36.90446853637695, 123.96564483642578, 64.78994750976562, 113.83049774169922, -113.75109100341797, 49.305198669433594, 113.07591247558594, 56.045021057128906, 6.9256744384765625, 104.3900146484375, -4.436759948730469, -44.684112548828125, -76.23641967773438, 78.18952178955078, 77.59180450439453, 5.8319091796875, 15.6572265625, 59.538875579833984, 1.05914306640625, 31.062837600708008, 10.4603271484375, -13.445648193359375, 30.560649871826172, 19.171661376953125, 32.296531677246094, -1.1160774230957031, 26.75921630859375, 27.465316772460938, 113.77369689941406, 8.692121505737305, 126.98762512207031, 122.53272247314453, -102.35369110107422, 55.10130310058594, 40.19378662109375, 42.187591552734375, -3.3330535888671875, 57.77813720703125, 127.59387969970703, 103.05033874511719, 36.96051025390625, 4.208316802978516, -95.00569152832031, -19.200597763061523, -16.958629608154297, 61.14151382446289, 14.1524658203125, 60.0291748046875, 48.016326904296875, 3.709075927734375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000577.npy"}
|
||||
{"epoch": 0.872260015117158, "step": 578, "batch_size": 64, "mean": 24.863304138183594, "std": 51.31336975097656, "min": -90.51600646972656, "p10": -23.40001220703125, "median": 11.921247482299805, "p90": 102.46069030761721, "max": 119.4476318359375, "pos_frac": 0.625, "sample": [40.9097900390625, -22.50176239013672, 19.9708251953125, 16.9046630859375, 13.165031433105469, -18.124666213989258, 27.685665130615234, 5.019889831542969, 79.04696655273438, -1.2276897430419922, -7.594146728515625, 90.30645751953125, -7.078823089599609, 53.58015060424805, 106.83877563476562, -90.51600646972656, 105.36572265625, 85.13471984863281, 68.49120330810547, 95.68228149414062, 111.98207092285156, 31.073707580566406, 5.65507698059082, 10.67746353149414, -16.49907875061035, 54.94525909423828, 89.48819732666016, -2.2262954711914062, 35.366607666015625, -1.0342216491699219, 92.97069549560547, 74.96884155273438, 119.4476318359375, -8.264495849609375, -19.208526611328125, 5.3472747802734375, 9.135690689086914, 68.55369567871094, 18.516769409179688, 31.219797134399414, 94.25437927246094, -82.64413452148438, 19.196609497070312, -24.289583206176758, -11.420242309570312, -11.771110534667969, -4.884422302246094, 115.30606842041016, -23.697711944580078, 113.75077819824219, 26.956459045410156, -51.155662536621094, 112.45048522949219, -2.2091617584228516, 1.3811111450195312, -38.4737548828125, 5.203071594238281, 17.659141540527344, -20.088722229003906, 2.945680618286133, -22.705379486083984, -3.2770309448242188, -71.02296447753906, 76.61235046386719], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000578.npy"}
|
||||
{"epoch": 0.873771730914588, "step": 579, "batch_size": 64, "mean": 45.14439392089844, "std": 49.6462516784668, "min": -79.97087097167969, "p10": -10.49599151611328, "median": 37.539424896240234, "p90": 118.9173683166504, "max": 135.75082397460938, "pos_frac": 0.796875, "sample": [57.70532989501953, 44.180885314941406, 47.713836669921875, 23.924827575683594, 36.87718963623047, -45.100921630859375, 98.69685363769531, 34.7332763671875, 73.06146240234375, 82.85050964355469, 97.52571868896484, 114.89555358886719, 78.65260314941406, 132.8712158203125, 86.58888244628906, 26.7491455078125, 119.75373840332031, 3.88970947265625, -22.022361755371094, 26.419418334960938, 135.75082397460938, -48.38193893432617, 121.54983520507812, -0.3035717010498047, 18.129547119140625, 56.22664260864258, 126.01264953613281, 22.964630126953125, 119.53587341308594, 88.27064514160156, -19.266990661621094, 26.72246551513672, 117.47418975830078, 16.88688087463379, 57.174476623535156, 71.65357971191406, 67.93266296386719, 12.390459060668945, -2.770477294921875, 102.875732421875, -10.993782043457031, 38.1917724609375, -2.2789154052734375, -19.88275909423828, 30.580745697021484, 5.136293411254883, -5.090152740478516, -79.97087097167969, 128.40908813476562, 36.88707733154297, 21.10205078125, 42.185211181640625, -3.5793609619140625, 98.55713653564453, 49.123477935791016, 9.777631759643555, 98.66796875, 86.51432800292969, 18.11372947692871, 45.315521240234375, -9.334480285644531, 8.918872833251953, 69.0866470336914, 23.008869171142578], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000579.npy"}
|
||||
{"epoch": 0.8752834467120182, "step": 580, "batch_size": 64, "mean": 32.1077766418457, "std": 57.12873458862305, "min": -99.20489501953125, "p10": -32.84502906799316, "median": 22.740050315856934, "p90": 115.3553421020508, "max": 133.88131713867188, "pos_frac": 0.703125, "sample": [-73.43930053710938, -46.927284240722656, -10.400299072265625, 21.79139518737793, -12.334304809570312, 29.994428634643555, 101.51188659667969, 59.08953857421875, 43.0172119140625, 1.7665863037109375, 112.27104187011719, 63.04743194580078, -8.267013549804688, 7.561092376708984, -17.941078186035156, 24.405370712280273, 82.43433380126953, 59.289710998535156, -0.33501243591308594, 21.614608764648438, -2.8604164123535156, 63.02510070800781, 133.88131713867188, 30.307388305664062, 89.97260284423828, -4.0361328125, 116.67718505859375, -0.6286392211914062, 5.462099075317383, 125.55563354492188, 119.40389251708984, 123.28352355957031, 4.455772399902344, 68.66991424560547, -30.357391357421875, 133.0423583984375, -33.91115951538086, -99.20489501953125, 5.890987396240234, 90.94833374023438, -90.2601318359375, 13.302337646484375, -48.01342010498047, 31.336517333984375, 125.23683166503906, -14.587831497192383, 12.789876937866211, 18.609079360961914, 105.9019775390625, 39.139801025390625, 91.75732421875, 88.55217742919922, -5.132560729980469, -88.38312530517578, 9.992237091064453, -17.096555709838867, 71.45533752441406, 85.9376449584961, 63.387733459472656, 46.410736083984375, 23.688705444335938, 7.5476531982421875, 21.3629150390625, 64.23463439941406], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000580.npy"}
|
||||
{"epoch": 0.8767951625094482, "step": 581, "batch_size": 64, "mean": 27.898914337158203, "std": 52.04502868652344, "min": -82.2335205078125, "p10": -35.54132690429687, "median": 23.528608322143555, "p90": 90.87611160278321, "max": 141.14462280273438, "pos_frac": 0.640625, "sample": [81.24176025390625, 69.11780548095703, -0.18418121337890625, 8.116729736328125, 47.00474548339844, -3.5581817626953125, -17.635787963867188, -49.677276611328125, 42.41316223144531, 44.277687072753906, -28.702545166015625, 9.452278137207031, 11.138654708862305, 29.52107810974121, 60.852272033691406, -50.31388854980469, 89.42779541015625, 13.115402221679688, 77.06336212158203, 4.767908096313477, 114.86134338378906, 8.929794311523438, 122.93365478515625, -26.886962890625, 132.53797912597656, 130.84353637695312, 77.13026428222656, -4.917388916015625, 19.08617401123047, 41.947601318359375, -14.824554443359375, 55.45069122314453, 11.709590911865234, 59.03643035888672, 61.855712890625, 67.23811340332031, 33.27851104736328, -82.2335205078125, 141.14462280273438, 91.49681854248047, -0.774932861328125, -16.712799072265625, 42.235076904296875, -12.280105590820312, 34.09367752075195, -68.40879821777344, 40.098114013671875, -38.099639892578125, -2.870403289794922, -71.53659057617188, 87.4570541381836, -1.6188125610351562, 62.74397277832031, 76.27090454101562, 10.893543243408203, 39.82124328613281, -1.3546142578125, -44.383785247802734, -29.571929931640625, 27.97104263305664, -2.113983154296875, 76.43685913085938, 111.34500122070312, -12.16677474975586], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000581.npy"}
|
||||
{"epoch": 0.8783068783068783, "step": 582, "batch_size": 64, "mean": 36.49109649658203, "std": 53.84037780761719, "min": -93.63992309570312, "p10": -19.14667892456054, "median": 28.241047859191895, "p90": 106.26834106445312, "max": 146.0777587890625, "pos_frac": 0.78125, "sample": [89.90935516357422, 127.01133728027344, 12.674842834472656, 50.95301818847656, 16.58875274658203, 4.267993927001953, 104.81011962890625, 57.657535552978516, 74.55792999267578, 108.08805847167969, 29.513504028320312, 29.63983917236328, 89.55326843261719, 146.0777587890625, -0.8065338134765625, 105.01747131347656, -2.23944091796875, -13.720840454101562, 54.374168395996094, 70.44503021240234, 6.0942840576171875, 81.3853988647461, 21.232563018798828, 113.09333801269531, 79.25216674804688, 99.71971130371094, 26.968591690063477, 23.59861183166504, -81.50186157226562, -84.11030578613281, 2.9710636138916016, 30.89093017578125, 72.4872055053711, -21.950340270996094, -9.044565200805664, 80.91424560546875, 89.80029296875, 66.80049133300781, 113.45142364501953, 0.8783454895019531, 106.80442810058594, 52.423439025878906, 20.43635368347168, 11.034303665161133, -4.805295944213867, -13.240150451660156, -25.25566864013672, 52.75946807861328, 8.060615539550781, -93.63992309570312, 3.751668930053711, 7.734951019287109, 0.5537910461425781, 0.7888259887695312, -62.812782287597656, 61.402862548828125, 91.63156127929688, -11.702384948730469, 118.8952407836914, -21.47203826904297, 0.096038818359375, 2.9208450317382812, 62.67754364013672, 99.0816650390625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000582.npy"}
|
||||
{"epoch": 0.8798185941043084, "step": 583, "batch_size": 64, "mean": 36.420982360839844, "std": 61.32170104980469, "min": -120.07902526855469, "p10": -30.205366134643555, "median": 29.301576614379883, "p90": 113.07957992553712, "max": 136.1431121826172, "pos_frac": 0.671875, "sample": [22.271427154541016, -4.876861572265625, -30.789878845214844, -13.020366668701172, 132.02169799804688, 48.38223648071289, -55.816978454589844, -22.380859375, -75.23839569091797, -16.226089477539062, 85.67457580566406, -28.841503143310547, -4.369937896728516, 97.07479858398438, 51.3008918762207, -19.46172332763672, 73.46647644042969, -4.322931289672852, 136.1431121826172, 114.70917510986328, 61.56907653808594, -82.73655700683594, 84.88374328613281, -0.7625350952148438, 26.566055297851562, 85.37615966796875, 104.1162109375, -84.390869140625, 22.887561798095703, 97.25138092041016, 49.43813705444336, 24.640729904174805, 8.002822875976562, 27.942657470703125, 29.127197265625, 130.9469757080078, -36.12144470214844, -18.85022735595703, 89.97026824951172, 124.38673400878906, 29.475955963134766, 126.73806762695312, -120.07902526855469, 109.27719116210938, 16.844636917114258, 101.56430053710938, 84.47941589355469, 38.84912109375, 70.5444107055664, -27.40399932861328, 95.44781494140625, 30.821121215820312, 100.94226837158203, 4.299554824829102, 40.82896041870117, 10.8182373046875, -20.10673713684082, 62.249359130859375, -0.19934844970703125, 107.63978576660156, 133.995849609375, 108.48441314697266, -7.7345428466796875, 3.2232589721679688], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000583.npy"}
|
||||
{"epoch": 0.8813303099017384, "step": 584, "batch_size": 64, "mean": 34.65182876586914, "std": 60.765785217285156, "min": -103.60335540771484, "p10": -34.093933486938475, "median": 26.351157188415527, "p90": 110.79853515625, "max": 149.57342529296875, "pos_frac": 0.671875, "sample": [132.33291625976562, 91.32797241210938, 62.6430549621582, -2.8828964233398438, 30.38623046875, -5.949457168579102, 137.89865112304688, 110.9076156616211, 86.74816131591797, 87.53910827636719, 35.62789535522461, -11.799772262573242, -2.966867446899414, -32.667510986328125, -56.776424407958984, 82.5763931274414, 69.36846160888672, 115.83357238769531, 100.50230407714844, -11.932247161865234, 96.91130828857422, 107.47135925292969, -3.6847763061523438, 85.13507080078125, 73.79214477539062, 90.97578430175781, 22.451391220092773, 7.027885437011719, 5.864768981933594, 13.105911254882812, 110.54401397705078, -49.089599609375, 78.24836730957031, 14.49212646484375, -34.705257415771484, 95.34209442138672, -22.473724365234375, -31.372421264648438, -8.009590148925781, -10.22193717956543, 9.966381072998047, 64.20038604736328, -55.33294677734375, 35.15779113769531, 75.78998565673828, 88.25775146484375, -0.9808292388916016, 46.89591979980469, 50.96107482910156, -30.52672576904297, -99.90090942382812, 23.865428924560547, -29.17304229736328, 116.25872802734375, 2.192058563232422, 149.57342529296875, 136.51593017578125, 4.318948745727539, 26.849348068237305, 25.85296630859375, 22.362525939941406, 79.35188293457031, -81.65774536132812, -103.60335540771484], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000584.npy"}
|
||||
{"epoch": 0.8828420256991686, "step": 585, "batch_size": 64, "mean": 25.791934967041016, "std": 52.77466583251953, "min": -134.00579833984375, "p10": -22.747693634033194, "median": 19.141655921936035, "p90": 104.50587158203128, "max": 129.68296813964844, "pos_frac": 0.78125, "sample": [20.352771759033203, 26.619400024414062, 31.035064697265625, 7.1259307861328125, 45.1650390625, 45.614471435546875, 15.638160705566406, 107.16421508789062, 4.718109130859375, 3.2515792846679688, -111.90910339355469, 4.595645904541016, 47.701019287109375, 9.02288818359375, 0.70355224609375, 98.30307006835938, -71.28572082519531, -4.046257019042969, 21.606796264648438, 48.550689697265625, 80.76193237304688, 8.710184097290039, -0.3725414276123047, -38.49424743652344, 28.889245986938477, 53.88935089111328, 11.39935302734375, -0.4899749755859375, 88.48390197753906, 49.691932678222656, -3.832225799560547, -0.238525390625, 49.63473892211914, -0.4282989501953125, -84.87474060058594, 3.000030517578125, 6.733726501464844, 77.96150970458984, 109.10891723632812, 61.697879791259766, 15.531726837158203, 83.5491714477539, 29.29273223876953, -55.009483337402344, -134.00579833984375, 125.60700988769531, 129.68296813964844, 124.13375854492188, -26.96289825439453, -12.912216186523438, 20.3236026763916, 10.839012145996094, 6.7387237548828125, 19.59544563293457, 17.44048309326172, 10.747306823730469, 18.6878662109375, 118.044189453125, 126.37772369384766, 56.82524108886719, 32.71570587158203, 15.342559814453125, 42.78843688964844, 24.151206970214844], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000585.npy"}
|
||||
{"epoch": 0.8843537414965986, "step": 586, "batch_size": 64, "mean": 38.363807678222656, "std": 45.646156311035156, "min": -79.66997528076172, "p10": -6.2433317184448205, "median": 29.72132968902588, "p90": 107.23784942626953, "max": 129.9061279296875, "pos_frac": 0.8125, "sample": [113.07537841796875, 3.1961746215820312, 68.5326919555664, 44.71662139892578, 13.2410888671875, 111.03759765625, 29.38228416442871, 5.4293060302734375, 6.247528076171875, -2.8932323455810547, 56.71337127685547, -8.063392639160156, 10.484237670898438, 129.9061279296875, 116.78102111816406, -10.35536003112793, 45.342857360839844, 72.33978271484375, 20.248830795288086, 91.98784637451172, 11.400798797607422, 74.07848358154297, 20.689746856689453, 107.9788818359375, 5.229682922363281, -0.6911373138427734, 7.046211242675781, 3.549072265625, 93.11270141601562, 42.28825759887695, 85.06478881835938, 84.98497009277344, 32.381004333496094, 30.060375213623047, 99.88311767578125, 15.087814331054688, -59.573909759521484, 128.92459106445312, -2.876415252685547, 6.687030792236328, 15.043008804321289, 41.085052490234375, 5.275543212890625, 50.192657470703125, 61.7432861328125, -7.679088592529297, -0.15479278564453125, 45.59251403808594, -0.5286712646484375, 60.68983459472656, 105.50877380371094, -35.60923767089844, 16.103836059570312, 22.93366050720215, 101.74950408935547, 14.241859436035156, -79.66997528076172, 112.66088104248047, -8.530853271484375, 36.327362060546875, 80.64241027832031, 38.91059494018555, 46.9777946472168, 29.120849609375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000586.npy"}
|
||||
{"epoch": 0.8858654572940288, "step": 587, "batch_size": 64, "mean": 37.58617401123047, "std": 61.924442291259766, "min": -149.28829956054688, "p10": -26.475698471069336, "median": 41.820777893066406, "p90": 122.66763000488282, "max": 135.71307373046875, "pos_frac": 0.71875, "sample": [13.830371856689453, -90.62174987792969, 3.9465560913085938, 41.71851348876953, 72.43034362792969, -49.4053840637207, 51.09807205200195, -8.596389770507812, 104.40705871582031, 49.503570556640625, 5.523368835449219, 41.92304229736328, -26.713157653808594, -2.7624053955078125, -0.19293975830078125, 48.65607452392578, 62.043373107910156, -2.153278350830078, -0.84832763671875, 130.8833770751953, 91.90618896484375, 64.18161010742188, -0.6447219848632812, 126.85232543945312, 83.38421630859375, 27.928443908691406, 135.71307373046875, 3.9772109985351562, -149.28829956054688, -9.094558715820312, 82.50382232666016, 51.78388595581055, 52.46153259277344, 71.84239196777344, 17.088947296142578, 42.02928161621094, 65.4509506225586, -120.28141784667969, -55.126953125, 126.23031616210938, 112.48123931884766, -25.921627044677734, 101.85938262939453, 84.16992950439453, -2.7916431427001953, 83.87690734863281, 6.068778991699219, 105.23638153076172, 93.53327941894531, 74.37748718261719, 127.85395812988281, -71.173583984375, 89.2789077758789, 4.5240631103515625, 25.664939880371094, 123.89088439941406, 24.558319091796875, -7.51409912109375, -5.879924774169922, 19.731224060058594, 38.83338165283203, 126.97787475585938, 119.81336975097656, 2.4974822998046875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000587.npy"}
|
||||
{"epoch": 0.8873771730914588, "step": 588, "batch_size": 64, "mean": 38.221168518066406, "std": 54.1992073059082, "min": -94.5723876953125, "p10": -23.30777130126953, "median": 36.7147216796875, "p90": 113.554061126709, "max": 140.37081909179688, "pos_frac": 0.765625, "sample": [53.06315612792969, -53.36009216308594, -11.628395080566406, 52.36564636230469, -18.1392822265625, 100.93980407714844, 120.86380767822266, 21.693710327148438, 103.58382415771484, 75.64556884765625, 122.80679321289062, 31.47447395324707, 20.08525276184082, 44.151145935058594, 10.529365539550781, 69.20999145507812, 135.27947998046875, 140.37081909179688, 19.06739616394043, 120.83161926269531, 40.40868377685547, 67.62325286865234, -42.52432632446289, 126.28160095214844, 19.196779251098633, 62.14387512207031, 34.768028259277344, -13.819816589355469, 0.08191680908203125, -2.040283203125, 3.561410903930664, 58.899803161621094, -87.85479736328125, 43.23796081542969, 85.20893096923828, 115.58499145507812, -12.533170700073242, 61.67833709716797, 41.75262451171875, -18.330482482910156, -20.955482482910156, -39.20220947265625, 48.340049743652344, 98.85372924804688, 73.88755798339844, 31.299758911132812, 99.2386245727539, 2.282276153564453, 9.03558349609375, 88.79779052734375, 9.938133239746094, -14.972381591796875, 16.041297912597656, 28.753334045410156, 38.86749267578125, 31.51428985595703, -24.315895080566406, 38.661415100097656, 12.248260498046875, -32.13909912109375, 85.7345962524414, 107.84341430664062, 108.81522369384766, -94.5723876953125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000588.npy"}
|
||||
{"epoch": 0.8888888888888888, "step": 589, "batch_size": 64, "mean": 40.45512008666992, "std": 46.469024658203125, "min": -45.82087326049805, "p10": -9.195801162719725, "median": 32.86814498901367, "p90": 106.59288177490237, "max": 145.3842010498047, "pos_frac": 0.828125, "sample": [8.791093826293945, 44.16358184814453, -29.85354232788086, 122.02239227294922, 33.56037902832031, 5.879356384277344, 6.707252502441406, 2.440460205078125, -3.0050735473632812, 2.336681365966797, -4.1939697265625, 6.924821853637695, 145.3842010498047, 10.725788116455078, 55.36811828613281, 34.221641540527344, 46.88890838623047, 99.16860961914062, 112.15669250488281, 88.00554656982422, 31.903488159179688, 57.49784851074219, 90.62725067138672, -11.82855224609375, 91.71827697753906, 66.24046325683594, -6.872749328613281, 78.6012954711914, 90.00789642333984, 62.762882232666016, 52.481109619140625, -11.80282974243164, 102.67057037353516, 15.363494873046875, -3.2089691162109375, 41.28185272216797, 16.61803436279297, 15.784202575683594, 32.17591094970703, 128.2273712158203, 17.68560028076172, 40.568199157714844, 94.70243072509766, 55.380332946777344, 46.78639221191406, 3.850677490234375, 54.893211364746094, -45.82087326049805, -10.191394805908203, 5.426856994628906, -35.370391845703125, 69.18241119384766, 117.36843872070312, -41.367881774902344, 13.107215881347656, 108.27387237548828, 100.02259826660156, 43.296058654785156, 8.456939697265625, 7.35150146484375, 27.2568359375, 20.42818832397461, 143.41012573242188, 16.4883975982666], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000589.npy"}
|
||||
{"epoch": 0.890400604686319, "step": 590, "batch_size": 64, "mean": 43.478607177734375, "std": 53.66924285888672, "min": -93.50251770019531, "p10": -16.691635131835934, "median": 42.735862731933594, "p90": 117.3926498413086, "max": 133.1403350830078, "pos_frac": 0.78125, "sample": [94.35210418701172, 22.063560485839844, 57.88343048095703, -0.39534759521484375, 94.267578125, 105.92340087890625, 26.310760498046875, -93.50251770019531, 15.304054260253906, -17.64604949951172, 29.115982055664062, 5.744915008544922, 0.6787109375, 19.030189514160156, -76.45068359375, 76.9434814453125, 99.18494415283203, 100.56964111328125, -23.333507537841797, -4.2766571044921875, -14.464668273925781, 44.893829345703125, 4.116188049316406, 115.04865264892578, 127.59577941894531, 7.326377868652344, 0.3139190673828125, 54.55775451660156, 133.1403350830078, 25.987258911132812, -12.386062622070312, 105.4437255859375, 11.068964004516602, 43.133514404296875, 68.95417785644531, 80.07247924804688, 71.7883071899414, 18.831619262695312, 42.33821105957031, 13.489212036132812, 117.12443542480469, 48.02997589111328, 129.95687866210938, 89.66899108886719, 121.54151153564453, 76.93045043945312, 75.32321166992188, 47.64530944824219, -11.817001342773438, 61.31071472167969, -26.736328125, 117.50759887695312, 77.310546875, -1.729705810546875, 3.9004058837890625, 105.59908294677734, 48.686256408691406, 31.41571807861328, 127.64452362060547, -3.534423828125, 23.594913482666016, -18.838333129882812, 123.33428192138672, -54.25569534301758], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000590.npy"}
|
||||
{"epoch": 0.891912320483749, "step": 591, "batch_size": 64, "mean": 44.440185546875, "std": 62.184410095214844, "min": -109.57157897949219, "p10": -33.96871795654297, "median": 42.699527740478516, "p90": 123.4548210144043, "max": 136.4790802001953, "pos_frac": 0.75, "sample": [0.7382450103759766, 109.2550048828125, 78.47745513916016, 95.7183837890625, 78.89434051513672, 42.147727966308594, 124.5587158203125, 23.453506469726562, 53.191261291503906, 90.35020446777344, 16.973342895507812, -30.43115234375, 81.89236450195312, -72.29161834716797, 58.64442443847656, -19.617599487304688, 43.25132751464844, 31.149139404296875, 63.85723876953125, 13.884449005126953, 92.94497680664062, 37.18675231933594, 125.07443237304688, -68.87553405761719, 110.63296508789062, -109.57157897949219, 51.786712646484375, 50.19499206542969, 32.657806396484375, 39.22393798828125, -78.13992309570312, -65.99591827392578, 128.23532104492188, 136.4790802001953, 3.790966033935547, 118.48973083496094, -15.627490997314453, 24.25860595703125, 112.42510223388672, 13.023731231689453, -0.2310333251953125, -13.367198944091797, 30.485794067382812, 102.25080871582031, -2.2898120880126953, 134.14035034179688, -12.948110580444336, -73.08181762695312, -13.071479797363281, 98.69451904296875, -35.48481750488281, -4.952873229980469, 134.60775756835938, 40.08424377441406, 22.394100189208984, 120.87906646728516, 101.27477264404297, 62.97588348388672, 93.48066711425781, 71.54771423339844, 126.42231750488281, 111.57640075683594, 7.693780899047852, 118.79922485351562], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000591.npy"}
|
||||
{"epoch": 0.8934240362811792, "step": 592, "batch_size": 64, "mean": 36.251068115234375, "std": 52.486427307128906, "min": -52.27635192871094, "p10": -22.98765220642089, "median": 25.188501358032227, "p90": 125.69466857910157, "max": 146.77484130859375, "pos_frac": 0.765625, "sample": [48.89436340332031, 130.55609130859375, 108.95732116699219, 26.28367042541504, 98.42784881591797, 126.71968078613281, 117.66659545898438, 9.636211395263672, 2.8165969848632812, 30.71202850341797, 2.5211105346679688, 55.39203643798828, 26.419998168945312, 7.667854309082031, -15.381301879882812, 53.429466247558594, 124.57374572753906, 72.29570007324219, 11.1279296875, 5.119762420654297, -52.27635192871094, 92.56180572509766, 52.96805953979492, 126.17506408691406, 15.678546905517578, 72.51052856445312, 59.54340362548828, 128.006591796875, 92.15772247314453, 26.218978881835938, -50.47313690185547, 65.34294128417969, 0.7920684814453125, 66.81671142578125, 23.16448211669922, 39.01475143432617, -51.38142395019531, 49.161590576171875, -46.13646697998047, -12.550714492797852, -1.3123741149902344, 32.112709045410156, -0.7975234985351562, -26.16547393798828, -26.628753662109375, 12.406509399414062, -14.999420166015625, 132.0349884033203, 136.94259643554688, -4.200531005859375, 146.77484130859375, -1.4095230102539062, 7.529048919677734, 0.37029266357421875, 19.63604736328125, 2.322000503540039, -43.147308349609375, 9.596611022949219, -15.572734832763672, 24.158023834228516, 53.32563400268555, 10.895210266113281, 56.58403015136719, 68.48168182373047], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000592.npy"}
|
||||
{"epoch": 0.8949357520786092, "step": 593, "batch_size": 64, "mean": 34.608680725097656, "std": 59.20881652832031, "min": -92.92000579833984, "p10": -39.13551635742187, "median": 29.90060043334961, "p90": 111.00615005493165, "max": 134.62704467773438, "pos_frac": 0.734375, "sample": [60.02095031738281, 6.689788818359375, 96.7222900390625, -17.517181396484375, 108.4328842163086, 2.2362213134765625, 29.084922790527344, -41.440704345703125, 134.24952697753906, 33.09889221191406, 126.4836196899414, 13.14279556274414, 107.05642700195312, 108.25065612792969, 8.123817443847656, -15.163249969482422, 50.22187042236328, 53.982666015625, 100.989013671875, 64.67980194091797, 10.913640975952148, 30.662315368652344, 127.82015228271484, 101.00443267822266, 13.5101318359375, 10.849754333496094, -13.403131484985352, 96.26043701171875, 38.55732727050781, -91.1807861328125, 48.66664123535156, -66.74055480957031, -6.0055084228515625, 44.43534851074219, 54.02111053466797, 93.83677673339844, -2.6980361938476562, -33.756744384765625, 100.69276428222656, -43.076454162597656, -79.93598937988281, -11.137447357177734, 128.51644897460938, -86.53023529052734, 118.114501953125, 82.2529296875, 112.10897827148438, 14.349348068237305, -8.568927764892578, 134.62704467773438, 10.65997314453125, -2.173738479614258, 74.48416900634766, 47.743534088134766, 35.532188415527344, -92.92000579833984, 12.902963638305664, 103.39668273925781, 1.0211372375488281, -6.415802001953125, 0.8525848388671875, 29.138885498046875, 7.33074951171875, 45.891021728515625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000593.npy"}
|
||||
{"epoch": 0.8964474678760394, "step": 594, "batch_size": 64, "mean": 31.851009368896484, "std": 49.5014762878418, "min": -108.37396240234375, "p10": -24.80753936767578, "median": 27.798049926757812, "p90": 93.2167236328125, "max": 129.65750122070312, "pos_frac": 0.6875, "sample": [16.867473602294922, 48.77983093261719, -19.409076690673828, 30.23389434814453, 47.77302932739258, 84.0075454711914, 35.036354064941406, 22.412200927734375, 13.706268310546875, -11.207954406738281, -24.43560028076172, -4.673549652099609, -17.297988891601562, 29.783065795898438, 33.67212677001953, -0.7121353149414062, -15.65478515625, -1.4143791198730469, 79.47871398925781, 2.587982177734375, 86.2900390625, 26.07646942138672, 1.7451534271240234, 82.03367614746094, 128.26771545410156, 68.39007568359375, 68.47843933105469, -24.966941833496094, 93.48692321777344, -1.7791671752929688, 63.89044952392578, 21.27008056640625, 127.46600341796875, 99.01872253417969, 4.3774261474609375, -27.179428100585938, -26.170799255371094, 66.75230407714844, 89.96015167236328, -34.607086181640625, 92.58625793457031, -3.6197032928466797, 74.01527404785156, 111.62130737304688, 1.4496536254882812, 21.802358627319336, 38.758697509765625, 105.11341857910156, -8.82765007019043, 91.19082641601562, 53.92253112792969, -108.37396240234375, 75.4759521484375, 129.65750122070312, 60.94171142578125, 54.55583572387695, 5.126026153564453, 29.300933837890625, 26.295166015625, -53.00519561767578, 44.28803253173828, -53.311744689941406, -3.4460525512695312, -9.385833740234375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000594.npy"}
|
||||
{"epoch": 0.8979591836734694, "step": 595, "batch_size": 64, "mean": 44.979042053222656, "std": 59.21854019165039, "min": -123.4466781616211, "p10": -6.646003913879395, "median": 38.710044860839844, "p90": 121.94169158935547, "max": 137.1292266845703, "pos_frac": 0.84375, "sample": [-6.665672302246094, 46.345069885253906, 118.61282348632812, 46.59386444091797, 61.026161193847656, 20.901948928833008, 50.56670379638672, 22.240596771240234, -5.899871826171875, 120.02836608886719, 110.02652740478516, 11.489089965820312, 31.81351089477539, -41.003543853759766, 124.13642883300781, 22.564804077148438, 7.647642135620117, 0.43206787109375, 114.25520324707031, 32.378273010253906, 35.97930145263672, 58.955020904541016, 137.1292266845703, 16.85900115966797, 104.65557861328125, 17.29596710205078, 41.44078826904297, 50.76105499267578, 81.56501770019531, 95.57250213623047, 3.7045669555664062, -45.81874084472656, 26.309799194335938, 6.549530029296875, 106.45582580566406, 117.45359802246094, 122.76168823242188, -74.97257995605469, -123.4466781616211, 135.33477783203125, 128.6376495361328, 122.83744812011719, 33.09661865234375, 50.25270080566406, -61.02997589111328, 114.19544982910156, 66.71549987792969, 81.1261215209961, 0.1746063232421875, 7.388675689697266, -6.60011100769043, 104.38452911376953, 58.95606231689453, 5.596809387207031, -1.9681396484375, 28.048233032226562, 125.23725891113281, 97.78012084960938, 6.49462890625, 0.224212646484375, 3.5838871002197266, -88.15170288085938, 105.19358825683594, 94.4494400024414], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000595.npy"}
|
||||
{"epoch": 0.8994708994708994, "step": 596, "batch_size": 64, "mean": 24.183456420898438, "std": 53.03311538696289, "min": -103.34027099609375, "p10": -26.355972862243647, "median": 15.62765884399414, "p90": 118.46272354125976, "max": 136.171875, "pos_frac": 0.625, "sample": [-42.86448669433594, 36.392127990722656, 128.85874938964844, 26.96428680419922, 49.03192138671875, -29.519485473632812, -3.320873260498047, 9.348007202148438, -9.752685546875, 113.43171691894531, -55.44981384277344, 47.980960845947266, -75.57720947265625, -4.8571014404296875, 125.22743225097656, 123.43340301513672, 73.01525115966797, 16.623271942138672, -62.08209991455078, 26.942567825317383, -4.543212890625, 106.23841857910156, 30.261734008789062, 62.39684295654297, 123.03837585449219, 118.48921203613281, 28.710350036621094, -18.800275802612305, 4.262275695800781, 12.22784423828125, -10.755008697509766, 18.37615966796875, 15.263519287109375, -7.648199081420898, 36.44001007080078, -13.03824234008789, 15.991798400878906, 136.171875, -7.660709381103516, -18.762779235839844, 30.40283966064453, 12.982002258300781, -6.760112762451172, -103.34027099609375, -5.2217864990234375, 34.00423049926758, -3.73046875, -5.363506317138672, 118.40091705322266, 31.563720703125, 80.87806701660156, 2.225921630859375, -9.873130798339844, 25.457054138183594, -17.32379913330078, 119.1822509765625, 2.053802490234375, 66.72376251220703, -39.19854736328125, 28.380815505981445, 14.366573333740234, 52.41648864746094, 18.002899169921875, -18.974443435668945], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000596.npy"}
|
||||
{"epoch": 0.9009826152683296, "step": 597, "batch_size": 64, "mean": 34.191078186035156, "std": 53.50724792480469, "min": -66.39578247070312, "p10": -29.90546245574951, "median": 18.974717140197754, "p90": 122.02435760498048, "max": 135.55984497070312, "pos_frac": 0.71875, "sample": [-5.654041290283203, -40.362754821777344, -31.842586517333984, 2.7693023681640625, 124.59271240234375, 2.9185104370117188, 22.954410552978516, -36.75434112548828, -25.866819381713867, 130.53428649902344, 1.5589866638183594, 83.9734878540039, 4.391946792602539, 64.431396484375, -28.116378784179688, 7.433872222900391, 135.55984497070312, 109.84248352050781, 106.78998565673828, -25.2193546295166, 63.48561096191406, 7.5457000732421875, -31.038009643554688, -0.540008544921875, 77.3898696899414, 81.43135070800781, 19.502607345581055, -13.12370491027832, 92.46904754638672, -0.09329414367675781, 52.05059814453125, 14.680992126464844, 126.58187103271484, 20.43675994873047, 37.779808044433594, 18.446826934814453, -3.7017822265625, 56.759857177734375, 23.1004581451416, 120.29402160644531, -7.725711822509766, -40.521339416503906, 4.427391052246094, 1.176706314086914, -30.672212600708008, 0.39972686767578125, -2.293973922729492, 5.754053115844727, 61.54857635498047, 20.831993103027344, 132.87228393554688, 6.382610321044922, 103.84127044677734, -5.2156524658203125, -66.39578247070312, 66.50064086914062, 3.0724220275878906, 45.4078369140625, 129.11495971679688, 122.76593017578125, 21.886831283569336, 100.67945098876953, 78.86023712158203, 68.13725280761719], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000597.npy"}
|
||||
{"epoch": 0.9024943310657596, "step": 598, "batch_size": 64, "mean": 22.515592575073242, "std": 53.28733444213867, "min": -114.30091857910156, "p10": -42.80532684326172, "median": 13.171669006347656, "p90": 109.06584320068364, "max": 131.45469665527344, "pos_frac": 0.640625, "sample": [2.814260482788086, -54.76646423339844, 9.342964172363281, 9.082923889160156, 34.86084747314453, -1.0896797180175781, 13.467010498046875, 126.06956481933594, 76.59522247314453, -114.30091857910156, 13.527151107788086, 38.673736572265625, 11.251483917236328, -64.57365417480469, 115.47865295410156, -21.930320739746094, 131.45469665527344, -41.6282958984375, 89.90939331054688, 48.574283599853516, -25.941675186157227, -72.86115264892578, -24.325515747070312, 51.16334533691406, 28.718978881835938, 17.6248779296875, 25.349979400634766, 13.424819946289062, -9.83102035522461, -29.785938262939453, -14.329370498657227, -9.611551284790039, 27.745649337768555, 55.21269226074219, 82.43795776367188, -6.321895599365234, 21.569564819335938, -7.172229766845703, 47.930633544921875, 2.3065185546875, 78.64601135253906, 12.91851806640625, -0.5867137908935547, 127.91043853759766, 3.9150009155273438, 67.45692443847656, 13.609176635742188, -1.1260147094726562, 81.00257110595703, 43.67583465576172, 2.663726806640625, -47.30120086669922, -3.1959457397460938, 10.354522705078125, 29.797571182250977, -43.30976867675781, 119.80712890625, 65.07121276855469, -0.2845458984375, -1.4098281860351562, 123.54531860351562, 114.8100357055664, -48.75224304199219, 95.66272735595703], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000598.npy"}
|
||||
{"epoch": 0.9040060468631897, "step": 599, "batch_size": 64, "mean": 26.235389709472656, "std": 53.59726333618164, "min": -102.4260025024414, "p10": -43.68916854858398, "median": 21.636171340942383, "p90": 102.33823852539065, "max": 138.91770935058594, "pos_frac": 0.671875, "sample": [-19.72613525390625, -13.684709548950195, 41.29827117919922, 35.065284729003906, -35.026588439941406, 59.39384460449219, -2.7994041442871094, 116.2952880859375, -47.401702880859375, -59.23345947265625, -22.65912628173828, -10.69389533996582, 1.5373687744140625, 53.394474029541016, 22.55516815185547, -27.49382781982422, -48.65232849121094, -11.689517974853516, 3.465738296508789, -56.38551330566406, 105.57441711425781, 64.5587387084961, 26.329681396484375, -102.4260025024414, 2.5491104125976562, 20.9696044921875, 3.9010772705078125, 124.19046020507812, -53.39911651611328, 8.631362915039062, 124.87185668945312, 93.95402526855469, -24.83209991455078, -14.894874572753906, 36.5168571472168, 15.331153869628906, 23.61897087097168, -2.9764862060546875, 14.495965957641602, 94.47528076171875, 94.78715515136719, 1.4000091552734375, 17.13908576965332, -15.454015731811523, 113.07256317138672, 50.85824966430664, 89.50421142578125, -13.893798828125, -56.17461013793945, 22.302738189697266, 124.88009643554688, 70.3585205078125, 44.00373077392578, 61.332889556884766, 86.43333435058594, 34.139366149902344, 138.91770935058594, 60.70233917236328, 60.03181457519531, 2.868886947631836, -6.7284698486328125, 66.14842987060547, 69.73439025878906, 23.701141357421875], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000599.npy"}
|
||||
{"epoch": 0.9055177626606198, "step": 600, "batch_size": 64, "mean": 29.23377799987793, "std": 53.65687942504883, "min": -104.14949035644531, "p10": -39.13670845031737, "median": 20.393115997314453, "p90": 108.36401901245117, "max": 134.86215209960938, "pos_frac": 0.71875, "sample": [82.84587097167969, 35.13787078857422, -15.691953659057617, 4.761631011962891, 117.52664184570312, 24.484329223632812, 18.30819320678711, -45.06536102294922, 113.47295379638672, 127.89430236816406, -20.859596252441406, 19.121051788330078, 87.23770904541016, 89.11955261230469, -4.6962890625, -5.753240585327148, -47.77934265136719, -50.317413330078125, -8.50341796875, 74.31623840332031, 4.505558013916016, 45.648773193359375, 108.05863952636719, 85.95242309570312, -26.913040161132812, 113.21476745605469, 34.675819396972656, 108.4948959350586, -2.3653182983398438, 92.97965240478516, 96.03643798828125, 18.415489196777344, -104.14949035644531, -10.027107238769531, 49.18797302246094, -44.375423431396484, -26.377273559570312, 65.35408782958984, 1.5811424255371094, 60.402557373046875, 1.1057510375976562, 31.010711669921875, 22.53968048095703, 83.2762451171875, -10.355123519897461, 16.762939453125, -46.60423278808594, -26.308027267456055, -84.85541534423828, 46.403011322021484, 0.7548294067382812, 49.603485107421875, 15.330915451049805, 13.781646728515625, 43.67488098144531, 3.132568359375, 19.348190307617188, 38.06098556518555, 64.72418212890625, 126.40556335449219, 2.99261474609375, 21.43804168701172, 38.015899658203125, 134.86215209960938], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000600.npy"}
|
||||
{"epoch": 0.9070294784580499, "step": 601, "batch_size": 64, "mean": 41.00107192993164, "std": 57.342529296875, "min": -91.73149108886719, "p10": -22.571045684814443, "median": 22.109451293945312, "p90": 113.61121292114258, "max": 144.6077880859375, "pos_frac": 0.734375, "sample": [82.52254486083984, 20.512290954589844, 99.2593765258789, -37.74467849731445, 113.54564666748047, 87.88716125488281, 144.6077880859375, -6.816131591796875, 19.93682861328125, -10.03701400756836, 139.45968627929688, -91.73149108886719, 115.74168395996094, 19.961633682250977, -9.740541458129883, 23.70661163330078, 95.55121612548828, 99.14300537109375, -27.618309020996094, 73.77599334716797, 118.66323852539062, 6.322452545166016, -0.29296875, 113.63931274414062, 130.30572509765625, 54.35079574584961, -3.3744258880615234, 10.807804107666016, 83.8927230834961, 15.739501953125, 11.074310302734375, 76.92939758300781, 50.0369873046875, -3.6501083374023438, -0.8094539642333984, 98.82109069824219, 4.740425109863281, 45.00520324707031, 26.15636444091797, 9.383096694946289, -46.50776672363281, 37.071807861328125, 4.555980682373047, 102.79898071289062, 4.212860107421875, 55.89678955078125, 19.520915985107422, -10.794097900390625, -29.85532569885254, 19.386791229248047, 18.00629425048828, 129.04981994628906, -4.465448379516602, 1.0079059600830078, 85.25624084472656, 90.92266082763672, 110.2938232421875, 89.10953521728516, 112.79265594482422, -0.099945068359375, -72.09770965576172, 101.47444152832031, 88.67213439941406, -81.80567932128906], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000601.npy"}
|
||||
{"epoch": 0.90854119425548, "step": 602, "batch_size": 64, "mean": 38.61279296875, "std": 50.101036071777344, "min": -93.80927276611328, "p10": -11.899263000488281, "median": 24.83704948425293, "p90": 111.7910743713379, "max": 138.76419067382812, "pos_frac": 0.78125, "sample": [-11.468299865722656, 77.87274932861328, 111.36243438720703, 138.76419067382812, 35.022125244140625, 24.615875244140625, 33.08689880371094, 4.712491989135742, 79.62051391601562, 14.60195541381836, 128.18463134765625, -15.785079956054688, 114.77156066894531, 113.09996795654297, -57.35611343383789, 107.46207427978516, -13.965499877929688, 104.43206787109375, 19.910537719726562, 3.8850059509277344, 3.6596832275390625, -0.5709991455078125, 3.4252471923828125, -4.017490386962891, 3.0273590087890625, 78.19107818603516, 21.45426368713379, 82.91268157958984, 25.180095672607422, 48.14707946777344, -2.419109344482422, 25.058223724365234, 65.35411834716797, 26.679058074951172, 24.174449920654297, 3.5020484924316406, 11.271305084228516, 19.63671875, 110.45602416992188, 33.545005798339844, -0.3545875549316406, 73.79365539550781, 83.3177261352539, 110.67697143554688, -12.11956787109375, -93.80927276611328, 63.592506408691406, 15.313728332519531, 23.121379852294922, 5.8953857421875, -3.3548927307128906, 3.9143218994140625, 98.6647720336914, 41.43829345703125, -1.4164962768554688, 111.97477722167969, 41.979209899902344, 35.78310012817383, 127.07609558105469, 119.76956176757812, -38.159523010253906, 74.27886962890625, 10.429885864257812, -12.083961486816406], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000602.npy"}
|
||||
{"epoch": 0.91005291005291, "step": 603, "batch_size": 64, "mean": 38.59679412841797, "std": 53.32042694091797, "min": -88.46334838867188, "p10": -27.701049041748046, "median": 32.66318321228027, "p90": 111.59701080322267, "max": 147.94635009765625, "pos_frac": 0.765625, "sample": [88.18411254882812, 80.67439270019531, 83.30429077148438, -26.77538299560547, -26.123071670532227, 6.8521881103515625, 37.64727020263672, 28.84124755859375, -6.475860595703125, 40.24433898925781, 117.28013610839844, 15.529556274414062, 31.17749786376953, -32.177406311035156, 17.27159881591797, 67.17449188232422, -34.98014450073242, 29.00713348388672, 3.5782394409179688, 147.94635009765625, 35.29612731933594, 17.790252685546875, 108.37185668945312, 19.038734436035156, -1.5190277099609375, 34.741180419921875, 84.0615234375, -30.599227905273438, 96.89927673339844, -79.75819396972656, 9.468719482421875, 9.22308349609375, 50.433021545410156, 71.17497253417969, -38.71942901611328, 88.61245727539062, 69.64485168457031, 15.24216079711914, -0.8051738739013672, 123.0147933959961, 51.85798645019531, 82.2375259399414, 41.47314453125, 54.92444610595703, 92.27510833740234, 107.85204315185547, 10.5054931640625, -21.985010147094727, 34.148868560791016, 0.30430030822753906, -9.088638305664062, 8.939638137817383, 93.68277740478516, 112.91844177246094, 72.32320404052734, -1.1885528564453125, 26.018383026123047, -88.46334838867188, 113.32674407958984, 108.513671875, 120.22923278808594, 1.4282474517822266, 136.26609802246094, -28.097763061523438], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000603.npy"}
|
||||
{"epoch": 0.9115646258503401, "step": 604, "batch_size": 64, "mean": 35.52631759643555, "std": 58.36043930053711, "min": -124.87803649902344, "p10": -25.651498413085935, "median": 27.690218925476074, "p90": 122.81401977539063, "max": 145.13832092285156, "pos_frac": 0.734375, "sample": [22.279449462890625, 93.22420501708984, -2.8360443115234375, -55.36351013183594, 33.58457565307617, 63.087249755859375, -8.362682342529297, 6.464532852172852, 123.55183410644531, -23.967315673828125, 3.803863525390625, -7.588203430175781, 1.9759578704833984, 85.40717315673828, -16.994171142578125, 4.138526916503906, 115.05952453613281, 26.598501205444336, 38.20513916015625, 85.59142303466797, -87.88562774658203, -8.479400634765625, -11.938446044921875, -46.564971923828125, 33.835838317871094, 3.2768115997314453, 7.619035720825195, 43.7646484375, 28.781936645507812, -124.87803649902344, 47.01071548461914, -8.055931091308594, 42.59510803222656, -12.6549072265625, 121.09245300292969, -8.510765075683594, 4.498291015625, 47.493896484375, 134.64285278320312, 93.35710906982422, 67.9903564453125, 130.00900268554688, 145.13832092285156, 15.143722534179688, 81.74283599853516, 107.13546752929688, 21.73619842529297, 113.1049575805664, 130.77487182617188, -26.373291015625, 123.95790100097656, 10.536300659179688, 50.219581604003906, 5.066642761230469, 75.68321990966797, 32.66858673095703, 69.99569702148438, 7.015291213989258, 2.1840381622314453, 135.4791717529297, 80.11046600341797, -49.896331787109375, -28.484390258789062, 85.88497924804688], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000604.npy"}
|
||||
{"epoch": 0.9130763416477702, "step": 605, "batch_size": 64, "mean": 34.185150146484375, "std": 55.78700256347656, "min": -98.99556732177734, "p10": -38.57764053344725, "median": 32.944908142089844, "p90": 106.31343460083008, "max": 139.64947509765625, "pos_frac": 0.78125, "sample": [42.91608428955078, 31.385498046875, -67.51354217529297, 139.64947509765625, 39.142738342285156, 105.3095932006836, 18.709945678710938, 61.563568115234375, 65.11396026611328, -44.823116302490234, 67.28740692138672, -15.139312744140625, 7.031867980957031, -18.57160186767578, 17.877582550048828, -56.84303665161133, 48.63188171386719, 95.92487335205078, 63.05971908569336, 39.458126068115234, -1.0223159790039062, 4.970634460449219, 51.689002990722656, -95.52401733398438, 1.6345367431640625, 70.37835693359375, 123.32613372802734, 93.56085205078125, 18.030059814453125, 10.230766296386719, 65.09001922607422, 7.855781555175781, -75.482177734375, 114.95381164550781, 0.310272216796875, 94.1793212890625, 108.92091369628906, 48.09136962890625, 19.000091552734375, 81.85287475585938, -15.741348266601562, 7.956611633300781, 89.84764862060547, 71.49540710449219, 132.57098388671875, 8.486213684082031, 95.11099243164062, 106.74365234375, 85.90166473388672, -14.96783447265625, 131.1436767578125, 45.12823486328125, 70.43650817871094, 28.54271697998047, -51.70643615722656, 10.049890518188477, 14.230371475219727, -98.99556732177734, 12.798187255859375, -24.004863739013672, 3.3225154876708984, 34.50431823730469, -6.670295715332031, 69.44816589355469], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000605.npy"}
|
||||
{"epoch": 0.9145880574452003, "step": 606, "batch_size": 64, "mean": 38.93024444580078, "std": 52.17251968383789, "min": -70.87691497802734, "p10": -27.741882514953613, "median": 29.390426635742188, "p90": 113.20435791015626, "max": 137.8748321533203, "pos_frac": 0.703125, "sample": [-28.141502380371094, 100.03125, 8.202323913574219, 121.947021484375, -34.95214080810547, 2.7273406982421875, 110.05096435546875, -6.4940032958984375, 0.13701820373535156, 89.99817657470703, 115.63026428222656, -35.17955017089844, 64.9696044921875, 82.58419799804688, 81.95235443115234, -70.87691497802734, 72.69844818115234, 24.7130126953125, -13.194276809692383, 53.69621276855469, 125.85507202148438, 33.49494934082031, 24.234310150146484, 80.76428985595703, -29.323585510253906, -22.65888214111328, 97.64013671875, -0.7051067352294922, 42.96760940551758, 86.45194244384766, 39.905418395996094, -7.84063720703125, -49.447227478027344, 119.54517364501953, 37.56031799316406, 3.430379867553711, 17.633544921875, -6.572549819946289, 70.96864318847656, 22.496917724609375, 108.83634948730469, 127.5583724975586, 52.12047576904297, 91.56724548339844, 13.310111999511719, 137.8748321533203, -4.7831878662109375, 13.024993896484375, -27.083051681518555, -4.375812530517578, 114.00459289550781, -11.118331909179688, -28.02423858642578, 62.631439208984375, -8.908607482910156, 62.53087615966797, 25.285903930664062, 13.885196685791016, 60.858551025390625, 111.33714294433594, 52.21636962890625, 78.81839752197266, 24.351638793945312, -1.2842350006103516], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000606.npy"}
|
||||
{"epoch": 0.9160997732426304, "step": 607, "batch_size": 64, "mean": 37.99995803833008, "std": 54.35776138305664, "min": -102.16242980957031, "p10": -24.192006683349607, "median": 26.834747314453125, "p90": 112.41958007812501, "max": 126.77513122558594, "pos_frac": 0.734375, "sample": [9.995445251464844, 10.104095458984375, -38.5499267578125, 28.334228515625, 57.03453826904297, 21.359169006347656, 77.02889251708984, 21.377296447753906, -102.16242980957031, -24.391868591308594, 6.345394134521484, 2.9221572875976562, 112.95016479492188, 116.68019104003906, 25.33526611328125, 5.416534423828125, -7.623435974121094, 11.029943466186523, 30.951759338378906, 122.83705139160156, 28.354904174804688, 116.95089721679688, 92.92410278320312, 87.73202514648438, 58.72834777832031, 16.157081604003906, -12.624221801757812, 85.81767272949219, -14.467605590820312, -4.2353515625, 111.18154907226562, 53.58934020996094, 85.21240234375, 81.48336029052734, -47.14862823486328, 121.7501449584961, 126.77513122558594, 0.3507652282714844, 85.40408325195312, -4.28558349609375, -30.226364135742188, 6.586902618408203, 7.799375534057617, 87.6579818725586, 68.41734313964844, -23.725662231445312, -11.181543350219727, 86.5405502319336, 121.50003051757812, 101.09517669677734, -3.345958709716797, -3.2356719970703125, 71.35852813720703, 106.7555923461914, 8.148426055908203, 104.39031982421875, -1.3403434753417969, 19.925186157226562, 63.718963623046875, 108.41674041748047, 34.37351989746094, -63.15312957763672, -52.67676544189453, 67.57316589355469], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000607.npy"}
|
||||
{"epoch": 0.9176114890400605, "step": 608, "batch_size": 64, "mean": 28.957908630371094, "std": 50.852821350097656, "min": -82.60035705566406, "p10": -20.77267436981201, "median": 12.128786087036133, "p90": 96.14005126953126, "max": 129.88270568847656, "pos_frac": 0.765625, "sample": [65.70338439941406, -27.662155151367188, 3.1396484375, 93.94654846191406, 0.021627426147460938, -2.367767333984375, 83.01888275146484, -0.7954940795898438, 19.77753448486328, -56.2474365234375, 60.458030700683594, 22.72606658935547, 78.98597717285156, 82.38824462890625, 27.878908157348633, 72.18763732910156, 127.65800476074219, -82.60035705566406, -21.59288215637207, 1.5755081176757812, 8.544731140136719, -12.444477081298828, -0.5155143737792969, 129.88270568847656, 97.08012390136719, 8.992015838623047, -4.437400817871094, 36.22435760498047, 10.984344482421875, 119.49403381347656, 20.65521240234375, -23.43354034423828, -79.10040283203125, 59.45648193359375, 1.8572845458984375, 86.89045715332031, -18.858856201171875, -13.648857116699219, 9.547012329101562, 5.7471771240234375, 21.72228240966797, 12.205272674560547, 7.78240966796875, 6.296730041503906, 18.87896728515625, 61.467071533203125, -17.989410400390625, 107.47111511230469, 129.36965942382812, 84.85858917236328, 34.1490478515625, 7.583595275878906, 21.377418518066406, 85.75932312011719, 1.2425880432128906, 89.80790710449219, -76.95875549316406, 122.98084259033203, 7.61541748046875, 0.5134353637695312, 12.052299499511719, 80.01362609863281, 9.433670043945312, 34.556304931640625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000608.npy"}
|
||||
{"epoch": 0.9191232048374905, "step": 609, "batch_size": 64, "mean": 42.56886291503906, "std": 47.018985748291016, "min": -103.69699096679688, "p10": -14.030491638183594, "median": 51.66179275512695, "p90": 99.45670623779297, "max": 128.0591583251953, "pos_frac": 0.796875, "sample": [118.09202575683594, 51.87353515625, 61.68042755126953, 7.551536560058594, 11.4652099609375, 13.742759704589844, 95.36518859863281, 52.25982666015625, -0.471221923828125, -6.509422302246094, -13.918243408203125, 66.1237564086914, -8.800201416015625, 31.977764129638672, 95.30061340332031, 86.89584350585938, 128.0591583251953, 60.61881637573242, 52.547332763671875, 126.18476104736328, -31.964637756347656, 68.50482177734375, 58.919342041015625, 60.493988037109375, -20.49327850341797, 118.2745590209961, 54.11151123046875, 78.51091766357422, -33.134613037109375, 59.85948181152344, 34.257850646972656, -0.11589813232421875, 119.10899353027344, 39.37052917480469, 13.665008544921875, 65.41075134277344, 45.29551315307617, 0.37117576599121094, 97.46145629882812, 58.45912170410156, 38.930999755859375, -20.921375274658203, 13.380195617675781, 116.84772491455078, 100.31181335449219, -103.69699096679688, 21.810176849365234, 67.02731323242188, 47.38966369628906, -6.21843147277832, 18.910938262939453, 51.450050354003906, 58.29804992675781, 25.890993118286133, 9.656251907348633, 2.3199615478515625, 85.44456481933594, 60.607093811035156, 96.00755310058594, -39.9775390625, 95.64205932617188, 60.80620574951172, 22.162506103515625, -14.078598022460938], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000609.npy"}
|
||||
{"epoch": 0.9206349206349206, "step": 610, "batch_size": 64, "mean": 39.86622619628906, "std": 53.45286560058594, "min": -68.2522201538086, "p10": -21.541852951049805, "median": 31.833555221557617, "p90": 122.55570602416994, "max": 145.5753173828125, "pos_frac": 0.734375, "sample": [43.43995666503906, 133.02101135253906, 75.0876235961914, 81.91360473632812, 16.296493530273438, 99.51033020019531, 19.708168029785156, 8.238256454467773, -23.053665161132812, 10.829086303710938, 31.40886688232422, -68.2522201538086, 56.62288284301758, 32.10829162597656, -0.5385589599609375, 49.555335998535156, 118.21176147460938, -14.433769226074219, -44.439178466796875, -9.421012878417969, 22.54290008544922, -26.82105255126953, 16.42150115966797, 54.5142936706543, -16.4056396484375, -11.9334716796875, 1.2319183349609375, 131.48495483398438, 38.64280700683594, 124.41739654541016, 33.788818359375, 145.5753173828125, 111.584716796875, 104.9918441772461, -22.157150268554688, 88.33241271972656, 110.98131561279297, 3.355772018432617, 91.29828643798828, 16.20184326171875, 20.000640869140625, 41.96940994262695, 31.91362762451172, -46.888702392578125, 100.9247055053711, -3.492206573486328, 13.012908935546875, -20.106159210205078, 115.59611511230469, 128.07034301757812, 39.36286926269531, 42.5247688293457, 29.3564453125, 0.7264080047607422, -1.1994190216064453, -2.6649246215820312, -40.91658020019531, 63.678306579589844, 127.6796875, 31.753482818603516, 48.72538757324219, -10.104965209960938, 130.2926025390625, 77.36158752441406], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000610.npy"}
|
||||
{"epoch": 0.9221466364323507, "step": 611, "batch_size": 64, "mean": 30.310468673706055, "std": 47.584938049316406, "min": -82.79765319824219, "p10": -23.535417175292967, "median": 25.198819160461426, "p90": 98.50867996215821, "max": 135.29937744140625, "pos_frac": 0.671875, "sample": [61.61077880859375, 58.931739807128906, -8.94012451171875, 34.607357025146484, 97.97776794433594, -0.15331268310546875, -3.582489013671875, 57.30741882324219, -3.2326011657714844, 20.991744995117188, -7.815402984619141, 115.879150390625, 29.405893325805664, 52.52344512939453, -3.0498504638671875, -11.446712493896484, -40.81126022338867, 100.41670227050781, -24.716712951660156, 135.29937744140625, 82.2587661743164, 82.9093246459961, 42.06791687011719, -15.129558563232422, -27.95834732055664, -9.193702697753906, 101.66954040527344, 38.606964111328125, 47.13672637939453, 78.23794555664062, 5.355049133300781, -20.77906036376953, -9.32421875, 126.87206268310547, -11.034454345703125, 13.864974975585938, -82.79765319824219, 71.66419982910156, 12.060539245605469, 38.768890380859375, 53.560577392578125, 49.603546142578125, -58.73853302001953, 70.49981689453125, 9.534652709960938, 18.82219696044922, 55.88783645629883, 17.3321533203125, 47.2583122253418, 66.45285034179688, 77.23158264160156, 98.73621368408203, 11.632394790649414, 43.49445343017578, 8.217416763305664, 35.560821533203125, 106.13461303710938, -64.3862075805664, 82.1434097290039, 7.275505065917969, -28.684194564819336, -3.4287071228027344, -5.372762680053711, 14.64312744140625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000611.npy"}
|
||||
{"epoch": 0.9236583522297808, "step": 612, "batch_size": 64, "mean": 38.89222717285156, "std": 55.51833724975586, "min": -124.68136596679688, "p10": -19.302223014831544, "median": 35.21982955932617, "p90": 114.44961395263675, "max": 147.06900024414062, "pos_frac": 0.796875, "sample": [-0.0055694580078125, -22.621551513671875, 40.341552734375, -12.232742309570312, -49.78874206542969, 5.491996765136719, 92.36106872558594, 76.19041442871094, -1.519989013671875, 5.002647399902344, 93.17577362060547, 28.994155883789062, 55.25318145751953, 39.73722839355469, 72.14983367919922, 22.070098876953125, 30.720584869384766, 19.700439453125, 48.70294189453125, 47.475608825683594, 98.17535400390625, -68.82567596435547, 66.00548553466797, 30.48749542236328, 14.344688415527344, 21.496707916259766, 128.1800079345703, -19.50485610961914, -16.28847885131836, 5.434013366699219, 33.484840393066406, 93.38671875, 0.3917045593261719, 2.567758560180664, -12.82025146484375, 117.64085388183594, 20.058433532714844, 53.06566619873047, 147.06900024414062, -90.46450805664062, 130.6913604736328, 22.321880340576172, 82.34413146972656, -51.60612487792969, 131.3558349609375, 57.72251892089844, 66.32766723632812, -18.82941246032715, 95.35031127929688, 135.48631286621094, 48.11249542236328, 4.363182067871094, 135.38644409179688, 36.35389709472656, 32.36426544189453, 107.00338745117188, 47.77925109863281, 24.377267837524414, 34.972747802734375, -124.68136596679688, 35.46691131591797, 84.25941467285156, 69.80619812011719, 87.28994750976562], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000612.npy"}
|
||||
{"epoch": 0.9251700680272109, "step": 613, "batch_size": 64, "mean": 46.94964599609375, "std": 52.55015563964844, "min": -100.7337875366211, "p10": -7.73803081512451, "median": 36.787532806396484, "p90": 124.86111221313476, "max": 137.80661010742188, "pos_frac": 0.796875, "sample": [-5.090396881103516, 53.72875213623047, -6.14520263671875, 21.65802001953125, 91.14441680908203, 70.66309356689453, 67.0509033203125, 36.87129211425781, -30.153892517089844, 0.23912620544433594, 22.44184112548828, 39.949405670166016, -100.7337875366211, 61.42189025878906, 79.17212677001953, 24.79742431640625, 9.34547233581543, 16.52301597595215, 128.218505859375, 80.72505950927734, 12.156600952148438, -8.420671463012695, -31.080718994140625, 14.659547805786133, 124.95255279541016, 55.33363342285156, 98.52874755859375, 23.528844833374023, 4.628486633300781, 21.44194793701172, 137.80661010742188, 49.12324523925781, 69.15966796875, -1.7888412475585938, -0.17198944091796875, 35.20211410522461, -40.58092498779297, 72.74117279052734, 65.66791534423828, 127.94420623779297, 118.44434356689453, 120.80654907226562, 36.703773498535156, 124.0694580078125, 117.67010498046875, -3.8480377197265625, 128.1378173828125, 14.252981185913086, -11.209991455078125, 110.17144775390625, -0.7810516357421875, 6.6529541015625, 104.33201599121094, 128.8128662109375, 52.69261169433594, 55.34944152832031, 29.485187530517578, 124.64775085449219, 71.3001937866211, 125.64413452148438, 26.783348083496094, 35.593135833740234, 25.579490661621094, -29.1722412109375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000613.npy"}
|
||||
{"epoch": 0.926681783824641, "step": 614, "batch_size": 64, "mean": 38.80487060546875, "std": 47.93080520629883, "min": -97.05477905273438, "p10": -16.23686065673828, "median": 42.03079414367676, "p90": 94.7982734680176, "max": 122.896728515625, "pos_frac": 0.78125, "sample": [97.19541931152344, 77.38445281982422, 19.226421356201172, 72.60123443603516, 61.21758270263672, 90.60640716552734, 30.359760284423828, 0.92462158203125, 41.422298431396484, 29.08639144897461, 100.668212890625, 70.75553131103516, 85.23643493652344, 89.0521011352539, 41.44639587402344, 49.762168884277344, 89.6624755859375, 78.82642364501953, 74.64325714111328, 61.282249450683594, -22.81681251525879, 14.971343994140625, -15.537651062011719, -82.33096313476562, 116.68881225585938, -17.14995574951172, 2.953897476196289, -60.75498962402344, 87.38427734375, 15.791534423828125, -16.536521911621094, 4.694438934326172, -4.851551055908203, 88.7370376586914, 15.44356918334961, 17.596471786499023, 96.59478759765625, 54.73686218261719, -97.05477905273438, 51.24665069580078, -11.210182189941406, 122.896728515625, 66.20598602294922, 82.17610168457031, 44.42963409423828, 66.05459594726562, 21.467491149902344, 26.27642822265625, 86.6943359375, 30.24346923828125, 17.362743377685547, 27.612014770507812, -11.116622924804688, -1.97137451171875, -11.51980209350586, 114.84825897216797, 53.68882751464844, -11.992408752441406, 79.48884582519531, 46.39739990234375, 24.921049118041992, 42.61519241333008, -35.31004333496094, 102.08667755126953], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000614.npy"}
|
||||
{"epoch": 0.9281934996220711, "step": 615, "batch_size": 64, "mean": 38.415618896484375, "std": 60.81072998046875, "min": -104.46458435058594, "p10": -22.91350936889648, "median": 28.914786338806152, "p90": 124.72429122924805, "max": 138.28915405273438, "pos_frac": 0.765625, "sample": [21.74981689453125, 46.87092590332031, 94.03974151611328, 25.931358337402344, 48.73420715332031, 101.06249237060547, 10.616523742675781, -104.46458435058594, 124.31803894042969, 55.72196960449219, -71.19440460205078, 41.25074005126953, -65.3785400390625, 41.47322082519531, 24.182771682739258, -44.56298828125, 74.7529067993164, 131.6306610107422, 32.770179748535156, -3.218994140625, 108.67493438720703, -1.6511611938476562, 116.6788101196289, -3.523212432861328, 86.4469985961914, 19.0421142578125, -20.315452575683594, 43.71620178222656, 2.230712890625, 129.29917907714844, 106.82241821289062, 26.915800094604492, 6.717460632324219, -0.5723838806152344, 16.72797393798828, 119.9249038696289, 52.2861328125, -12.73712158203125, 1.5539093017578125, 135.2064208984375, 124.89839935302734, 128.83236694335938, 108.29020690917969, 30.913772583007812, 87.57533264160156, 9.403079986572266, 12.456260681152344, -24.026962280273438, 37.99412536621094, 125.70555114746094, 13.70355224609375, 38.80645751953125, -6.8080291748046875, 68.49958801269531, -2.5494232177734375, 138.28915405273438, 21.864242553710938, 67.23881530761719, 123.73231506347656, 12.635383605957031, 20.255176544189453, 1.2033538818359375, -102.44074249267578, -97.60302734375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000615.npy"}
|
||||
{"epoch": 0.9297052154195011, "step": 616, "batch_size": 64, "mean": 34.402427673339844, "std": 52.51332473754883, "min": -88.8834228515625, "p10": -16.64931545257568, "median": 18.642322540283203, "p90": 114.39511108398442, "max": 140.15185546875, "pos_frac": 0.703125, "sample": [123.29668426513672, 2.7746200561523438, 14.89453125, -6.1177520751953125, 83.75971984863281, -14.487958908081055, 140.15185546875, 53.10145568847656, 3.0414466857910156, 51.042747497558594, 35.876564025878906, 3.297393798828125, 25.39488983154297, -85.23949432373047, 27.563915252685547, -38.322452545166016, 0.69268798828125, -11.103458404541016, 6.639625549316406, 128.94320678710938, 18.200531005859375, 127.1304931640625, -8.089170455932617, -11.31871223449707, 130.253662109375, -2.3543262481689453, -1.9311809539794922, -12.471717834472656, -2.957265853881836, 119.00785827636719, 85.81523132324219, 8.401046752929688, 48.953487396240234, 88.8970947265625, 128.3894500732422, 91.5655517578125, -19.011669158935547, 62.64596176147461, -1.945220947265625, 103.63203430175781, 92.79161834716797, 51.60000228881836, 17.119802474975586, 68.99440002441406, 94.97928619384766, 55.38092041015625, -4.094060897827148, 7.544384002685547, -88.8834228515625, 4.997001647949219, -17.575611114501953, -42.21897888183594, 39.53814697265625, 35.82588195800781, 35.54468536376953, 88.22163391113281, 18.297767639160156, 10.426029205322266, 63.191707611083984, -0.12621307373046875, 91.28925323486328, 81.7809066772461, 18.98687744140625, -19.869985580444336], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000616.npy"}
|
||||
{"epoch": 0.9312169312169312, "step": 617, "batch_size": 64, "mean": 28.701948165893555, "std": 50.68163299560547, "min": -89.03002166748047, "p10": -28.552845001220703, "median": 21.170490264892578, "p90": 98.92219314575196, "max": 144.4329833984375, "pos_frac": 0.78125, "sample": [20.543907165527344, 28.23311996459961, -28.907943725585938, 35.53827667236328, 107.63182067871094, 65.74276733398438, 36.98492431640625, 100.4529800415039, 5.515903472900391, -41.73242950439453, 17.142852783203125, 16.96282196044922, -7.5109710693359375, 16.19584083557129, 3.303539276123047, 76.91474914550781, 9.248603820800781, 68.16223907470703, 29.072647094726562, 7.568450927734375, 144.4329833984375, 0.020854949951171875, -0.7337989807128906, 92.52938079833984, 50.15028381347656, -1.5401458740234375, -0.5679054260253906, 91.97833251953125, 17.420440673828125, 25.178730010986328, 28.217735290527344, -4.272136688232422, 21.589385986328125, 108.23040771484375, 47.59307861328125, 139.3095703125, 15.606605529785156, -76.6914291381836, -27.724281311035156, 95.35035705566406, 54.03925323486328, 9.037765502929688, -35.34815979003906, 4.261837005615234, 119.12342071533203, 87.68486785888672, -89.03002166748047, -81.99713134765625, -86.31853485107422, 9.205162048339844, 37.416168212890625, 55.67167663574219, 41.41557312011719, 4.281606674194336, 120.45964050292969, 69.93395233154297, 36.47795867919922, 8.776420593261719, 40.28021240234375, 16.12152099609375, 20.75159454345703, 30.580101013183594, 38.9869499206543, -8.029792785644531], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000617.npy"}
|
||||
{"epoch": 0.9327286470143613, "step": 618, "batch_size": 64, "mean": 34.153472900390625, "std": 48.3674201965332, "min": -69.07225036621094, "p10": -10.243715286254881, "median": 21.560229301452637, "p90": 112.1611183166504, "max": 141.0592041015625, "pos_frac": 0.8125, "sample": [136.27098083496094, 108.09414672851562, 18.575061798095703, 85.42893981933594, 10.200981140136719, -2.3563098907470703, -35.62546920776367, -10.711830139160156, 51.488739013671875, 0.21121597290039062, 5.259288787841797, 53.03521728515625, 80.0260009765625, 28.363006591796875, 133.75274658203125, 2.4965667724609375, 20.690879821777344, 1.7364978790283203, 141.0592041015625, 71.86018371582031, 19.46263885498047, 125.90086364746094, 6.440835952758789, 24.078887939453125, -5.436393737792969, 120.88985443115234, 22.42957878112793, 28.184165954589844, 72.51786804199219, 23.87626075744629, 3.0852508544921875, 7.661445617675781, 106.96416473388672, 11.319202423095703, 50.03916931152344, 5.754425048828125, 51.065956115722656, -6.857765197753906, 55.75740432739258, 25.04088592529297, -44.300994873046875, 1.0282745361328125, -9.151447296142578, -61.669647216796875, 35.73826599121094, -21.256546020507812, 29.00093650817871, -69.07225036621094, 55.85881042480469, 113.90410614013672, 16.85338592529297, 79.50747680664062, 15.352859497070312, 11.203556060791016, 1.8469696044921875, 44.87593078613281, -2.2036895751953125, 50.99620819091797, 54.965145111083984, 97.92195892333984, 13.725357055664062, 124.1156005859375, -18.46759033203125, 17.018775939941406], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000618.npy"}
|
||||
{"epoch": 0.9342403628117913, "step": 619, "batch_size": 64, "mean": 36.216026306152344, "std": 49.890499114990234, "min": -104.73892211914062, "p10": -16.506523513793944, "median": 32.995073318481445, "p90": 113.7026870727539, "max": 131.82510375976562, "pos_frac": 0.78125, "sample": [53.626949310302734, -10.853561401367188, 2.028820037841797, 9.455368041992188, 125.96717834472656, 112.76094055175781, 120.899658203125, -24.531707763671875, -42.36442565917969, 128.02708435058594, 107.6957778930664, 36.3572883605957, 54.34458923339844, 2.6697444915771484, 37.963958740234375, -16.22480010986328, 35.30016326904297, 57.252044677734375, -4.602655410766602, 1.4931640625, -53.813209533691406, 125.29291534423828, 18.118934631347656, -20.939865112304688, -13.69554328918457, 75.7580337524414, 58.54206085205078, 58.19184875488281, 1.3211956024169922, -7.340545654296875, 17.72870635986328, -104.73892211914062, -16.627262115478516, 69.22964477539062, 88.3958511352539, 85.03085327148438, 4.748500823974609, 123.6966552734375, 131.82510375976562, -35.57829284667969, 48.050140380859375, 114.10629272460938, 40.54878234863281, -3.2558460235595703, 63.53004455566406, 84.5484848022461, 30.689983367919922, 73.31549835205078, 52.155677795410156, 21.623279571533203, 24.8995361328125, 38.45035171508789, 40.0560302734375, 7.441680908203125, 75.93487548828125, 7.874946594238281, 18.268569946289062, 79.61904907226562, 0.5620975494384766, 17.58014488220215, -0.6910400390625, 59.963043212890625, 4.5418701171875, 25.600116729736328], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000619.npy"}
|
||||
{"epoch": 0.9357520786092215, "step": 620, "batch_size": 64, "mean": 33.48244857788086, "std": 52.38579177856445, "min": -66.93679809570312, "p10": -27.2784553527832, "median": 23.536295890808105, "p90": 110.08427734375, "max": 134.44740295410156, "pos_frac": 0.75, "sample": [31.07128143310547, -43.981292724609375, -27.43798828125, 122.7529296875, -49.202239990234375, 4.150203704833984, 18.795364379882812, 99.05152893066406, 13.922019958496094, 62.871360778808594, 76.50265502929688, 104.05259704589844, 52.83294677734375, 35.0701904296875, 14.409065246582031, 110.112548828125, 49.539466857910156, 29.355499267578125, -19.409461975097656, 1.989990234375, -49.42034912109375, 5.921871185302734, -66.93679809570312, 130.67105102539062, -21.34527587890625, 113.29658508300781, 0.15982818603515625, -2.67181396484375, 132.2503662109375, -26.906211853027344, 94.01594543457031, 98.85831451416016, -12.043964385986328, -15.035566329956055, -22.941173553466797, 14.900382995605469, 5.2174224853515625, 40.654701232910156, 110.018310546875, 4.8922882080078125, 72.15277099609375, 9.254104614257812, 61.72063446044922, 39.818695068359375, 15.27719497680664, -48.7789306640625, 134.44740295410156, 121.84506225585938, -10.064870834350586, 55.22666931152344, 0.8081188201904297, 103.50436401367188, 80.04409790039062, -5.664701461791992, -45.19236373901367, 17.234777450561523, 82.82124328613281, 8.225982666015625, 34.96946334838867, 11.603668212890625, 55.25531005859375, 56.586280822753906, 43.49993133544922, 28.2772274017334], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000620.npy"}
|
||||
{"epoch": 0.9372637944066515, "step": 621, "batch_size": 64, "mean": 28.226398468017578, "std": 51.69369125366211, "min": -97.06898498535156, "p10": -21.115314865112303, "median": 16.564269065856934, "p90": 95.27756729125977, "max": 132.41290283203125, "pos_frac": 0.703125, "sample": [5.607219696044922, 10.661758422851562, 8.326515197753906, -4.911098480224609, -39.908966064453125, -72.12142181396484, 68.38980865478516, 78.63587951660156, -16.784442901611328, -3.8294429779052734, 28.056339263916016, 74.16361999511719, 120.65312194824219, 14.045499801635742, 125.87579345703125, 11.193572998046875, 43.28173828125, 24.3521728515625, 57.676631927490234, 89.07525634765625, 33.910240173339844, 12.32208251953125, 117.42298889160156, -97.06898498535156, 126.36093139648438, 95.9136734008789, 57.52278137207031, -74.421142578125, 10.867935180664062, -20.91733169555664, 70.92034912109375, 51.05039978027344, 31.968231201171875, 81.15652465820312, 67.63591766357422, -3.3967132568359375, 42.42144012451172, 2.4412097930908203, 55.99283981323242, -12.091522216796875, -20.388399124145508, 8.752771377563477, 103.76539611816406, -4.0601806640625, 10.076986312866211, -17.323734283447266, 21.63530731201172, -6.504859924316406, -21.200164794921875, 19.083038330078125, 0.8371105194091797, 85.04204559326172, 40.165771484375, 12.258581161499023, 74.5261001586914, -16.23040771484375, -6.094718933105469, 132.41290283203125, -72.60159301757812, 65.16648864746094, -37.420928955078125, 5.461151123046875, 93.79331970214844, 62.886077880859375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000621.npy"}
|
||||
{"epoch": 0.9387755102040817, "step": 622, "batch_size": 64, "mean": 31.139747619628906, "std": 48.262046813964844, "min": -81.16468811035156, "p10": -23.749233055114733, "median": 26.079692840576172, "p90": 96.17155075073242, "max": 138.67108154296875, "pos_frac": 0.765625, "sample": [77.82343292236328, 9.977100372314453, -0.21947479248046875, -81.16468811035156, 3.9595565795898438, 24.76288604736328, 2.7381439208984375, 29.71044921875, -38.71776580810547, -1.3208770751953125, 138.67108154296875, 73.02183532714844, 0.7540721893310547, -37.65915298461914, 95.55842590332031, 61.93250274658203, 100.86439514160156, 74.31446838378906, 34.39985656738281, 111.04588317871094, 66.98534393310547, 88.7858657836914, -10.455648422241211, -4.057538986206055, 126.64096069335938, 34.351768493652344, 27.396499633789062, 67.86986541748047, 0.4275245666503906, 39.83719253540039, 93.8056640625, -71.4598388671875, 29.357091903686523, 5.754032135009766, -29.15435028076172, 4.942926406860352, 86.00547790527344, 55.948158264160156, 61.44444274902344, 10.314216613769531, 96.43431854248047, 9.0252685546875, 62.4603157043457, -11.137292861938477, 62.948856353759766, 83.33688354492188, 22.4033203125, -59.111000061035156, 1.1221199035644531, 29.56000518798828, 100.88619232177734, 5.271476745605469, 1.8071441650390625, -9.00469970703125, 3.8813648223876953, -7.636222839355469, 101.26419067382812, 36.13794708251953, -41.525352478027344, -6.091423034667969, 43.18729782104492, 11.305997848510742, 72.45938873291016, 18.7659912109375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000622.npy"}
|
||||
{"epoch": 0.9402872260015117, "step": 623, "batch_size": 64, "mean": 35.991477966308594, "std": 47.09578323364258, "min": -57.7636833190918, "p10": -13.510034179687498, "median": 28.58784008026123, "p90": 102.80043182373046, "max": 130.17193603515625, "pos_frac": 0.75, "sample": [89.9374008178711, 22.219825744628906, -6.416858673095703, -57.7636833190918, 30.232559204101562, 102.94696044921875, 61.00770568847656, 75.54449462890625, 60.820831298828125, 63.425331115722656, -29.96636199951172, 29.168615341186523, 103.12830352783203, 4.835899353027344, -6.3372650146484375, 127.7463150024414, -10.367496490478516, 93.71929168701172, 53.67945098876953, 51.75220489501953, 54.6358642578125, 16.76585578918457, 88.45893096923828, -16.15142059326172, -5.681877136230469, 56.563377380371094, 79.19012451171875, 102.80195617675781, -46.73480224609375, 24.9130859375, 75.49807739257812, 79.48954772949219, 55.02565002441406, 30.787628173828125, -14.23150634765625, -46.671661376953125, 102.796875, 2.81500244140625, 75.5051040649414, 104.2271957397461, 118.7578353881836, 130.17193603515625, 40.19330978393555, 0.5958919525146484, -8.196823120117188, -2.000089645385742, -11.82659912109375, 26.922704696655273, -3.318134307861328, 62.931785583496094, 23.946685791015625, 100.1666259765625, 28.007064819335938, 29.588897705078125, 0.17547607421875, 18.293655395507812, 17.855667114257812, 0.1429290771484375, 15.214752197265625, 6.638303756713867, 86.93599700927734, -9.574684143066406, 5.672454833984375, -53.15745544433594], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000623.npy"}
|
||||
{"epoch": 0.9417989417989417, "step": 624, "batch_size": 64, "mean": 41.240196228027344, "std": 54.85030746459961, "min": -81.7491683959961, "p10": -15.88107509613037, "median": 33.11533069610596, "p90": 113.06727142333985, "max": 134.00729370117188, "pos_frac": 0.6875, "sample": [-62.13374328613281, -15.963142395019531, 84.07408142089844, 62.06028747558594, 127.99449157714844, 113.50178527832031, -2.7448787689208984, 10.117599487304688, -5.937835693359375, -12.97300910949707, 28.984962463378906, 21.084266662597656, 49.22220993041992, 70.53617095947266, 100.549072265625, 75.63052368164062, 18.2973575592041, 100.53814697265625, 112.05340576171875, 34.51506805419922, -16.569385528564453, -8.92629623413086, -15.689584732055664, 115.76988983154297, -0.18643569946289062, 67.9200439453125, 35.632423400878906, -1.9470596313476562, 24.673599243164062, 41.85065460205078, 108.43051147460938, 111.46575927734375, 87.41841125488281, 134.00729370117188, 11.265445709228516, -34.217777252197266, 127.8326187133789, 101.16285705566406, 133.85250854492188, 92.60396575927734, 31.715593338012695, 90.77216339111328, 70.66736602783203, 20.74053955078125, -21.168655395507812, 10.605316162109375, 133.41075134277344, -3.1472434997558594, -11.015932083129883, 74.90093994140625, 57.02978515625, 2.813161849975586, -13.122161865234375, 1.5686817169189453, -81.7491683959961, 73.72252655029297, -5.0187530517578125, -7.204139709472656, 104.56024169921875, -9.936779022216797, -56.29967498779297, 2.33416748046875, 40.50080108642578, 106.93659210205078], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000624.npy"}
|
||||
{"epoch": 0.9433106575963719, "step": 625, "batch_size": 64, "mean": 29.410263061523438, "std": 54.16386795043945, "min": -91.95753479003906, "p10": -35.300876998901366, "median": 17.14177703857422, "p90": 108.63313293457034, "max": 136.25660705566406, "pos_frac": 0.703125, "sample": [26.488298416137695, 73.49945831298828, 17.377517700195312, 43.85227966308594, -4.604803085327148, 75.22373962402344, 48.674530029296875, 42.717018127441406, 9.258642196655273, 133.25192260742188, 30.817127227783203, -22.12531280517578, 90.21031951904297, -11.465587615966797, 119.57891845703125, 16.906036376953125, 136.25660705566406, 12.329879760742188, -33.95536422729492, -4.666290283203125, 72.21098327636719, 95.30238342285156, 8.131706237792969, -53.1441650390625, -6.3622589111328125, 33.394287109375, 16.722618103027344, 62.76702117919922, -1.9615287780761719, 16.0167236328125, -5.972309112548828, 79.42616271972656, 116.33200073242188, -4.311180114746094, 58.2305793762207, -3.5730018615722656, -13.232107162475586, 133.64837646484375, 5.823463439941406, 4.9979705810546875, 66.16732788085938, 102.8031997680664, 111.13167572021484, 53.175537109375, 45.26765441894531, 5.423088073730469, -89.88704681396484, -60.13185119628906, -39.91779708862305, 79.90541076660156, 4.392608642578125, -81.08953857421875, -91.95753479003906, 123.07411193847656, 75.75237274169922, 4.0320281982421875, 53.77374267578125, 10.774063110351562, 44.79877471923828, -35.877525329589844, 20.7825927734375, 62.47346878051758, 11.543699264526367, -8.225959777832031], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000625.npy"}
|
||||
{"epoch": 0.9448223733938019, "step": 626, "batch_size": 64, "mean": 40.53667068481445, "std": 63.10452651977539, "min": -86.75921630859375, "p10": -45.61121826171875, "median": 28.78613567352295, "p90": 123.16969451904298, "max": 155.65859985351562, "pos_frac": 0.75, "sample": [113.06929779052734, 11.779098510742188, 101.93096160888672, -26.822158813476562, 140.6253662109375, 118.10297393798828, 130.86712646484375, -34.92060852050781, 54.24262237548828, 118.57086944580078, -14.348814010620117, 1.0080757141113281, 18.882003784179688, 1.1541824340820312, -45.484230041503906, 123.62092590332031, 64.3990707397461, -86.75921630859375, 131.10479736328125, 19.582727432250977, -76.47257232666016, 0.7647705078125, 44.48126220703125, 125.97319793701172, 16.204898834228516, 23.45842170715332, 68.72026824951172, 26.6751708984375, -16.553146362304688, 155.65859985351562, 54.241973876953125, -27.864255905151367, 112.85404205322266, 97.99822998046875, 21.973648071289062, -79.34744262695312, -11.271163940429688, 78.12644958496094, -0.18966102600097656, 69.0672836303711, 20.233718872070312, 13.819955825805664, 122.1168212890625, 9.822921752929688, -21.01068115234375, 11.52232551574707, 30.8971004486084, 115.80406951904297, 68.80462646484375, 78.86697387695312, 93.29383850097656, 1.8321189880371094, -55.75608444213867, 89.15045166015625, -47.75239562988281, 92.17790222167969, 12.116878509521484, 117.01768493652344, 104.04275512695312, 43.240333557128906, 130.5566864013672, -50.607093811035156, 34.716514587402344, -45.66564178466797], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000626.npy"}
|
||||
{"epoch": 0.9463340891912321, "step": 627, "batch_size": 64, "mean": 43.53168869018555, "std": 56.94757080078125, "min": -110.88862609863281, "p10": -15.678403854370117, "median": 39.41187286376953, "p90": 123.33006439208985, "max": 134.76934814453125, "pos_frac": 0.796875, "sample": [54.05649185180664, 20.86336898803711, 69.1875, 35.469573974609375, 52.931854248046875, 70.634521484375, 125.0231704711914, 134.76934814453125, 123.7552490234375, 17.26214599609375, -35.804534912109375, 117.61073303222656, 130.14776611328125, 79.34880065917969, 49.24083709716797, 7.066070556640625, 11.550521850585938, 14.801673889160156, 131.2591552734375, 35.295013427734375, 115.52876281738281, 51.85240936279297, 13.009323120117188, 11.79949951171875, 0.5593719482421875, -57.27729797363281, 122.33796691894531, -16.00359344482422, 84.05938720703125, -11.508060455322266, 1.5663013458251953, 124.79777526855469, 72.12124633789062, 2.6677322387695312, 60.387237548828125, 8.25836181640625, -8.867835998535156, 43.35417175292969, -5.878129959106445, 114.23883056640625, 111.200439453125, 53.00902557373047, -45.44865036010742, -14.919628143310547, -23.376365661621094, -12.924514770507812, -3.616607666015625, 104.7576675415039, 21.077178955078125, 81.49162292480469, 44.21089172363281, -110.88862609863281, 31.162639617919922, 71.84857940673828, 119.46905517578125, 32.85540771484375, 13.759780883789062, -90.82696533203125, 30.227371215820312, 17.195716857910156, 63.988037109375, 99.3390884399414, 88.30488586425781, 132.65948486328125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000627.npy"}
|
||||
{"epoch": 0.9478458049886621, "step": 628, "batch_size": 64, "mean": 47.73688507080078, "std": 61.86348342895508, "min": -128.64739990234375, "p10": -23.2353889465332, "median": 53.869754791259766, "p90": 118.4222480773926, "max": 136.04144287109375, "pos_frac": 0.796875, "sample": [134.33212280273438, 48.61370849609375, 58.364837646484375, 55.81078338623047, -25.221458435058594, -13.529022216796875, 101.95458984375, 7.30902099609375, 104.06935119628906, 91.53582763671875, 136.04144287109375, 126.51390838623047, 84.30522155761719, 33.43254470825195, 39.08155059814453, 48.97545623779297, -128.64739990234375, -95.94007110595703, 51.92872619628906, 11.945743560791016, 119.65847778320312, 50.44554138183594, 11.711877822875977, -66.26515197753906, 106.09817504882812, 109.68032836914062, 74.19259643554688, 99.83197021484375, 33.16246032714844, 121.61609649658203, 87.39912414550781, 109.7138671875, 28.8553466796875, 130.08485412597656, 29.984981536865234, 92.16851806640625, -13.333053588867188, 0.44623374938964844, 101.77753448486328, -57.83604431152344, 79.12054443359375, 57.49279022216797, 35.68915557861328, 91.16105651855469, 35.92399597167969, 70.32545471191406, 115.53771209716797, 66.62694549560547, 108.73612213134766, 107.43998718261719, 20.875829696655273, -15.634872436523438, 4.485633850097656, -101.42170715332031, 38.911319732666016, 128.2555694580078, 1.2505950927734375, 109.98194122314453, -6.22540283203125, 66.1788558959961, -18.601226806640625, 77.66019439697266, -4.4344482421875, -54.44599151611328], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000628.npy"}
|
||||
{"epoch": 0.9493575207860923, "step": 629, "batch_size": 64, "mean": 35.36682891845703, "std": 46.58755874633789, "min": -60.85564422607422, "p10": -10.466263580322265, "median": 24.15665054321289, "p90": 117.36232299804688, "max": 135.10299682617188, "pos_frac": 0.78125, "sample": [-10.551963806152344, 51.33661651611328, -60.85564422607422, 57.55250549316406, 36.02937316894531, 20.45794677734375, -0.273773193359375, 58.445823669433594, -39.97203826904297, 38.87452697753906, 12.612686157226562, 8.863960266113281, 3.8604373931884766, 132.5395050048828, 1.4576053619384766, -27.79245376586914, 15.812582015991211, 44.56816101074219, 9.89410400390625, 28.574127197265625, -10.26629638671875, -9.552513122558594, -4.267492294311523, 29.079952239990234, -18.942203521728516, 25.80450439453125, 110.63880920410156, 57.43377685546875, -5.019430160522461, 118.01082611083984, 85.03164672851562, -14.033760070800781, 53.119873046875, 65.58686065673828, 15.963577270507812, 23.126220703125, 70.64219665527344, 6.5133056640625, 11.619434356689453, 0.8321056365966797, 4.864044189453125, 30.683528900146484, 125.41803741455078, 5.734317779541016, -4.211112976074219, 30.121627807617188, 13.227432250976562, 84.10420227050781, 72.45063018798828, 65.91434478759766, 32.43214416503906, 22.504337310791016, 127.26239013671875, 15.546770095825195, 26.415008544921875, 124.32949829101562, -5.834388732910156, 135.10299682617188, 117.39814758300781, 117.27873229980469, 6.9160003662109375, 108.86209106445312, -10.986282348632812, 25.18708038330078], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000629.npy"}
|
||||
{"epoch": 0.9508692365835223, "step": 630, "batch_size": 64, "mean": 40.13007354736328, "std": 52.31532669067383, "min": -72.54608917236328, "p10": -21.228526306152343, "median": 27.109594345092773, "p90": 111.87816162109375, "max": 144.26840209960938, "pos_frac": 0.734375, "sample": [-34.83407974243164, 56.777252197265625, 4.058513641357422, -0.2071075439453125, 73.6317138671875, 129.80027770996094, 71.49288940429688, -2.171680450439453, 140.4650421142578, 96.90817260742188, -2.732616424560547, 93.17782592773438, -6.6649169921875, 108.89725494384766, 6.079551696777344, 89.20569610595703, -41.392005920410156, 59.01777648925781, 84.00067901611328, -3.262969970703125, 15.339935302734375, -2.2835464477539062, 37.46764373779297, 111.45085144042969, 108.55712890625, 17.97301483154297, -41.221458435058594, 27.267093658447266, -72.54608917236328, 55.3529052734375, -0.6923294067382812, 19.734817504882812, 2.9116134643554688, 131.1956787109375, 21.294876098632812, 16.741104125976562, 112.06129455566406, 5.7388458251953125, 27.91278076171875, 46.692832946777344, 144.26840209960938, 26.95209503173828, 104.82445526123047, -25.3123779296875, 117.43334197998047, 80.1646957397461, 69.55523681640625, 15.199325561523438, 42.057594299316406, -18.9324951171875, 14.354179382324219, -26.5914306640625, -8.362041473388672, 103.97100830078125, 74.59466552734375, -16.44658088684082, 28.010520935058594, 3.944141387939453, 77.79634094238281, 114.90546417236328, -22.212539672851562, 79.82261657714844, 20.729721069335938, 4.402187347412109], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000630.npy"}
|
||||
{"epoch": 0.9523809523809523, "step": 631, "batch_size": 64, "mean": 46.13783264160156, "std": 60.93382263183594, "min": -93.83490753173828, "p10": -23.438788223266602, "median": 34.32177543640137, "p90": 126.38572387695312, "max": 154.30950927734375, "pos_frac": 0.75, "sample": [99.14936065673828, 28.432891845703125, 6.421783447265625, 114.45738220214844, 66.6087875366211, 126.14669036865234, 118.22280883789062, 71.24266815185547, -12.297718048095703, 6.2827606201171875, -66.33006286621094, -22.485973358154297, 71.02264404296875, -14.942398071289062, -14.546466827392578, 39.283782958984375, -17.754535675048828, 9.841808319091797, 47.41002655029297, 135.9718017578125, 128.22271728515625, -23.847137451171875, 128.5755157470703, 121.68284606933594, 93.93325805664062, 62.482017517089844, 15.016311645507812, 137.29537963867188, -6.685033798217773, 52.25926971435547, 24.17068099975586, 18.686485290527344, -66.35185241699219, 63.933624267578125, 24.867900848388672, 125.49491882324219, 107.44841003417969, 12.636627197265625, 154.30950927734375, 110.88825988769531, 29.61988067626953, -47.17786407470703, -6.8089141845703125, 71.03673553466797, -9.753570556640625, -93.83490753173828, 12.807306289672852, 20.824373245239258, -37.536285400390625, 30.4271240234375, -24.54810333251953, 91.13540649414062, 9.689605712890625, 127.143798828125, -20.033321380615234, 117.32125091552734, 15.68124008178711, 126.48816680908203, 38.216426849365234, 122.32391357421875, 58.299774169921875, 122.50859069824219, 22.019573211669922, 99.81309509277344], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000631.npy"}
|
||||
{"epoch": 0.9538926681783825, "step": 632, "batch_size": 64, "mean": 47.93510437011719, "std": 51.508052825927734, "min": -75.52972412109375, "p10": -5.785988616943359, "median": 42.09863090515137, "p90": 115.35327606201172, "max": 131.52346801757812, "pos_frac": 0.796875, "sample": [80.9946517944336, 59.3515625, 46.519317626953125, 4.291316986083984, 57.14716720581055, 115.43122863769531, -0.913818359375, 85.01899719238281, -3.2447471618652344, 116.66092681884766, 41.020904541015625, -3.343902587890625, 31.6070556640625, -1.736114501953125, -42.466087341308594, 106.57295227050781, 101.16943359375, -0.337432861328125, 115.17138671875, -43.59832000732422, 17.534942626953125, 34.65238571166992, -54.68485641479492, 22.889678955078125, 17.442672729492188, 72.46168518066406, 14.997222900390625, 111.5901870727539, 59.747615814208984, 41.12559127807617, 107.70864868164062, 129.6158447265625, 38.64677429199219, -8.255149841308594, 67.8057861328125, 37.39441680908203, 84.4760971069336, 55.5953369140625, 84.3752670288086, 55.290855407714844, 100.75528717041016, 107.60193634033203, 11.075180053710938, -5.812431335449219, 8.96677017211914, 125.10118103027344, 72.9913330078125, 27.960927963256836, 117.0471420288086, 85.7302017211914, 130.72613525390625, -75.52972412109375, 43.07167053222656, 96.23887634277344, -5.7242889404296875, 21.30571746826172, 131.52346801757812, 106.05452728271484, 91.48594665527344, 29.354433059692383, 16.510765075683594, 18.66030502319336, -56.97117233276367, 13.994915008544922], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000632.npy"}
|
||||
{"epoch": 0.9554043839758125, "step": 633, "batch_size": 64, "mean": 35.42958450317383, "std": 54.11516189575195, "min": -77.19869995117188, "p10": -35.452907180786134, "median": 23.189002990722656, "p90": 115.47148208618167, "max": 138.21270751953125, "pos_frac": 0.734375, "sample": [-0.29955101013183594, 34.91255187988281, 24.31689453125, 55.94683837890625, 68.76283264160156, 19.250747680664062, 48.07487487792969, 68.12516021728516, 10.216367721557617, 23.166831970214844, 1.4041519165039062, 1.0234546661376953, 132.94168090820312, 23.21117401123047, -37.0164680480957, 96.27271270751953, -1.04046630859375, 38.72853088378906, 18.588172912597656, -8.73583984375, 5.774662017822266, -1.4950027465820312, 94.12493133544922, 43.601158142089844, 79.83232116699219, 49.0875244140625, -35.58621597290039, 11.5054931640625, 56.206268310546875, 83.46368408203125, 78.25823974609375, -4.641872406005859, 109.54255676269531, -4.385406494140625, -3.813304901123047, 17.601261138916016, -42.68353271484375, -14.367195129394531, -35.14185333251953, -6.705787658691406, -42.10261535644531, 20.147785186767578, 1.5343055725097656, 138.21270751953125, 34.55359649658203, 15.583467483520508, 26.406211853027344, -58.1351318359375, 29.748992919921875, 123.18610382080078, 117.77571105957031, 88.79472351074219, 104.23821258544922, 110.0949478149414, 11.812885284423828, -60.545928955078125, 6.437526702880859, 2.1831016540527344, 91.01763916015625, 129.5855712890625, 137.21798706054688, -77.19869995117188, 133.64027404785156, 85.27546691894531], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000633.npy"}
|
||||
{"epoch": 0.9569160997732427, "step": 634, "batch_size": 64, "mean": 19.047256469726562, "std": 52.9875602722168, "min": -85.69461059570312, "p10": -41.62980804443359, "median": 5.546304702758789, "p90": 100.6961563110352, "max": 119.73419189453125, "pos_frac": 0.609375, "sample": [15.934467315673828, -54.257171630859375, 2.8010635375976562, 115.76264190673828, 40.439239501953125, -23.389142990112305, -11.670345306396484, -8.22327995300293, 2.7568740844726562, -3.8120269775390625, 16.14012336730957, 14.940780639648438, -4.841808319091797, -7.276985168457031, -7.065391540527344, 20.01972770690918, 61.1164436340332, 47.60127639770508, -26.46576690673828, -28.032150268554688, -36.20716857910156, 0.8938407897949219, 91.4326171875, -84.07427215576172, 117.09242248535156, -2.725421905517578, 105.06292724609375, 86.56067657470703, 104.66624450683594, -60.19453048706055, 17.764938354492188, 31.23455238342285, 1.1389198303222656, -22.58224105834961, 111.76069641113281, 14.31202507019043, -34.989540100097656, -51.36883544921875, 119.73419189453125, 89.80961608886719, 60.03077697753906, -58.34716796875, -43.95379638671875, 22.113666534423828, -2.87371826171875, -85.69461059570312, -0.3677940368652344, 3.6156063079833984, 119.55328369140625, 32.51872634887695, -28.40478515625, -25.7177734375, 40.558258056640625, 6.374652862548828, 15.946416854858398, 3.9072227478027344, 91.2386703491211, 84.2088623046875, 89.77603912353516, 73.2922134399414, -28.028404235839844, 6.597055435180664, 4.71795654296875, 76.16281127929688], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000634.npy"}
|
||||
{"epoch": 0.9584278155706727, "step": 635, "batch_size": 64, "mean": 26.969741821289062, "std": 60.382774353027344, "min": -85.85005187988281, "p10": -57.63698730468748, "median": 11.407913208007812, "p90": 113.1339912414551, "max": 137.89309692382812, "pos_frac": 0.640625, "sample": [-18.743606567382812, 76.95880126953125, -75.77079010009766, -1.5553398132324219, 24.870819091796875, -1.9514541625976562, -68.896484375, -65.35377502441406, -79.96000671386719, 120.47409057617188, 3.6554489135742188, 92.51370239257812, -14.3211669921875, -19.18793487548828, 86.32792663574219, 115.87616729736328, 90.26274108886719, 9.924812316894531, 99.47242736816406, 106.73558044433594, -37.20704650878906, 11.899810791015625, -7.303733825683594, 128.83731079101562, 3.267669677734375, -39.63114929199219, 84.35313415527344, -1.8632259368896484, -17.176189422607422, 47.2772331237793, -13.37335205078125, -30.92169952392578, 116.63937377929688, 20.889244079589844, 6.774681091308594, 17.820438385009766, 104.27592468261719, 8.04263687133789, 75.62287902832031, -85.85005187988281, 78.06423950195312, 119.72654724121094, -9.804683685302734, 131.71539306640625, 7.2437286376953125, -26.84576416015625, 34.23969268798828, 46.99315643310547, 71.60928344726562, 9.330947875976562, 59.157623291015625, 137.89309692382812, -0.1438446044921875, 25.84648895263672, -84.7296371459961, 46.735740661621094, 103.34956359863281, 5.718849182128906, 10.916015625, 82.02078247070312, -81.08854675292969, 66.42596435546875, 21.310951232910156, -3.3277511596679688], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000635.npy"}
|
||||
{"epoch": 0.9599395313681028, "step": 636, "batch_size": 64, "mean": 40.624237060546875, "std": 51.73701858520508, "min": -89.83849334716797, "p10": -4.598444938659667, "median": 33.162513732910156, "p90": 116.36107940673833, "max": 144.69107055664062, "pos_frac": 0.828125, "sample": [99.75950622558594, 1.34796142578125, 43.338111877441406, 33.4110107421875, 73.1988754272461, 13.884468078613281, 43.395748138427734, 1.4697647094726562, -0.5140762329101562, 102.962158203125, 52.212921142578125, -4.859819412231445, 88.31536865234375, 73.54244995117188, 13.646881103515625, 12.21807861328125, -3.9885711669921875, 54.66233825683594, 29.833885192871094, 86.52149963378906, 71.32938385009766, 18.645496368408203, 1.251321792602539, 88.99121856689453, 12.643142700195312, 65.06082153320312, 139.36026000976562, 1.5209732055664062, -3.1878414154052734, 44.96930694580078, 22.882503509521484, 31.639793395996094, 134.92970275878906, 22.3106689453125, -1.2725372314453125, 88.74778747558594, 46.01316833496094, 85.16960144042969, -13.869918823242188, 121.42001342773438, 57.47559356689453, 124.82106018066406, -7.9503173828125, 144.69107055664062, 1.1830329895019531, -89.83849334716797, 134.75587463378906, 11.791107177734375, 22.445247650146484, 104.55690002441406, 38.76643371582031, 41.875587463378906, 67.1407470703125, 129.33770751953125, 19.405044555664062, -82.80348205566406, -82.3180160522461, 13.029085159301758, 44.06568908691406, 73.90632629394531, 18.604270935058594, -31.881568908691406, 32.91401672363281, 21.06475830078125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000636.npy"}
|
||||
{"epoch": 0.9614512471655329, "step": 637, "batch_size": 64, "mean": 29.64734649658203, "std": 58.32988357543945, "min": -122.08367919921875, "p10": -37.42683639526367, "median": 18.692031860351562, "p90": 125.74754638671878, "max": 141.3760223388672, "pos_frac": 0.703125, "sample": [11.020111083984375, 94.56149291992188, 24.66204071044922, 140.64291381835938, -4.733116149902344, 6.832855224609375, -35.08435821533203, 76.80255889892578, 100.99215698242188, 16.767047882080078, -20.45262908935547, 6.374382019042969, 51.29271697998047, -69.36495208740234, 65.53836059570312, 29.814563751220703, -1.4634628295898438, 67.44346618652344, 108.18344116210938, 1.4128856658935547, 30.150146484375, 38.78282165527344, 0.3319091796875, -24.12335205078125, -3.417043685913086, 35.912078857421875, -30.9776611328125, 129.4108123779297, 128.8756561279297, 130.2344970703125, 130.6093292236328, 16.75849151611328, -44.359130859375, 79.66946411132812, -26.040390014648438, 89.052734375, -47.05908203125, 8.804481506347656, 18.699508666992188, 43.700767517089844, 22.350967407226562, 113.56480407714844, -3.7086963653564453, 17.17398452758789, 102.22479248046875, 18.733726501464844, -38.430755615234375, 18.71826171875, -13.497600555419922, -5.734016418457031, 12.984954833984375, 141.3760223388672, 41.08363342285156, 10.886924743652344, 18.684555053710938, -54.344154357910156, 119.69757080078125, 25.673734664916992, -12.861778259277344, -122.08367919921875, 34.72057342529297, 128.34039306640625, -59.74720001220703, 5.364799499511719], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000637.npy"}
|
||||
{"epoch": 0.9629629629629629, "step": 638, "batch_size": 64, "mean": 35.60945129394531, "std": 52.61056900024414, "min": -97.880859375, "p10": -20.91507568359374, "median": 20.51979637145996, "p90": 107.48005828857423, "max": 130.2198944091797, "pos_frac": 0.71875, "sample": [-24.688201904296875, 92.58004760742188, 67.70050048828125, 0.6416206359863281, -97.880859375, -4.8848419189453125, 125.54399108886719, 47.303680419921875, 8.68576431274414, -9.027423858642578, 6.3277130126953125, -39.36672592163086, -12.111114501953125, 7.6349945068359375, -10.647293090820312, 93.8359375, -49.86399841308594, -25.28014373779297, 106.22429656982422, 0.8139419555664062, 20.938369750976562, 37.205726623535156, 108.01824188232422, 74.17172241210938, -43.80127716064453, 91.66297912597656, 61.358543395996094, 121.98179626464844, 3.4282150268554688, 34.957427978515625, 36.964141845703125, 124.0921630859375, 61.12754821777344, 104.36514282226562, 8.892578125, -5.908525466918945, 0.07622528076171875, 85.74441528320312, 1.2450408935546875, 102.49736022949219, 54.40483856201172, 103.81761169433594, 103.31927490234375, -9.035289764404297, 94.13185119628906, 13.867347717285156, -3.0004615783691406, 119.5685806274414, -2.9189910888671875, 35.236183166503906, -0.6609096527099609, 20.10122299194336, 53.35719299316406, 56.13262939453125, -27.088153839111328, 56.329288482666016, -0.3394927978515625, 130.2198944091797, 118.37420654296875, 2.747234344482422, 1.0419769287109375, -8.457626342773438, 4.189155578613281, 51.10749816894531], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000638.npy"}
|
||||
{"epoch": 0.9644746787603931, "step": 639, "batch_size": 64, "mean": 39.93478775024414, "std": 53.4281005859375, "min": -75.10704803466797, "p10": -20.27189102172851, "median": 35.70470428466797, "p90": 118.99536132812503, "max": 138.7562713623047, "pos_frac": 0.71875, "sample": [67.71102905273438, 59.9951171875, 55.61534118652344, 15.952316284179688, 53.45960998535156, 73.38636016845703, 12.216880798339844, -40.20366287231445, -22.49019432067871, 79.68850708007812, 38.63807678222656, 34.925018310546875, 138.7562713623047, 21.489334106445312, 79.0892333984375, -4.7286376953125, 39.910064697265625, -1.8138656616210938, -75.10704803466797, 37.646217346191406, 60.02561950683594, 101.00283813476562, 1.9244976043701172, 104.00015258789062, -5.887016296386719, 101.60540771484375, 30.455760955810547, 109.87312316894531, -3.9237213134765625, 125.93562316894531, 121.38444519042969, -15.095849990844727, 1.1877593994140625, 56.616783142089844, -3.5683727264404297, 5.660041809082031, 79.60282897949219, -2.9988250732421875, 136.2877197265625, 91.46915435791016, 80.9533920288086, 6.272552490234375, 121.3555679321289, 42.67479705810547, -6.679931640625, 10.805721282958984, -5.685493469238281, -9.438697814941406, 36.48439025878906, 3.608348846435547, 113.48821258544922, 95.88818359375, 5.046684265136719, 19.344253540039062, -2.65826416015625, -50.82609939575195, 11.722713470458984, -31.092666625976562, -45.94548797607422, 127.21035766601562, -37.11997985839844, 90.79409790039062, 92.52686309814453, 127.40306854248047], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000639.npy"}
|
||||
{"epoch": 0.9659863945578231, "step": 640, "batch_size": 64, "mean": 34.14212417602539, "std": 52.78111267089844, "min": -123.37281799316406, "p10": -20.201838684082027, "median": 34.89353561401367, "p90": 104.9544532775879, "max": 126.0052261352539, "pos_frac": 0.78125, "sample": [91.65925598144531, 34.60872268676758, 93.10917663574219, -47.34325408935547, 3.4211578369140625, 20.213775634765625, 81.38261413574219, 1.948028564453125, 19.137054443359375, 111.82347106933594, 45.662109375, -3.4714126586914062, 68.87911987304688, 55.502593994140625, -65.03254699707031, 75.03965759277344, 37.38695526123047, 35.178348541259766, -6.182033538818359, 126.0052261352539, -123.37281799316406, 36.63182067871094, 115.73483276367188, 3.5454864501953125, -23.947715759277344, -15.248016357421875, 69.87442779541016, 3.4091262817382812, -24.376596450805664, 1.0630531311035156, 105.78140258789062, 3.4122467041015625, 11.37689208984375, 111.92251586914062, 58.50782775878906, -0.8715438842773438, 18.380523681640625, 17.76629638671875, -7.1964263916015625, 55.83319854736328, 13.996963500976562, 105.70537567138672, 2.346393585205078, 39.776885986328125, 103.20230102539062, 78.14712524414062, 64.27261352539062, -122.8843765258789, 24.449209213256836, 93.1061782836914, 54.4603385925293, 97.4119873046875, 29.112548828125, 99.51971435546875, -3.79803466796875, -9.145818710327148, 117.29852294921875, -22.324905395507812, 9.73492431640625, 71.38404083251953, 13.082202911376953, 40.25159454345703, 38.713768005371094, 50.121971130371094], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000640.npy"}
|
||||
{"epoch": 0.9674981103552532, "step": 641, "batch_size": 64, "mean": 33.59276580810547, "std": 46.28376007080078, "min": -122.42030334472656, "p10": -7.024786376953122, "median": 23.197982788085938, "p90": 93.33007354736328, "max": 140.78274536132812, "pos_frac": 0.796875, "sample": [-16.105152130126953, -47.13465118408203, 79.74874877929688, 50.003658294677734, 64.27561950683594, 0.05233001708984375, 76.5072021484375, 4.979286193847656, -2.7105255126953125, 10.569351196289062, 36.57807922363281, 13.329275131225586, 63.29633331298828, 124.81932067871094, 7.7136383056640625, 12.00898551940918, -8.52593994140625, 93.7043685913086, 129.802978515625, 72.47247314453125, 70.27345275878906, -1.2713470458984375, 15.857681274414062, 11.507247924804688, 40.237525939941406, 97.86961364746094, 9.572402954101562, 26.458602905273438, 32.71257781982422, 77.55337524414062, -3.5220947265625, 66.98617553710938, -0.49445343017578125, 55.486717224121094, -0.9389801025390625, -47.378028869628906, 66.72369384765625, 13.58966064453125, -21.10169219970703, 6.273918151855469, -122.42030334472656, 98.66458892822266, 31.98565673828125, 20.269668579101562, 26.148284912109375, 117.89988708496094, 1.14617919921875, -0.6920089721679688, 85.63688659667969, 26.126296997070312, 140.78274536132812, 67.99336242675781, 92.45671844482422, 83.36943054199219, -16.806884765625, 19.194915771484375, 47.207550048828125, 9.161453247070312, 19.965579986572266, 16.981201171875, 6.798439025878906, 11.9573974609375, 33.36212921142578, 50.966514587402344], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000641.npy"}
|
||||
{"epoch": 0.9690098261526833, "step": 642, "batch_size": 64, "mean": 26.34848403930664, "std": 56.30079650878906, "min": -121.30795288085938, "p10": -38.74680099487304, "median": 16.95722770690918, "p90": 113.78484497070315, "max": 126.01041412353516, "pos_frac": 0.640625, "sample": [69.53883361816406, 28.982276916503906, 10.154472351074219, -51.03861618041992, 16.575355529785156, -5.601690292358398, 63.08363342285156, 121.8313217163086, -15.883697509765625, -121.30795288085938, 118.14938354492188, 2.392547607421875, 36.006866455078125, -45.27934265136719, 53.05419921875, 7.96942138671875, 79.62852478027344, 126.01041412353516, -12.922765731811523, -19.78152847290039, 49.41893005371094, 64.86650085449219, 66.92313385009766, -13.66848373413086, 74.6612548828125, 76.06137084960938, 121.01597595214844, -5.736572265625, 109.936767578125, 21.68665313720703, 45.067832946777344, 60.33934020996094, 45.48756408691406, 1.1033363342285156, -8.448738098144531, 115.43402099609375, -29.307540893554688, 14.860282897949219, 8.264244079589844, -15.774856567382812, -12.915302276611328, 4.2848968505859375, -3.988790512084961, -27.755828857421875, 75.50032806396484, 21.365938186645508, -28.919204711914062, 72.67435455322266, 107.05633544921875, 123.70752716064453, 20.295669555664062, -0.36080169677734375, 105.40241241455078, 66.69303894042969, 123.13621520996094, 17.339099884033203, -42.792198181152344, 19.832557678222656, -64.10725402832031, -84.0301284790039, -1.2794342041015625, -20.695816040039062, -62.6063232421875, 14.713096618652344], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000642.npy"}
|
||||
{"epoch": 0.9705215419501134, "step": 643, "batch_size": 64, "mean": 32.88540267944336, "std": 54.76426696777344, "min": -95.9997787475586, "p10": -36.93949890136719, "median": 27.091830253601074, "p90": 113.34584732055664, "max": 130.21807861328125, "pos_frac": 0.71875, "sample": [18.999053955078125, 34.19663619995117, 27.086994171142578, -16.9608154296875, 4.977129936218262, 112.60763549804688, 5.921640396118164, 91.08222961425781, 65.6591567993164, 5.944007873535156, -1.372894287109375, 25.77118492126465, 56.05594253540039, -0.9853668212890625, 97.33475494384766, 58.3218994140625, -20.560699462890625, 1.1513385772705078, 121.10113525390625, 130.21807861328125, 71.87596130371094, 24.547164916992188, 100.2012710571289, 18.09868621826172, -33.58528137207031, 123.5899658203125, 16.793838500976562, -11.97064208984375, 128.6834716796875, 106.34228515625, 28.97570037841797, 60.42432403564453, -23.00347900390625, 73.34400939941406, 12.673582077026367, -49.26706314086914, 27.09666633605957, 2.0850982666015625, 120.3454818725586, 117.84859466552734, 96.95541381835938, -52.14873504638672, 50.42890930175781, -9.787830352783203, 76.9574966430664, -37.17945861816406, -59.102699279785156, -95.9997787475586, 4.23309326171875, 24.012584686279297, -51.65863800048828, 53.95689392089844, 89.98153686523438, 41.56391906738281, 27.690216064453125, -52.692230224609375, -5.9547119140625, 113.66222381591797, 33.020111083984375, 32.80530548095703, -16.383930206298828, 69.65411376953125, 75.38285064697266, -36.37959289550781], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000643.npy"}
|
||||
{"epoch": 0.9720332577475435, "step": 644, "batch_size": 64, "mean": 30.634416580200195, "std": 60.60884094238281, "min": -127.44242858886719, "p10": -31.410435676574707, "median": 21.221267700195312, "p90": 116.49451828002931, "max": 127.47944641113281, "pos_frac": 0.734375, "sample": [79.30175018310547, -29.627370834350586, -98.01487731933594, 104.41322326660156, -21.64547348022461, -32.48283004760742, 6.736717224121094, 19.348817825317383, 103.83277893066406, 122.89517211914062, 64.89764404296875, 4.548671722412109, 105.71977233886719, 95.98318481445312, 27.005783081054688, 9.43341064453125, 28.97353744506836, 29.052162170410156, 0.5505447387695312, 121.59254455566406, 2.3096961975097656, 103.25958251953125, -87.40753173828125, -3.90289306640625, -29.2742919921875, 69.93052673339844, 0.37139892578125, 39.952091217041016, 1.7315139770507812, -28.423683166503906, 117.49189758300781, -12.03753662109375, 21.038177490234375, 64.41386413574219, 95.11258697509766, -9.751045227050781, -2.8242950439453125, 127.47944641113281, 110.27005004882812, 9.058441162109375, 40.8909912109375, 117.39632415771484, 48.23512268066406, 35.467750549316406, 5.690460205078125, -127.44242858886719, -13.051040649414062, 101.91482543945312, 9.438484191894531, 114.39030456542969, -32.17460632324219, 10.599273681640625, 0.5300521850585938, -50.214569091796875, 21.40435791015625, 24.5443115234375, -96.91365051269531, 46.243404388427734, 87.12266540527344, 3.4574623107910156, -0.3837738037109375, 124.11480712890625, 32.69562530517578, 125.33344268798828], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000644.npy"}
|
||||
{"epoch": 0.9735449735449735, "step": 645, "batch_size": 64, "mean": 32.27980041503906, "std": 49.51926040649414, "min": -105.8138427734375, "p10": -16.93548126220703, "median": 24.602919578552246, "p90": 101.62403259277345, "max": 137.54888916015625, "pos_frac": 0.671875, "sample": [20.029483795166016, 117.27566528320312, 67.62299346923828, -34.02368927001953, 39.66438293457031, 137.54888916015625, 79.27094268798828, 15.456064224243164, -1.2514266967773438, -0.9869918823242188, -1.0825233459472656, -8.29998779296875, 22.488027572631836, -1.3903846740722656, 71.00282287597656, 9.4849853515625, 69.01297760009766, 1.5598468780517578, -0.7643585205078125, -2.264169692993164, 125.18226623535156, -22.884437561035156, 16.002647399902344, 61.948265075683594, 34.34642791748047, -13.964393615722656, -1.9315109252929688, 29.537322998046875, 117.75176239013672, 64.07836151123047, 26.717811584472656, 0.47046470642089844, 41.688270568847656, 0.7885665893554688, 8.254653930664062, 52.46632385253906, -52.99378967285156, -9.60107421875, -39.31968688964844, 102.58230590820312, 52.1829833984375, 62.588539123535156, 135.4596405029297, -8.521446228027344, 9.454143524169922, 44.389183044433594, 70.56224822998047, 45.81122589111328, 26.7437744140625, 65.62322998046875, -16.617599487304688, 94.2828369140625, 31.500774383544922, 98.35466766357422, 78.03533935546875, -1.0247802734375, 99.3880615234375, -14.783672332763672, -17.07171630859375, -105.8138427734375, 3.8097152709960938, 112.38949584960938, 76.0162124633789, -18.325965881347656], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000645.npy"}
|
||||
{"epoch": 0.9750566893424036, "step": 646, "batch_size": 64, "mean": 31.456527709960938, "std": 56.313411712646484, "min": -117.04193878173828, "p10": -26.75483283996581, "median": 15.489526748657227, "p90": 112.49228744506836, "max": 142.8944091796875, "pos_frac": 0.78125, "sample": [-6.042144775390625, 1.025186538696289, 5.128959655761719, 67.39999389648438, 115.76451110839844, 99.96519470214844, -73.61839294433594, 76.12039184570312, 53.270057678222656, 111.5157470703125, 0.405517578125, 9.128303527832031, 93.73027801513672, 115.03254699707031, 75.34420776367188, 1.673187255859375, 31.151412963867188, 32.468841552734375, 120.49602508544922, 78.0154037475586, 111.23924255371094, 16.362075805664062, -92.70521545410156, -30.199111938476562, -38.78964614868164, 47.060760498046875, 5.822837829589844, -12.152145385742188, 12.795330047607422, 6.32452392578125, 12.443450927734375, 9.699447631835938, 53.21825408935547, 102.1461181640625, 51.565643310546875, 5.278907775878906, 16.81336212158203, -7.6327056884765625, 35.56006622314453, 67.82220458984375, -0.9941501617431641, -14.70449447631836, 64.89060974121094, 1.9592742919921875, 94.4365234375, 142.8944091796875, 112.91080474853516, 77.94527435302734, 16.53534507751465, 136.8594970703125, 50.804656982421875, 10.51531982421875, 13.132598876953125, 5.1739044189453125, -117.04193878173828, 5.663154602050781, 37.58869934082031, -80.81053161621094, -33.43408203125, 121.4734115600586, -18.718181610107422, -11.061813354492188, 1.9339637756347656, 14.61697769165039], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000646.npy"}
|
||||
{"epoch": 0.9765684051398337, "step": 647, "batch_size": 64, "mean": 46.494606018066406, "std": 52.41477966308594, "min": -78.90376281738281, "p10": -7.345520782470702, "median": 35.06735801696777, "p90": 125.86884155273438, "max": 142.0814208984375, "pos_frac": 0.828125, "sample": [2.886280059814453, 51.695167541503906, 111.18223571777344, 5.2665252685546875, 18.7630615234375, 87.51777648925781, -7.702079772949219, 126.79934692382812, 45.768768310546875, 126.826171875, 25.975990295410156, 36.774261474609375, 60.4388427734375, -5.028919219970703, -33.2169189453125, 27.485595703125, 142.0814208984375, -2.541933059692383, -60.21585464477539, -6.5135498046875, 102.76789855957031, 36.28242492675781, 65.2986068725586, 47.472652435302734, 137.6090087890625, 95.64795684814453, 102.822998046875, -2.234090805053711, 113.14697265625, 7.8912506103515625, 77.63850402832031, 7.3271484375, 92.95663452148438, 6.2598724365234375, 81.77571105957031, -30.687835693359375, 20.332504272460938, 123.69766235351562, 23.8685302734375, -25.09575653076172, 79.60520935058594, 31.04778289794922, 33.852291107177734, 100.03064727783203, -78.90376281738281, 15.752439498901367, 36.623619079589844, 24.56601333618164, 25.308128356933594, 25.406892776489258, 5.60321044921875, 133.183349609375, 60.761783599853516, 13.03778076171875, -10.02984619140625, 94.93733978271484, 6.361318588256836, 137.85504150390625, 9.000541687011719, 93.25733947753906, 131.24325561523438, 15.561439514160156, 64.09159088134766, 88.48063659667969], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000647.npy"}
|
||||
{"epoch": 0.9780801209372638, "step": 648, "batch_size": 64, "mean": 35.88824462890625, "std": 54.4984016418457, "min": -119.84996032714844, "p10": -27.778112030029295, "median": 40.42451095581055, "p90": 101.18835906982424, "max": 137.6435089111328, "pos_frac": 0.75, "sample": [78.85736846923828, 22.150718688964844, 17.83802032470703, -119.84996032714844, 105.37493896484375, -6.8846588134765625, -109.69189453125, 11.059257507324219, 89.40934753417969, 32.24065399169922, -14.162120819091797, 36.94209289550781, 10.428085327148438, 4.923271179199219, -29.02984619140625, -57.320526123046875, 81.59745788574219, -4.066108703613281, 137.6435089111328, 67.60836791992188, 128.43106079101562, 50.720375061035156, 71.58282470703125, 93.15017700195312, 96.26734924316406, 4.695518493652344, 43.6163330078125, 120.9095687866211, 48.624122619628906, 76.61597442626953, 1.9572372436523438, -0.8032703399658203, 103.29736328125, 118.85076904296875, 2.0311641693115234, -52.45769500732422, 82.7457275390625, 53.955718994140625, 56.33258056640625, 37.232688903808594, 79.64944458007812, 88.48394775390625, 110.304443359375, -5.503734588623047, -37.661460876464844, 5.854972839355469, 30.305130004882812, 58.50273895263672, 51.5513916015625, 90.69775390625, 52.119449615478516, 75.5030517578125, 66.04661560058594, -47.02970886230469, -6.839630126953125, 16.102066040039062, 85.12677001953125, -24.857398986816406, 26.768199920654297, -15.91806411743164, 1.0703659057617188, -9.888032913208008, 45.7825927734375, 67.85305786132812], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000648.npy"}
|
||||
{"epoch": 0.9795918367346939, "step": 649, "batch_size": 64, "mean": 35.89997100830078, "std": 57.00625228881836, "min": -104.05488586425781, "p10": -35.13901443481445, "median": 36.19842529296875, "p90": 114.37792129516602, "max": 132.7054901123047, "pos_frac": 0.71875, "sample": [102.55577087402344, 86.88297271728516, -15.941097259521484, 73.60868835449219, 102.91690063476562, 96.4815902709961, 36.73674011230469, 35.66011047363281, 2.2582473754882812, 113.08049774169922, 7.966894149780273, 128.49327087402344, 63.66352844238281, -27.356800079345703, 57.462528228759766, 56.27906799316406, 46.362178802490234, 72.48625946044922, 52.788536071777344, 93.34893798828125, 124.35733032226562, -2.946474075317383, -68.6277084350586, -21.366539001464844, -104.05488586425781, 75.98683166503906, 3.9985122680664062, 14.090095520019531, -35.86174774169922, -33.45263671875, 30.789173126220703, 18.51775360107422, 109.2252197265625, 1.5698051452636719, 53.65459060668945, 78.52053833007812, 28.99811553955078, 63.5863151550293, 91.55940246582031, 66.7363510131836, -11.887924194335938, 120.76470947265625, 64.42023468017578, -8.89725112915039, 8.357284545898438, 120.66918182373047, 1.2610435485839844, 3.1305313110351562, -5.8722686767578125, 10.794418334960938, -6.090904235839844, 123.81352233886719, -51.49864196777344, -40.987152099609375, 132.7054901123047, -27.545650482177734, -42.49491882324219, -25.379058837890625, 61.14698028564453, 114.9339599609375, 77.26924896240234, 58.368980407714844, 4.423137664794922, -64.8216552734375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000649.npy"}
|
||||
{"epoch": 0.981103552532124, "step": 650, "batch_size": 64, "mean": 35.07628631591797, "std": 60.52153015136719, "min": -108.27131652832031, "p10": -32.430043029785146, "median": 30.307761192321777, "p90": 117.47580108642579, "max": 129.80030822753906, "pos_frac": 0.734375, "sample": [-20.26395034790039, 113.10679626464844, -0.057342529296875, 103.27568054199219, 2.7469329833984375, 120.1559829711914, 68.19482421875, 29.995529174804688, 3.5139617919921875, 14.189403533935547, 110.83817291259766, 97.2220458984375, -67.23490905761719, 10.952407836914062, 2.4062671661376953, 62.12059783935547, 101.67947387695312, 30.619993209838867, 117.87347412109375, 2.3927555084228516, 80.83172607421875, 50.09281921386719, -19.41473960876465, -22.162425994873047, 126.59112548828125, 112.19408416748047, 129.80030822753906, 90.82514953613281, 42.592872619628906, -61.95209503173828, 3.954914093017578, -108.27131652832031, 16.460514068603516, 97.68860626220703, -93.34149169921875, 11.408889770507812, 116.54789733886719, 49.934757232666016, 6.8712310791015625, -14.436332702636719, 5.387969970703125, 54.08583068847656, -58.260345458984375, 104.38468170166016, -36.23414611816406, -23.553802490234375, 127.81809997558594, 120.04620361328125, 36.18359375, -11.373525619506836, 61.39805603027344, 2.0455322265625, 94.80905151367188, -4.732791900634766, 12.827104568481445, 42.37066650390625, -12.182388305664062, 33.479827880859375, 85.17239379882812, 125.65615844726562, -75.66390991210938, -0.6835746765136719, 39.72919845581055, 2.2277374267578125], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000650.npy"}
|
||||
{"epoch": 0.982615268329554, "step": 651, "batch_size": 64, "mean": 30.842975616455078, "std": 51.04434585571289, "min": -86.54427337646484, "p10": -28.186786651611325, "median": 20.5859317779541, "p90": 105.72068786621094, "max": 129.60855102539062, "pos_frac": 0.71875, "sample": [1.048065185546875, 20.942466735839844, 41.69886779785156, -29.449172973632812, 24.76343536376953, -51.63959503173828, -25.24121856689453, 39.63340759277344, -0.6106643676757812, 104.53761291503906, 9.06793212890625, -5.3901214599609375, -2.0398712158203125, -4.775016784667969, 80.61355590820312, -86.54427337646484, -7.042198181152344, 106.74441528320312, 42.20067596435547, 104.57383728027344, 80.8775405883789, -3.2114906311035156, -49.163726806640625, 46.505279541015625, 10.888664245605469, 3.353618621826172, 7.3589630126953125, 62.68382263183594, -15.569915771484375, 96.50601196289062, 20.22939682006836, 116.3933334350586, 14.978252410888672, 1.6447372436523438, 24.442428588867188, 0.1997203826904297, 64.8204345703125, 29.50201416015625, -37.825172424316406, 32.734378814697266, 121.57849884033203, 21.8060302734375, 69.08584594726562, 7.876850128173828, -3.9066505432128906, 129.60855102539062, 127.93115234375, 105.04825592041016, -8.580726623535156, 106.00887298583984, 14.675691604614258, 3.63238525390625, 4.204231262207031, 79.69048309326172, 61.257080078125, 52.10057830810547, 99.13252258300781, 13.16494369506836, 44.277732849121094, -59.82135009765625, 26.256378173828125, -34.35938262939453, 128.85302734375, -6.0110015869140625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000651.npy"}
|
||||
{"epoch": 0.9841269841269841, "step": 652, "batch_size": 64, "mean": 30.856266021728516, "std": 54.983829498291016, "min": -91.81407165527344, "p10": -33.12961959838867, "median": 28.543835639953613, "p90": 117.31391448974614, "max": 134.16162109375, "pos_frac": 0.640625, "sample": [-2.637592315673828, 106.88338470458984, 121.78414154052734, 41.169166564941406, -32.918052673339844, 123.0589599609375, 31.28952980041504, 41.00973129272461, 124.58985900878906, 2.6373214721679688, 104.46964263916016, -69.3214111328125, -6.234737396240234, 30.364168167114258, 43.31364059448242, 76.40269470214844, 15.195892333984375, 93.10123443603516, 26.72350311279297, -0.533294677734375, -33.22029113769531, 74.87869262695312, 36.80714416503906, 1.8301925659179688, 4.356693267822266, -8.284111022949219, -33.832122802734375, -2.558380126953125, 132.94651794433594, 54.281333923339844, 19.583404541015625, -9.580036163330078, 38.59309005737305, 3.8708419799804688, 99.58545684814453, -7.162147521972656, 63.61547088623047, 63.10892105102539, -15.204639434814453, -79.10737609863281, -25.03777313232422, 125.47846984863281, 9.634284973144531, 68.14004516601562, -39.616485595703125, -9.28731918334961, 105.03333282470703, 62.12959289550781, -1.4313983917236328, 86.8020248413086, 134.16162109375, 34.70848846435547, 60.23472595214844, -5.130165100097656, -12.020057678222656, 44.26216125488281, -36.88810729980469, 11.082815170288086, -29.9385986328125, 126.33385467529297, 31.136367797851562, -8.653900146484375, -91.81407165527344, 60.624603271484375], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000652.npy"}
|
||||
{"epoch": 0.9856386999244142, "step": 653, "batch_size": 64, "mean": 39.352874755859375, "std": 56.939170837402344, "min": -96.59013366699219, "p10": -15.394922256469727, "median": 29.97746181488037, "p90": 118.18116912841799, "max": 140.56283569335938, "pos_frac": 0.765625, "sample": [15.722429275512695, 112.6832275390625, 0.08339691162109375, -5.327362060546875, 43.16225051879883, -3.1005401611328125, 50.206016540527344, -6.122642517089844, 133.0955352783203, 0.8787078857421875, -15.417919158935547, 8.268539428710938, 85.26742553710938, -15.341262817382812, 140.56283569335938, 73.97586822509766, 58.03656768798828, 123.36128234863281, 13.236919403076172, 125.24352264404297, 64.47913360595703, 132.70225524902344, 59.54161834716797, -1.715118408203125, 26.907888412475586, -2.1287784576416016, 108.49030303955078, 10.763931274414062, 3.9201736450195312, -1.0024185180664062, 136.71597290039062, -17.855056762695312, -40.82318878173828, 18.066593170166016, 24.55910301208496, -42.13426208496094, 57.13418960571289, 36.126869201660156, 74.275634765625, 50.38779830932617, 4.534263610839844, -96.59013366699219, 112.91500854492188, 52.12847900390625, 95.77214813232422, -92.36805725097656, 110.93644714355469, -1.4964599609375, 81.35896301269531, 26.721466064453125, 7.729698181152344, 120.43809509277344, -93.9691162109375, 12.090621948242188, 0.9123363494873047, 33.047035217285156, 101.06623840332031, 93.78076171875, 9.363262176513672, 93.78680419921875, 45.98388671875, 69.36566162109375, 2.4247665405273438, 91.7645034790039], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000653.npy"}
|
||||
{"epoch": 0.9871504157218443, "step": 654, "batch_size": 64, "mean": 28.009090423583984, "std": 53.327640533447266, "min": -70.6779556274414, "p10": -37.60759086608886, "median": 12.455150604248047, "p90": 116.34748611450198, "max": 150.66262817382812, "pos_frac": 0.71875, "sample": [5.477874755859375, 124.48472595214844, -70.6779556274414, 61.40269470214844, 23.182327270507812, -39.74774932861328, 104.53619384765625, 23.047163009643555, 7.357196807861328, -20.45623779296875, -2.30267333984375, -40.121238708496094, 72.30770111083984, 1.8923263549804688, -2.1129989624023438, 24.487157821655273, 106.74286651611328, -39.325199127197266, 111.58026885986328, -4.98748779296875, 3.5530548095703125, 7.427967071533203, 92.64456176757812, 130.33389282226562, -3.9857616424560547, 0.77783203125, 1.7475624084472656, -7.921516418457031, -11.650550842285156, 24.325191497802734, 2.2577056884765625, 46.74052429199219, 150.66262817382812, 68.19004821777344, 40.839393615722656, 102.42818450927734, 66.98516845703125, -16.827362060546875, 18.315536499023438, 24.64788818359375, 69.74687957763672, -33.59983825683594, 21.869140625, -46.499114990234375, 6.353656768798828, 0.6282730102539062, 126.92955017089844, 118.39057922363281, 26.73331642150879, 43.065345764160156, 8.354202270507812, 123.60160064697266, 6.580333709716797, 14.006492614746094, 123.47111511230469, -58.49501419067383, 10.90380859375, 25.077014923095703, -11.816848754882812, 72.05892944335938, 26.648609161376953, 8.686752319335938, -11.024215698242188, -67.34765625], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000654.npy"}
|
||||
{"epoch": 0.9886621315192744, "step": 655, "batch_size": 64, "mean": 43.98023986816406, "std": 45.70294952392578, "min": -39.42347717285156, "p10": -7.081919097900387, "median": 35.22175979614258, "p90": 101.45399856567383, "max": 146.0089111328125, "pos_frac": 0.78125, "sample": [86.41683959960938, 40.9581298828125, -1.9494857788085938, 9.692863464355469, 26.819366455078125, 34.300445556640625, 111.66227722167969, 84.61515808105469, 68.10796356201172, 4.094085693359375, 7.4190216064453125, 60.113037109375, 4.227508544921875, -2.9409561157226562, -13.306564331054688, -2.283111572265625, 34.876922607421875, 111.01773071289062, 95.8394546508789, 10.980403900146484, 122.03961181640625, -1.5130157470703125, 89.95187377929688, 96.18528747558594, 23.265283584594727, -0.9555130004882812, 52.98351287841797, -32.44264602661133, 113.52896118164062, -25.989334106445312, 100.05364990234375, 58.27488708496094, 101.92628479003906, 13.169441223144531, 17.669647216796875, 3.102123260498047, 21.652971267700195, 100.35199737548828, 54.16511535644531, 146.0089111328125, 8.104530334472656, -4.254302978515625, -8.293754577636719, 99.72901916503906, 59.75583267211914, -38.66266632080078, 30.257314682006836, 87.02821350097656, 113.7513427734375, 76.07615661621094, -0.5264015197753906, 21.604598999023438, 78.65585327148438, 26.465246200561523, 35.56659698486328, 85.81275939941406, 88.62016296386719, 16.674537658691406, 80.5963134765625, -11.064224243164062, -39.42347717285156, 55.85630798339844, 67.207275390625, 61.10789489746094], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000655.npy"}
|
||||
{"epoch": 0.9901738473167044, "step": 656, "batch_size": 64, "mean": 40.203338623046875, "std": 51.957298278808594, "min": -80.8673095703125, "p10": -12.358838272094726, "median": 26.740936279296875, "p90": 118.33206787109376, "max": 135.52088928222656, "pos_frac": 0.84375, "sample": [112.85122680664062, 37.642295837402344, 116.00797271728516, 80.4179458618164, 4.989032745361328, 13.563461303710938, 9.116748809814453, 0.39354705810546875, 53.77107238769531, -3.6953773498535156, 16.170413970947266, 109.61438751220703, 1.0810317993164062, 3.6189422607421875, 2.2403030395507812, 83.06776428222656, -80.8673095703125, -13.104515075683594, 65.94194793701172, 83.97462463378906, 122.57689666748047, 11.295326232910156, -77.70087432861328, 17.543811798095703, 122.48165893554688, -12.922527313232422, 48.21985626220703, 71.94902038574219, -11.043563842773438, 49.792545318603516, 118.11585998535156, 18.308692932128906, 12.115365982055664, 47.52046203613281, 45.57776641845703, 52.268310546875, 25.45025634765625, 32.36162567138672, 127.06889343261719, 28.0316162109375, 15.065780639648438, -38.59075164794922, 32.43865203857422, 127.50294494628906, 23.523956298828125, 109.5590591430664, 43.3104248046875, 74.49479675292969, 24.150470733642578, 83.89228820800781, 122.14492797851562, -14.609386444091797, 7.97613525390625, 135.52088928222656, 71.15149688720703, 1.1746673583984375, -6.0985107421875, -65.71177673339844, 10.868782043457031, 13.025712966918945, 24.086273193359375, 13.673362731933594, 100.23218536376953, 118.42472839355469], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000656.npy"}
|
||||
{"epoch": 0.9916855631141346, "step": 657, "batch_size": 64, "mean": 43.78763961791992, "std": 58.145538330078125, "min": -113.59988403320312, "p10": -14.489235687255853, "median": 43.453521728515625, "p90": 124.14984741210938, "max": 136.32058715820312, "pos_frac": 0.78125, "sample": [131.9861602783203, 55.40380859375, -5.545093536376953, 124.0228271484375, 55.53485107421875, 10.002408981323242, 124.20428466796875, 128.46051025390625, 43.21538543701172, -113.59988403320312, -3.6138572692871094, -17.51219367980957, 80.63702392578125, 21.40485382080078, 89.703125, 41.36067581176758, 71.2778091430664, 125.73660278320312, -65.02687072753906, 111.784912109375, -8.384674072265625, 63.615745544433594, -85.0162353515625, 15.780366897583008, 64.65701293945312, 109.76419830322266, 16.668243408203125, 15.514602661132812, -0.6135349273681641, 43.69165802001953, 25.904983520507812, 101.80799865722656, 14.667648315429688, 93.0247802734375, 66.25988006591797, 35.09022521972656, 73.33212280273438, 106.49732208251953, 125.10974884033203, 5.070535659790039, 116.67364501953125, 1.5247039794921875, -1.3724212646484375, 115.99665832519531, 0.25370025634765625, 9.231719970703125, 136.32058715820312, 4.474527359008789, -22.283767700195312, 54.95743942260742, 16.19501495361328, -88.71660614013672, 93.6707534790039, -1.250162124633789, 5.548887252807617, 51.869544982910156, -17.10547637939453, 62.013587951660156, 60.46485137939453, 130.99932861328125, -2.402667999267578, 82.97997283935547, 1.9130744934082031, 98.57213592529297], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000657.npy"}
|
||||
{"epoch": 0.9931972789115646, "step": 658, "batch_size": 64, "mean": 31.531997680664062, "std": 62.38043975830078, "min": -120.94393157958984, "p10": -41.28969268798828, "median": 29.629499435424805, "p90": 114.95982055664064, "max": 146.49490356445312, "pos_frac": 0.734375, "sample": [49.86098861694336, 32.47111892700195, 112.47673034667969, -42.405120849609375, 80.08706665039062, 6.451436996459961, -103.29035949707031, 17.54137420654297, 97.10051727294922, 89.10696411132812, 3.4282684326171875, 57.79936218261719, -59.13189697265625, 77.151611328125, 84.70938110351562, 28.12881088256836, -29.054763793945312, 89.90238952636719, 146.49490356445312, 1.59033203125, -11.044748306274414, 8.791412353515625, 31.50562286376953, 71.14530944824219, 59.0694580078125, 124.89140319824219, -86.05604553222656, 66.12311553955078, 70.97662353515625, 13.460678100585938, 3.0313167572021484, -1.3038177490234375, 33.860557556152344, 79.63949584960938, -0.45473289489746094, 42.933509826660156, 125.14712524414062, -84.76567077636719, 12.180648803710938, 31.13018798828125, 127.9714126586914, 2.1019649505615234, 14.077957153320312, 113.45074462890625, -1.8860759735107422, 59.24346160888672, -36.06336975097656, 115.6065673828125, 32.69806671142578, 36.298213958740234, 4.677696228027344, 0.09840965270996094, -38.68702697753906, -89.04383087158203, 18.77497100830078, 90.10321807861328, -22.26987075805664, -120.94393157958984, 141.20687866210938, -16.138221740722656, 131.72659301757812, 13.591720581054688, 110.85755157470703, -0.08583259582519531], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000658.npy"}
|
||||
{"epoch": 0.9947089947089947, "step": 659, "batch_size": 64, "mean": 29.910579681396484, "std": 56.28750991821289, "min": -104.43505859375, "p10": -36.861821746826166, "median": 20.917399406433105, "p90": 108.23011779785158, "max": 133.99618530273438, "pos_frac": 0.75, "sample": [23.744552612304688, 79.25706481933594, 116.50679016113281, 89.74593353271484, 1.5002899169921875, 126.09359741210938, 43.71479034423828, 58.92547607421875, 20.47684669494629, 118.73577880859375, 72.13717651367188, -3.028778076171875, 10.611686706542969, 8.75497055053711, -0.067626953125, 10.883007049560547, 82.89549255371094, -77.11322784423828, 8.408742904663086, -34.78235626220703, 109.16560363769531, -18.024484634399414, 33.98065185546875, 13.170654296875, 5.328794479370117, -0.2611236572265625, 13.347738265991211, 22.1062068939209, -6.518032073974609, 17.903656005859375, 11.354232788085938, 7.255027770996094, 104.3380126953125, 67.53346252441406, 33.71522521972656, 4.702125549316406, 122.19542694091797, 72.85546875, 124.55801391601562, 21.357952117919922, 57.00885009765625, -51.09967041015625, -45.27531433105469, 104.46658325195312, -20.991525650024414, 11.402313232421875, 10.623703002929688, 60.005760192871094, 92.59156036376953, -30.525245666503906, 26.166000366210938, 92.5123062133789, -102.2470932006836, -37.753021240234375, 106.04731750488281, 22.50971221923828, 2.6279220581054688, -104.43505859375, 133.99618530273438, 27.736129760742188, -10.095203399658203, 88.4335708618164, 30.416717529296875, -67.31021118164062], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000659.npy"}
|
||||
{"epoch": 0.9962207105064248, "step": 660, "batch_size": 64, "mean": 37.475677490234375, "std": 47.12800979614258, "min": -70.09993743896484, "p10": -25.92962493896484, "median": 30.03331184387207, "p90": 96.28330688476562, "max": 141.45855712890625, "pos_frac": 0.8125, "sample": [125.90803527832031, 30.22905731201172, 23.571697235107422, -28.32879638671875, 42.86077880859375, -50.85822296142578, 12.683563232421875, 96.54612731933594, -70.09993743896484, -31.667724609375, 29.837566375732422, 125.64644622802734, 17.284610748291016, 66.31475830078125, 15.31597900390625, 141.45855712890625, 88.9298095703125, 56.47346496582031, 25.369108200073242, -29.17041015625, 62.75079345703125, -19.781005859375, 64.40384674072266, 9.681436538696289, 3.4711151123046875, 124.22541046142578, -37.44805908203125, 101.94564819335938, 32.36587142944336, -20.331558227539062, 33.835227966308594, 48.158050537109375, 84.985107421875, 44.34620666503906, -0.029842376708984375, -41.99664306640625, 8.142173767089844, 7.235084533691406, 64.79867553710938, 35.36524963378906, 11.773414611816406, 22.927988052368164, 76.34432983398438, 94.04209899902344, 92.58948516845703, 93.80741119384766, 10.17510986328125, 23.44310760498047, 73.86894989013672, 93.2748031616211, 18.78109359741211, 99.4237289428711, 10.54803466796875, -11.028448104858398, 95.67005920410156, 46.09028244018555, 33.543758392333984, 81.96257019042969, 18.88471221923828, -4.020904541015625, 23.256757736206055, 26.347946166992188, 10.816543579101562, 61.493263244628906], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000660.npy"}
|
||||
{"epoch": 0.9977324263038548, "step": 661, "batch_size": 64, "mean": 28.993637084960938, "std": 56.31117248535156, "min": -115.33921813964844, "p10": -36.103081512451155, "median": 19.177291870117188, "p90": 105.61378402709963, "max": 134.83901977539062, "pos_frac": 0.6875, "sample": [-20.785743713378906, -15.468757629394531, -45.987823486328125, -60.527671813964844, 85.44804382324219, 48.74045944213867, 91.35079193115234, -5.729248046875, -114.90339660644531, 72.55258178710938, -53.700233459472656, 1.534271240234375, 4.2126922607421875, 123.3832778930664, 115.03998565673828, 88.1575927734375, 125.15958404541016, 63.167076110839844, 134.83901977539062, 30.948074340820312, -115.33921813964844, 17.5372314453125, 89.26542663574219, -21.684844970703125, -2.6994361877441406, 7.0824127197265625, 17.912246704101562, -18.234973907470703, -1.5000495910644531, 18.332130432128906, 14.95504379272461, 47.77324676513672, -65.56034088134766, 35.50304412841797, 58.341064453125, 8.417724609375, 23.616104125976562, 114.93721008300781, -0.19593429565429688, 102.21109008789062, 52.4864501953125, 16.345558166503906, -8.337234497070312, 106.91824340820312, 97.79960632324219, 43.011207580566406, 102.5700454711914, 33.536048889160156, 3.847280502319336, 35.7586669921875, -42.282325744628906, 10.057945251464844, -0.697265625, -14.285398483276367, 40.422142028808594, 87.3948745727539, 20.02245330810547, 27.272674560546875, 22.617355346679688, 121.82217407226562, 16.413101196289062, -5.3089599609375, -9.228736877441406, 99.33717346191406], "npy": "outputs/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6/margin_logs/step_0000661.npy"}
|
||||
3
model-00001-of-00007.safetensors
Normal file
3
model-00001-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:cc01f8283809011a1cd3034cbc138ad75ee76a9303ee896c9824f20c4d86bffa
|
||||
size 4886466168
|
||||
3
model-00002-of-00007.safetensors
Normal file
3
model-00002-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:20be3da12ab3c6589d6b3af13a38c510228a23ecdba94d7b50098e0f1a5a8994
|
||||
size 4832007448
|
||||
3
model-00003-of-00007.safetensors
Normal file
3
model-00003-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:989bb0f82e14a1c058235959196ae44eb540d74bde834f03ac2f7b4fb55c35e1
|
||||
size 4999813112
|
||||
3
model-00004-of-00007.safetensors
Normal file
3
model-00004-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:84038098f48e010d20fab863b79b51e0baee45841e59e43f647398922d34fdea
|
||||
size 4999813128
|
||||
3
model-00005-of-00007.safetensors
Normal file
3
model-00005-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:2ecf5f67e9892023f6cc1f238003f0aebc9f028679aaa7c300a5e4954f4d3696
|
||||
size 4832007496
|
||||
3
model-00006-of-00007.safetensors
Normal file
3
model-00006-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:c731e7a4ab8ee509aa5d8ed745786e80e1f51714c686e2b497d80c4e3ba1e869
|
||||
size 4999813120
|
||||
3
model-00007-of-00007.safetensors
Normal file
3
model-00007-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:0521fc5bec358b02f2d960bed501882f5369061f1f24abcd959b486131a9ab49
|
||||
size 2571158184
|
||||
298
model.safetensors.index.json
Normal file
298
model.safetensors.index.json
Normal file
@@ -0,0 +1,298 @@
|
||||
{
|
||||
"metadata": {
|
||||
"total_size": 32121044992
|
||||
},
|
||||
"weight_map": {
|
||||
"lm_head.weight": "model-00007-of-00007.safetensors",
|
||||
"model.embed_tokens.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.input_layernorm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.input_layernorm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.10.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.14.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.14.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.14.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.14.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.14.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.14.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.14.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.14.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.14.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.15.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.15.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.15.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.15.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.15.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.15.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.15.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.15.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.15.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.2.input_layernorm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.20.input_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.20.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.20.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.20.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.20.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.20.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.20.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.20.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.20.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.21.input_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.21.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.21.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.21.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.21.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.21.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.21.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.21.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.21.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.22.input_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.22.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.22.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.22.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.22.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.22.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.22.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.22.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.22.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.input_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.input_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.25.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.25.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.25.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.25.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.25.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.25.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.25.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.25.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.25.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.26.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.26.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.26.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.26.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.26.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.26.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.26.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.26.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.26.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.27.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.27.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.27.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.27.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.27.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.27.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.27.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.27.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.27.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.28.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.28.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.28.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.28.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.28.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.28.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.28.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.28.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.28.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.3.input_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.3.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.3.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.3.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.3.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.30.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.31.input_layernorm.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.31.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.31.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.31.mlp.up_proj.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.31.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.31.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.31.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.31.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.31.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.4.input_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.input_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.input_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.input_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.8.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.8.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.8.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.8.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.8.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.8.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.8.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.8.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.8.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.9.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.9.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.9.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.9.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.9.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.9.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.9.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.9.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.9.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.norm.weight": "model-00007-of-00007.safetensors"
|
||||
}
|
||||
}
|
||||
23
special_tokens_map.json
Normal file
23
special_tokens_map.json
Normal file
@@ -0,0 +1,23 @@
|
||||
{
|
||||
"bos_token": {
|
||||
"content": "<|begin_of_text|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"eos_token": {
|
||||
"content": "<|end_of_text|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": {
|
||||
"content": "<|end_of_text|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
3
tokenizer.json
Normal file
3
tokenizer.json
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:3c5cf44023714fb39b05e71e425f8d7b92805ff73f7988b083b8c87f0bf87393
|
||||
size 17209961
|
||||
2064
tokenizer_config.json
Normal file
2064
tokenizer_config.json
Normal file
File diff suppressed because it is too large
Load Diff
9
train_results.json
Normal file
9
train_results.json
Normal file
@@ -0,0 +1,9 @@
|
||||
{
|
||||
"epoch": 0.999244142101285,
|
||||
"total_flos": 0.0,
|
||||
"train_loss": 1.0886367675756723,
|
||||
"train_runtime": 1755.0349,
|
||||
"train_samples": 42336,
|
||||
"train_samples_per_second": 24.123,
|
||||
"train_steps_per_second": 0.377
|
||||
}
|
||||
12653
trainer_state.json
Normal file
12653
trainer_state.json
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user