4.1 KiB
4.1 KiB
| 1 | step | loss | grad_norm | learning_rate | num_tokens | completions/mean_length | completions/min_length | completions/max_length | completions/clipped_ratio | completions/mean_terminated_length | completions/min_terminated_length | completions/max_terminated_length | rewards/reward_total/mean | rewards/reward_total/std | rewards/reward_market/mean | rewards/reward_market/std | rewards/reward_warehouse/mean | rewards/reward_warehouse/std | rewards/reward_showroom/mean | rewards/reward_showroom/std | reward | reward_std | frac_reward_zero_std | sampling/sampling_logp_difference/mean | sampling/sampling_logp_difference/max | sampling/importance_sampling_ratio/min | sampling/importance_sampling_ratio/mean | sampling/importance_sampling_ratio/max | entropy | clip_ratio/low_mean | clip_ratio/low_min | clip_ratio/high_mean | clip_ratio/high_max | clip_ratio/region_mean | step_time | epoch | train_runtime | train_samples_per_second | train_steps_per_second | total_flos | train_loss |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 1 | -0.0 | 0.0 | 0.0 | 6155.0 | 18.5 | 13.0 | 24.0 | 0.0 | 18.5 | 13.0 | 24.0 | 0.8737499713897705 | 0.014071441255509853 | 0.6000000238418579 | 0.0 | 0.20000000298023224 | 0.0 | 0.07375000417232513 | 0.01407142635434866 | 0.8737499713897705 | 0.014071442186832428 | 0.0 | 4.520341396331787 | 27.174238204956055 | 4.035991810979052e-40 | 3.371377950408304e-34 | 6.742751768219281e-34 | 0.1811772882938385 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 4.618328085001849 | 0.0033333333333333335 | |||||
| 3 | 2 | 0.0 | 0.0 | 5.000000000000001e-07 | 13990.0 | 20.5 | 13.0 | 28.0 | 0.0 | 20.5 | 13.0 | 28.0 | 0.9020000100135803 | 0.04058792069554329 | 0.20000000298023224 | 0.0 | 0.6000000238418579 | 0.0 | 0.10199999809265137 | 0.04058793559670448 | 0.9020000100135803 | 0.04058792069554329 | 0.0 | 4.2938385009765625 | 26.646509170532227 | 2.0038568039844884e-43 | 9.08055335179305e-34 | 1.81611067035861e-33 | 0.11802829056978226 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 6.2618235270019795 | 0.006666666666666667 | |||||
| 4 | 3 | 0.0 | 0.0 | 1.0000000000000002e-06 | 27189.0 | 47.5 | 40.0 | 55.0 | 0.0 | 47.5 | 40.0 | 55.0 | 0.800000011920929 | 0.0 | 0.20000000298023224 | 0.0 | 0.6000000238418579 | 0.0 | 0.0 | 0.0 | 0.800000011920929 | 0.0 | 1.0 | 2.252011299133301 | 28.722728729248047 | 0.0 | 5.605193857299268e-45 | 1.2611686178923354e-44 | 0.12017613649368286 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 10.559301583001798 | 0.01 | |||||
| 5 | 4 | 0.0 | 0.0 | 1.5e-06 | 32537.0 | 16.0 | 16.0 | 16.0 | 0.0 | 16.0 | 16.0 | 16.0 | 0.8726999759674072 | 0.008343853987753391 | 0.20000000298023224 | 0.0 | 0.6000000238418579 | 0.0 | 0.07269999384880066 | 0.008343859575688839 | 0.8726999759674072 | 0.008343853987753391 | 0.0 | 6.424993515014648 | 30.274734497070312 | 1.401298464324817e-45 | 2.802596928649634e-45 | 2.802596928649634e-45 | 0.17401638627052307 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.9552808339976764 | 0.013333333333333334 | |||||
| 6 | 5 | -0.0 | 0.0 | 2.0000000000000003e-06 | 41360.0 | 32.0 | 12.0 | 52.0 | 0.0 | 32.0 | 12.0 | 52.0 | 0.8379999995231628 | 0.05374009534716606 | 0.6000000238418579 | 0.0 | 0.20000000298023224 | 0.0 | 0.03799999877810478 | 0.053740113973617554 | 0.8379999995231628 | 0.05374009534716606 | 0.0 | 3.4955575466156006 | 31.43513298034668 | 0.0 | 2.3879863577791495e-32 | 4.775972715558299e-32 | 0.10594719648361206 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 8.467385248000937 | 0.016666666666666666 | |||||
| 7 | 6 | 0.0 | 0.0 | 2.5e-06 | 53669.0 | 47.0 | 34.0 | 60.0 | 0.0 | 47.0 | 34.0 | 60.0 | 0.800000011920929 | 0.0 | 0.20000000298023224 | 0.0 | 0.6000000238418579 | 0.0 | 0.0 | 0.0 | 0.800000011920929 | 0.0 | 1.0 | 2.4156625270843506 | 28.952373504638672 | 0.0 | 1.090479293292958e-33 | 2.180958586585916e-33 | 0.1315789371728897 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 8.83436526999867 | 0.02 | |||||
| 8 | 7 | -0.0 | 0.0 | 3e-06 | 62446.0 | 26.0 | 13.0 | 39.0 | 0.0 | 26.0 | 13.0 | 39.0 | 0.8481500148773193 | 0.024678032845258713 | 0.20000000298023224 | 0.0 | 0.6000000238418579 | 0.0 | 0.04814999923110008 | 0.024678027257323265 | 0.8481500148773193 | 0.024678032845258713 | 0.0 | 3.9407880306243896 | 28.3719482421875 | 0.0 | 6.8739474988531065e-34 | 1.3747894997706213e-33 | 0.10619711875915527 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 8.326133659000334 | 0.023333333333333334 | |||||
| 9 | 8 | 0.0 | 0.0 | 3.5e-06 | 72064.0 | 29.5 | 19.0 | 40.0 | 0.0 | 29.5 | 19.0 | 40.0 | 0.8438500165939331 | 0.06201327219605446 | 0.20000000298023224 | 0.0 | 0.6000000238418579 | 0.0 | 0.04385000094771385 | 0.06201326474547386 | 0.8438500165939331 | 0.06201327219605446 | 0.0 | 3.4995782375335693 | 30.46342658996582 | 7.006492321624085e-45 | 8.407790785948902e-45 | 9.80908925027372e-45 | 0.11579056829214096 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 8.365506213998742 | 0.02666666666666667 | |||||
| 10 | 8 | 0.02666666666666667 | 91.2182 | 0.175 | 0.088 | 0.0 | -7.903189287664419e-34 |