初始化项目,由ModelHub XC社区提供模型

Model: rbelanec/train_mrpc_42_1776331557
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-03 10:20:24 +08:00
commit 7ceef409c6
17 changed files with 6309 additions and 0 deletions

36
.gitattributes vendored Normal file
View File

@@ -0,0 +1,36 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text

81
README.md Normal file
View File

@@ -0,0 +1,81 @@
---
library_name: transformers
license: llama3.2
base_model: meta-llama/Llama-3.2-1B-Instruct
tags:
- peft-factory
- full
- llama-factory
- generated_from_trainer
model-index:
- name: train_mrpc_42_1776331557
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# train_mrpc_42_1776331557
This model is a fine-tuned version of [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) on the mrpc dataset.
It achieves the following results on the evaluation set:
- Loss: 0.1084
- Num Input Tokens Seen: 1780000
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 5
### Training results
| Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| 0.1552 | 0.2518 | 104 | 0.1485 | 89600 |
| 0.2178 | 0.5036 | 208 | 0.1320 | 178688 |
| 0.1165 | 0.7554 | 312 | 0.1130 | 267968 |
| 0.1193 | 1.0073 | 416 | 0.1084 | 357488 |
| 0.0685 | 1.2591 | 520 | 0.1903 | 446896 |
| 0.0801 | 1.5109 | 624 | 0.1982 | 536176 |
| 0.2066 | 1.7627 | 728 | 0.1449 | 626992 |
| 0.0011 | 2.0145 | 832 | 0.2068 | 716344 |
| 0.0059 | 2.2663 | 936 | 0.2691 | 806712 |
| 0.0756 | 2.5182 | 1040 | 0.2895 | 895736 |
| 0.0001 | 2.7700 | 1144 | 0.2260 | 985592 |
| 0.0 | 3.0218 | 1248 | 0.2253 | 1074624 |
| 0.0 | 3.2736 | 1352 | 0.2578 | 1164544 |
| 0.0 | 3.5254 | 1456 | 0.2580 | 1253248 |
| 0.0 | 3.7772 | 1560 | 0.2703 | 1344000 |
| 0.0 | 4.0291 | 1664 | 0.2502 | 1432880 |
| 0.0001 | 4.2809 | 1768 | 0.2504 | 1522544 |
| 0.0 | 4.5327 | 1872 | 0.2489 | 1611760 |
| 0.0 | 4.7845 | 1976 | 0.2508 | 1702832 |
### Framework versions
- Transformers 4.51.3
- Pytorch 2.10.0+cu128
- Datasets 4.0.0
- Tokenizers 0.21.4

13
all_results.json Normal file
View File

@@ -0,0 +1,13 @@
{
"epoch": 5.0,
"eval_loss": 0.10842076689004898,
"eval_runtime": 0.6289,
"eval_samples_per_second": 583.581,
"eval_steps_per_second": 73.146,
"num_input_tokens_seen": 1780000,
"total_flos": 1.039320047616e+16,
"train_loss": 0.06261659951716346,
"train_runtime": 1141.6604,
"train_samples_per_second": 14.457,
"train_steps_per_second": 1.809
}

39
config.json Normal file
View File

@@ -0,0 +1,39 @@
{
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": [
128001,
128008,
128009
],
"head_dim": 64,
"hidden_act": "silu",
"hidden_size": 2048,
"initializer_range": 0.02,
"intermediate_size": 8192,
"max_position_embeddings": 131072,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 16,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": {
"factor": 32.0,
"high_freq_factor": 4.0,
"low_freq_factor": 1.0,
"original_max_position_embeddings": 8192,
"rope_type": "llama3"
},
"rope_theta": 500000.0,
"tie_word_embeddings": true,
"torch_dtype": "float32",
"transformers_version": "4.51.3",
"use_cache": false,
"vocab_size": 128256
}

8
eval_results.json Normal file
View File

@@ -0,0 +1,8 @@
{
"epoch": 5.0,
"eval_loss": 0.10842076689004898,
"eval_runtime": 0.6289,
"eval_samples_per_second": 583.581,
"eval_steps_per_second": 73.146,
"num_input_tokens_seen": 1780000
}

12
generation_config.json Normal file
View File

@@ -0,0 +1,12 @@
{
"bos_token_id": 128000,
"do_sample": true,
"eos_token_id": [
128001,
128008,
128009
],
"temperature": 0.6,
"top_p": 0.9,
"transformers_version": "4.51.3"
}

3
model.safetensors Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9b7af7ab14e0309187e05387160a099eb1b33ea1c3a9f9af496fbb6393ec06a7
size 4943274328

26
special_tokens_map.json Normal file
View File

@@ -0,0 +1,26 @@
{
"additional_special_tokens": [
{
"content": "<|eom_id|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
],
"bos_token": {
"content": "<|begin_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<|eot_id|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": "<|eot_id|>"
}

3
tokenizer.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6b9e4e7fb171f92fd137b777cc2714bf87d11576700a1dcd7a399e7bbe39537b
size 17209920

2069
tokenizer_config.json Normal file

File diff suppressed because it is too large Load Diff

55
train.yaml Normal file
View File

@@ -0,0 +1,55 @@
seed: 42
### model
model_name_or_path: meta-llama/Llama-3.2-1B-Instruct
trust_remote_code: true
flash_attn: auto
use_cache: false
### method
stage: sft
do_train: true
finetuning_type: full
### dataset
dataset: mrpc
template: llama3
cutoff_len: 2048
overwrite_cache: true
preprocessing_num_workers: 4
dataloader_num_workers: 4
packing: false
### output
output_dir: saves_bts_preliminary/base/llama-3.2-1b-instruct/train_mrpc_42_1776331557
logging_steps: 5
save_steps: 0.05
overwrite_output_dir: true
save_only_model: false
plot_loss: true
include_num_input_tokens_seen: true
push_to_hub: true
push_to_hub_organization: rbelanec
load_best_model_at_end: true
save_total_limit: 1
### train
per_device_train_batch_size: 8
learning_rate: 5.0e-6
num_train_epochs: 5
weight_decay: 1.0e-5
lr_scheduler_type: cosine
bf16: true
ddp_timeout: 180000000
resume_from_checkpoint: null
warmup_ratio: 0.1
optim: adamw_torch
report_to:
- wandb
run_name: base_llama-3.2-1b-instruct_train_mrpc_42_1776331557
### eval
per_device_eval_batch_size: 8
eval_strategy: steps
eval_steps: 0.05
val_size: 0.1

9
train_results.json Normal file
View File

@@ -0,0 +1,9 @@
{
"epoch": 5.0,
"num_input_tokens_seen": 1780000,
"total_flos": 1.039320047616e+16,
"train_loss": 0.06261659951716346,
"train_runtime": 1141.6604,
"train_samples_per_second": 14.457,
"train_steps_per_second": 1.809
}

433
trainer_log.jsonl Normal file
View File

@@ -0,0 +1,433 @@
{"current_steps": 5, "total_steps": 2065, "loss": 0.81, "lr": 9.661835748792271e-08, "epoch": 0.012106537530266344, "percentage": 0.24, "elapsed_time": "0:00:00", "remaining_time": "0:04:33", "throughput": 6549.56, "total_tokens": 4352}
{"current_steps": 10, "total_steps": 2065, "loss": 0.8, "lr": 2.173913043478261e-07, "epoch": 0.024213075060532687, "percentage": 0.48, "elapsed_time": "0:00:01", "remaining_time": "0:03:35", "throughput": 8373.14, "total_tokens": 8768}
{"current_steps": 15, "total_steps": 2065, "loss": 0.6335, "lr": 3.3816425120772945e-07, "epoch": 0.03631961259079903, "percentage": 0.73, "elapsed_time": "0:00:01", "remaining_time": "0:03:13", "throughput": 9173.69, "total_tokens": 12992}
{"current_steps": 20, "total_steps": 2065, "loss": 0.3717, "lr": 4.5893719806763294e-07, "epoch": 0.048426150121065374, "percentage": 0.97, "elapsed_time": "0:00:01", "remaining_time": "0:03:02", "throughput": 9704.71, "total_tokens": 17344}
{"current_steps": 25, "total_steps": 2065, "loss": 0.2438, "lr": 5.797101449275363e-07, "epoch": 0.06053268765133172, "percentage": 1.21, "elapsed_time": "0:00:02", "remaining_time": "0:02:56", "throughput": 10054.34, "total_tokens": 21696}
{"current_steps": 30, "total_steps": 2065, "loss": 0.2523, "lr": 7.004830917874397e-07, "epoch": 0.07263922518159806, "percentage": 1.45, "elapsed_time": "0:00:02", "remaining_time": "0:02:51", "throughput": 10328.87, "total_tokens": 26112}
{"current_steps": 35, "total_steps": 2065, "loss": 0.2223, "lr": 8.212560386473431e-07, "epoch": 0.0847457627118644, "percentage": 1.69, "elapsed_time": "0:00:02", "remaining_time": "0:02:47", "throughput": 10436.42, "total_tokens": 30208}
{"current_steps": 40, "total_steps": 2065, "loss": 0.2184, "lr": 9.420289855072465e-07, "epoch": 0.09685230024213075, "percentage": 1.94, "elapsed_time": "0:00:03", "remaining_time": "0:02:45", "throughput": 10618.68, "total_tokens": 34688}
{"current_steps": 45, "total_steps": 2065, "loss": 0.2163, "lr": 1.0628019323671499e-06, "epoch": 0.1089588377723971, "percentage": 2.18, "elapsed_time": "0:00:03", "remaining_time": "0:02:42", "throughput": 10689.52, "total_tokens": 38784}
{"current_steps": 50, "total_steps": 2065, "loss": 0.2198, "lr": 1.1835748792270531e-06, "epoch": 0.12106537530266344, "percentage": 2.42, "elapsed_time": "0:00:04", "remaining_time": "0:02:41", "throughput": 10794.69, "total_tokens": 43200}
{"current_steps": 55, "total_steps": 2065, "loss": 0.2224, "lr": 1.3043478260869566e-06, "epoch": 0.13317191283292978, "percentage": 2.66, "elapsed_time": "0:00:04", "remaining_time": "0:02:39", "throughput": 10823.52, "total_tokens": 47296}
{"current_steps": 60, "total_steps": 2065, "loss": 0.2272, "lr": 1.42512077294686e-06, "epoch": 0.14527845036319612, "percentage": 2.91, "elapsed_time": "0:00:04", "remaining_time": "0:02:38", "throughput": 10914.4, "total_tokens": 51712}
{"current_steps": 65, "total_steps": 2065, "loss": 0.1665, "lr": 1.5458937198067634e-06, "epoch": 0.15738498789346247, "percentage": 3.15, "elapsed_time": "0:00:05", "remaining_time": "0:02:37", "throughput": 10944.07, "total_tokens": 55872}
{"current_steps": 70, "total_steps": 2065, "loss": 0.1835, "lr": 1.6666666666666667e-06, "epoch": 0.1694915254237288, "percentage": 3.39, "elapsed_time": "0:00:05", "remaining_time": "0:02:35", "throughput": 10943.0, "total_tokens": 59840}
{"current_steps": 75, "total_steps": 2065, "loss": 0.1898, "lr": 1.7874396135265702e-06, "epoch": 0.18159806295399517, "percentage": 3.63, "elapsed_time": "0:00:05", "remaining_time": "0:02:34", "throughput": 10968.19, "total_tokens": 64000}
{"current_steps": 80, "total_steps": 2065, "loss": 0.2149, "lr": 1.9082125603864736e-06, "epoch": 0.1937046004842615, "percentage": 3.87, "elapsed_time": "0:00:06", "remaining_time": "0:02:33", "throughput": 11018.77, "total_tokens": 68352}
{"current_steps": 85, "total_steps": 2065, "loss": 0.1519, "lr": 2.028985507246377e-06, "epoch": 0.20581113801452786, "percentage": 4.12, "elapsed_time": "0:00:06", "remaining_time": "0:02:33", "throughput": 11068.81, "total_tokens": 72768}
{"current_steps": 90, "total_steps": 2065, "loss": 0.1468, "lr": 2.1497584541062806e-06, "epoch": 0.2179176755447942, "percentage": 4.36, "elapsed_time": "0:00:06", "remaining_time": "0:02:32", "throughput": 11103.7, "total_tokens": 77120}
{"current_steps": 95, "total_steps": 2065, "loss": 0.2277, "lr": 2.270531400966184e-06, "epoch": 0.23002421307506055, "percentage": 4.6, "elapsed_time": "0:00:07", "remaining_time": "0:02:32", "throughput": 11131.81, "total_tokens": 81664}
{"current_steps": 100, "total_steps": 2065, "loss": 0.1552, "lr": 2.391304347826087e-06, "epoch": 0.24213075060532688, "percentage": 4.84, "elapsed_time": "0:00:07", "remaining_time": "0:02:31", "throughput": 11169.26, "total_tokens": 86080}
{"current_steps": 104, "total_steps": 2065, "eval_loss": 0.1484687179327011, "epoch": 0.25181598062953997, "percentage": 5.04, "elapsed_time": "0:00:08", "remaining_time": "0:02:41", "throughput": 10430.75, "total_tokens": 89600}
{"current_steps": 105, "total_steps": 2065, "loss": 0.1673, "lr": 2.5120772946859904e-06, "epoch": 0.2542372881355932, "percentage": 5.08, "elapsed_time": "0:01:19", "remaining_time": "0:24:48", "throughput": 1133.88, "total_tokens": 90432}
{"current_steps": 110, "total_steps": 2065, "loss": 0.1694, "lr": 2.632850241545894e-06, "epoch": 0.26634382566585957, "percentage": 5.33, "elapsed_time": "0:01:20", "remaining_time": "0:23:43", "throughput": 1179.82, "total_tokens": 94528}
{"current_steps": 115, "total_steps": 2065, "loss": 0.1627, "lr": 2.7536231884057974e-06, "epoch": 0.2784503631961259, "percentage": 5.57, "elapsed_time": "0:01:20", "remaining_time": "0:22:44", "throughput": 1227.67, "total_tokens": 98816}
{"current_steps": 120, "total_steps": 2065, "loss": 0.2205, "lr": 2.8743961352657007e-06, "epoch": 0.29055690072639223, "percentage": 5.81, "elapsed_time": "0:01:20", "remaining_time": "0:21:50", "throughput": 1275.15, "total_tokens": 103104}
{"current_steps": 125, "total_steps": 2065, "loss": 0.1841, "lr": 2.995169082125604e-06, "epoch": 0.3026634382566586, "percentage": 6.05, "elapsed_time": "0:01:21", "remaining_time": "0:21:00", "throughput": 1321.41, "total_tokens": 107328}
{"current_steps": 130, "total_steps": 2065, "loss": 0.1779, "lr": 3.1159420289855073e-06, "epoch": 0.31476997578692495, "percentage": 6.3, "elapsed_time": "0:01:21", "remaining_time": "0:20:14", "throughput": 1366.46, "total_tokens": 111488}
{"current_steps": 135, "total_steps": 2065, "loss": 0.158, "lr": 3.236714975845411e-06, "epoch": 0.3268765133171913, "percentage": 6.54, "elapsed_time": "0:01:21", "remaining_time": "0:19:31", "throughput": 1414.94, "total_tokens": 115968}
{"current_steps": 140, "total_steps": 2065, "loss": 0.2089, "lr": 3.3574879227053142e-06, "epoch": 0.3389830508474576, "percentage": 6.78, "elapsed_time": "0:01:22", "remaining_time": "0:18:51", "throughput": 1459.93, "total_tokens": 120192}
{"current_steps": 145, "total_steps": 2065, "loss": 0.1007, "lr": 3.4782608695652175e-06, "epoch": 0.35108958837772397, "percentage": 7.02, "elapsed_time": "0:01:22", "remaining_time": "0:18:14", "throughput": 1504.53, "total_tokens": 124416}
{"current_steps": 150, "total_steps": 2065, "loss": 0.2718, "lr": 3.5990338164251208e-06, "epoch": 0.36319612590799033, "percentage": 7.26, "elapsed_time": "0:01:23", "remaining_time": "0:17:40", "throughput": 1550.99, "total_tokens": 128832}
{"current_steps": 155, "total_steps": 2065, "loss": 0.3704, "lr": 3.7198067632850245e-06, "epoch": 0.37530266343825663, "percentage": 7.51, "elapsed_time": "0:01:23", "remaining_time": "0:17:08", "throughput": 1594.12, "total_tokens": 132992}
{"current_steps": 160, "total_steps": 2065, "loss": 0.1945, "lr": 3.840579710144928e-06, "epoch": 0.387409200968523, "percentage": 7.75, "elapsed_time": "0:01:23", "remaining_time": "0:16:37", "throughput": 1638.27, "total_tokens": 137280}
{"current_steps": 165, "total_steps": 2065, "loss": 0.253, "lr": 3.961352657004831e-06, "epoch": 0.39951573849878935, "percentage": 7.99, "elapsed_time": "0:01:24", "remaining_time": "0:16:09", "throughput": 1682.08, "total_tokens": 141568}
{"current_steps": 170, "total_steps": 2065, "loss": 0.1622, "lr": 4.082125603864734e-06, "epoch": 0.4116222760290557, "percentage": 8.23, "elapsed_time": "0:01:24", "remaining_time": "0:15:42", "throughput": 1727.0, "total_tokens": 145984}
{"current_steps": 175, "total_steps": 2065, "loss": 0.1974, "lr": 4.202898550724638e-06, "epoch": 0.423728813559322, "percentage": 8.47, "elapsed_time": "0:01:24", "remaining_time": "0:15:16", "throughput": 1768.6, "total_tokens": 150144}
{"current_steps": 180, "total_steps": 2065, "loss": 0.2191, "lr": 4.323671497584541e-06, "epoch": 0.4358353510895884, "percentage": 8.72, "elapsed_time": "0:01:25", "remaining_time": "0:14:52", "throughput": 1813.49, "total_tokens": 154624}
{"current_steps": 185, "total_steps": 2065, "loss": 0.2192, "lr": 4.444444444444444e-06, "epoch": 0.44794188861985473, "percentage": 8.96, "elapsed_time": "0:01:25", "remaining_time": "0:14:30", "throughput": 1854.32, "total_tokens": 158784}
{"current_steps": 190, "total_steps": 2065, "loss": 0.1887, "lr": 4.565217391304348e-06, "epoch": 0.4600484261501211, "percentage": 9.2, "elapsed_time": "0:01:25", "remaining_time": "0:14:08", "throughput": 1896.22, "total_tokens": 163072}
{"current_steps": 195, "total_steps": 2065, "loss": 0.1951, "lr": 4.6859903381642516e-06, "epoch": 0.4721549636803874, "percentage": 9.44, "elapsed_time": "0:01:26", "remaining_time": "0:13:48", "throughput": 1934.98, "total_tokens": 167104}
{"current_steps": 200, "total_steps": 2065, "loss": 0.1486, "lr": 4.806763285024155e-06, "epoch": 0.48426150121065376, "percentage": 9.69, "elapsed_time": "0:01:26", "remaining_time": "0:13:28", "throughput": 1976.89, "total_tokens": 171456}
{"current_steps": 205, "total_steps": 2065, "loss": 0.2178, "lr": 4.927536231884059e-06, "epoch": 0.4963680387409201, "percentage": 9.93, "elapsed_time": "0:01:27", "remaining_time": "0:13:10", "throughput": 2018.51, "total_tokens": 175808}
{"current_steps": 208, "total_steps": 2065, "eval_loss": 0.1319892704486847, "epoch": 0.5036319612590799, "percentage": 10.07, "elapsed_time": "0:01:27", "remaining_time": "0:13:04", "throughput": 2032.59, "total_tokens": 178688}
{"current_steps": 210, "total_steps": 2065, "loss": 0.1052, "lr": 4.999985705205496e-06, "epoch": 0.5084745762711864, "percentage": 10.17, "elapsed_time": "0:02:03", "remaining_time": "0:18:13", "throughput": 1456.18, "total_tokens": 180224}
{"current_steps": 215, "total_steps": 2065, "loss": 0.1655, "lr": 4.999824890644693e-06, "epoch": 0.5205811138014528, "percentage": 10.41, "elapsed_time": "0:02:04", "remaining_time": "0:17:48", "throughput": 1487.91, "total_tokens": 184704}
{"current_steps": 220, "total_steps": 2065, "loss": 0.3684, "lr": 4.999485404562269e-06, "epoch": 0.5326876513317191, "percentage": 10.65, "elapsed_time": "0:02:04", "remaining_time": "0:17:24", "throughput": 1519.45, "total_tokens": 189184}
{"current_steps": 225, "total_steps": 2065, "loss": 0.1527, "lr": 4.998967271222521e-06, "epoch": 0.5447941888619855, "percentage": 10.9, "elapsed_time": "0:02:04", "remaining_time": "0:17:01", "throughput": 1549.8, "total_tokens": 193536}
{"current_steps": 230, "total_steps": 2065, "loss": 0.1238, "lr": 4.998270527658311e-06, "epoch": 0.5569007263922519, "percentage": 11.14, "elapsed_time": "0:02:05", "remaining_time": "0:16:39", "throughput": 1579.99, "total_tokens": 197888}
{"current_steps": 235, "total_steps": 2065, "loss": 0.2147, "lr": 4.997395223668422e-06, "epoch": 0.5690072639225182, "percentage": 11.38, "elapsed_time": "0:02:05", "remaining_time": "0:16:18", "throughput": 1608.93, "total_tokens": 202112}
{"current_steps": 240, "total_steps": 2065, "loss": 0.1162, "lr": 4.996341421813993e-06, "epoch": 0.5811138014527845, "percentage": 11.62, "elapsed_time": "0:02:05", "remaining_time": "0:15:58", "throughput": 1639.23, "total_tokens": 206528}
{"current_steps": 245, "total_steps": 2065, "loss": 0.1311, "lr": 4.995109197414051e-06, "epoch": 0.5932203389830508, "percentage": 11.86, "elapsed_time": "0:02:06", "remaining_time": "0:15:38", "throughput": 1669.38, "total_tokens": 210944}
{"current_steps": 250, "total_steps": 2065, "loss": 0.1437, "lr": 4.9936986385401305e-06, "epoch": 0.6053268765133172, "percentage": 12.11, "elapsed_time": "0:02:06", "remaining_time": "0:15:20", "throughput": 1697.32, "total_tokens": 215104}
{"current_steps": 255, "total_steps": 2065, "loss": 0.1597, "lr": 4.992109846009972e-06, "epoch": 0.6174334140435835, "percentage": 12.35, "elapsed_time": "0:02:07", "remaining_time": "0:15:02", "throughput": 1725.62, "total_tokens": 219328}
{"current_steps": 260, "total_steps": 2065, "loss": 0.1878, "lr": 4.990342933380321e-06, "epoch": 0.6295399515738499, "percentage": 12.59, "elapsed_time": "0:02:07", "remaining_time": "0:14:44", "throughput": 1754.71, "total_tokens": 223680}
{"current_steps": 265, "total_steps": 2065, "loss": 0.1445, "lr": 4.988398026938811e-06, "epoch": 0.6416464891041163, "percentage": 12.83, "elapsed_time": "0:02:07", "remaining_time": "0:14:28", "throughput": 1782.69, "total_tokens": 227904}
{"current_steps": 270, "total_steps": 2065, "loss": 0.0992, "lr": 4.986275265694935e-06, "epoch": 0.6537530266343826, "percentage": 13.08, "elapsed_time": "0:02:08", "remaining_time": "0:14:12", "throughput": 1808.96, "total_tokens": 231936}
{"current_steps": 275, "total_steps": 2065, "loss": 0.0608, "lr": 4.983974801370115e-06, "epoch": 0.6658595641646489, "percentage": 13.32, "elapsed_time": "0:02:08", "remaining_time": "0:13:57", "throughput": 1836.54, "total_tokens": 236160}
{"current_steps": 280, "total_steps": 2065, "loss": 0.2262, "lr": 4.981496798386849e-06, "epoch": 0.6779661016949152, "percentage": 13.56, "elapsed_time": "0:02:08", "remaining_time": "0:13:42", "throughput": 1863.56, "total_tokens": 240320}
{"current_steps": 285, "total_steps": 2065, "loss": 0.1165, "lr": 4.9788414338569715e-06, "epoch": 0.6900726392251816, "percentage": 13.8, "elapsed_time": "0:02:09", "remaining_time": "0:13:27", "throughput": 1892.7, "total_tokens": 244800}
{"current_steps": 290, "total_steps": 2065, "loss": 0.2377, "lr": 4.9760088975689815e-06, "epoch": 0.7021791767554479, "percentage": 14.04, "elapsed_time": "0:02:09", "remaining_time": "0:13:13", "throughput": 1920.82, "total_tokens": 249152}
{"current_steps": 295, "total_steps": 2065, "loss": 0.1377, "lr": 4.972999391974488e-06, "epoch": 0.7142857142857143, "percentage": 14.29, "elapsed_time": "0:02:10", "remaining_time": "0:13:00", "throughput": 1947.89, "total_tokens": 253376}
{"current_steps": 300, "total_steps": 2065, "loss": 0.19, "lr": 4.969813132173735e-06, "epoch": 0.7263922518159807, "percentage": 14.53, "elapsed_time": "0:02:10", "remaining_time": "0:12:47", "throughput": 1975.26, "total_tokens": 257664}
{"current_steps": 305, "total_steps": 2065, "loss": 0.1146, "lr": 4.966450345900229e-06, "epoch": 0.738498789346247, "percentage": 14.77, "elapsed_time": "0:02:10", "remaining_time": "0:12:34", "throughput": 2002.92, "total_tokens": 262016}
{"current_steps": 310, "total_steps": 2065, "loss": 0.1165, "lr": 4.962911273504461e-06, "epoch": 0.7506053268765133, "percentage": 15.01, "elapsed_time": "0:02:11", "remaining_time": "0:12:22", "throughput": 2030.98, "total_tokens": 266432}
{"current_steps": 312, "total_steps": 2065, "eval_loss": 0.11303775012493134, "epoch": 0.7554479418886199, "percentage": 15.11, "elapsed_time": "0:02:13", "remaining_time": "0:12:27", "throughput": 2014.4, "total_tokens": 267968}
{"current_steps": 315, "total_steps": 2065, "loss": 0.181, "lr": 4.959196167936729e-06, "epoch": 0.7627118644067796, "percentage": 15.25, "elapsed_time": "0:03:02", "remaining_time": "0:16:54", "throughput": 1480.79, "total_tokens": 270464}
{"current_steps": 320, "total_steps": 2065, "loss": 0.0946, "lr": 4.955305294729056e-06, "epoch": 0.774818401937046, "percentage": 15.5, "elapsed_time": "0:03:03", "remaining_time": "0:16:38", "throughput": 1500.88, "total_tokens": 274688}
{"current_steps": 325, "total_steps": 2065, "loss": 0.1293, "lr": 4.9512389319762165e-06, "epoch": 0.7869249394673123, "percentage": 15.74, "elapsed_time": "0:03:03", "remaining_time": "0:16:21", "throughput": 1520.53, "total_tokens": 278848}
{"current_steps": 330, "total_steps": 2065, "loss": 0.124, "lr": 4.946997370315857e-06, "epoch": 0.7990314769975787, "percentage": 15.98, "elapsed_time": "0:03:03", "remaining_time": "0:16:06", "throughput": 1540.78, "total_tokens": 283136}
{"current_steps": 335, "total_steps": 2065, "loss": 0.1767, "lr": 4.9425809129077204e-06, "epoch": 0.8111380145278451, "percentage": 16.22, "elapsed_time": "0:03:04", "remaining_time": "0:15:50", "throughput": 1562.31, "total_tokens": 287680}
{"current_steps": 340, "total_steps": 2065, "loss": 0.0811, "lr": 4.937989875411986e-06, "epoch": 0.8232445520581114, "percentage": 16.46, "elapsed_time": "0:03:04", "remaining_time": "0:15:36", "throughput": 1583.75, "total_tokens": 292224}
{"current_steps": 345, "total_steps": 2065, "loss": 0.1567, "lr": 4.933224585966696e-06, "epoch": 0.8353510895883777, "percentage": 16.71, "elapsed_time": "0:03:04", "remaining_time": "0:15:21", "throughput": 1603.4, "total_tokens": 296448}
{"current_steps": 350, "total_steps": 2065, "loss": 0.1363, "lr": 4.928285385164316e-06, "epoch": 0.847457627118644, "percentage": 16.95, "elapsed_time": "0:03:05", "remaining_time": "0:15:07", "throughput": 1623.29, "total_tokens": 300736}
{"current_steps": 355, "total_steps": 2065, "loss": 0.1348, "lr": 4.92317262602738e-06, "epoch": 0.8595641646489104, "percentage": 17.19, "elapsed_time": "0:03:05", "remaining_time": "0:14:54", "throughput": 1642.76, "total_tokens": 304960}
{"current_steps": 360, "total_steps": 2065, "loss": 0.1694, "lr": 4.917886673983267e-06, "epoch": 0.8716707021791767, "percentage": 17.43, "elapsed_time": "0:03:06", "remaining_time": "0:14:40", "throughput": 1662.16, "total_tokens": 309184}
{"current_steps": 365, "total_steps": 2065, "loss": 0.1352, "lr": 4.912427906838079e-06, "epoch": 0.8837772397094431, "percentage": 17.68, "elapsed_time": "0:03:06", "remaining_time": "0:14:28", "throughput": 1681.51, "total_tokens": 313408}
{"current_steps": 370, "total_steps": 2065, "loss": 0.0933, "lr": 4.906796714749635e-06, "epoch": 0.8958837772397095, "percentage": 17.92, "elapsed_time": "0:03:06", "remaining_time": "0:14:15", "throughput": 1702.11, "total_tokens": 317888}
{"current_steps": 375, "total_steps": 2065, "loss": 0.1488, "lr": 4.900993500199591e-06, "epoch": 0.9079903147699758, "percentage": 18.16, "elapsed_time": "0:03:07", "remaining_time": "0:14:03", "throughput": 1720.96, "total_tokens": 322048}
{"current_steps": 380, "total_steps": 2065, "loss": 0.087, "lr": 4.895018677964669e-06, "epoch": 0.9200968523002422, "percentage": 18.4, "elapsed_time": "0:03:07", "remaining_time": "0:13:51", "throughput": 1741.7, "total_tokens": 326592}
{"current_steps": 385, "total_steps": 2065, "loss": 0.1017, "lr": 4.888872675087012e-06, "epoch": 0.9322033898305084, "percentage": 18.64, "elapsed_time": "0:03:07", "remaining_time": "0:13:39", "throughput": 1761.07, "total_tokens": 330880}
{"current_steps": 390, "total_steps": 2065, "loss": 0.1105, "lr": 4.882555930843664e-06, "epoch": 0.9443099273607748, "percentage": 18.89, "elapsed_time": "0:03:08", "remaining_time": "0:13:28", "throughput": 1780.03, "total_tokens": 335104}
{"current_steps": 395, "total_steps": 2065, "loss": 0.1437, "lr": 4.876068896715171e-06, "epoch": 0.9564164648910412, "percentage": 19.13, "elapsed_time": "0:03:08", "remaining_time": "0:13:17", "throughput": 1799.27, "total_tokens": 339392}
{"current_steps": 400, "total_steps": 2065, "loss": 0.146, "lr": 4.8694120363533105e-06, "epoch": 0.9685230024213075, "percentage": 19.37, "elapsed_time": "0:03:09", "remaining_time": "0:13:06", "throughput": 1818.74, "total_tokens": 343744}
{"current_steps": 405, "total_steps": 2065, "loss": 0.0985, "lr": 4.862585825547957e-06, "epoch": 0.9806295399515739, "percentage": 19.61, "elapsed_time": "0:03:09", "remaining_time": "0:12:56", "throughput": 1838.45, "total_tokens": 348160}
{"current_steps": 410, "total_steps": 2065, "loss": 0.116, "lr": 4.855590752193075e-06, "epoch": 0.9927360774818402, "percentage": 19.85, "elapsed_time": "0:03:09", "remaining_time": "0:12:45", "throughput": 1857.49, "total_tokens": 352448}
{"current_steps": 415, "total_steps": 2065, "loss": 0.1193, "lr": 4.848427316251843e-06, "epoch": 1.0048426150121066, "percentage": 20.1, "elapsed_time": "0:03:10", "remaining_time": "0:12:36", "throughput": 1875.08, "total_tokens": 356656}
{"current_steps": 416, "total_steps": 2065, "eval_loss": 0.10842076689004898, "epoch": 1.0072639225181599, "percentage": 20.15, "elapsed_time": "0:03:10", "remaining_time": "0:12:36", "throughput": 1872.78, "total_tokens": 357488}
{"current_steps": 420, "total_steps": 2065, "loss": 0.073, "lr": 4.841096029720921e-06, "epoch": 1.0169491525423728, "percentage": 20.34, "elapsed_time": "0:04:43", "remaining_time": "0:18:31", "throughput": 1271.96, "total_tokens": 360880}
{"current_steps": 425, "total_steps": 2065, "loss": 0.0535, "lr": 4.833597416593861e-06, "epoch": 1.0290556900726393, "percentage": 20.58, "elapsed_time": "0:04:44", "remaining_time": "0:18:16", "throughput": 1285.18, "total_tokens": 365104}
{"current_steps": 430, "total_steps": 2065, "loss": 0.1458, "lr": 4.825932012823652e-06, "epoch": 1.0411622276029056, "percentage": 20.82, "elapsed_time": "0:04:44", "remaining_time": "0:18:01", "throughput": 1299.92, "total_tokens": 369776}
{"current_steps": 435, "total_steps": 2065, "loss": 0.1602, "lr": 4.818100366284408e-06, "epoch": 1.053268765133172, "percentage": 21.07, "elapsed_time": "0:04:44", "remaining_time": "0:17:47", "throughput": 1313.06, "total_tokens": 374000}
{"current_steps": 440, "total_steps": 2065, "loss": 0.2577, "lr": 4.81010303673222e-06, "epoch": 1.0653753026634383, "percentage": 21.31, "elapsed_time": "0:04:45", "remaining_time": "0:17:33", "throughput": 1325.75, "total_tokens": 378096}
{"current_steps": 445, "total_steps": 2065, "loss": 0.0566, "lr": 4.80194059576514e-06, "epoch": 1.0774818401937045, "percentage": 21.55, "elapsed_time": "0:04:45", "remaining_time": "0:17:19", "throughput": 1338.63, "total_tokens": 382256}
{"current_steps": 450, "total_steps": 2065, "loss": 0.1761, "lr": 4.793613626782331e-06, "epoch": 1.089588377723971, "percentage": 21.79, "elapsed_time": "0:04:45", "remaining_time": "0:17:06", "throughput": 1352.35, "total_tokens": 386672}
{"current_steps": 455, "total_steps": 2065, "loss": 0.0591, "lr": 4.785122724942367e-06, "epoch": 1.1016949152542372, "percentage": 22.03, "elapsed_time": "0:04:46", "remaining_time": "0:16:53", "throughput": 1365.58, "total_tokens": 390960}
{"current_steps": 460, "total_steps": 2065, "loss": 0.0952, "lr": 4.7764684971206974e-06, "epoch": 1.1138014527845037, "percentage": 22.28, "elapsed_time": "0:04:46", "remaining_time": "0:16:40", "throughput": 1379.44, "total_tokens": 395440}
{"current_steps": 465, "total_steps": 2065, "loss": 0.0664, "lr": 4.767651561866269e-06, "epoch": 1.12590799031477, "percentage": 22.52, "elapsed_time": "0:04:47", "remaining_time": "0:16:27", "throughput": 1392.16, "total_tokens": 399600}
{"current_steps": 470, "total_steps": 2065, "loss": 0.1001, "lr": 4.758672549357316e-06, "epoch": 1.1380145278450362, "percentage": 22.76, "elapsed_time": "0:04:47", "remaining_time": "0:16:15", "throughput": 1405.3, "total_tokens": 403888}
{"current_steps": 475, "total_steps": 2065, "loss": 0.2506, "lr": 4.7495321013563225e-06, "epoch": 1.1501210653753027, "percentage": 23.0, "elapsed_time": "0:04:47", "remaining_time": "0:16:03", "throughput": 1418.41, "total_tokens": 408176}
{"current_steps": 480, "total_steps": 2065, "loss": 0.044, "lr": 4.740230871164148e-06, "epoch": 1.162227602905569, "percentage": 23.24, "elapsed_time": "0:04:48", "remaining_time": "0:15:51", "throughput": 1430.62, "total_tokens": 412208}
{"current_steps": 485, "total_steps": 2065, "loss": 0.1472, "lr": 4.730769523573337e-06, "epoch": 1.1743341404358354, "percentage": 23.49, "elapsed_time": "0:04:48", "remaining_time": "0:15:39", "throughput": 1444.09, "total_tokens": 416624}
{"current_steps": 490, "total_steps": 2065, "loss": 0.1661, "lr": 4.721148734820605e-06, "epoch": 1.1864406779661016, "percentage": 23.73, "elapsed_time": "0:04:48", "remaining_time": "0:15:28", "throughput": 1457.53, "total_tokens": 421040}
{"current_steps": 495, "total_steps": 2065, "loss": 0.094, "lr": 4.711369192538503e-06, "epoch": 1.1985472154963681, "percentage": 23.97, "elapsed_time": "0:04:49", "remaining_time": "0:15:17", "throughput": 1469.85, "total_tokens": 425136}
{"current_steps": 500, "total_steps": 2065, "loss": 0.1282, "lr": 4.701431595706269e-06, "epoch": 1.2106537530266344, "percentage": 24.21, "elapsed_time": "0:04:49", "remaining_time": "0:15:06", "throughput": 1483.63, "total_tokens": 429680}
{"current_steps": 505, "total_steps": 2065, "loss": 0.0874, "lr": 4.691336654599873e-06, "epoch": 1.2227602905569008, "percentage": 24.46, "elapsed_time": "0:04:49", "remaining_time": "0:14:55", "throughput": 1497.39, "total_tokens": 434224}
{"current_steps": 510, "total_steps": 2065, "loss": 0.0403, "lr": 4.6810850907412486e-06, "epoch": 1.234866828087167, "percentage": 24.7, "elapsed_time": "0:04:50", "remaining_time": "0:14:45", "throughput": 1509.6, "total_tokens": 438320}
{"current_steps": 515, "total_steps": 2065, "loss": 0.0227, "lr": 4.6706776368467236e-06, "epoch": 1.2469733656174333, "percentage": 24.94, "elapsed_time": "0:04:50", "remaining_time": "0:14:34", "throughput": 1522.66, "total_tokens": 442672}
{"current_steps": 520, "total_steps": 2065, "loss": 0.0685, "lr": 4.6601150367746485e-06, "epoch": 1.2590799031476998, "percentage": 25.18, "elapsed_time": "0:04:51", "remaining_time": "0:14:24", "throughput": 1535.25, "total_tokens": 446896}
{"current_steps": 520, "total_steps": 2065, "eval_loss": 0.19028596580028534, "epoch": 1.2590799031476998, "percentage": 25.18, "elapsed_time": "0:04:51", "remaining_time": "0:14:27", "throughput": 1531.47, "total_tokens": 446896}
{"current_steps": 525, "total_steps": 2065, "loss": 0.1008, "lr": 4.649398045472235e-06, "epoch": 1.271186440677966, "percentage": 25.42, "elapsed_time": "0:05:25", "remaining_time": "0:15:55", "throughput": 1385.14, "total_tokens": 451312}
{"current_steps": 530, "total_steps": 2065, "loss": 0.3076, "lr": 4.638527428921592e-06, "epoch": 1.2832929782082325, "percentage": 25.67, "elapsed_time": "0:05:26", "remaining_time": "0:15:44", "throughput": 1396.15, "total_tokens": 455408}
{"current_steps": 535, "total_steps": 2065, "loss": 0.0462, "lr": 4.627503964084981e-06, "epoch": 1.2953995157384988, "percentage": 25.91, "elapsed_time": "0:05:26", "remaining_time": "0:15:33", "throughput": 1408.87, "total_tokens": 460080}
{"current_steps": 540, "total_steps": 2065, "loss": 0.0124, "lr": 4.616328438849284e-06, "epoch": 1.307506053268765, "percentage": 26.15, "elapsed_time": "0:05:26", "remaining_time": "0:15:23", "throughput": 1420.78, "total_tokens": 464496}
{"current_steps": 545, "total_steps": 2065, "loss": 0.1408, "lr": 4.605001651969686e-06, "epoch": 1.3196125907990315, "percentage": 26.39, "elapsed_time": "0:05:27", "remaining_time": "0:15:12", "throughput": 1432.09, "total_tokens": 468720}
{"current_steps": 550, "total_steps": 2065, "loss": 0.115, "lr": 4.5935244130125925e-06, "epoch": 1.331719128329298, "percentage": 26.63, "elapsed_time": "0:05:27", "remaining_time": "0:15:02", "throughput": 1444.33, "total_tokens": 473264}
{"current_steps": 555, "total_steps": 2065, "loss": 0.0061, "lr": 4.581897542297761e-06, "epoch": 1.3438256658595642, "percentage": 26.88, "elapsed_time": "0:05:28", "remaining_time": "0:14:52", "throughput": 1455.77, "total_tokens": 477552}
{"current_steps": 560, "total_steps": 2065, "loss": 0.0843, "lr": 4.570121870839671e-06, "epoch": 1.3559322033898304, "percentage": 27.12, "elapsed_time": "0:05:28", "remaining_time": "0:14:42", "throughput": 1467.77, "total_tokens": 482032}
{"current_steps": 565, "total_steps": 2065, "loss": 0.0764, "lr": 4.558198240288131e-06, "epoch": 1.368038740920097, "percentage": 27.36, "elapsed_time": "0:05:28", "remaining_time": "0:14:32", "throughput": 1479.35, "total_tokens": 486384}
{"current_steps": 570, "total_steps": 2065, "loss": 0.1836, "lr": 4.5461275028681186e-06, "epoch": 1.3801452784503632, "percentage": 27.6, "elapsed_time": "0:05:29", "remaining_time": "0:14:23", "throughput": 1490.72, "total_tokens": 490672}
{"current_steps": 575, "total_steps": 2065, "loss": 0.1097, "lr": 4.533910521318872e-06, "epoch": 1.3922518159806296, "percentage": 27.85, "elapsed_time": "0:05:29", "remaining_time": "0:14:13", "throughput": 1502.06, "total_tokens": 494960}
{"current_steps": 580, "total_steps": 2065, "loss": 0.1144, "lr": 4.521548168832227e-06, "epoch": 1.4043583535108959, "percentage": 28.09, "elapsed_time": "0:05:29", "remaining_time": "0:14:04", "throughput": 1512.99, "total_tokens": 499120}
{"current_steps": 585, "total_steps": 2065, "loss": 0.0169, "lr": 4.509041328990204e-06, "epoch": 1.4164648910411621, "percentage": 28.33, "elapsed_time": "0:05:30", "remaining_time": "0:13:55", "throughput": 1524.3, "total_tokens": 503408}
{"current_steps": 590, "total_steps": 2065, "loss": 0.0424, "lr": 4.496390895701858e-06, "epoch": 1.4285714285714286, "percentage": 28.57, "elapsed_time": "0:05:30", "remaining_time": "0:13:46", "throughput": 1534.45, "total_tokens": 507312}
{"current_steps": 595, "total_steps": 2065, "loss": 0.053, "lr": 4.483597773139387e-06, "epoch": 1.4406779661016949, "percentage": 28.81, "elapsed_time": "0:05:30", "remaining_time": "0:13:37", "throughput": 1545.71, "total_tokens": 511600}
{"current_steps": 600, "total_steps": 2065, "loss": 0.0615, "lr": 4.470662875673506e-06, "epoch": 1.4527845036319613, "percentage": 29.06, "elapsed_time": "0:05:31", "remaining_time": "0:13:29", "throughput": 1556.94, "total_tokens": 515888}
{"current_steps": 605, "total_steps": 2065, "loss": 0.2071, "lr": 4.4575871278080964e-06, "epoch": 1.4648910411622276, "percentage": 29.3, "elapsed_time": "0:05:31", "remaining_time": "0:13:20", "throughput": 1567.38, "total_tokens": 519920}
{"current_steps": 610, "total_steps": 2065, "loss": 0.0688, "lr": 4.444371464114126e-06, "epoch": 1.4769975786924938, "percentage": 29.54, "elapsed_time": "0:05:32", "remaining_time": "0:13:12", "throughput": 1578.93, "total_tokens": 524336}
{"current_steps": 615, "total_steps": 2065, "loss": 0.071, "lr": 4.431016829162851e-06, "epoch": 1.4891041162227603, "percentage": 29.78, "elapsed_time": "0:05:32", "remaining_time": "0:13:03", "throughput": 1589.7, "total_tokens": 528496}
{"current_steps": 620, "total_steps": 2065, "loss": 0.0801, "lr": 4.417524177458309e-06, "epoch": 1.5012106537530268, "percentage": 30.02, "elapsed_time": "0:05:32", "remaining_time": "0:12:55", "throughput": 1600.84, "total_tokens": 532784}
{"current_steps": 624, "total_steps": 2065, "eval_loss": 0.1981746405363083, "epoch": 1.5108958837772397, "percentage": 30.22, "elapsed_time": "0:05:33", "remaining_time": "0:12:50", "throughput": 1606.65, "total_tokens": 536176}
{"current_steps": 625, "total_steps": 2065, "loss": 0.0258, "lr": 4.403894473369092e-06, "epoch": 1.513317191283293, "percentage": 30.27, "elapsed_time": "0:06:02", "remaining_time": "0:13:54", "throughput": 1483.2, "total_tokens": 537136}
{"current_steps": 630, "total_steps": 2065, "loss": 0.199, "lr": 4.390128691059423e-06, "epoch": 1.5254237288135593, "percentage": 30.51, "elapsed_time": "0:06:02", "remaining_time": "0:13:45", "throughput": 1493.87, "total_tokens": 541552}
{"current_steps": 635, "total_steps": 2065, "loss": 0.1964, "lr": 4.376227814419524e-06, "epoch": 1.5375302663438255, "percentage": 30.75, "elapsed_time": "0:06:02", "remaining_time": "0:13:37", "throughput": 1503.66, "total_tokens": 545648}
{"current_steps": 640, "total_steps": 2065, "loss": 0.06, "lr": 4.3621928369952995e-06, "epoch": 1.549636803874092, "percentage": 30.99, "elapsed_time": "0:06:03", "remaining_time": "0:13:28", "throughput": 1514.8, "total_tokens": 550256}
{"current_steps": 645, "total_steps": 2065, "loss": 0.1114, "lr": 4.348024761917321e-06, "epoch": 1.5617433414043584, "percentage": 31.23, "elapsed_time": "0:06:03", "remaining_time": "0:13:20", "throughput": 1526.09, "total_tokens": 554928}
{"current_steps": 650, "total_steps": 2065, "loss": 0.0725, "lr": 4.333724601829132e-06, "epoch": 1.5738498789346247, "percentage": 31.48, "elapsed_time": "0:06:03", "remaining_time": "0:13:12", "throughput": 1536.66, "total_tokens": 559344}
{"current_steps": 655, "total_steps": 2065, "loss": 0.1308, "lr": 4.319293378814868e-06, "epoch": 1.585956416464891, "percentage": 31.72, "elapsed_time": "0:06:04", "remaining_time": "0:13:04", "throughput": 1547.23, "total_tokens": 563760}
{"current_steps": 660, "total_steps": 2065, "loss": 0.0653, "lr": 4.3047321243262065e-06, "epoch": 1.5980629539951574, "percentage": 31.96, "elapsed_time": "0:06:04", "remaining_time": "0:12:56", "throughput": 1557.6, "total_tokens": 568112}
{"current_steps": 665, "total_steps": 2065, "loss": 0.006, "lr": 4.290041879108641e-06, "epoch": 1.6101694915254239, "percentage": 32.2, "elapsed_time": "0:06:05", "remaining_time": "0:12:48", "throughput": 1567.94, "total_tokens": 572464}
{"current_steps": 670, "total_steps": 2065, "loss": 0.0771, "lr": 4.275223693127103e-06, "epoch": 1.6222760290556901, "percentage": 32.45, "elapsed_time": "0:06:05", "remaining_time": "0:12:40", "throughput": 1578.09, "total_tokens": 576752}
{"current_steps": 675, "total_steps": 2065, "loss": 0.034, "lr": 4.260278625490911e-06, "epoch": 1.6343825665859564, "percentage": 32.69, "elapsed_time": "0:06:05", "remaining_time": "0:12:33", "throughput": 1588.04, "total_tokens": 580976}
{"current_steps": 680, "total_steps": 2065, "loss": 0.1429, "lr": 4.245207744378075e-06, "epoch": 1.6464891041162226, "percentage": 32.93, "elapsed_time": "0:06:06", "remaining_time": "0:12:25", "throughput": 1598.15, "total_tokens": 585264}
{"current_steps": 685, "total_steps": 2065, "loss": 0.0664, "lr": 4.2300121269589475e-06, "epoch": 1.658595641646489, "percentage": 33.17, "elapsed_time": "0:06:06", "remaining_time": "0:12:18", "throughput": 1608.74, "total_tokens": 589744}
{"current_steps": 690, "total_steps": 2065, "loss": 0.0792, "lr": 4.2146928593192375e-06, "epoch": 1.6707021791767556, "percentage": 33.41, "elapsed_time": "0:06:06", "remaining_time": "0:12:11", "throughput": 1618.65, "total_tokens": 593968}
{"current_steps": 695, "total_steps": 2065, "loss": 0.1061, "lr": 4.19925103638238e-06, "epoch": 1.6828087167070218, "percentage": 33.66, "elapsed_time": "0:06:07", "remaining_time": "0:12:04", "throughput": 1628.7, "total_tokens": 598256}
{"current_steps": 700, "total_steps": 2065, "loss": 0.0958, "lr": 4.183687761831282e-06, "epoch": 1.694915254237288, "percentage": 33.9, "elapsed_time": "0:06:07", "remaining_time": "0:11:56", "throughput": 1638.9, "total_tokens": 602608}
{"current_steps": 705, "total_steps": 2065, "loss": 0.1234, "lr": 4.168004148029435e-06, "epoch": 1.7070217917675545, "percentage": 34.14, "elapsed_time": "0:06:08", "remaining_time": "0:11:50", "throughput": 1649.42, "total_tokens": 607088}
{"current_steps": 710, "total_steps": 2065, "loss": 0.1094, "lr": 4.152201315941414e-06, "epoch": 1.7191283292978208, "percentage": 34.38, "elapsed_time": "0:06:08", "remaining_time": "0:11:43", "throughput": 1659.06, "total_tokens": 611248}
{"current_steps": 715, "total_steps": 2065, "loss": 0.1047, "lr": 4.136280395052754e-06, "epoch": 1.7312348668280872, "percentage": 34.62, "elapsed_time": "0:06:08", "remaining_time": "0:11:36", "throughput": 1669.02, "total_tokens": 615536}
{"current_steps": 720, "total_steps": 2065, "loss": 0.0341, "lr": 4.120242523289223e-06, "epoch": 1.7433414043583535, "percentage": 34.87, "elapsed_time": "0:06:09", "remaining_time": "0:11:29", "throughput": 1679.29, "total_tokens": 619952}
{"current_steps": 725, "total_steps": 2065, "loss": 0.2066, "lr": 4.104088846935493e-06, "epoch": 1.7554479418886197, "percentage": 35.11, "elapsed_time": "0:06:09", "remaining_time": "0:11:23", "throughput": 1689.56, "total_tokens": 624368}
{"current_steps": 728, "total_steps": 2065, "eval_loss": 0.14485575258731842, "epoch": 1.7627118644067796, "percentage": 35.25, "elapsed_time": "0:06:10", "remaining_time": "0:11:20", "throughput": 1692.87, "total_tokens": 626992}
{"current_steps": 730, "total_steps": 2065, "loss": 0.0104, "lr": 4.087820520553205e-06, "epoch": 1.7675544794188862, "percentage": 35.35, "elapsed_time": "0:07:04", "remaining_time": "0:12:56", "throughput": 1480.95, "total_tokens": 628720}
{"current_steps": 735, "total_steps": 2065, "loss": 0.0572, "lr": 4.071438706898457e-06, "epoch": 1.7796610169491527, "percentage": 35.59, "elapsed_time": "0:07:04", "remaining_time": "0:12:48", "throughput": 1489.76, "total_tokens": 633008}
{"current_steps": 740, "total_steps": 2065, "loss": 0.1222, "lr": 4.0549445768386895e-06, "epoch": 1.791767554479419, "percentage": 35.84, "elapsed_time": "0:07:05", "remaining_time": "0:12:41", "throughput": 1498.69, "total_tokens": 637360}
{"current_steps": 745, "total_steps": 2065, "loss": 0.1171, "lr": 4.038339309269002e-06, "epoch": 1.8038740920096852, "percentage": 36.08, "elapsed_time": "0:07:05", "remaining_time": "0:12:34", "throughput": 1507.47, "total_tokens": 641648}
{"current_steps": 750, "total_steps": 2065, "loss": 0.1638, "lr": 4.021624091027895e-06, "epoch": 1.8159806295399514, "percentage": 36.32, "elapsed_time": "0:07:06", "remaining_time": "0:12:26", "throughput": 1515.36, "total_tokens": 645552}
{"current_steps": 755, "total_steps": 2065, "loss": 0.1092, "lr": 4.00480011681244e-06, "epoch": 1.828087167070218, "percentage": 36.56, "elapsed_time": "0:07:06", "remaining_time": "0:12:19", "throughput": 1524.26, "total_tokens": 649904}
{"current_steps": 760, "total_steps": 2065, "loss": 0.1118, "lr": 3.987868589092894e-06, "epoch": 1.8401937046004844, "percentage": 36.8, "elapsed_time": "0:07:06", "remaining_time": "0:12:12", "throughput": 1532.85, "total_tokens": 654128}
{"current_steps": 765, "total_steps": 2065, "loss": 0.1015, "lr": 3.970830718026746e-06, "epoch": 1.8523002421307506, "percentage": 37.05, "elapsed_time": "0:07:07", "remaining_time": "0:12:05", "throughput": 1542.13, "total_tokens": 658672}
{"current_steps": 770, "total_steps": 2065, "loss": 0.1207, "lr": 3.9536877213722335e-06, "epoch": 1.8644067796610169, "percentage": 37.29, "elapsed_time": "0:07:07", "remaining_time": "0:11:58", "throughput": 1551.12, "total_tokens": 663088}
{"current_steps": 775, "total_steps": 2065, "loss": 0.083, "lr": 3.936440824401299e-06, "epoch": 1.8765133171912833, "percentage": 37.53, "elapsed_time": "0:07:07", "remaining_time": "0:11:52", "throughput": 1559.95, "total_tokens": 667440}
{"current_steps": 780, "total_steps": 2065, "loss": 0.0249, "lr": 3.919091259812013e-06, "epoch": 1.8886198547215496, "percentage": 37.77, "elapsed_time": "0:07:08", "remaining_time": "0:11:45", "throughput": 1568.76, "total_tokens": 671792}
{"current_steps": 785, "total_steps": 2065, "loss": 0.0425, "lr": 3.901640267640475e-06, "epoch": 1.900726392251816, "percentage": 38.01, "elapsed_time": "0:07:08", "remaining_time": "0:11:38", "throughput": 1578.01, "total_tokens": 676336}
{"current_steps": 790, "total_steps": 2065, "loss": 0.0402, "lr": 3.884089095172181e-06, "epoch": 1.9128329297820823, "percentage": 38.26, "elapsed_time": "0:07:08", "remaining_time": "0:11:32", "throughput": 1586.65, "total_tokens": 680624}
{"current_steps": 795, "total_steps": 2065, "loss": 0.0155, "lr": 3.866438996852873e-06, "epoch": 1.9249394673123486, "percentage": 38.5, "elapsed_time": "0:07:09", "remaining_time": "0:11:25", "throughput": 1595.56, "total_tokens": 685040}
{"current_steps": 800, "total_steps": 2065, "loss": 0.0372, "lr": 3.848691234198879e-06, "epoch": 1.937046004842615, "percentage": 38.74, "elapsed_time": "0:07:09", "remaining_time": "0:11:19", "throughput": 1604.31, "total_tokens": 689392}
{"current_steps": 805, "total_steps": 2065, "loss": 0.1257, "lr": 3.830847075706957e-06, "epoch": 1.9491525423728815, "percentage": 38.98, "elapsed_time": "0:07:10", "remaining_time": "0:11:13", "throughput": 1612.62, "total_tokens": 693552}
{"current_steps": 810, "total_steps": 2065, "loss": 0.0454, "lr": 3.812907796763616e-06, "epoch": 1.9612590799031477, "percentage": 39.23, "elapsed_time": "0:07:10", "remaining_time": "0:11:06", "throughput": 1621.64, "total_tokens": 698032}
{"current_steps": 815, "total_steps": 2065, "loss": 0.2136, "lr": 3.794874679553975e-06, "epoch": 1.973365617433414, "percentage": 39.47, "elapsed_time": "0:07:10", "remaining_time": "0:11:00", "throughput": 1629.5, "total_tokens": 702000}
{"current_steps": 820, "total_steps": 2065, "loss": 0.1643, "lr": 3.7767490129701057e-06, "epoch": 1.9854721549636802, "percentage": 39.71, "elapsed_time": "0:07:11", "remaining_time": "0:10:54", "throughput": 1637.76, "total_tokens": 706160}
{"current_steps": 825, "total_steps": 2065, "loss": 0.0475, "lr": 3.7585320925189246e-06, "epoch": 1.9975786924939467, "percentage": 39.95, "elapsed_time": "0:07:11", "remaining_time": "0:10:48", "throughput": 1647.02, "total_tokens": 710768}
{"current_steps": 830, "total_steps": 2065, "loss": 0.0011, "lr": 3.7402252202295876e-06, "epoch": 2.009685230024213, "percentage": 40.19, "elapsed_time": "0:07:12", "remaining_time": "0:10:42", "throughput": 1654.4, "total_tokens": 714744}
{"current_steps": 832, "total_steps": 2065, "eval_loss": 0.2067757099866867, "epoch": 2.0145278450363198, "percentage": 40.29, "elapsed_time": "0:07:12", "remaining_time": "0:10:41", "throughput": 1655.23, "total_tokens": 716344}
{"current_steps": 835, "total_steps": 2065, "loss": 0.0057, "lr": 3.7218297045604362e-06, "epoch": 2.0217917675544794, "percentage": 40.44, "elapsed_time": "0:08:22", "remaining_time": "0:12:20", "throughput": 1429.44, "total_tokens": 718776}
{"current_steps": 840, "total_steps": 2065, "loss": 0.0114, "lr": 3.703346860305473e-06, "epoch": 2.0338983050847457, "percentage": 40.68, "elapsed_time": "0:08:23", "remaining_time": "0:12:13", "throughput": 1436.3, "total_tokens": 722744}
{"current_steps": 845, "total_steps": 2065, "loss": 0.0047, "lr": 3.6847780085003908e-06, "epoch": 2.046004842615012, "percentage": 40.92, "elapsed_time": "0:08:23", "remaining_time": "0:12:07", "throughput": 1444.0, "total_tokens": 727160}
{"current_steps": 850, "total_steps": 2065, "loss": 0.0867, "lr": 3.666124476328155e-06, "epoch": 2.0581113801452786, "percentage": 41.16, "elapsed_time": "0:08:23", "remaining_time": "0:12:00", "throughput": 1451.7, "total_tokens": 731576}
{"current_steps": 855, "total_steps": 2065, "loss": 0.0084, "lr": 3.647387597024139e-06, "epoch": 2.070217917675545, "percentage": 41.4, "elapsed_time": "0:08:24", "remaining_time": "0:11:53", "throughput": 1459.77, "total_tokens": 736184}
{"current_steps": 860, "total_steps": 2065, "loss": 0.0011, "lr": 3.6285687097808396e-06, "epoch": 2.082324455205811, "percentage": 41.65, "elapsed_time": "0:08:24", "remaining_time": "0:11:47", "throughput": 1467.19, "total_tokens": 740472}
{"current_steps": 865, "total_steps": 2065, "loss": 0.0528, "lr": 3.609669159652158e-06, "epoch": 2.0944309927360774, "percentage": 41.89, "elapsed_time": "0:08:25", "remaining_time": "0:11:40", "throughput": 1474.6, "total_tokens": 744760}
{"current_steps": 870, "total_steps": 2065, "loss": 0.0003, "lr": 3.5906902974572623e-06, "epoch": 2.106537530266344, "percentage": 42.13, "elapsed_time": "0:08:25", "remaining_time": "0:11:34", "throughput": 1482.26, "total_tokens": 749176}
{"current_steps": 875, "total_steps": 2065, "loss": 0.0329, "lr": 3.5716334796840403e-06, "epoch": 2.1186440677966103, "percentage": 42.37, "elapsed_time": "0:08:25", "remaining_time": "0:11:27", "throughput": 1489.79, "total_tokens": 753528}
{"current_steps": 880, "total_steps": 2065, "loss": 0.0022, "lr": 3.5525000683921467e-06, "epoch": 2.1307506053268765, "percentage": 42.62, "elapsed_time": "0:08:26", "remaining_time": "0:11:21", "throughput": 1496.92, "total_tokens": 757688}
{"current_steps": 885, "total_steps": 2065, "loss": 0.0268, "lr": 3.533291431115653e-06, "epoch": 2.142857142857143, "percentage": 42.86, "elapsed_time": "0:08:26", "remaining_time": "0:11:15", "throughput": 1504.42, "total_tokens": 762040}
{"current_steps": 890, "total_steps": 2065, "loss": 0.0746, "lr": 3.514008940765304e-06, "epoch": 2.154963680387409, "percentage": 43.1, "elapsed_time": "0:08:26", "remaining_time": "0:11:09", "throughput": 1511.54, "total_tokens": 766200}
{"current_steps": 895, "total_steps": 2065, "loss": 0.0202, "lr": 3.494653975530388e-06, "epoch": 2.1670702179176757, "percentage": 43.34, "elapsed_time": "0:08:27", "remaining_time": "0:11:03", "throughput": 1519.26, "total_tokens": 770680}
{"current_steps": 900, "total_steps": 2065, "loss": 0.0023, "lr": 3.475227918780239e-06, "epoch": 2.179176755447942, "percentage": 43.58, "elapsed_time": "0:08:27", "remaining_time": "0:10:57", "throughput": 1526.36, "total_tokens": 774840}
{"current_steps": 905, "total_steps": 2065, "loss": 0.0001, "lr": 3.455732158965356e-06, "epoch": 2.1912832929782082, "percentage": 43.83, "elapsed_time": "0:08:28", "remaining_time": "0:10:51", "throughput": 1533.82, "total_tokens": 779192}
{"current_steps": 910, "total_steps": 2065, "loss": 0.0001, "lr": 3.436168089518168e-06, "epoch": 2.2033898305084745, "percentage": 44.07, "elapsed_time": "0:08:28", "remaining_time": "0:10:45", "throughput": 1541.38, "total_tokens": 783608}
{"current_steps": 915, "total_steps": 2065, "loss": 0.0365, "lr": 3.4165371087534428e-06, "epoch": 2.2154963680387407, "percentage": 44.31, "elapsed_time": "0:08:28", "remaining_time": "0:10:39", "throughput": 1549.06, "total_tokens": 788088}
{"current_steps": 920, "total_steps": 2065, "loss": 0.0, "lr": 3.396840619768338e-06, "epoch": 2.2276029055690074, "percentage": 44.55, "elapsed_time": "0:08:29", "remaining_time": "0:10:33", "throughput": 1556.72, "total_tokens": 792568}
{"current_steps": 925, "total_steps": 2065, "loss": 0.0001, "lr": 3.377080030342125e-06, "epoch": 2.2397094430992737, "percentage": 44.79, "elapsed_time": "0:08:29", "remaining_time": "0:10:27", "throughput": 1564.63, "total_tokens": 797176}
{"current_steps": 930, "total_steps": 2065, "loss": 0.0038, "lr": 3.3572567528355614e-06, "epoch": 2.25181598062954, "percentage": 45.04, "elapsed_time": "0:08:29", "remaining_time": "0:10:22", "throughput": 1571.78, "total_tokens": 801400}
{"current_steps": 935, "total_steps": 2065, "loss": 0.0059, "lr": 3.3373722040899515e-06, "epoch": 2.263922518159806, "percentage": 45.28, "elapsed_time": "0:08:30", "remaining_time": "0:10:16", "throughput": 1579.54, "total_tokens": 805944}
{"current_steps": 936, "total_steps": 2065, "eval_loss": 0.26913806796073914, "epoch": 2.2663438256658597, "percentage": 45.33, "elapsed_time": "0:08:30", "remaining_time": "0:10:16", "throughput": 1578.75, "total_tokens": 806712}
{"current_steps": 940, "total_steps": 2065, "loss": 0.0006, "lr": 3.3174278053258753e-06, "epoch": 2.2760290556900724, "percentage": 45.52, "elapsed_time": "0:09:14", "remaining_time": "0:11:03", "throughput": 1460.35, "total_tokens": 810040}
{"current_steps": 945, "total_steps": 2065, "loss": 0.0482, "lr": 3.2974249820416094e-06, "epoch": 2.288135593220339, "percentage": 45.76, "elapsed_time": "0:09:15", "remaining_time": "0:10:57", "throughput": 1467.22, "total_tokens": 814392}
{"current_steps": 950, "total_steps": 2065, "loss": 0.0175, "lr": 3.2773651639112432e-06, "epoch": 2.3002421307506054, "percentage": 46.0, "elapsed_time": "0:09:15", "remaining_time": "0:10:51", "throughput": 1474.29, "total_tokens": 818872}
{"current_steps": 955, "total_steps": 2065, "loss": 0.0039, "lr": 3.2572497846824922e-06, "epoch": 2.3123486682808716, "percentage": 46.25, "elapsed_time": "0:09:15", "remaining_time": "0:10:46", "throughput": 1480.92, "total_tokens": 823096}
{"current_steps": 960, "total_steps": 2065, "loss": 0.0549, "lr": 3.2370802820742273e-06, "epoch": 2.324455205811138, "percentage": 46.49, "elapsed_time": "0:09:16", "remaining_time": "0:10:40", "throughput": 1487.2, "total_tokens": 827128}
{"current_steps": 965, "total_steps": 2065, "loss": 0.0011, "lr": 3.2168580976737105e-06, "epoch": 2.3365617433414045, "percentage": 46.73, "elapsed_time": "0:09:16", "remaining_time": "0:10:34", "throughput": 1493.7, "total_tokens": 831288}
{"current_steps": 970, "total_steps": 2065, "loss": 0.0202, "lr": 3.1965846768335625e-06, "epoch": 2.348668280871671, "percentage": 46.97, "elapsed_time": "0:09:16", "remaining_time": "0:10:28", "throughput": 1500.52, "total_tokens": 835640}
{"current_steps": 975, "total_steps": 2065, "loss": 0.0019, "lr": 3.176261468568457e-06, "epoch": 2.360774818401937, "percentage": 47.22, "elapsed_time": "0:09:17", "remaining_time": "0:10:22", "throughput": 1506.89, "total_tokens": 839736}
{"current_steps": 980, "total_steps": 2065, "loss": 0.0363, "lr": 3.155889925451557e-06, "epoch": 2.3728813559322033, "percentage": 47.46, "elapsed_time": "0:09:17", "remaining_time": "0:10:17", "throughput": 1513.58, "total_tokens": 844024}
{"current_steps": 985, "total_steps": 2065, "loss": 0.0001, "lr": 3.1354715035106892e-06, "epoch": 2.38498789346247, "percentage": 47.7, "elapsed_time": "0:09:18", "remaining_time": "0:10:11", "throughput": 1520.15, "total_tokens": 848248}
{"current_steps": 990, "total_steps": 2065, "loss": 0.0, "lr": 3.115007662124282e-06, "epoch": 2.3970944309927362, "percentage": 47.94, "elapsed_time": "0:09:18", "remaining_time": "0:10:06", "throughput": 1526.71, "total_tokens": 852472}
{"current_steps": 995, "total_steps": 2065, "loss": 0.0006, "lr": 3.0944998639170544e-06, "epoch": 2.4092009685230025, "percentage": 48.18, "elapsed_time": "0:09:18", "remaining_time": "0:10:00", "throughput": 1533.49, "total_tokens": 856824}
{"current_steps": 1000, "total_steps": 2065, "loss": 0.0018, "lr": 3.0739495746554785e-06, "epoch": 2.4213075060532687, "percentage": 48.43, "elapsed_time": "0:09:19", "remaining_time": "0:09:55", "throughput": 1539.92, "total_tokens": 860984}
{"current_steps": 1005, "total_steps": 2065, "loss": 0.068, "lr": 3.0533582631430153e-06, "epoch": 2.433414043583535, "percentage": 48.67, "elapsed_time": "0:09:19", "remaining_time": "0:09:50", "throughput": 1546.57, "total_tokens": 865272}
{"current_steps": 1010, "total_steps": 2065, "loss": 0.0395, "lr": 3.0327274011151355e-06, "epoch": 2.4455205811138017, "percentage": 48.91, "elapsed_time": "0:09:19", "remaining_time": "0:09:44", "throughput": 1553.21, "total_tokens": 869560}
{"current_steps": 1015, "total_steps": 2065, "loss": 0.0, "lr": 3.012058463134126e-06, "epoch": 2.457627118644068, "percentage": 49.15, "elapsed_time": "0:09:20", "remaining_time": "0:09:39", "throughput": 1560.06, "total_tokens": 873976}
{"current_steps": 1020, "total_steps": 2065, "loss": 0.0, "lr": 2.991352926483702e-06, "epoch": 2.469733656174334, "percentage": 49.39, "elapsed_time": "0:09:20", "remaining_time": "0:09:34", "throughput": 1566.57, "total_tokens": 878200}
{"current_steps": 1025, "total_steps": 2065, "loss": 0.0008, "lr": 2.9706122710634166e-06, "epoch": 2.4818401937046004, "percentage": 49.64, "elapsed_time": "0:09:20", "remaining_time": "0:09:29", "throughput": 1573.86, "total_tokens": 882872}
{"current_steps": 1030, "total_steps": 2065, "loss": 0.0, "lr": 2.949837979282889e-06, "epoch": 2.4939467312348667, "percentage": 49.88, "elapsed_time": "0:09:21", "remaining_time": "0:09:24", "throughput": 1580.35, "total_tokens": 887096}
{"current_steps": 1035, "total_steps": 2065, "loss": 0.0032, "lr": 2.9290315359558504e-06, "epoch": 2.5060532687651333, "percentage": 50.12, "elapsed_time": "0:09:21", "remaining_time": "0:09:18", "throughput": 1587.28, "total_tokens": 891576}
{"current_steps": 1040, "total_steps": 2065, "loss": 0.0756, "lr": 2.908194428194019e-06, "epoch": 2.5181598062953996, "percentage": 50.36, "elapsed_time": "0:09:22", "remaining_time": "0:09:13", "throughput": 1593.64, "total_tokens": 895736}
{"current_steps": 1040, "total_steps": 2065, "eval_loss": 0.28947436809539795, "epoch": 2.5181598062953996, "percentage": 50.36, "elapsed_time": "0:09:22", "remaining_time": "0:09:14", "throughput": 1591.83, "total_tokens": 895736}
{"current_steps": 1045, "total_steps": 2065, "loss": 0.0001, "lr": 2.88732814530081e-06, "epoch": 2.530266343825666, "percentage": 50.61, "elapsed_time": "0:09:56", "remaining_time": "0:09:42", "throughput": 1508.91, "total_tokens": 900024}
{"current_steps": 1050, "total_steps": 2065, "loss": 0.0128, "lr": 2.8664341786648932e-06, "epoch": 2.542372881355932, "percentage": 50.85, "elapsed_time": "0:09:56", "remaining_time": "0:09:36", "throughput": 1515.38, "total_tokens": 904440}
{"current_steps": 1055, "total_steps": 2065, "loss": 0.0001, "lr": 2.845514021653595e-06, "epoch": 2.5544794188861983, "percentage": 51.09, "elapsed_time": "0:09:57", "remaining_time": "0:09:31", "throughput": 1521.62, "total_tokens": 908728}
{"current_steps": 1060, "total_steps": 2065, "loss": 0.0443, "lr": 2.8245691695061605e-06, "epoch": 2.566585956416465, "percentage": 51.33, "elapsed_time": "0:09:57", "remaining_time": "0:09:26", "throughput": 1527.86, "total_tokens": 913016}
{"current_steps": 1065, "total_steps": 2065, "loss": 0.0032, "lr": 2.8036011192268863e-06, "epoch": 2.5786924939467313, "percentage": 51.57, "elapsed_time": "0:09:57", "remaining_time": "0:09:21", "throughput": 1534.09, "total_tokens": 917304}
{"current_steps": 1070, "total_steps": 2065, "loss": 0.0001, "lr": 2.7826113694781254e-06, "epoch": 2.5907990314769975, "percentage": 51.82, "elapsed_time": "0:09:58", "remaining_time": "0:09:16", "throughput": 1540.21, "total_tokens": 921528}
{"current_steps": 1075, "total_steps": 2065, "loss": 0.0, "lr": 2.7616014204731683e-06, "epoch": 2.6029055690072638, "percentage": 52.06, "elapsed_time": "0:09:58", "remaining_time": "0:09:11", "throughput": 1546.64, "total_tokens": 925944}
{"current_steps": 1080, "total_steps": 2065, "loss": 0.0001, "lr": 2.7405727738690193e-06, "epoch": 2.61501210653753, "percentage": 52.3, "elapsed_time": "0:09:59", "remaining_time": "0:09:06", "throughput": 1553.68, "total_tokens": 930744}
{"current_steps": 1085, "total_steps": 2065, "loss": 0.0725, "lr": 2.7195269326590685e-06, "epoch": 2.6271186440677967, "percentage": 52.54, "elapsed_time": "0:09:59", "remaining_time": "0:09:01", "throughput": 1560.39, "total_tokens": 935352}
{"current_steps": 1090, "total_steps": 2065, "loss": 0.0295, "lr": 2.698465401065667e-06, "epoch": 2.639225181598063, "percentage": 52.78, "elapsed_time": "0:09:59", "remaining_time": "0:08:56", "throughput": 1566.57, "total_tokens": 939640}
{"current_steps": 1095, "total_steps": 2065, "loss": 0.0001, "lr": 2.6773896844326126e-06, "epoch": 2.651331719128329, "percentage": 53.03, "elapsed_time": "0:10:00", "remaining_time": "0:08:51", "throughput": 1572.33, "total_tokens": 943672}
{"current_steps": 1100, "total_steps": 2065, "loss": 0.0001, "lr": 2.656301289117561e-06, "epoch": 2.663438256658596, "percentage": 53.27, "elapsed_time": "0:10:00", "remaining_time": "0:08:46", "throughput": 1578.1, "total_tokens": 947704}
{"current_steps": 1105, "total_steps": 2065, "loss": 0.0196, "lr": 2.6352017223843584e-06, "epoch": 2.6755447941888617, "percentage": 53.51, "elapsed_time": "0:10:00", "remaining_time": "0:08:42", "throughput": 1584.16, "total_tokens": 951928}
{"current_steps": 1110, "total_steps": 2065, "loss": 0.0294, "lr": 2.6140924922953125e-06, "epoch": 2.6876513317191284, "percentage": 53.75, "elapsed_time": "0:10:01", "remaining_time": "0:08:37", "throughput": 1590.32, "total_tokens": 956216}
{"current_steps": 1115, "total_steps": 2065, "loss": 0.0001, "lr": 2.592975107603406e-06, "epoch": 2.6997578692493946, "percentage": 54.0, "elapsed_time": "0:10:01", "remaining_time": "0:08:32", "throughput": 1596.47, "total_tokens": 960504}
{"current_steps": 1120, "total_steps": 2065, "loss": 0.0135, "lr": 2.571851077644461e-06, "epoch": 2.711864406779661, "percentage": 54.24, "elapsed_time": "0:10:02", "remaining_time": "0:08:27", "throughput": 1603.03, "total_tokens": 965048}
{"current_steps": 1125, "total_steps": 2065, "loss": 0.0001, "lr": 2.55072191222926e-06, "epoch": 2.7239709443099276, "percentage": 54.48, "elapsed_time": "0:10:02", "remaining_time": "0:08:23", "throughput": 1608.96, "total_tokens": 969208}
{"current_steps": 1130, "total_steps": 2065, "loss": 0.0991, "lr": 2.5295891215356362e-06, "epoch": 2.736077481840194, "percentage": 54.72, "elapsed_time": "0:10:02", "remaining_time": "0:08:18", "throughput": 1615.29, "total_tokens": 973624}
{"current_steps": 1135, "total_steps": 2065, "loss": 0.0064, "lr": 2.5084542160005338e-06, "epoch": 2.74818401937046, "percentage": 54.96, "elapsed_time": "0:10:03", "remaining_time": "0:08:14", "throughput": 1621.51, "total_tokens": 977976}
{"current_steps": 1140, "total_steps": 2065, "loss": 0.0001, "lr": 2.4873187062120515e-06, "epoch": 2.7602905569007263, "percentage": 55.21, "elapsed_time": "0:10:03", "remaining_time": "0:08:09", "throughput": 1627.52, "total_tokens": 982200}
{"current_steps": 1144, "total_steps": 2065, "eval_loss": 0.22601255774497986, "epoch": 2.7699757869249395, "percentage": 55.4, "elapsed_time": "0:10:04", "remaining_time": "0:08:06", "throughput": 1630.57, "total_tokens": 985592}
{"current_steps": 1145, "total_steps": 2065, "loss": 0.0002, "lr": 2.4661841028014786e-06, "epoch": 2.7723970944309926, "percentage": 55.45, "elapsed_time": "0:11:04", "remaining_time": "0:08:53", "throughput": 1485.47, "total_tokens": 986488}
{"current_steps": 1150, "total_steps": 2065, "loss": 0.0002, "lr": 2.445051916335321e-06, "epoch": 2.7845036319612593, "percentage": 55.69, "elapsed_time": "0:11:04", "remaining_time": "0:08:48", "throughput": 1490.64, "total_tokens": 990456}
{"current_steps": 1155, "total_steps": 2065, "loss": 0.0766, "lr": 2.4239236572073354e-06, "epoch": 2.7966101694915255, "percentage": 55.93, "elapsed_time": "0:11:04", "remaining_time": "0:08:43", "throughput": 1496.25, "total_tokens": 994744}
{"current_steps": 1160, "total_steps": 2065, "loss": 0.0501, "lr": 2.4028008355305817e-06, "epoch": 2.8087167070217918, "percentage": 56.17, "elapsed_time": "0:11:05", "remaining_time": "0:08:38", "throughput": 1502.06, "total_tokens": 999160}
{"current_steps": 1165, "total_steps": 2065, "loss": 0.0289, "lr": 2.3816849610294784e-06, "epoch": 2.820823244552058, "percentage": 56.42, "elapsed_time": "0:11:05", "remaining_time": "0:08:34", "throughput": 1507.39, "total_tokens": 1003256}
{"current_steps": 1170, "total_steps": 2065, "loss": 0.0884, "lr": 2.3605775429319115e-06, "epoch": 2.8329297820823243, "percentage": 56.66, "elapsed_time": "0:11:05", "remaining_time": "0:08:29", "throughput": 1512.89, "total_tokens": 1007480}
{"current_steps": 1175, "total_steps": 2065, "loss": 0.0004, "lr": 2.3394800898613536e-06, "epoch": 2.845036319612591, "percentage": 56.9, "elapsed_time": "0:11:06", "remaining_time": "0:08:24", "throughput": 1518.65, "total_tokens": 1011896}
{"current_steps": 1180, "total_steps": 2065, "loss": 0.0004, "lr": 2.318394109729041e-06, "epoch": 2.857142857142857, "percentage": 57.14, "elapsed_time": "0:11:06", "remaining_time": "0:08:20", "throughput": 1523.96, "total_tokens": 1015992}
{"current_steps": 1185, "total_steps": 2065, "loss": 0.003, "lr": 2.297321109626198e-06, "epoch": 2.8692493946731235, "percentage": 57.38, "elapsed_time": "0:11:07", "remaining_time": "0:08:15", "throughput": 1529.74, "total_tokens": 1020408}
{"current_steps": 1190, "total_steps": 2065, "loss": 0.0003, "lr": 2.27626259571632e-06, "epoch": 2.8813559322033897, "percentage": 57.63, "elapsed_time": "0:11:07", "remaining_time": "0:08:10", "throughput": 1535.77, "total_tokens": 1025016}
{"current_steps": 1195, "total_steps": 2065, "loss": 0.0571, "lr": 2.2552200731275215e-06, "epoch": 2.893462469733656, "percentage": 57.87, "elapsed_time": "0:11:07", "remaining_time": "0:08:06", "throughput": 1541.44, "total_tokens": 1029368}
{"current_steps": 1200, "total_steps": 2065, "loss": 0.0007, "lr": 2.2341950458449576e-06, "epoch": 2.9055690072639226, "percentage": 58.11, "elapsed_time": "0:11:08", "remaining_time": "0:08:01", "throughput": 1546.91, "total_tokens": 1033592}
{"current_steps": 1205, "total_steps": 2065, "loss": 0.0001, "lr": 2.2131890166033333e-06, "epoch": 2.917675544794189, "percentage": 58.35, "elapsed_time": "0:11:08", "remaining_time": "0:07:57", "throughput": 1552.19, "total_tokens": 1037688}
{"current_steps": 1210, "total_steps": 2065, "loss": 0.0136, "lr": 2.1922034867794923e-06, "epoch": 2.929782082324455, "percentage": 58.6, "elapsed_time": "0:11:08", "remaining_time": "0:07:52", "throughput": 1557.65, "total_tokens": 1041912}
{"current_steps": 1215, "total_steps": 2065, "loss": 0.0003, "lr": 2.171239956285115e-06, "epoch": 2.9418886198547214, "percentage": 58.84, "elapsed_time": "0:11:09", "remaining_time": "0:07:48", "throughput": 1563.48, "total_tokens": 1046392}
{"current_steps": 1220, "total_steps": 2065, "loss": 0.0001, "lr": 2.150299923459505e-06, "epoch": 2.9539951573849876, "percentage": 59.08, "elapsed_time": "0:11:09", "remaining_time": "0:07:43", "throughput": 1568.93, "total_tokens": 1050616}
{"current_steps": 1225, "total_steps": 2065, "loss": 0.0001, "lr": 2.1293848849625065e-06, "epoch": 2.9661016949152543, "percentage": 59.32, "elapsed_time": "0:11:10", "remaining_time": "0:07:39", "throughput": 1574.38, "total_tokens": 1054840}
{"current_steps": 1230, "total_steps": 2065, "loss": 0.0001, "lr": 2.108496335667527e-06, "epoch": 2.9782082324455206, "percentage": 59.56, "elapsed_time": "0:11:10", "remaining_time": "0:07:35", "throughput": 1579.63, "total_tokens": 1058936}
{"current_steps": 1235, "total_steps": 2065, "loss": 0.0001, "lr": 2.0876357685546942e-06, "epoch": 2.990314769975787, "percentage": 59.81, "elapsed_time": "0:11:10", "remaining_time": "0:07:30", "throughput": 1585.26, "total_tokens": 1063288}
{"current_steps": 1240, "total_steps": 2065, "loss": 0.0, "lr": 2.0668046746041497e-06, "epoch": 3.002421307506053, "percentage": 60.05, "elapsed_time": "0:11:11", "remaining_time": "0:07:26", "throughput": 1590.19, "total_tokens": 1067392}
{"current_steps": 1245, "total_steps": 2065, "loss": 0.0, "lr": 2.0460045426894816e-06, "epoch": 3.0145278450363198, "percentage": 60.29, "elapsed_time": "0:11:11", "remaining_time": "0:07:22", "throughput": 1595.98, "total_tokens": 1071872}
{"current_steps": 1248, "total_steps": 2065, "eval_loss": 0.22526989877223969, "epoch": 3.0217917675544794, "percentage": 60.44, "elapsed_time": "0:11:12", "remaining_time": "0:07:20", "throughput": 1598.06, "total_tokens": 1074624}
{"current_steps": 1250, "total_steps": 2065, "loss": 0.0, "lr": 2.0252368594713083e-06, "epoch": 3.026634382566586, "percentage": 60.53, "elapsed_time": "0:11:53", "remaining_time": "0:07:45", "throughput": 1507.92, "total_tokens": 1076416}
{"current_steps": 1255, "total_steps": 2065, "loss": 0.0024, "lr": 2.004503109291023e-06, "epoch": 3.0387409200968523, "percentage": 60.77, "elapsed_time": "0:11:54", "remaining_time": "0:07:40", "throughput": 1512.89, "total_tokens": 1080512}
{"current_steps": 1260, "total_steps": 2065, "loss": 0.0, "lr": 1.9838047740647024e-06, "epoch": 3.0508474576271185, "percentage": 61.02, "elapsed_time": "0:11:54", "remaining_time": "0:07:36", "throughput": 1517.85, "total_tokens": 1084608}
{"current_steps": 1265, "total_steps": 2065, "loss": 0.0, "lr": 1.9631433331771886e-06, "epoch": 3.062953995157385, "percentage": 61.26, "elapsed_time": "0:11:54", "remaining_time": "0:07:32", "throughput": 1523.24, "total_tokens": 1089024}
{"current_steps": 1270, "total_steps": 2065, "loss": 0.0, "lr": 1.942520263376351e-06, "epoch": 3.0750605326876514, "percentage": 61.5, "elapsed_time": "0:11:55", "remaining_time": "0:07:27", "throughput": 1528.54, "total_tokens": 1093376}
{"current_steps": 1275, "total_steps": 2065, "loss": 0.0, "lr": 1.921937038667539e-06, "epoch": 3.0871670702179177, "percentage": 61.74, "elapsed_time": "0:11:55", "remaining_time": "0:07:23", "throughput": 1533.84, "total_tokens": 1097728}
{"current_steps": 1280, "total_steps": 2065, "loss": 0.0831, "lr": 1.901395130208229e-06, "epoch": 3.099273607748184, "percentage": 61.99, "elapsed_time": "0:11:56", "remaining_time": "0:07:19", "throughput": 1538.86, "total_tokens": 1101888}
{"current_steps": 1285, "total_steps": 2065, "loss": 0.0, "lr": 1.880896006202876e-06, "epoch": 3.11138014527845, "percentage": 62.23, "elapsed_time": "0:11:56", "remaining_time": "0:07:14", "throughput": 1544.06, "total_tokens": 1106176}
{"current_steps": 1290, "total_steps": 2065, "loss": 0.0, "lr": 1.860441131797977e-06, "epoch": 3.123486682808717, "percentage": 62.47, "elapsed_time": "0:11:56", "remaining_time": "0:07:10", "throughput": 1549.0, "total_tokens": 1110272}
{"current_steps": 1295, "total_steps": 2065, "loss": 0.0001, "lr": 1.8400319689773474e-06, "epoch": 3.135593220338983, "percentage": 62.71, "elapsed_time": "0:11:57", "remaining_time": "0:07:06", "throughput": 1554.09, "total_tokens": 1114496}
{"current_steps": 1300, "total_steps": 2065, "loss": 0.0001, "lr": 1.8196699764576316e-06, "epoch": 3.1476997578692494, "percentage": 62.95, "elapsed_time": "0:11:57", "remaining_time": "0:07:02", "throughput": 1559.27, "total_tokens": 1118784}
{"current_steps": 1305, "total_steps": 2065, "loss": 0.0001, "lr": 1.7993566095840442e-06, "epoch": 3.1598062953995156, "percentage": 63.2, "elapsed_time": "0:11:57", "remaining_time": "0:06:58", "throughput": 1564.36, "total_tokens": 1123008}
{"current_steps": 1310, "total_steps": 2065, "loss": 0.0001, "lr": 1.7790933202263437e-06, "epoch": 3.171912832929782, "percentage": 63.44, "elapsed_time": "0:11:58", "remaining_time": "0:06:53", "throughput": 1569.7, "total_tokens": 1127424}
{"current_steps": 1315, "total_steps": 2065, "loss": 0.0001, "lr": 1.7588815566750728e-06, "epoch": 3.1840193704600486, "percentage": 63.68, "elapsed_time": "0:11:58", "remaining_time": "0:06:49", "throughput": 1575.04, "total_tokens": 1131840}
{"current_steps": 1320, "total_steps": 2065, "loss": 0.0001, "lr": 1.7387227635380362e-06, "epoch": 3.196125907990315, "percentage": 63.92, "elapsed_time": "0:11:58", "remaining_time": "0:06:45", "throughput": 1580.28, "total_tokens": 1136192}
{"current_steps": 1325, "total_steps": 2065, "loss": 0.0001, "lr": 1.7186183816370522e-06, "epoch": 3.208232445520581, "percentage": 64.16, "elapsed_time": "0:11:59", "remaining_time": "0:06:41", "throughput": 1585.52, "total_tokens": 1140544}
{"current_steps": 1330, "total_steps": 2065, "loss": 0.0001, "lr": 1.6985698479049703e-06, "epoch": 3.2203389830508473, "percentage": 64.41, "elapsed_time": "0:11:59", "remaining_time": "0:06:37", "throughput": 1591.27, "total_tokens": 1145280}
{"current_steps": 1335, "total_steps": 2065, "loss": 0.0039, "lr": 1.6785785952829718e-06, "epoch": 3.232445520581114, "percentage": 64.65, "elapsed_time": "0:12:00", "remaining_time": "0:06:33", "throughput": 1596.85, "total_tokens": 1149888}
{"current_steps": 1340, "total_steps": 2065, "loss": 0.0, "lr": 1.6586460526181476e-06, "epoch": 3.2445520581113803, "percentage": 64.89, "elapsed_time": "0:12:00", "remaining_time": "0:06:29", "throughput": 1601.65, "total_tokens": 1153920}
{"current_steps": 1345, "total_steps": 2065, "loss": 0.0, "lr": 1.6387736445613772e-06, "epoch": 3.2566585956416465, "percentage": 65.13, "elapsed_time": "0:12:00", "remaining_time": "0:06:25", "throughput": 1607.3, "total_tokens": 1158592}
{"current_steps": 1350, "total_steps": 2065, "loss": 0.0, "lr": 1.618962791465501e-06, "epoch": 3.2687651331719128, "percentage": 65.38, "elapsed_time": "0:12:01", "remaining_time": "0:06:21", "throughput": 1612.33, "total_tokens": 1162816}
{"current_steps": 1352, "total_steps": 2065, "eval_loss": 0.25782492756843567, "epoch": 3.2736077481840193, "percentage": 65.47, "elapsed_time": "0:12:01", "remaining_time": "0:06:20", "throughput": 1613.0, "total_tokens": 1164544}
{"current_steps": 1355, "total_steps": 2065, "loss": 0.0002, "lr": 1.599214909283805e-06, "epoch": 3.280871670702179, "percentage": 65.62, "elapsed_time": "0:12:51", "remaining_time": "0:06:44", "throughput": 1513.19, "total_tokens": 1167232}
{"current_steps": 1360, "total_steps": 2065, "loss": 0.0, "lr": 1.579531409468815e-06, "epoch": 3.2929782082324457, "percentage": 65.86, "elapsed_time": "0:12:51", "remaining_time": "0:06:40", "throughput": 1518.18, "total_tokens": 1171648}
{"current_steps": 1365, "total_steps": 2065, "loss": 0.0, "lr": 1.5599136988714186e-06, "epoch": 3.305084745762712, "percentage": 66.1, "elapsed_time": "0:12:52", "remaining_time": "0:06:35", "throughput": 1522.86, "total_tokens": 1175808}
{"current_steps": 1370, "total_steps": 2065, "loss": 0.0, "lr": 1.5403631796403085e-06, "epoch": 3.317191283292978, "percentage": 66.34, "elapsed_time": "0:12:52", "remaining_time": "0:06:31", "throughput": 1527.84, "total_tokens": 1180224}
{"current_steps": 1375, "total_steps": 2065, "loss": 0.0, "lr": 1.5208812491217669e-06, "epoch": 3.3292978208232444, "percentage": 66.59, "elapsed_time": "0:12:52", "remaining_time": "0:06:27", "throughput": 1532.9, "total_tokens": 1184704}
{"current_steps": 1380, "total_steps": 2065, "loss": 0.053, "lr": 1.5014692997597962e-06, "epoch": 3.341404358353511, "percentage": 66.83, "elapsed_time": "0:12:53", "remaining_time": "0:06:23", "throughput": 1537.72, "total_tokens": 1188992}
{"current_steps": 1385, "total_steps": 2065, "loss": 0.0, "lr": 1.4821287189965865e-06, "epoch": 3.3535108958837774, "percentage": 67.07, "elapsed_time": "0:12:53", "remaining_time": "0:06:19", "throughput": 1542.69, "total_tokens": 1193408}
{"current_steps": 1390, "total_steps": 2065, "loss": 0.0002, "lr": 1.4628608891733626e-06, "epoch": 3.3656174334140436, "percentage": 67.31, "elapsed_time": "0:12:53", "remaining_time": "0:06:15", "throughput": 1547.58, "total_tokens": 1197760}
{"current_steps": 1395, "total_steps": 2065, "loss": 0.0, "lr": 1.443667187431572e-06, "epoch": 3.37772397094431, "percentage": 67.55, "elapsed_time": "0:12:54", "remaining_time": "0:06:11", "throughput": 1552.06, "total_tokens": 1201792}
{"current_steps": 1400, "total_steps": 2065, "loss": 0.0, "lr": 1.4245489856144633e-06, "epoch": 3.389830508474576, "percentage": 67.8, "elapsed_time": "0:12:54", "remaining_time": "0:06:07", "throughput": 1556.54, "total_tokens": 1205824}
{"current_steps": 1405, "total_steps": 2065, "loss": 0.0, "lr": 1.4055076501690313e-06, "epoch": 3.401937046004843, "percentage": 68.04, "elapsed_time": "0:12:55", "remaining_time": "0:06:04", "throughput": 1561.49, "total_tokens": 1210240}
{"current_steps": 1410, "total_steps": 2065, "loss": 0.0, "lr": 1.3865445420483524e-06, "epoch": 3.414043583535109, "percentage": 68.28, "elapsed_time": "0:12:55", "remaining_time": "0:06:00", "throughput": 1566.2, "total_tokens": 1214464}
{"current_steps": 1415, "total_steps": 2065, "loss": 0.0005, "lr": 1.367661016614315e-06, "epoch": 3.4261501210653753, "percentage": 68.52, "elapsed_time": "0:12:55", "remaining_time": "0:05:56", "throughput": 1570.98, "total_tokens": 1218752}
{"current_steps": 1420, "total_steps": 2065, "loss": 0.0, "lr": 1.348858423540744e-06, "epoch": 3.4382566585956416, "percentage": 68.77, "elapsed_time": "0:12:56", "remaining_time": "0:05:52", "throughput": 1575.91, "total_tokens": 1223168}
{"current_steps": 1425, "total_steps": 2065, "loss": 0.0, "lr": 1.3301381067169367e-06, "epoch": 3.450363196125908, "percentage": 69.01, "elapsed_time": "0:12:56", "remaining_time": "0:05:48", "throughput": 1580.53, "total_tokens": 1227328}
{"current_steps": 1430, "total_steps": 2065, "loss": 0.0, "lr": 1.3115014041516088e-06, "epoch": 3.4624697336561745, "percentage": 69.25, "elapsed_time": "0:12:56", "remaining_time": "0:05:44", "throughput": 1584.98, "total_tokens": 1231360}
{"current_steps": 1435, "total_steps": 2065, "loss": 0.0, "lr": 1.2929496478772635e-06, "epoch": 3.4745762711864407, "percentage": 69.49, "elapsed_time": "0:12:57", "remaining_time": "0:05:41", "throughput": 1589.51, "total_tokens": 1235456}
{"current_steps": 1440, "total_steps": 2065, "loss": 0.0, "lr": 1.2744841638549843e-06, "epoch": 3.486682808716707, "percentage": 69.73, "elapsed_time": "0:12:57", "remaining_time": "0:05:37", "throughput": 1594.1, "total_tokens": 1239616}
{"current_steps": 1445, "total_steps": 2065, "loss": 0.0, "lr": 1.2561062718796663e-06, "epoch": 3.4987893462469732, "percentage": 69.98, "elapsed_time": "0:12:57", "remaining_time": "0:05:33", "throughput": 1598.94, "total_tokens": 1243968}
{"current_steps": 1450, "total_steps": 2065, "loss": 0.0001, "lr": 1.2378172854856831e-06, "epoch": 3.5108958837772395, "percentage": 70.22, "elapsed_time": "0:12:58", "remaining_time": "0:05:30", "throughput": 1603.53, "total_tokens": 1248128}
{"current_steps": 1455, "total_steps": 2065, "loss": 0.0, "lr": 1.2196185118530063e-06, "epoch": 3.523002421307506, "percentage": 70.46, "elapsed_time": "0:12:58", "remaining_time": "0:05:26", "throughput": 1608.12, "total_tokens": 1252288}
{"current_steps": 1456, "total_steps": 2065, "eval_loss": 0.2580437958240509, "epoch": 3.5254237288135593, "percentage": 70.51, "elapsed_time": "0:12:59", "remaining_time": "0:05:26", "throughput": 1607.92, "total_tokens": 1253248}
{"current_steps": 1460, "total_steps": 2065, "loss": 0.0, "lr": 1.2015112517137744e-06, "epoch": 3.5351089588377724, "percentage": 70.7, "elapsed_time": "0:13:42", "remaining_time": "0:05:40", "throughput": 1528.3, "total_tokens": 1256640}
{"current_steps": 1465, "total_steps": 2065, "loss": 0.0, "lr": 1.183496799259326e-06, "epoch": 3.5472154963680387, "percentage": 70.94, "elapsed_time": "0:13:42", "remaining_time": "0:05:36", "throughput": 1533.43, "total_tokens": 1261440}
{"current_steps": 1470, "total_steps": 2065, "loss": 0.0, "lr": 1.165576442047699e-06, "epoch": 3.559322033898305, "percentage": 71.19, "elapsed_time": "0:13:42", "remaining_time": "0:05:33", "throughput": 1537.88, "total_tokens": 1265664}
{"current_steps": 1475, "total_steps": 2065, "loss": 0.0, "lr": 1.147751460911604e-06, "epoch": 3.571428571428571, "percentage": 71.43, "elapsed_time": "0:13:43", "remaining_time": "0:05:29", "throughput": 1542.47, "total_tokens": 1270016}
{"current_steps": 1480, "total_steps": 2065, "loss": 0.0, "lr": 1.1300231298668786e-06, "epoch": 3.583535108958838, "percentage": 71.67, "elapsed_time": "0:13:43", "remaining_time": "0:05:25", "throughput": 1547.29, "total_tokens": 1274560}
{"current_steps": 1485, "total_steps": 2065, "loss": 0.0, "lr": 1.112392716021429e-06, "epoch": 3.595641646489104, "percentage": 71.91, "elapsed_time": "0:13:44", "remaining_time": "0:05:21", "throughput": 1551.96, "total_tokens": 1278976}
{"current_steps": 1490, "total_steps": 2065, "loss": 0.0, "lr": 1.0948614794846668e-06, "epoch": 3.6077481840193704, "percentage": 72.15, "elapsed_time": "0:13:44", "remaining_time": "0:05:18", "throughput": 1556.39, "total_tokens": 1283200}
{"current_steps": 1495, "total_steps": 2065, "loss": 0.0, "lr": 1.0774306732774414e-06, "epoch": 3.619854721549637, "percentage": 72.4, "elapsed_time": "0:13:44", "remaining_time": "0:05:14", "throughput": 1560.67, "total_tokens": 1287296}
{"current_steps": 1500, "total_steps": 2065, "loss": 0.0, "lr": 1.0601015432424818e-06, "epoch": 3.6319612590799033, "percentage": 72.64, "elapsed_time": "0:13:45", "remaining_time": "0:05:10", "throughput": 1565.32, "total_tokens": 1291712}
{"current_steps": 1505, "total_steps": 2065, "loss": 0.0328, "lr": 1.0428753279553561e-06, "epoch": 3.6440677966101696, "percentage": 72.88, "elapsed_time": "0:13:45", "remaining_time": "0:05:07", "throughput": 1569.73, "total_tokens": 1295936}
{"current_steps": 1510, "total_steps": 2065, "loss": 0.0527, "lr": 1.0257532586359422e-06, "epoch": 3.656174334140436, "percentage": 73.12, "elapsed_time": "0:13:45", "remaining_time": "0:05:03", "throughput": 1574.67, "total_tokens": 1300608}
{"current_steps": 1515, "total_steps": 2065, "loss": 0.0, "lr": 1.008736559060429e-06, "epoch": 3.668280871670702, "percentage": 73.37, "elapsed_time": "0:13:46", "remaining_time": "0:04:59", "throughput": 1579.31, "total_tokens": 1305024}
{"current_steps": 1520, "total_steps": 2065, "loss": 0.0001, "lr": 9.918264454738504e-07, "epoch": 3.6803874092009687, "percentage": 73.61, "elapsed_time": "0:13:46", "remaining_time": "0:04:56", "throughput": 1583.86, "total_tokens": 1309376}
{"current_steps": 1525, "total_steps": 2065, "loss": 0.0001, "lr": 9.750241265031529e-07, "epoch": 3.692493946731235, "percentage": 73.85, "elapsed_time": "0:13:47", "remaining_time": "0:04:52", "throughput": 1588.34, "total_tokens": 1313664}
{"current_steps": 1530, "total_steps": 2065, "loss": 0.0001, "lr": 9.583308030708135e-07, "epoch": 3.7046004842615012, "percentage": 74.09, "elapsed_time": "0:13:47", "remaining_time": "0:04:49", "throughput": 1592.97, "total_tokens": 1318080}
{"current_steps": 1535, "total_steps": 2065, "loss": 0.0001, "lr": 9.417476683090007e-07, "epoch": 3.7167070217917675, "percentage": 74.33, "elapsed_time": "0:13:47", "remaining_time": "0:04:45", "throughput": 1597.51, "total_tokens": 1322432}
{"current_steps": 1540, "total_steps": 2065, "loss": 0.0003, "lr": 9.252759074743034e-07, "epoch": 3.7288135593220337, "percentage": 74.58, "elapsed_time": "0:13:48", "remaining_time": "0:04:42", "throughput": 1602.13, "total_tokens": 1326848}
{"current_steps": 1545, "total_steps": 2065, "loss": 0.0, "lr": 9.08916697863014e-07, "epoch": 3.7409200968523004, "percentage": 74.82, "elapsed_time": "0:13:48", "remaining_time": "0:04:38", "throughput": 1606.82, "total_tokens": 1331328}
{"current_steps": 1550, "total_steps": 2065, "loss": 0.0, "lr": 8.926712087269801e-07, "epoch": 3.7530266343825667, "percentage": 75.06, "elapsed_time": "0:13:48", "remaining_time": "0:04:35", "throughput": 1611.05, "total_tokens": 1335424}
{"current_steps": 1555, "total_steps": 2065, "loss": 0.0, "lr": 8.765406011900368e-07, "epoch": 3.765133171912833, "percentage": 75.3, "elapsed_time": "0:13:49", "remaining_time": "0:04:31", "throughput": 1615.51, "total_tokens": 1339712}
{"current_steps": 1560, "total_steps": 2065, "loss": 0.0, "lr": 8.605260281650152e-07, "epoch": 3.777239709443099, "percentage": 75.54, "elapsed_time": "0:13:49", "remaining_time": "0:04:28", "throughput": 1619.96, "total_tokens": 1344000}
{"current_steps": 1560, "total_steps": 2065, "eval_loss": 0.2703007757663727, "epoch": 3.777239709443099, "percentage": 75.54, "elapsed_time": "0:13:50", "remaining_time": "0:04:28", "throughput": 1618.71, "total_tokens": 1344000}
{"current_steps": 1565, "total_steps": 2065, "loss": 0.0, "lr": 8.44628634271342e-07, "epoch": 3.7893462469733654, "percentage": 75.79, "elapsed_time": "0:14:28", "remaining_time": "0:04:37", "throughput": 1551.82, "total_tokens": 1348224}
{"current_steps": 1570, "total_steps": 2065, "loss": 0.0017, "lr": 8.288495557532241e-07, "epoch": 3.801452784503632, "percentage": 76.03, "elapsed_time": "0:14:29", "remaining_time": "0:04:34", "throughput": 1556.17, "total_tokens": 1352576}
{"current_steps": 1575, "total_steps": 2065, "loss": 0.0616, "lr": 8.131899203984464e-07, "epoch": 3.8135593220338984, "percentage": 76.27, "elapsed_time": "0:14:29", "remaining_time": "0:04:30", "throughput": 1560.44, "total_tokens": 1356864}
{"current_steps": 1580, "total_steps": 2065, "loss": 0.0, "lr": 7.976508474577549e-07, "epoch": 3.8256658595641646, "percentage": 76.51, "elapsed_time": "0:14:29", "remaining_time": "0:04:27", "throughput": 1564.7, "total_tokens": 1361152}
{"current_steps": 1585, "total_steps": 2065, "loss": 0.0, "lr": 7.822334475648655e-07, "epoch": 3.837772397094431, "percentage": 76.76, "elapsed_time": "0:14:30", "remaining_time": "0:04:23", "throughput": 1568.9, "total_tokens": 1365376}
{"current_steps": 1590, "total_steps": 2065, "loss": 0.0, "lr": 7.66938822657081e-07, "epoch": 3.849878934624697, "percentage": 77.0, "elapsed_time": "0:14:30", "remaining_time": "0:04:20", "throughput": 1573.23, "total_tokens": 1369728}
{"current_steps": 1595, "total_steps": 2065, "loss": 0.0, "lr": 7.517680658965328e-07, "epoch": 3.861985472154964, "percentage": 77.24, "elapsed_time": "0:14:31", "remaining_time": "0:04:16", "throughput": 1577.63, "total_tokens": 1374144}
{"current_steps": 1600, "total_steps": 2065, "loss": 0.0, "lr": 7.367222615920477e-07, "epoch": 3.87409200968523, "percentage": 77.48, "elapsed_time": "0:14:31", "remaining_time": "0:04:13", "throughput": 1581.81, "total_tokens": 1378368}
{"current_steps": 1605, "total_steps": 2065, "loss": 0.0, "lr": 7.21802485121649e-07, "epoch": 3.8861985472154963, "percentage": 77.72, "elapsed_time": "0:14:31", "remaining_time": "0:04:09", "throughput": 1585.85, "total_tokens": 1382464}
{"current_steps": 1610, "total_steps": 2065, "loss": 0.0, "lr": 7.070098028556949e-07, "epoch": 3.898305084745763, "percentage": 77.97, "elapsed_time": "0:14:32", "remaining_time": "0:04:06", "throughput": 1590.24, "total_tokens": 1386880}
{"current_steps": 1615, "total_steps": 2065, "loss": 0.0, "lr": 6.923452720806612e-07, "epoch": 3.910411622276029, "percentage": 78.21, "elapsed_time": "0:14:32", "remaining_time": "0:04:03", "throughput": 1594.63, "total_tokens": 1391296}
{"current_steps": 1620, "total_steps": 2065, "loss": 0.0, "lr": 6.778099409235739e-07, "epoch": 3.9225181598062955, "percentage": 78.45, "elapsed_time": "0:14:32", "remaining_time": "0:03:59", "throughput": 1598.72, "total_tokens": 1395456}
{"current_steps": 1625, "total_steps": 2065, "loss": 0.0, "lr": 6.634048482770946e-07, "epoch": 3.9346246973365617, "percentage": 78.69, "elapsed_time": "0:14:33", "remaining_time": "0:03:56", "throughput": 1602.82, "total_tokens": 1399616}
{"current_steps": 1630, "total_steps": 2065, "loss": 0.0, "lr": 6.491310237252679e-07, "epoch": 3.946731234866828, "percentage": 78.93, "elapsed_time": "0:14:33", "remaining_time": "0:03:53", "throughput": 1606.83, "total_tokens": 1403712}
{"current_steps": 1635, "total_steps": 2065, "loss": 0.0, "lr": 6.349894874699345e-07, "epoch": 3.9588377723970947, "percentage": 79.18, "elapsed_time": "0:14:33", "remaining_time": "0:03:49", "throughput": 1611.21, "total_tokens": 1408128}
{"current_steps": 1640, "total_steps": 2065, "loss": 0.0, "lr": 6.209812502578113e-07, "epoch": 3.970944309927361, "percentage": 79.42, "elapsed_time": "0:14:34", "remaining_time": "0:03:46", "throughput": 1615.5, "total_tokens": 1412480}
{"current_steps": 1645, "total_steps": 2065, "loss": 0.0, "lr": 6.071073133082492e-07, "epoch": 3.983050847457627, "percentage": 79.66, "elapsed_time": "0:14:34", "remaining_time": "0:03:43", "throughput": 1619.65, "total_tokens": 1416704}
{"current_steps": 1650, "total_steps": 2065, "loss": 0.0, "lr": 5.933686682416759e-07, "epoch": 3.9951573849878934, "percentage": 79.9, "elapsed_time": "0:14:35", "remaining_time": "0:03:40", "throughput": 1624.02, "total_tokens": 1421120}
{"current_steps": 1655, "total_steps": 2065, "loss": 0.0, "lr": 5.797662970087184e-07, "epoch": 4.00726392251816, "percentage": 80.15, "elapsed_time": "0:14:35", "remaining_time": "0:03:36", "throughput": 1627.44, "total_tokens": 1424944}
{"current_steps": 1660, "total_steps": 2065, "loss": 0.0, "lr": 5.663011718200201e-07, "epoch": 4.019370460048426, "percentage": 80.39, "elapsed_time": "0:14:35", "remaining_time": "0:03:33", "throughput": 1631.72, "total_tokens": 1429296}
{"current_steps": 1664, "total_steps": 2065, "eval_loss": 0.2501881718635559, "epoch": 4.0290556900726395, "percentage": 80.58, "elapsed_time": "0:14:36", "remaining_time": "0:03:31", "throughput": 1634.11, "total_tokens": 1432880}
{"current_steps": 1665, "total_steps": 2065, "loss": 0.0, "lr": 5.529742550767545e-07, "epoch": 4.031476997578692, "percentage": 80.63, "elapsed_time": "0:15:11", "remaining_time": "0:03:38", "throughput": 1572.94, "total_tokens": 1433776}
{"current_steps": 1670, "total_steps": 2065, "loss": 0.0, "lr": 5.397864993018367e-07, "epoch": 4.043583535108959, "percentage": 80.87, "elapsed_time": "0:15:11", "remaining_time": "0:03:35", "throughput": 1576.93, "total_tokens": 1438000}
{"current_steps": 1675, "total_steps": 2065, "loss": 0.0, "lr": 5.267388470718449e-07, "epoch": 4.0556900726392255, "percentage": 81.11, "elapsed_time": "0:15:12", "remaining_time": "0:03:32", "throughput": 1581.05, "total_tokens": 1442352}
{"current_steps": 1680, "total_steps": 2065, "loss": 0.0, "lr": 5.138322309496504e-07, "epoch": 4.067796610169491, "percentage": 81.36, "elapsed_time": "0:15:12", "remaining_time": "0:03:29", "throughput": 1585.18, "total_tokens": 1446704}
{"current_steps": 1685, "total_steps": 2065, "loss": 0.0, "lr": 5.010675734177631e-07, "epoch": 4.079903147699758, "percentage": 81.6, "elapsed_time": "0:15:13", "remaining_time": "0:03:25", "throughput": 1589.1, "total_tokens": 1450864}
{"current_steps": 1690, "total_steps": 2065, "loss": 0.0, "lr": 4.884457868124001e-07, "epoch": 4.092009685230024, "percentage": 81.84, "elapsed_time": "0:15:13", "remaining_time": "0:03:22", "throughput": 1593.08, "total_tokens": 1455088}
{"current_steps": 1695, "total_steps": 2065, "loss": 0.0051, "lr": 4.759677732582782e-07, "epoch": 4.1041162227602905, "percentage": 82.08, "elapsed_time": "0:15:13", "remaining_time": "0:03:19", "throughput": 1597.13, "total_tokens": 1459376}
{"current_steps": 1700, "total_steps": 2065, "loss": 0.0, "lr": 4.6363442460413215e-07, "epoch": 4.116222760290557, "percentage": 82.32, "elapsed_time": "0:15:14", "remaining_time": "0:03:16", "throughput": 1601.11, "total_tokens": 1463600}
{"current_steps": 1705, "total_steps": 2065, "loss": 0.0, "lr": 4.514466223589753e-07, "epoch": 4.128329297820823, "percentage": 82.57, "elapsed_time": "0:15:14", "remaining_time": "0:03:13", "throughput": 1605.35, "total_tokens": 1468080}
{"current_steps": 1710, "total_steps": 2065, "loss": 0.0253, "lr": 4.394052376290914e-07, "epoch": 4.14043583535109, "percentage": 82.81, "elapsed_time": "0:15:14", "remaining_time": "0:03:09", "throughput": 1609.67, "total_tokens": 1472624}
{"current_steps": 1715, "total_steps": 2065, "loss": 0.0, "lr": 4.2751113105577587e-07, "epoch": 4.1525423728813555, "percentage": 83.05, "elapsed_time": "0:15:15", "remaining_time": "0:03:06", "throughput": 1613.84, "total_tokens": 1477040}
{"current_steps": 1720, "total_steps": 2065, "loss": 0.0, "lr": 4.157651527538223e-07, "epoch": 4.164648910411622, "percentage": 83.29, "elapsed_time": "0:15:15", "remaining_time": "0:03:03", "throughput": 1617.87, "total_tokens": 1481328}
{"current_steps": 1725, "total_steps": 2065, "loss": 0.0, "lr": 4.041681422507604e-07, "epoch": 4.176755447941889, "percentage": 83.54, "elapsed_time": "0:15:15", "remaining_time": "0:03:00", "throughput": 1622.1, "total_tokens": 1485808}
{"current_steps": 1730, "total_steps": 2065, "loss": 0.0, "lr": 3.927209284268535e-07, "epoch": 4.188861985472155, "percentage": 83.78, "elapsed_time": "0:15:16", "remaining_time": "0:02:57", "throughput": 1626.19, "total_tokens": 1490160}
{"current_steps": 1735, "total_steps": 2065, "loss": 0.0, "lr": 3.8142432945585425e-07, "epoch": 4.200968523002421, "percentage": 84.02, "elapsed_time": "0:15:16", "remaining_time": "0:02:54", "throughput": 1630.28, "total_tokens": 1494512}
{"current_steps": 1740, "total_steps": 2065, "loss": 0.0, "lr": 3.702791527465274e-07, "epoch": 4.213075060532688, "percentage": 84.26, "elapsed_time": "0:15:17", "remaining_time": "0:02:51", "throughput": 1633.96, "total_tokens": 1498480}
{"current_steps": 1745, "total_steps": 2065, "loss": 0.0, "lr": 3.592861948849416e-07, "epoch": 4.225181598062954, "percentage": 84.5, "elapsed_time": "0:15:17", "remaining_time": "0:02:48", "throughput": 1637.97, "total_tokens": 1502768}
{"current_steps": 1750, "total_steps": 2065, "loss": 0.0, "lr": 3.484462415775333e-07, "epoch": 4.237288135593221, "percentage": 84.75, "elapsed_time": "0:15:17", "remaining_time": "0:02:45", "throughput": 1641.91, "total_tokens": 1506992}
{"current_steps": 1755, "total_steps": 2065, "loss": 0.0, "lr": 3.377600675949527e-07, "epoch": 4.249394673123486, "percentage": 84.99, "elapsed_time": "0:15:18", "remaining_time": "0:02:42", "throughput": 1646.12, "total_tokens": 1511472}
{"current_steps": 1760, "total_steps": 2065, "loss": 0.0, "lr": 3.272284367166825e-07, "epoch": 4.261501210653753, "percentage": 85.23, "elapsed_time": "0:15:18", "remaining_time": "0:02:39", "throughput": 1650.19, "total_tokens": 1515824}
{"current_steps": 1765, "total_steps": 2065, "loss": 0.0001, "lr": 3.1685210167645336e-07, "epoch": 4.27360774818402, "percentage": 85.47, "elapsed_time": "0:15:18", "remaining_time": "0:02:36", "throughput": 1654.26, "total_tokens": 1520176}
{"current_steps": 1768, "total_steps": 2065, "eval_loss": 0.25040701031684875, "epoch": 4.280871670702179, "percentage": 85.62, "elapsed_time": "0:15:21", "remaining_time": "0:02:34", "throughput": 1652.2, "total_tokens": 1522544}
{"current_steps": 1770, "total_steps": 2065, "loss": 0.0018, "lr": 3.066318041084398e-07, "epoch": 4.285714285714286, "percentage": 85.71, "elapsed_time": "0:16:08", "remaining_time": "0:02:41", "throughput": 1573.81, "total_tokens": 1524336}
{"current_steps": 1775, "total_steps": 2065, "loss": 0.0, "lr": 2.9656827449425495e-07, "epoch": 4.297820823244552, "percentage": 85.96, "elapsed_time": "0:16:08", "remaining_time": "0:02:38", "throughput": 1577.57, "total_tokens": 1528560}
{"current_steps": 1780, "total_steps": 2065, "loss": 0.026, "lr": 2.86662232110739e-07, "epoch": 4.309927360774818, "percentage": 86.2, "elapsed_time": "0:16:09", "remaining_time": "0:02:35", "throughput": 1581.27, "total_tokens": 1532720}
{"current_steps": 1785, "total_steps": 2065, "loss": 0.0, "lr": 2.769143849785513e-07, "epoch": 4.322033898305085, "percentage": 86.44, "elapsed_time": "0:16:09", "remaining_time": "0:02:32", "throughput": 1585.02, "total_tokens": 1536944}
{"current_steps": 1790, "total_steps": 2065, "loss": 0.0, "lr": 2.673254298115646e-07, "epoch": 4.3341404358353515, "percentage": 86.68, "elapsed_time": "0:16:10", "remaining_time": "0:02:29", "throughput": 1588.78, "total_tokens": 1541168}
{"current_steps": 1795, "total_steps": 2065, "loss": 0.0, "lr": 2.5789605196706675e-07, "epoch": 4.346246973365617, "percentage": 86.92, "elapsed_time": "0:16:10", "remaining_time": "0:02:25", "throughput": 1592.59, "total_tokens": 1545456}
{"current_steps": 1800, "total_steps": 2065, "loss": 0.0, "lr": 2.4862692539677907e-07, "epoch": 4.358353510895884, "percentage": 87.17, "elapsed_time": "0:16:10", "remaining_time": "0:02:22", "throughput": 1596.54, "total_tokens": 1549872}
{"current_steps": 1805, "total_steps": 2065, "loss": 0.0, "lr": 2.39518712598685e-07, "epoch": 4.37046004842615, "percentage": 87.41, "elapsed_time": "0:16:11", "remaining_time": "0:02:19", "throughput": 1600.47, "total_tokens": 1554288}
{"current_steps": 1810, "total_steps": 2065, "loss": 0.0, "lr": 2.3057206456967908e-07, "epoch": 4.3825665859564165, "percentage": 87.65, "elapsed_time": "0:16:11", "remaining_time": "0:02:16", "throughput": 1604.08, "total_tokens": 1558384}
{"current_steps": 1815, "total_steps": 2065, "loss": 0.0, "lr": 2.2178762075903747e-07, "epoch": 4.394673123486683, "percentage": 87.89, "elapsed_time": "0:16:11", "remaining_time": "0:02:13", "throughput": 1607.76, "total_tokens": 1562544}
{"current_steps": 1820, "total_steps": 2065, "loss": 0.0, "lr": 2.131660090227139e-07, "epoch": 4.406779661016949, "percentage": 88.14, "elapsed_time": "0:16:12", "remaining_time": "0:02:10", "throughput": 1611.94, "total_tokens": 1567216}
{"current_steps": 1825, "total_steps": 2065, "loss": 0.0, "lr": 2.0470784557846652e-07, "epoch": 4.418886198547216, "percentage": 88.38, "elapsed_time": "0:16:12", "remaining_time": "0:02:07", "throughput": 1615.8, "total_tokens": 1571568}
{"current_steps": 1830, "total_steps": 2065, "loss": 0.0, "lr": 1.9641373496181143e-07, "epoch": 4.4309927360774815, "percentage": 88.62, "elapsed_time": "0:16:12", "remaining_time": "0:02:04", "throughput": 1619.52, "total_tokens": 1575792}
{"current_steps": 1835, "total_steps": 2065, "loss": 0.0, "lr": 1.882842699828169e-07, "epoch": 4.443099273607748, "percentage": 88.86, "elapsed_time": "0:16:13", "remaining_time": "0:02:02", "throughput": 1623.31, "total_tokens": 1580080}
{"current_steps": 1840, "total_steps": 2065, "loss": 0.0, "lr": 1.8032003168373306e-07, "epoch": 4.455205811138015, "percentage": 89.1, "elapsed_time": "0:16:13", "remaining_time": "0:01:59", "throughput": 1626.84, "total_tokens": 1584112}
{"current_steps": 1845, "total_steps": 2065, "loss": 0.0, "lr": 1.7252158929746133e-07, "epoch": 4.467312348668281, "percentage": 89.35, "elapsed_time": "0:16:14", "remaining_time": "0:01:56", "throughput": 1630.63, "total_tokens": 1588400}
{"current_steps": 1850, "total_steps": 2065, "loss": 0.0, "lr": 1.6488950020686956e-07, "epoch": 4.479418886198547, "percentage": 89.59, "elapsed_time": "0:16:14", "remaining_time": "0:01:53", "throughput": 1634.54, "total_tokens": 1592816}
{"current_steps": 1855, "total_steps": 2065, "loss": 0.0, "lr": 1.5742430990495465e-07, "epoch": 4.491525423728813, "percentage": 89.83, "elapsed_time": "0:16:14", "remaining_time": "0:01:50", "throughput": 1638.51, "total_tokens": 1597296}
{"current_steps": 1860, "total_steps": 2065, "loss": 0.0184, "lr": 1.501265519558537e-07, "epoch": 4.50363196125908, "percentage": 90.07, "elapsed_time": "0:16:15", "remaining_time": "0:01:47", "throughput": 1642.35, "total_tokens": 1601648}
{"current_steps": 1865, "total_steps": 2065, "loss": 0.0, "lr": 1.4299674795670765e-07, "epoch": 4.5157384987893465, "percentage": 90.31, "elapsed_time": "0:16:15", "remaining_time": "0:01:44", "throughput": 1646.12, "total_tokens": 1605936}
{"current_steps": 1870, "total_steps": 2065, "loss": 0.0, "lr": 1.360354075003828e-07, "epoch": 4.527845036319612, "percentage": 90.56, "elapsed_time": "0:16:15", "remaining_time": "0:01:41", "throughput": 1649.77, "total_tokens": 1610096}
{"current_steps": 1872, "total_steps": 2065, "eval_loss": 0.2488991767168045, "epoch": 4.532687651331719, "percentage": 90.65, "elapsed_time": "0:16:16", "remaining_time": "0:01:40", "throughput": 1650.2, "total_tokens": 1611760}
{"current_steps": 1875, "total_steps": 2065, "loss": 0.0, "lr": 1.2924302813904582e-07, "epoch": 4.539951573849879, "percentage": 90.8, "elapsed_time": "0:17:29", "remaining_time": "0:01:46", "throughput": 1538.37, "total_tokens": 1614384}
{"current_steps": 1880, "total_steps": 2065, "loss": 0.0, "lr": 1.2262009534860368e-07, "epoch": 4.552058111380145, "percentage": 91.04, "elapsed_time": "0:17:29", "remaining_time": "0:01:43", "throughput": 1542.04, "total_tokens": 1618800}
{"current_steps": 1885, "total_steps": 2065, "loss": 0.0, "lr": 1.161670824940045e-07, "epoch": 4.5641646489104115, "percentage": 91.28, "elapsed_time": "0:17:30", "remaining_time": "0:01:40", "throughput": 1545.46, "total_tokens": 1622960}
{"current_steps": 1890, "total_steps": 2065, "loss": 0.0, "lr": 1.0988445079540389e-07, "epoch": 4.576271186440678, "percentage": 91.53, "elapsed_time": "0:17:30", "remaining_time": "0:01:37", "throughput": 1548.83, "total_tokens": 1627056}
{"current_steps": 1895, "total_steps": 2065, "loss": 0.0002, "lr": 1.0377264929520126e-07, "epoch": 4.588377723970944, "percentage": 91.77, "elapsed_time": "0:17:30", "remaining_time": "0:01:34", "throughput": 1552.42, "total_tokens": 1631408}
{"current_steps": 1900, "total_steps": 2065, "loss": 0.0, "lr": 9.783211482594285e-08, "epoch": 4.600484261501211, "percentage": 92.01, "elapsed_time": "0:17:31", "remaining_time": "0:01:31", "throughput": 1556.13, "total_tokens": 1635888}
{"current_steps": 1905, "total_steps": 2065, "loss": 0.0, "lr": 9.206327197910203e-08, "epoch": 4.6125907990314765, "percentage": 92.25, "elapsed_time": "0:17:31", "remaining_time": "0:01:28", "throughput": 1559.67, "total_tokens": 1640176}
{"current_steps": 1910, "total_steps": 2065, "loss": 0.0, "lr": 8.64665330747308e-08, "epoch": 4.624697336561743, "percentage": 92.49, "elapsed_time": "0:17:31", "remaining_time": "0:01:25", "throughput": 1563.26, "total_tokens": 1644528}
{"current_steps": 1915, "total_steps": 2065, "loss": 0.0, "lr": 8.104229813199111e-08, "epoch": 4.63680387409201, "percentage": 92.74, "elapsed_time": "0:17:32", "remaining_time": "0:01:22", "throughput": 1567.2, "total_tokens": 1649264}
{"current_steps": 1920, "total_steps": 2065, "loss": 0.0, "lr": 7.579095484056193e-08, "epoch": 4.648910411622276, "percentage": 92.98, "elapsed_time": "0:17:32", "remaining_time": "0:01:19", "throughput": 1570.96, "total_tokens": 1653808}
{"current_steps": 1925, "total_steps": 2065, "loss": 0.0, "lr": 7.071287853293141e-08, "epoch": 4.661016949152542, "percentage": 93.22, "elapsed_time": "0:17:33", "remaining_time": "0:01:16", "throughput": 1574.66, "total_tokens": 1658288}
{"current_steps": 1930, "total_steps": 2065, "loss": 0.0, "lr": 6.580843215757082e-08, "epoch": 4.673123486682809, "percentage": 93.46, "elapsed_time": "0:17:33", "remaining_time": "0:01:13", "throughput": 1578.18, "total_tokens": 1662576}
{"current_steps": 1935, "total_steps": 2065, "loss": 0.0, "lr": 6.107796625299117e-08, "epoch": 4.685230024213075, "percentage": 93.7, "elapsed_time": "0:17:33", "remaining_time": "0:01:10", "throughput": 1581.87, "total_tokens": 1667056}
{"current_steps": 1940, "total_steps": 2065, "loss": 0.0, "lr": 5.652181892269182e-08, "epoch": 4.697336561743342, "percentage": 93.95, "elapsed_time": "0:17:34", "remaining_time": "0:01:07", "throughput": 1585.56, "total_tokens": 1671536}
{"current_steps": 1945, "total_steps": 2065, "loss": 0.0, "lr": 5.214031581099149e-08, "epoch": 4.709443099273607, "percentage": 94.19, "elapsed_time": "0:17:34", "remaining_time": "0:01:05", "throughput": 1589.13, "total_tokens": 1675888}
{"current_steps": 1950, "total_steps": 2065, "loss": 0.0, "lr": 4.793377007975719e-08, "epoch": 4.721549636803874, "percentage": 94.43, "elapsed_time": "0:17:34", "remaining_time": "0:01:02", "throughput": 1592.64, "total_tokens": 1680176}
{"current_steps": 1955, "total_steps": 2065, "loss": 0.0, "lr": 4.3902482386018186e-08, "epoch": 4.733656174334141, "percentage": 94.67, "elapsed_time": "0:17:35", "remaining_time": "0:00:59", "throughput": 1596.08, "total_tokens": 1684400}
{"current_steps": 1960, "total_steps": 2065, "loss": 0.0357, "lr": 4.004674086047905e-08, "epoch": 4.745762711864407, "percentage": 94.92, "elapsed_time": "0:17:35", "remaining_time": "0:00:56", "throughput": 1599.7, "total_tokens": 1688816}
{"current_steps": 1965, "total_steps": 2065, "loss": 0.0, "lr": 3.636682108692502e-08, "epoch": 4.757869249394673, "percentage": 95.16, "elapsed_time": "0:17:36", "remaining_time": "0:00:53", "throughput": 1603.44, "total_tokens": 1693360}
{"current_steps": 1970, "total_steps": 2065, "loss": 0.0, "lr": 3.286298608252442e-08, "epoch": 4.76997578692494, "percentage": 95.4, "elapsed_time": "0:17:36", "remaining_time": "0:00:50", "throughput": 1606.87, "total_tokens": 1697584}
{"current_steps": 1975, "total_steps": 2065, "loss": 0.0, "lr": 2.953548627903202e-08, "epoch": 4.782082324455206, "percentage": 95.64, "elapsed_time": "0:17:36", "remaining_time": "0:00:48", "throughput": 1610.49, "total_tokens": 1702000}
{"current_steps": 1976, "total_steps": 2065, "eval_loss": 0.2507624924182892, "epoch": 4.784503631961259, "percentage": 95.69, "elapsed_time": "0:17:37", "remaining_time": "0:00:47", "throughput": 1610.21, "total_tokens": 1702832}
{"current_steps": 1980, "total_steps": 2065, "loss": 0.0, "lr": 2.6384559504886164e-08, "epoch": 4.7941888619854724, "percentage": 95.88, "elapsed_time": "0:18:14", "remaining_time": "0:00:46", "throughput": 1559.14, "total_tokens": 1706416}
{"current_steps": 1985, "total_steps": 2065, "loss": 0.0, "lr": 2.3410430968214825e-08, "epoch": 4.806295399515738, "percentage": 96.13, "elapsed_time": "0:18:14", "remaining_time": "0:00:44", "throughput": 1562.76, "total_tokens": 1710960}
{"current_steps": 1990, "total_steps": 2065, "loss": 0.0, "lr": 2.0613313240735457e-08, "epoch": 4.818401937046005, "percentage": 96.37, "elapsed_time": "0:18:15", "remaining_time": "0:00:41", "throughput": 1566.32, "total_tokens": 1715440}
{"current_steps": 1995, "total_steps": 2065, "loss": 0.0, "lr": 1.7993406242563238e-08, "epoch": 4.830508474576272, "percentage": 96.61, "elapsed_time": "0:18:15", "remaining_time": "0:00:38", "throughput": 1569.7, "total_tokens": 1719728}
{"current_steps": 2000, "total_steps": 2065, "loss": 0.0, "lr": 1.5550897227922522e-08, "epoch": 4.842615012106537, "percentage": 96.85, "elapsed_time": "0:18:15", "remaining_time": "0:00:35", "throughput": 1573.32, "total_tokens": 1724272}
{"current_steps": 2005, "total_steps": 2065, "loss": 0.0, "lr": 1.3285960771761696e-08, "epoch": 4.854721549636804, "percentage": 97.09, "elapsed_time": "0:18:16", "remaining_time": "0:00:32", "throughput": 1576.7, "total_tokens": 1728560}
{"current_steps": 2010, "total_steps": 2065, "loss": 0.0, "lr": 1.119875875727705e-08, "epoch": 4.86682808716707, "percentage": 97.34, "elapsed_time": "0:18:16", "remaining_time": "0:00:30", "throughput": 1580.31, "total_tokens": 1733104}
{"current_steps": 2015, "total_steps": 2065, "loss": 0.0, "lr": 9.289440364341484e-09, "epoch": 4.878934624697337, "percentage": 97.58, "elapsed_time": "0:18:17", "remaining_time": "0:00:27", "throughput": 1583.57, "total_tokens": 1737264}
{"current_steps": 2020, "total_steps": 2065, "loss": 0.0, "lr": 7.558142058842755e-09, "epoch": 4.891041162227603, "percentage": 97.82, "elapsed_time": "0:18:17", "remaining_time": "0:00:24", "throughput": 1586.83, "total_tokens": 1741424}
{"current_steps": 2025, "total_steps": 2065, "loss": 0.0, "lr": 6.004987582929056e-09, "epoch": 4.903147699757869, "percentage": 98.06, "elapsed_time": "0:18:17", "remaining_time": "0:00:21", "throughput": 1590.14, "total_tokens": 1745648}
{"current_steps": 2030, "total_steps": 2065, "loss": 0.0, "lr": 4.6300879461655404e-09, "epoch": 4.915254237288136, "percentage": 98.31, "elapsed_time": "0:18:18", "remaining_time": "0:00:18", "throughput": 1593.45, "total_tokens": 1749872}
{"current_steps": 2035, "total_steps": 2065, "loss": 0.0, "lr": 3.4335414175995506e-09, "epoch": 4.927360774818402, "percentage": 98.55, "elapsed_time": "0:18:18", "remaining_time": "0:00:16", "throughput": 1596.94, "total_tokens": 1754288}
{"current_steps": 2040, "total_steps": 2065, "loss": 0.0, "lr": 2.4154335187365207e-09, "epoch": 4.939467312348668, "percentage": 98.79, "elapsed_time": "0:18:18", "remaining_time": "0:00:13", "throughput": 1600.36, "total_tokens": 1758640}
{"current_steps": 2045, "total_steps": 2065, "loss": 0.0003, "lr": 1.575837017428472e-09, "epoch": 4.951573849878935, "percentage": 99.03, "elapsed_time": "0:18:19", "remaining_time": "0:00:10", "throughput": 1603.72, "total_tokens": 1762928}
{"current_steps": 2050, "total_steps": 2065, "loss": 0.0, "lr": 9.14811922672898e-10, "epoch": 4.963680387409201, "percentage": 99.27, "elapsed_time": "0:18:19", "remaining_time": "0:00:08", "throughput": 1607.19, "total_tokens": 1767344}
{"current_steps": 2055, "total_steps": 2065, "loss": 0.0, "lr": 4.3240548032230657e-10, "epoch": 4.9757869249394675, "percentage": 99.52, "elapsed_time": "0:18:20", "remaining_time": "0:00:05", "throughput": 1610.55, "total_tokens": 1771632}
{"current_steps": 2060, "total_steps": 2065, "loss": 0.0, "lr": 1.2865216970914253e-10, "epoch": 4.987893462469733, "percentage": 99.76, "elapsed_time": "0:18:20", "remaining_time": "0:00:02", "throughput": 1613.74, "total_tokens": 1775728}
{"current_steps": 2065, "total_steps": 2065, "loss": 0.0, "lr": 3.573701180537015e-12, "epoch": 5.0, "percentage": 100.0, "elapsed_time": "0:18:20", "remaining_time": "0:00:00", "throughput": 1617.05, "total_tokens": 1780000}
{"current_steps": 2065, "total_steps": 2065, "epoch": 5.0, "percentage": 100.0, "elapsed_time": "0:18:59", "remaining_time": "0:00:00", "throughput": 1561.49, "total_tokens": 1780000}

3519
trainer_state.json Normal file

File diff suppressed because it is too large Load Diff

3
training_args.bin Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:38da5c91afc8307ef30418b0b9cb84db7da16d9c341336f894d4ab8cd5bf8fe0
size 6289

BIN
training_eval_loss.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 46 KiB

BIN
training_loss.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 46 KiB