初始化项目,由ModelHub XC社区提供模型

Model: shuoxing/llama3-8b-full-pretrain-wash-c4-1-8m-sft-bs64
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-06-12 17:10:15 +08:00
commit 2df7531635
10 changed files with 410 additions and 0 deletions

36
.gitattributes vendored Normal file
View File

@@ -0,0 +1,36 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text

58
README.md Normal file
View File

@@ -0,0 +1,58 @@
---
library_name: transformers
tags:
- llama-factory
- generated_from_trainer
model-index:
- name: llama3-8b-full-pretrain-wash-c4-1-8m-sft-bs64
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# llama3-8b-full-pretrain-wash-c4-1-8m-sft-bs64
This model was trained from scratch on an unknown dataset.
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- total_eval_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 0.1
- num_epochs: 3.0
### Training results
### Framework versions
- Transformers 5.2.0
- Pytorch 2.6.0+cu124
- Datasets 4.0.0
- Tokenizers 0.22.2

5
chat_template.jinja Normal file
View File

@@ -0,0 +1,5 @@
{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>
'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>
' }}{% endif %}

32
config.json Normal file
View File

@@ -0,0 +1,32 @@
{
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"dtype": "bfloat16",
"eos_token_id": 128009,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pad_token_id": 128009,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_parameters": {
"rope_theta": 500000.0,
"rope_type": "default"
},
"tie_word_embeddings": false,
"transformers_version": "5.2.0",
"use_cache": false,
"vocab_size": 128256
}

13
generation_config.json Normal file
View File

@@ -0,0 +1,13 @@
{
"bos_token_id": 128000,
"do_sample": true,
"eos_token_id": [
128001,
128009
],
"max_length": 4096,
"pad_token_id": 128009,
"temperature": 0.6,
"top_p": 0.9,
"transformers_version": "5.2.0"
}

3
model.safetensors Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f7b47d24969e8e7999f2b6363545b7c8f60a299fe24045f08861cdf320df9603
size 16060556616

3
tokenizer.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8c1dcab308e7cf5970ea38815e0a62887d705c5b436f869ca27a5dcdd40c36a6
size 17210148

19
tokenizer_config.json Normal file
View File

@@ -0,0 +1,19 @@
{
"backend": "tokenizers",
"bos_token": "<|begin_of_text|>",
"clean_up_tokenization_spaces": true,
"eos_token": "<|eot_id|>",
"extra_special_tokens": [
"<|eom_id|>"
],
"is_local": true,
"model_input_names": [
"input_ids",
"attention_mask"
],
"model_max_length": 1000000000000000019884624838656,
"pad_token": "<|eot_id|>",
"padding_side": "right",
"split_special_tokens": false,
"tokenizer_class": "TokenizersBackend"
}

238
trainer_log.jsonl Normal file
View File

@@ -0,0 +1,238 @@
{"current_steps": 1, "total_steps": 237, "loss": 2.2329883575439453, "lr": 0.0, "epoch": 0.012738853503184714, "percentage": 0.42, "elapsed_time": "0:00:04", "remaining_time": "0:19:20"}
{"current_steps": 2, "total_steps": 237, "loss": 2.2418107986450195, "lr": 4.1666666666666667e-07, "epoch": 0.025477707006369428, "percentage": 0.84, "elapsed_time": "0:00:06", "remaining_time": "0:13:40"}
{"current_steps": 3, "total_steps": 237, "loss": 2.2039730548858643, "lr": 8.333333333333333e-07, "epoch": 0.03821656050955414, "percentage": 1.27, "elapsed_time": "0:00:09", "remaining_time": "0:11:45"}
{"current_steps": 4, "total_steps": 237, "loss": 2.1634364128112793, "lr": 1.25e-06, "epoch": 0.050955414012738856, "percentage": 1.69, "elapsed_time": "0:00:11", "remaining_time": "0:10:55"}
{"current_steps": 5, "total_steps": 237, "loss": 2.3396921157836914, "lr": 1.6666666666666667e-06, "epoch": 0.06369426751592357, "percentage": 2.11, "elapsed_time": "0:00:13", "remaining_time": "0:10:27"}
{"current_steps": 6, "total_steps": 237, "loss": 2.1613612174987793, "lr": 2.0833333333333334e-06, "epoch": 0.07643312101910828, "percentage": 2.53, "elapsed_time": "0:00:15", "remaining_time": "0:09:59"}
{"current_steps": 7, "total_steps": 237, "loss": 1.7931935787200928, "lr": 2.5e-06, "epoch": 0.08917197452229299, "percentage": 2.95, "elapsed_time": "0:00:18", "remaining_time": "0:09:58"}
{"current_steps": 8, "total_steps": 237, "loss": 1.7570035457611084, "lr": 2.916666666666667e-06, "epoch": 0.10191082802547771, "percentage": 3.38, "elapsed_time": "0:00:20", "remaining_time": "0:09:40"}
{"current_steps": 9, "total_steps": 237, "loss": 1.7455153465270996, "lr": 3.3333333333333333e-06, "epoch": 0.11464968152866242, "percentage": 3.8, "elapsed_time": "0:00:22", "remaining_time": "0:09:27"}
{"current_steps": 10, "total_steps": 237, "loss": 1.7620800733566284, "lr": 3.7500000000000005e-06, "epoch": 0.12738853503184713, "percentage": 4.22, "elapsed_time": "0:00:24", "remaining_time": "0:09:14"}
{"current_steps": 11, "total_steps": 237, "loss": 1.701028823852539, "lr": 4.166666666666667e-06, "epoch": 0.14012738853503184, "percentage": 4.64, "elapsed_time": "0:00:26", "remaining_time": "0:09:05"}
{"current_steps": 12, "total_steps": 237, "loss": 1.6110973358154297, "lr": 4.583333333333333e-06, "epoch": 0.15286624203821655, "percentage": 5.06, "elapsed_time": "0:00:28", "remaining_time": "0:08:58"}
{"current_steps": 13, "total_steps": 237, "loss": 1.7218174934387207, "lr": 5e-06, "epoch": 0.16560509554140126, "percentage": 5.49, "elapsed_time": "0:00:30", "remaining_time": "0:08:50"}
{"current_steps": 14, "total_steps": 237, "loss": 1.6655555963516235, "lr": 5.416666666666667e-06, "epoch": 0.17834394904458598, "percentage": 5.91, "elapsed_time": "0:00:32", "remaining_time": "0:08:43"}
{"current_steps": 15, "total_steps": 237, "loss": 1.5263633728027344, "lr": 5.833333333333334e-06, "epoch": 0.1910828025477707, "percentage": 6.33, "elapsed_time": "0:00:34", "remaining_time": "0:08:37"}
{"current_steps": 16, "total_steps": 237, "loss": 1.499047040939331, "lr": 6.25e-06, "epoch": 0.20382165605095542, "percentage": 6.75, "elapsed_time": "0:00:37", "remaining_time": "0:08:31"}
{"current_steps": 17, "total_steps": 237, "loss": 1.5429470539093018, "lr": 6.666666666666667e-06, "epoch": 0.21656050955414013, "percentage": 7.17, "elapsed_time": "0:00:39", "remaining_time": "0:08:26"}
{"current_steps": 18, "total_steps": 237, "loss": 1.3057682514190674, "lr": 7.083333333333335e-06, "epoch": 0.22929936305732485, "percentage": 7.59, "elapsed_time": "0:00:41", "remaining_time": "0:08:20"}
{"current_steps": 19, "total_steps": 237, "loss": 1.5826351642608643, "lr": 7.500000000000001e-06, "epoch": 0.24203821656050956, "percentage": 8.02, "elapsed_time": "0:00:43", "remaining_time": "0:08:16"}
{"current_steps": 20, "total_steps": 237, "loss": 1.4761024713516235, "lr": 7.916666666666667e-06, "epoch": 0.25477707006369427, "percentage": 8.44, "elapsed_time": "0:00:45", "remaining_time": "0:08:11"}
{"current_steps": 21, "total_steps": 237, "loss": 1.484875202178955, "lr": 8.333333333333334e-06, "epoch": 0.267515923566879, "percentage": 8.86, "elapsed_time": "0:00:47", "remaining_time": "0:08:08"}
{"current_steps": 22, "total_steps": 237, "loss": 1.392436146736145, "lr": 8.750000000000001e-06, "epoch": 0.2802547770700637, "percentage": 9.28, "elapsed_time": "0:00:49", "remaining_time": "0:08:05"}
{"current_steps": 23, "total_steps": 237, "loss": 1.42863130569458, "lr": 9.166666666666666e-06, "epoch": 0.2929936305732484, "percentage": 9.7, "elapsed_time": "0:00:51", "remaining_time": "0:08:01"}
{"current_steps": 24, "total_steps": 237, "loss": 1.5316230058670044, "lr": 9.583333333333335e-06, "epoch": 0.3057324840764331, "percentage": 10.13, "elapsed_time": "0:00:53", "remaining_time": "0:07:57"}
{"current_steps": 25, "total_steps": 237, "loss": 1.3155794143676758, "lr": 1e-05, "epoch": 0.3184713375796178, "percentage": 10.55, "elapsed_time": "0:00:56", "remaining_time": "0:07:55"}
{"current_steps": 26, "total_steps": 237, "loss": 1.3940659761428833, "lr": 9.999456158087994e-06, "epoch": 0.33121019108280253, "percentage": 10.97, "elapsed_time": "0:00:58", "remaining_time": "0:07:52"}
{"current_steps": 27, "total_steps": 237, "loss": 1.5094261169433594, "lr": 9.997824750657586e-06, "epoch": 0.34394904458598724, "percentage": 11.39, "elapsed_time": "0:01:00", "remaining_time": "0:07:49"}
{"current_steps": 28, "total_steps": 237, "loss": 1.3562203645706177, "lr": 9.995106132599869e-06, "epoch": 0.35668789808917195, "percentage": 11.81, "elapsed_time": "0:01:02", "remaining_time": "0:07:46"}
{"current_steps": 29, "total_steps": 237, "loss": 1.2628836631774902, "lr": 9.99130089531422e-06, "epoch": 0.36942675159235666, "percentage": 12.24, "elapsed_time": "0:01:04", "remaining_time": "0:07:45"}
{"current_steps": 30, "total_steps": 237, "loss": 1.5627453327178955, "lr": 9.98640986657965e-06, "epoch": 0.3821656050955414, "percentage": 12.66, "elapsed_time": "0:01:06", "remaining_time": "0:07:41"}
{"current_steps": 31, "total_steps": 237, "loss": 1.390010952949524, "lr": 9.980434110374725e-06, "epoch": 0.39490445859872614, "percentage": 13.08, "elapsed_time": "0:01:09", "remaining_time": "0:07:38"}
{"current_steps": 32, "total_steps": 237, "loss": 1.5666842460632324, "lr": 9.973374926646117e-06, "epoch": 0.40764331210191085, "percentage": 13.5, "elapsed_time": "0:01:11", "remaining_time": "0:07:36"}
{"current_steps": 33, "total_steps": 237, "loss": 1.391902208328247, "lr": 9.965233851025816e-06, "epoch": 0.42038216560509556, "percentage": 13.92, "elapsed_time": "0:01:13", "remaining_time": "0:07:32"}
{"current_steps": 34, "total_steps": 237, "loss": 1.5242109298706055, "lr": 9.956012654497073e-06, "epoch": 0.43312101910828027, "percentage": 14.35, "elapsed_time": "0:01:15", "remaining_time": "0:07:29"}
{"current_steps": 35, "total_steps": 237, "loss": 1.5025076866149902, "lr": 9.945713343009154e-06, "epoch": 0.445859872611465, "percentage": 14.77, "elapsed_time": "0:01:17", "remaining_time": "0:07:26"}
{"current_steps": 36, "total_steps": 237, "loss": 1.3909598588943481, "lr": 9.934338157040953e-06, "epoch": 0.4585987261146497, "percentage": 15.19, "elapsed_time": "0:01:19", "remaining_time": "0:07:23"}
{"current_steps": 37, "total_steps": 237, "loss": 1.4934210777282715, "lr": 9.921889571113629e-06, "epoch": 0.4713375796178344, "percentage": 15.61, "elapsed_time": "0:01:21", "remaining_time": "0:07:20"}
{"current_steps": 38, "total_steps": 237, "loss": 1.3801617622375488, "lr": 9.90837029325229e-06, "epoch": 0.4840764331210191, "percentage": 16.03, "elapsed_time": "0:01:23", "remaining_time": "0:07:17"}
{"current_steps": 39, "total_steps": 237, "loss": 1.3861180543899536, "lr": 9.893783264396903e-06, "epoch": 0.4968152866242038, "percentage": 16.46, "elapsed_time": "0:01:25", "remaining_time": "0:07:15"}
{"current_steps": 40, "total_steps": 237, "loss": 1.3935482501983643, "lr": 9.878131657762535e-06, "epoch": 0.5095541401273885, "percentage": 16.88, "elapsed_time": "0:01:27", "remaining_time": "0:07:12"}
{"current_steps": 41, "total_steps": 237, "loss": 1.358353853225708, "lr": 9.861418878149056e-06, "epoch": 0.5222929936305732, "percentage": 17.3, "elapsed_time": "0:01:30", "remaining_time": "0:07:10"}
{"current_steps": 42, "total_steps": 237, "loss": 1.4175193309783936, "lr": 9.843648561200476e-06, "epoch": 0.535031847133758, "percentage": 17.72, "elapsed_time": "0:01:32", "remaining_time": "0:07:07"}
{"current_steps": 43, "total_steps": 237, "loss": 1.5568209886550903, "lr": 9.82482457261405e-06, "epoch": 0.5477707006369427, "percentage": 18.14, "elapsed_time": "0:01:34", "remaining_time": "0:07:04"}
{"current_steps": 44, "total_steps": 237, "loss": 1.4567246437072754, "lr": 9.80495100729936e-06, "epoch": 0.5605095541401274, "percentage": 18.57, "elapsed_time": "0:01:36", "remaining_time": "0:07:02"}
{"current_steps": 45, "total_steps": 237, "loss": 1.4141236543655396, "lr": 9.784032188487507e-06, "epoch": 0.5732484076433121, "percentage": 18.99, "elapsed_time": "0:01:38", "remaining_time": "0:06:59"}
{"current_steps": 46, "total_steps": 237, "loss": 1.5040584802627563, "lr": 9.762072666790658e-06, "epoch": 0.5859872611464968, "percentage": 19.41, "elapsed_time": "0:01:40", "remaining_time": "0:06:57"}
{"current_steps": 47, "total_steps": 237, "loss": 1.5955801010131836, "lr": 9.73907721921212e-06, "epoch": 0.5987261146496815, "percentage": 19.83, "elapsed_time": "0:01:42", "remaining_time": "0:06:54"}
{"current_steps": 48, "total_steps": 237, "loss": 1.3969696760177612, "lr": 9.715050848107167e-06, "epoch": 0.6114649681528662, "percentage": 20.25, "elapsed_time": "0:01:44", "remaining_time": "0:06:52"}
{"current_steps": 49, "total_steps": 237, "loss": 1.346752643585205, "lr": 9.689998780094839e-06, "epoch": 0.6242038216560509, "percentage": 20.68, "elapsed_time": "0:01:46", "remaining_time": "0:06:49"}
{"current_steps": 50, "total_steps": 237, "loss": 1.4426841735839844, "lr": 9.663926464920959e-06, "epoch": 0.6369426751592356, "percentage": 21.1, "elapsed_time": "0:01:48", "remaining_time": "0:06:46"}
{"current_steps": 51, "total_steps": 237, "loss": 1.4863749742507935, "lr": 9.636839574272623e-06, "epoch": 0.6496815286624203, "percentage": 21.52, "elapsed_time": "0:01:50", "remaining_time": "0:06:44"}
{"current_steps": 52, "total_steps": 237, "loss": 1.4416649341583252, "lr": 9.608744000544392e-06, "epoch": 0.6624203821656051, "percentage": 21.94, "elapsed_time": "0:01:52", "remaining_time": "0:06:41"}
{"current_steps": 53, "total_steps": 237, "loss": 1.2139196395874023, "lr": 9.579645855556481e-06, "epoch": 0.6751592356687898, "percentage": 22.36, "elapsed_time": "0:01:55", "remaining_time": "0:06:39"}
{"current_steps": 54, "total_steps": 237, "loss": 1.3615843057632446, "lr": 9.54955146922521e-06, "epoch": 0.6878980891719745, "percentage": 22.78, "elapsed_time": "0:01:57", "remaining_time": "0:06:37"}
{"current_steps": 55, "total_steps": 237, "loss": 1.3113226890563965, "lr": 9.51846738818602e-06, "epoch": 0.7006369426751592, "percentage": 23.21, "elapsed_time": "0:01:59", "remaining_time": "0:06:35"}
{"current_steps": 56, "total_steps": 237, "loss": 1.3093180656433105, "lr": 9.48640037436934e-06, "epoch": 0.7133757961783439, "percentage": 23.63, "elapsed_time": "0:02:01", "remaining_time": "0:06:32"}
{"current_steps": 57, "total_steps": 237, "loss": 1.3632457256317139, "lr": 9.453357403529609e-06, "epoch": 0.7261146496815286, "percentage": 24.05, "elapsed_time": "0:02:03", "remaining_time": "0:06:30"}
{"current_steps": 58, "total_steps": 237, "loss": 1.4312504529953003, "lr": 9.419345663727805e-06, "epoch": 0.7388535031847133, "percentage": 24.47, "elapsed_time": "0:02:05", "remaining_time": "0:06:27"}
{"current_steps": 59, "total_steps": 237, "loss": 1.2978061437606812, "lr": 9.38437255376777e-06, "epoch": 0.7515923566878981, "percentage": 24.89, "elapsed_time": "0:02:07", "remaining_time": "0:06:25"}
{"current_steps": 60, "total_steps": 237, "loss": 1.4193507432937622, "lr": 9.348445681586703e-06, "epoch": 0.7643312101910829, "percentage": 25.32, "elapsed_time": "0:02:09", "remaining_time": "0:06:22"}
{"current_steps": 61, "total_steps": 237, "loss": 1.4133822917938232, "lr": 9.31157286260014e-06, "epoch": 0.7770700636942676, "percentage": 25.74, "elapsed_time": "0:02:11", "remaining_time": "0:06:20"}
{"current_steps": 62, "total_steps": 237, "loss": 1.3197431564331055, "lr": 9.273762118001837e-06, "epoch": 0.7898089171974523, "percentage": 26.16, "elapsed_time": "0:02:13", "remaining_time": "0:06:18"}
{"current_steps": 63, "total_steps": 237, "loss": 1.3052380084991455, "lr": 9.235021673018849e-06, "epoch": 0.802547770700637, "percentage": 26.58, "elapsed_time": "0:02:16", "remaining_time": "0:06:15"}
{"current_steps": 64, "total_steps": 237, "loss": 1.3184340000152588, "lr": 9.195359955122244e-06, "epoch": 0.8152866242038217, "percentage": 27.0, "elapsed_time": "0:02:18", "remaining_time": "0:06:13"}
{"current_steps": 65, "total_steps": 237, "loss": 1.3289365768432617, "lr": 9.15478559219382e-06, "epoch": 0.8280254777070064, "percentage": 27.43, "elapsed_time": "0:02:20", "remaining_time": "0:06:11"}
{"current_steps": 66, "total_steps": 237, "loss": 1.461507797241211, "lr": 9.113307410649222e-06, "epoch": 0.8407643312101911, "percentage": 27.85, "elapsed_time": "0:02:22", "remaining_time": "0:06:08"}
{"current_steps": 67, "total_steps": 237, "loss": 1.3825844526290894, "lr": 9.070934433517872e-06, "epoch": 0.8535031847133758, "percentage": 28.27, "elapsed_time": "0:02:24", "remaining_time": "0:06:06"}
{"current_steps": 68, "total_steps": 237, "loss": 1.4079831838607788, "lr": 9.027675878480131e-06, "epoch": 0.8662420382165605, "percentage": 28.69, "elapsed_time": "0:02:26", "remaining_time": "0:06:03"}
{"current_steps": 69, "total_steps": 237, "loss": 1.3963056802749634, "lr": 8.983541155862114e-06, "epoch": 0.8789808917197452, "percentage": 29.11, "elapsed_time": "0:02:28", "remaining_time": "0:06:01"}
{"current_steps": 70, "total_steps": 237, "loss": 1.2974305152893066, "lr": 8.938539866588593e-06, "epoch": 0.89171974522293, "percentage": 29.54, "elapsed_time": "0:02:30", "remaining_time": "0:05:59"}
{"current_steps": 71, "total_steps": 237, "loss": 1.369901418685913, "lr": 8.892681800094447e-06, "epoch": 0.9044585987261147, "percentage": 29.96, "elapsed_time": "0:02:32", "remaining_time": "0:05:56"}
{"current_steps": 72, "total_steps": 237, "loss": 1.4970135688781738, "lr": 8.845976932195104e-06, "epoch": 0.9171974522292994, "percentage": 30.38, "elapsed_time": "0:02:34", "remaining_time": "0:05:54"}
{"current_steps": 73, "total_steps": 237, "loss": 1.539624571800232, "lr": 8.798435422916425e-06, "epoch": 0.9299363057324841, "percentage": 30.8, "elapsed_time": "0:02:36", "remaining_time": "0:05:52"}
{"current_steps": 74, "total_steps": 237, "loss": 1.2668843269348145, "lr": 8.750067614284534e-06, "epoch": 0.9426751592356688, "percentage": 31.22, "elapsed_time": "0:02:38", "remaining_time": "0:05:49"}
{"current_steps": 75, "total_steps": 237, "loss": 1.4836182594299316, "lr": 8.700884028076042e-06, "epoch": 0.9554140127388535, "percentage": 31.65, "elapsed_time": "0:02:40", "remaining_time": "0:05:47"}
{"current_steps": 76, "total_steps": 237, "loss": 1.305760383605957, "lr": 8.650895363529172e-06, "epoch": 0.9681528662420382, "percentage": 32.07, "elapsed_time": "0:02:43", "remaining_time": "0:05:45"}
{"current_steps": 77, "total_steps": 237, "loss": 1.2860628366470337, "lr": 8.600112495016289e-06, "epoch": 0.9808917197452229, "percentage": 32.49, "elapsed_time": "0:02:45", "remaining_time": "0:05:43"}
{"current_steps": 78, "total_steps": 237, "loss": 1.4810314178466797, "lr": 8.548546469678311e-06, "epoch": 0.9936305732484076, "percentage": 32.91, "elapsed_time": "0:02:47", "remaining_time": "0:05:40"}
{"current_steps": 79, "total_steps": 237, "loss": 1.43093740940094, "lr": 8.496208505021572e-06, "epoch": 1.0, "percentage": 33.33, "elapsed_time": "0:02:48", "remaining_time": "0:05:36"}
{"current_steps": 80, "total_steps": 237, "loss": 0.884945809841156, "lr": 8.443109986477574e-06, "epoch": 1.0127388535031847, "percentage": 33.76, "elapsed_time": "0:02:50", "remaining_time": "0:05:34"}
{"current_steps": 81, "total_steps": 237, "loss": 0.6251118183135986, "lr": 8.389262464926256e-06, "epoch": 1.0254777070063694, "percentage": 34.18, "elapsed_time": "0:02:52", "remaining_time": "0:05:32"}
{"current_steps": 82, "total_steps": 237, "loss": 0.5941186547279358, "lr": 8.334677654183254e-06, "epoch": 1.0382165605095541, "percentage": 34.6, "elapsed_time": "0:02:55", "remaining_time": "0:05:31"}
{"current_steps": 83, "total_steps": 237, "loss": 0.7217048406600952, "lr": 8.279367428451703e-06, "epoch": 1.0509554140127388, "percentage": 35.02, "elapsed_time": "0:02:57", "remaining_time": "0:05:29"}
{"current_steps": 84, "total_steps": 237, "loss": 0.6755723357200623, "lr": 8.223343819739164e-06, "epoch": 1.0636942675159236, "percentage": 35.44, "elapsed_time": "0:02:59", "remaining_time": "0:05:26"}
{"current_steps": 85, "total_steps": 237, "loss": 0.6515792608261108, "lr": 8.166619015240236e-06, "epoch": 1.0764331210191083, "percentage": 35.86, "elapsed_time": "0:03:01", "remaining_time": "0:05:24"}
{"current_steps": 86, "total_steps": 237, "loss": 0.5231607556343079, "lr": 8.109205354685367e-06, "epoch": 1.089171974522293, "percentage": 36.29, "elapsed_time": "0:03:03", "remaining_time": "0:05:22"}
{"current_steps": 87, "total_steps": 237, "loss": 0.6374996900558472, "lr": 8.051115327656538e-06, "epoch": 1.1019108280254777, "percentage": 36.71, "elapsed_time": "0:03:05", "remaining_time": "0:05:20"}
{"current_steps": 88, "total_steps": 237, "loss": 0.5556366443634033, "lr": 7.992361570870289e-06, "epoch": 1.1146496815286624, "percentage": 37.13, "elapsed_time": "0:03:07", "remaining_time": "0:05:17"}
{"current_steps": 89, "total_steps": 237, "loss": 0.5695494413375854, "lr": 7.932956865428792e-06, "epoch": 1.127388535031847, "percentage": 37.55, "elapsed_time": "0:03:09", "remaining_time": "0:05:15"}
{"current_steps": 90, "total_steps": 237, "loss": 0.5527031421661377, "lr": 7.872914134039485e-06, "epoch": 1.1401273885350318, "percentage": 37.97, "elapsed_time": "0:03:11", "remaining_time": "0:05:13"}
{"current_steps": 91, "total_steps": 237, "loss": 0.5303822755813599, "lr": 7.812246438203905e-06, "epoch": 1.1528662420382165, "percentage": 38.4, "elapsed_time": "0:03:13", "remaining_time": "0:05:11"}
{"current_steps": 92, "total_steps": 237, "loss": 0.6510012149810791, "lr": 7.750966975376328e-06, "epoch": 1.1656050955414012, "percentage": 38.82, "elapsed_time": "0:03:16", "remaining_time": "0:05:08"}
{"current_steps": 93, "total_steps": 237, "loss": 0.571890652179718, "lr": 7.689089076092851e-06, "epoch": 1.178343949044586, "percentage": 39.24, "elapsed_time": "0:03:18", "remaining_time": "0:05:07"}
{"current_steps": 94, "total_steps": 237, "loss": 0.5705777406692505, "lr": 7.626626201071494e-06, "epoch": 1.1910828025477707, "percentage": 39.66, "elapsed_time": "0:03:20", "remaining_time": "0:05:04"}
{"current_steps": 95, "total_steps": 237, "loss": 0.6913941502571106, "lr": 7.563591938284012e-06, "epoch": 1.2038216560509554, "percentage": 40.08, "elapsed_time": "0:03:22", "remaining_time": "0:05:02"}
{"current_steps": 96, "total_steps": 237, "loss": 0.5719045996665955, "lr": 7.500000000000001e-06, "epoch": 1.21656050955414, "percentage": 40.51, "elapsed_time": "0:03:24", "remaining_time": "0:05:00"}
{"current_steps": 97, "total_steps": 237, "loss": 0.5766797661781311, "lr": 7.4358642198039835e-06, "epoch": 1.2292993630573248, "percentage": 40.93, "elapsed_time": "0:03:26", "remaining_time": "0:04:58"}
{"current_steps": 98, "total_steps": 237, "loss": 0.6781610250473022, "lr": 7.371198549586091e-06, "epoch": 1.2420382165605095, "percentage": 41.35, "elapsed_time": "0:03:28", "remaining_time": "0:04:55"}
{"current_steps": 99, "total_steps": 237, "loss": 0.5562888383865356, "lr": 7.306017056507018e-06, "epoch": 1.2547770700636942, "percentage": 41.77, "elapsed_time": "0:03:30", "remaining_time": "0:04:53"}
{"current_steps": 100, "total_steps": 237, "loss": 0.5290209054946899, "lr": 7.240333919937893e-06, "epoch": 1.267515923566879, "percentage": 42.19, "elapsed_time": "0:03:32", "remaining_time": "0:04:51"}
{"current_steps": 101, "total_steps": 237, "loss": 0.5402839779853821, "lr": 7.174163428375748e-06, "epoch": 1.2802547770700636, "percentage": 42.62, "elapsed_time": "0:03:34", "remaining_time": "0:04:49"}
{"current_steps": 102, "total_steps": 237, "loss": 0.48276376724243164, "lr": 7.107519976335241e-06, "epoch": 1.2929936305732483, "percentage": 43.04, "elapsed_time": "0:03:36", "remaining_time": "0:04:47"}
{"current_steps": 103, "total_steps": 237, "loss": 0.5307224988937378, "lr": 7.040418061217325e-06, "epoch": 1.305732484076433, "percentage": 43.46, "elapsed_time": "0:03:38", "remaining_time": "0:04:44"}
{"current_steps": 104, "total_steps": 237, "loss": 0.6105501651763916, "lr": 6.972872280155528e-06, "epoch": 1.3184713375796178, "percentage": 43.88, "elapsed_time": "0:03:40", "remaining_time": "0:04:42"}
{"current_steps": 105, "total_steps": 237, "loss": 0.6230810880661011, "lr": 6.9048973268405375e-06, "epoch": 1.3312101910828025, "percentage": 44.3, "elapsed_time": "0:03:43", "remaining_time": "0:04:40"}
{"current_steps": 106, "total_steps": 237, "loss": 0.5940905213356018, "lr": 6.836507988323785e-06, "epoch": 1.3439490445859872, "percentage": 44.73, "elapsed_time": "0:03:45", "remaining_time": "0:04:38"}
{"current_steps": 107, "total_steps": 237, "loss": 0.5832507610321045, "lr": 6.767719141800718e-06, "epoch": 1.356687898089172, "percentage": 45.15, "elapsed_time": "0:03:47", "remaining_time": "0:04:36"}
{"current_steps": 108, "total_steps": 237, "loss": 0.6137063503265381, "lr": 6.698545751374465e-06, "epoch": 1.3694267515923566, "percentage": 45.57, "elapsed_time": "0:03:49", "remaining_time": "0:04:34"}
{"current_steps": 109, "total_steps": 237, "loss": 0.5683932900428772, "lr": 6.629002864800589e-06, "epoch": 1.3821656050955413, "percentage": 45.99, "elapsed_time": "0:03:51", "remaining_time": "0:04:31"}
{"current_steps": 110, "total_steps": 237, "loss": 0.5938336849212646, "lr": 6.55910561021365e-06, "epoch": 1.394904458598726, "percentage": 46.41, "elapsed_time": "0:03:53", "remaining_time": "0:04:29"}
{"current_steps": 111, "total_steps": 237, "loss": 0.6230363845825195, "lr": 6.488869192836279e-06, "epoch": 1.4076433121019107, "percentage": 46.84, "elapsed_time": "0:03:55", "remaining_time": "0:04:27"}
{"current_steps": 112, "total_steps": 237, "loss": 0.5125907063484192, "lr": 6.418308891671484e-06, "epoch": 1.4203821656050954, "percentage": 47.26, "elapsed_time": "0:03:57", "remaining_time": "0:04:25"}
{"current_steps": 113, "total_steps": 237, "loss": 0.5707980394363403, "lr": 6.347440056178904e-06, "epoch": 1.4331210191082802, "percentage": 47.68, "elapsed_time": "0:03:59", "remaining_time": "0:04:23"}
{"current_steps": 114, "total_steps": 237, "loss": 0.5640935301780701, "lr": 6.27627810293574e-06, "epoch": 1.4458598726114649, "percentage": 48.1, "elapsed_time": "0:04:01", "remaining_time": "0:04:20"}
{"current_steps": 115, "total_steps": 237, "loss": 0.584221363067627, "lr": 6.204838512283073e-06, "epoch": 1.4585987261146496, "percentage": 48.52, "elapsed_time": "0:04:03", "remaining_time": "0:04:18"}
{"current_steps": 116, "total_steps": 237, "loss": 0.5377991199493408, "lr": 6.133136824958334e-06, "epoch": 1.4713375796178343, "percentage": 48.95, "elapsed_time": "0:04:06", "remaining_time": "0:04:16"}
{"current_steps": 117, "total_steps": 237, "loss": 0.5428900718688965, "lr": 6.061188638714616e-06, "epoch": 1.484076433121019, "percentage": 49.37, "elapsed_time": "0:04:08", "remaining_time": "0:04:14"}
{"current_steps": 118, "total_steps": 237, "loss": 0.5634554624557495, "lr": 5.989009604927587e-06, "epoch": 1.4968152866242037, "percentage": 49.79, "elapsed_time": "0:04:10", "remaining_time": "0:04:12"}
{"current_steps": 119, "total_steps": 237, "loss": 0.608221173286438, "lr": 5.916615425190744e-06, "epoch": 1.5095541401273884, "percentage": 50.21, "elapsed_time": "0:04:12", "remaining_time": "0:04:10"}
{"current_steps": 120, "total_steps": 237, "loss": 0.569106936454773, "lr": 5.844021847899735e-06, "epoch": 1.5222929936305731, "percentage": 50.63, "elapsed_time": "0:04:14", "remaining_time": "0:04:08"}
{"current_steps": 121, "total_steps": 237, "loss": 0.5092579126358032, "lr": 5.771244664826512e-06, "epoch": 1.5350318471337578, "percentage": 51.05, "elapsed_time": "0:04:16", "remaining_time": "0:04:05"}
{"current_steps": 122, "total_steps": 237, "loss": 0.6413620710372925, "lr": 5.698299707684031e-06, "epoch": 1.5477707006369426, "percentage": 51.48, "elapsed_time": "0:04:18", "remaining_time": "0:04:03"}
{"current_steps": 123, "total_steps": 237, "loss": 0.5989945530891418, "lr": 5.6252028446822805e-06, "epoch": 1.5605095541401273, "percentage": 51.9, "elapsed_time": "0:04:20", "remaining_time": "0:04:01"}
{"current_steps": 124, "total_steps": 237, "loss": 0.5982795357704163, "lr": 5.55196997707635e-06, "epoch": 1.573248407643312, "percentage": 52.32, "elapsed_time": "0:04:22", "remaining_time": "0:03:59"}
{"current_steps": 125, "total_steps": 237, "loss": 0.5701369047164917, "lr": 5.478617035707337e-06, "epoch": 1.5859872611464967, "percentage": 52.74, "elapsed_time": "0:04:24", "remaining_time": "0:03:57"}
{"current_steps": 126, "total_steps": 237, "loss": 0.5309903621673584, "lr": 5.4051599775368e-06, "epoch": 1.5987261146496814, "percentage": 53.16, "elapsed_time": "0:04:27", "remaining_time": "0:03:55"}
{"current_steps": 127, "total_steps": 237, "loss": 0.6309601068496704, "lr": 5.33161478217552e-06, "epoch": 1.611464968152866, "percentage": 53.59, "elapsed_time": "0:04:29", "remaining_time": "0:03:53"}
{"current_steps": 128, "total_steps": 237, "loss": 0.6076474785804749, "lr": 5.257997448407366e-06, "epoch": 1.6242038216560508, "percentage": 54.01, "elapsed_time": "0:04:31", "remaining_time": "0:03:51"}
{"current_steps": 129, "total_steps": 237, "loss": 0.4852054715156555, "lr": 5.184323990708959e-06, "epoch": 1.6369426751592355, "percentage": 54.43, "elapsed_time": "0:04:33", "remaining_time": "0:03:49"}
{"current_steps": 130, "total_steps": 237, "loss": 0.6187564134597778, "lr": 5.110610435765935e-06, "epoch": 1.6496815286624202, "percentage": 54.85, "elapsed_time": "0:04:35", "remaining_time": "0:03:47"}
{"current_steps": 131, "total_steps": 237, "loss": 0.47985607385635376, "lr": 5.0368728189865624e-06, "epoch": 1.662420382165605, "percentage": 55.27, "elapsed_time": "0:04:37", "remaining_time": "0:03:44"}
{"current_steps": 132, "total_steps": 237, "loss": 0.5692814588546753, "lr": 4.9631271810134375e-06, "epoch": 1.6751592356687897, "percentage": 55.7, "elapsed_time": "0:04:40", "remaining_time": "0:03:42"}
{"current_steps": 133, "total_steps": 237, "loss": 0.47124144434928894, "lr": 4.8893895642340665e-06, "epoch": 1.6878980891719744, "percentage": 56.12, "elapsed_time": "0:04:42", "remaining_time": "0:03:40"}
{"current_steps": 134, "total_steps": 237, "loss": 0.64717698097229, "lr": 4.815676009291044e-06, "epoch": 1.700636942675159, "percentage": 56.54, "elapsed_time": "0:04:44", "remaining_time": "0:03:38"}
{"current_steps": 135, "total_steps": 237, "loss": 0.552649974822998, "lr": 4.742002551592635e-06, "epoch": 1.7133757961783438, "percentage": 56.96, "elapsed_time": "0:04:46", "remaining_time": "0:03:36"}
{"current_steps": 136, "total_steps": 237, "loss": 0.5507756471633911, "lr": 4.668385217824482e-06, "epoch": 1.7261146496815285, "percentage": 57.38, "elapsed_time": "0:04:48", "remaining_time": "0:03:34"}
{"current_steps": 137, "total_steps": 237, "loss": 0.6294535994529724, "lr": 4.594840022463201e-06, "epoch": 1.7388535031847132, "percentage": 57.81, "elapsed_time": "0:04:50", "remaining_time": "0:03:32"}
{"current_steps": 138, "total_steps": 237, "loss": 0.4838172495365143, "lr": 4.5213829642926635e-06, "epoch": 1.7515923566878981, "percentage": 58.23, "elapsed_time": "0:04:52", "remaining_time": "0:03:29"}
{"current_steps": 139, "total_steps": 237, "loss": 0.6029865741729736, "lr": 4.4480300229236525e-06, "epoch": 1.7643312101910829, "percentage": 58.65, "elapsed_time": "0:04:54", "remaining_time": "0:03:27"}
{"current_steps": 140, "total_steps": 237, "loss": 0.5469167828559875, "lr": 4.374797155317721e-06, "epoch": 1.7770700636942676, "percentage": 59.07, "elapsed_time": "0:04:56", "remaining_time": "0:03:25"}
{"current_steps": 141, "total_steps": 237, "loss": 0.5518971681594849, "lr": 4.30170029231597e-06, "epoch": 1.7898089171974523, "percentage": 59.49, "elapsed_time": "0:04:58", "remaining_time": "0:03:23"}
{"current_steps": 142, "total_steps": 237, "loss": 0.5194137692451477, "lr": 4.228755335173488e-06, "epoch": 1.802547770700637, "percentage": 59.92, "elapsed_time": "0:05:01", "remaining_time": "0:03:21"}
{"current_steps": 143, "total_steps": 237, "loss": 0.5753588080406189, "lr": 4.155978152100266e-06, "epoch": 1.8152866242038217, "percentage": 60.34, "elapsed_time": "0:05:03", "remaining_time": "0:03:19"}
{"current_steps": 144, "total_steps": 237, "loss": 0.6354119181632996, "lr": 4.0833845748092586e-06, "epoch": 1.8280254777070064, "percentage": 60.76, "elapsed_time": "0:05:05", "remaining_time": "0:03:17"}
{"current_steps": 145, "total_steps": 237, "loss": 0.539757251739502, "lr": 4.010990395072414e-06, "epoch": 1.8407643312101911, "percentage": 61.18, "elapsed_time": "0:05:07", "remaining_time": "0:03:14"}
{"current_steps": 146, "total_steps": 237, "loss": 0.5501378178596497, "lr": 3.938811361285386e-06, "epoch": 1.8535031847133758, "percentage": 61.6, "elapsed_time": "0:05:09", "remaining_time": "0:03:12"}
{"current_steps": 147, "total_steps": 237, "loss": 0.7076854109764099, "lr": 3.866863175041666e-06, "epoch": 1.8662420382165605, "percentage": 62.03, "elapsed_time": "0:05:11", "remaining_time": "0:03:10"}
{"current_steps": 148, "total_steps": 237, "loss": 0.639454185962677, "lr": 3.7951614877169285e-06, "epoch": 1.8789808917197452, "percentage": 62.45, "elapsed_time": "0:05:13", "remaining_time": "0:03:08"}
{"current_steps": 149, "total_steps": 237, "loss": 0.47795403003692627, "lr": 3.7237218970642624e-06, "epoch": 1.89171974522293, "percentage": 62.87, "elapsed_time": "0:05:15", "remaining_time": "0:03:06"}
{"current_steps": 150, "total_steps": 237, "loss": 0.5447227358818054, "lr": 3.6525599438210956e-06, "epoch": 1.9044585987261147, "percentage": 63.29, "elapsed_time": "0:05:17", "remaining_time": "0:03:04"}
{"current_steps": 151, "total_steps": 237, "loss": 0.5872648358345032, "lr": 3.5816911083285165e-06, "epoch": 1.9171974522292994, "percentage": 63.71, "elapsed_time": "0:05:19", "remaining_time": "0:03:02"}
{"current_steps": 152, "total_steps": 237, "loss": 0.4650334119796753, "lr": 3.511130807163724e-06, "epoch": 1.929936305732484, "percentage": 64.14, "elapsed_time": "0:05:22", "remaining_time": "0:03:00"}
{"current_steps": 153, "total_steps": 237, "loss": 0.5531576871871948, "lr": 3.440894389786352e-06, "epoch": 1.9426751592356688, "percentage": 64.56, "elapsed_time": "0:05:24", "remaining_time": "0:02:57"}
{"current_steps": 154, "total_steps": 237, "loss": 0.669037938117981, "lr": 3.370997135199413e-06, "epoch": 1.9554140127388535, "percentage": 64.98, "elapsed_time": "0:05:26", "remaining_time": "0:02:55"}
{"current_steps": 155, "total_steps": 237, "loss": 0.5962421894073486, "lr": 3.3014542486255365e-06, "epoch": 1.9681528662420382, "percentage": 65.4, "elapsed_time": "0:05:28", "remaining_time": "0:02:53"}
{"current_steps": 156, "total_steps": 237, "loss": 0.5672598481178284, "lr": 3.2322808581992825e-06, "epoch": 1.980891719745223, "percentage": 65.82, "elapsed_time": "0:05:30", "remaining_time": "0:02:51"}
{"current_steps": 157, "total_steps": 237, "loss": 0.4786456823348999, "lr": 3.1634920116762175e-06, "epoch": 1.9936305732484076, "percentage": 66.24, "elapsed_time": "0:05:32", "remaining_time": "0:02:49"}
{"current_steps": 158, "total_steps": 237, "loss": 0.4024711847305298, "lr": 3.0951026731594634e-06, "epoch": 2.0, "percentage": 66.67, "elapsed_time": "0:05:33", "remaining_time": "0:02:46"}
{"current_steps": 159, "total_steps": 237, "loss": 0.12846143543720245, "lr": 3.0271277198444737e-06, "epoch": 2.0127388535031847, "percentage": 67.09, "elapsed_time": "0:05:35", "remaining_time": "0:02:44"}
{"current_steps": 160, "total_steps": 237, "loss": 0.16186080873012543, "lr": 2.9595819387826753e-06, "epoch": 2.0254777070063694, "percentage": 67.51, "elapsed_time": "0:05:37", "remaining_time": "0:02:42"}
{"current_steps": 161, "total_steps": 237, "loss": 0.12769103050231934, "lr": 2.89248002366476e-06, "epoch": 2.038216560509554, "percentage": 67.93, "elapsed_time": "0:05:39", "remaining_time": "0:02:40"}
{"current_steps": 162, "total_steps": 237, "loss": 0.18756446242332458, "lr": 2.8258365716242543e-06, "epoch": 2.050955414012739, "percentage": 68.35, "elapsed_time": "0:05:41", "remaining_time": "0:02:38"}
{"current_steps": 163, "total_steps": 237, "loss": 0.2508277893066406, "lr": 2.7596660800621076e-06, "epoch": 2.0636942675159236, "percentage": 68.78, "elapsed_time": "0:05:43", "remaining_time": "0:02:36"}
{"current_steps": 164, "total_steps": 237, "loss": 0.16548970341682434, "lr": 2.6939829434929834e-06, "epoch": 2.0764331210191083, "percentage": 69.2, "elapsed_time": "0:05:46", "remaining_time": "0:02:34"}
{"current_steps": 165, "total_steps": 237, "loss": 0.1585598886013031, "lr": 2.6288014504139104e-06, "epoch": 2.089171974522293, "percentage": 69.62, "elapsed_time": "0:05:48", "remaining_time": "0:02:31"}
{"current_steps": 166, "total_steps": 237, "loss": 0.11741052567958832, "lr": 2.5641357801960186e-06, "epoch": 2.1019108280254777, "percentage": 70.04, "elapsed_time": "0:05:50", "remaining_time": "0:02:29"}
{"current_steps": 167, "total_steps": 237, "loss": 0.13042503595352173, "lr": 2.5000000000000015e-06, "epoch": 2.1146496815286624, "percentage": 70.46, "elapsed_time": "0:05:52", "remaining_time": "0:02:27"}
{"current_steps": 168, "total_steps": 237, "loss": 0.10423420369625092, "lr": 2.4364080617159885e-06, "epoch": 2.127388535031847, "percentage": 70.89, "elapsed_time": "0:05:54", "remaining_time": "0:02:25"}
{"current_steps": 169, "total_steps": 237, "loss": 0.15576717257499695, "lr": 2.373373798928507e-06, "epoch": 2.140127388535032, "percentage": 71.31, "elapsed_time": "0:05:56", "remaining_time": "0:02:23"}
{"current_steps": 170, "total_steps": 237, "loss": 0.16010719537734985, "lr": 2.310910923907149e-06, "epoch": 2.1528662420382165, "percentage": 71.73, "elapsed_time": "0:05:58", "remaining_time": "0:02:21"}
{"current_steps": 171, "total_steps": 237, "loss": 0.15022137761116028, "lr": 2.249033024623672e-06, "epoch": 2.1656050955414012, "percentage": 72.15, "elapsed_time": "0:06:00", "remaining_time": "0:02:19"}
{"current_steps": 172, "total_steps": 237, "loss": 0.13759568333625793, "lr": 2.187753561796097e-06, "epoch": 2.178343949044586, "percentage": 72.57, "elapsed_time": "0:06:02", "remaining_time": "0:02:17"}
{"current_steps": 173, "total_steps": 237, "loss": 0.14168789982795715, "lr": 2.127085865960516e-06, "epoch": 2.1910828025477707, "percentage": 73.0, "elapsed_time": "0:06:05", "remaining_time": "0:02:15"}
{"current_steps": 174, "total_steps": 237, "loss": 0.12648224830627441, "lr": 2.0670431345712092e-06, "epoch": 2.2038216560509554, "percentage": 73.42, "elapsed_time": "0:06:07", "remaining_time": "0:02:12"}
{"current_steps": 175, "total_steps": 237, "loss": 0.12812396883964539, "lr": 2.0076384291297134e-06, "epoch": 2.21656050955414, "percentage": 73.84, "elapsed_time": "0:06:09", "remaining_time": "0:02:10"}
{"current_steps": 176, "total_steps": 237, "loss": 0.12238447368144989, "lr": 1.9488846723434646e-06, "epoch": 2.229299363057325, "percentage": 74.26, "elapsed_time": "0:06:11", "remaining_time": "0:02:08"}
{"current_steps": 177, "total_steps": 237, "loss": 0.13111728429794312, "lr": 1.890794645314633e-06, "epoch": 2.2420382165605095, "percentage": 74.68, "elapsed_time": "0:06:13", "remaining_time": "0:02:06"}
{"current_steps": 178, "total_steps": 237, "loss": 0.14279913902282715, "lr": 1.8333809847597644e-06, "epoch": 2.254777070063694, "percentage": 75.11, "elapsed_time": "0:06:15", "remaining_time": "0:02:04"}
{"current_steps": 179, "total_steps": 237, "loss": 0.13436204195022583, "lr": 1.7766561802608374e-06, "epoch": 2.267515923566879, "percentage": 75.53, "elapsed_time": "0:06:17", "remaining_time": "0:02:02"}
{"current_steps": 180, "total_steps": 237, "loss": 0.11263652890920639, "lr": 1.7206325715483003e-06, "epoch": 2.2802547770700636, "percentage": 75.95, "elapsed_time": "0:06:19", "remaining_time": "0:02:00"}
{"current_steps": 181, "total_steps": 237, "loss": 0.09942139685153961, "lr": 1.665322345816746e-06, "epoch": 2.2929936305732483, "percentage": 76.37, "elapsed_time": "0:06:22", "remaining_time": "0:01:58"}
{"current_steps": 182, "total_steps": 237, "loss": 0.12155772745609283, "lr": 1.6107375350737437e-06, "epoch": 2.305732484076433, "percentage": 76.79, "elapsed_time": "0:06:24", "remaining_time": "0:01:56"}
{"current_steps": 183, "total_steps": 237, "loss": 0.08767275512218475, "lr": 1.556890013522428e-06, "epoch": 2.3184713375796178, "percentage": 77.22, "elapsed_time": "0:06:26", "remaining_time": "0:01:53"}
{"current_steps": 184, "total_steps": 237, "loss": 0.15320360660552979, "lr": 1.50379149497843e-06, "epoch": 2.3312101910828025, "percentage": 77.64, "elapsed_time": "0:06:28", "remaining_time": "0:01:51"}
{"current_steps": 185, "total_steps": 237, "loss": 0.08692272752523422, "lr": 1.4514535303216893e-06, "epoch": 2.343949044585987, "percentage": 78.06, "elapsed_time": "0:06:30", "remaining_time": "0:01:49"}
{"current_steps": 186, "total_steps": 237, "loss": 0.10834214836359024, "lr": 1.3998875049837141e-06, "epoch": 2.356687898089172, "percentage": 78.48, "elapsed_time": "0:06:32", "remaining_time": "0:01:47"}
{"current_steps": 187, "total_steps": 237, "loss": 0.13243836164474487, "lr": 1.3491046364708294e-06, "epoch": 2.3694267515923566, "percentage": 78.9, "elapsed_time": "0:06:34", "remaining_time": "0:01:45"}
{"current_steps": 188, "total_steps": 237, "loss": 0.11645258218050003, "lr": 1.2991159719239581e-06, "epoch": 2.3821656050955413, "percentage": 79.32, "elapsed_time": "0:06:36", "remaining_time": "0:01:43"}
{"current_steps": 189, "total_steps": 237, "loss": 0.11976869404315948, "lr": 1.249932385715467e-06, "epoch": 2.394904458598726, "percentage": 79.75, "elapsed_time": "0:06:38", "remaining_time": "0:01:41"}
{"current_steps": 190, "total_steps": 237, "loss": 0.09735321998596191, "lr": 1.2015645770835765e-06, "epoch": 2.4076433121019107, "percentage": 80.17, "elapsed_time": "0:06:40", "remaining_time": "0:01:39"}
{"current_steps": 191, "total_steps": 237, "loss": 0.11357178539037704, "lr": 1.1540230678048969e-06, "epoch": 2.4203821656050954, "percentage": 80.59, "elapsed_time": "0:06:42", "remaining_time": "0:01:37"}
{"current_steps": 192, "total_steps": 237, "loss": 0.11156035959720612, "lr": 1.1073181999055538e-06, "epoch": 2.43312101910828, "percentage": 81.01, "elapsed_time": "0:06:45", "remaining_time": "0:01:34"}
{"current_steps": 193, "total_steps": 237, "loss": 0.1507529616355896, "lr": 1.0614601334114099e-06, "epoch": 2.445859872611465, "percentage": 81.43, "elapsed_time": "0:06:47", "remaining_time": "0:01:32"}
{"current_steps": 194, "total_steps": 237, "loss": 0.09091517329216003, "lr": 1.016458844137887e-06, "epoch": 2.4585987261146496, "percentage": 81.86, "elapsed_time": "0:06:49", "remaining_time": "0:01:30"}
{"current_steps": 195, "total_steps": 237, "loss": 0.08778954297304153, "lr": 9.723241215198692e-07, "epoch": 2.4713375796178343, "percentage": 82.28, "elapsed_time": "0:06:51", "remaining_time": "0:01:28"}
{"current_steps": 196, "total_steps": 237, "loss": 0.11164680123329163, "lr": 9.290655664821296e-07, "epoch": 2.484076433121019, "percentage": 82.7, "elapsed_time": "0:06:53", "remaining_time": "0:01:26"}
{"current_steps": 197, "total_steps": 237, "loss": 0.12990619242191315, "lr": 8.866925893507805e-07, "epoch": 2.4968152866242037, "percentage": 83.12, "elapsed_time": "0:06:55", "remaining_time": "0:01:24"}
{"current_steps": 198, "total_steps": 237, "loss": 0.12117606401443481, "lr": 8.45214407806182e-07, "epoch": 2.5095541401273884, "percentage": 83.54, "elapsed_time": "0:06:57", "remaining_time": "0:01:22"}
{"current_steps": 199, "total_steps": 237, "loss": 0.11454776674509048, "lr": 8.046400448777575e-07, "epoch": 2.522292993630573, "percentage": 83.97, "elapsed_time": "0:06:59", "remaining_time": "0:01:20"}
{"current_steps": 200, "total_steps": 237, "loss": 0.10610666126012802, "lr": 7.649783269811523e-07, "epoch": 2.535031847133758, "percentage": 84.39, "elapsed_time": "0:07:01", "remaining_time": "0:01:18"}
{"current_steps": 201, "total_steps": 237, "loss": 0.1145130917429924, "lr": 7.26237881998163e-07, "epoch": 2.5477707006369426, "percentage": 84.81, "elapsed_time": "0:07:03", "remaining_time": "0:01:15"}
{"current_steps": 202, "total_steps": 237, "loss": 0.11320038139820099, "lr": 6.884271373998608e-07, "epoch": 2.5605095541401273, "percentage": 85.23, "elapsed_time": "0:07:05", "remaining_time": "0:01:13"}
{"current_steps": 203, "total_steps": 237, "loss": 0.09951528906822205, "lr": 6.515543184133e-07, "epoch": 2.573248407643312, "percentage": 85.65, "elapsed_time": "0:07:07", "remaining_time": "0:01:11"}
{"current_steps": 204, "total_steps": 237, "loss": 0.14554539322853088, "lr": 6.156274462322292e-07, "epoch": 2.5859872611464967, "percentage": 86.08, "elapsed_time": "0:07:10", "remaining_time": "0:01:09"}
{"current_steps": 205, "total_steps": 237, "loss": 0.10202307999134064, "lr": 5.806543362721945e-07, "epoch": 2.5987261146496814, "percentage": 86.5, "elapsed_time": "0:07:12", "remaining_time": "0:01:07"}
{"current_steps": 206, "total_steps": 237, "loss": 0.09979554265737534, "lr": 5.466425964703914e-07, "epoch": 2.611464968152866, "percentage": 86.92, "elapsed_time": "0:07:14", "remaining_time": "0:01:05"}
{"current_steps": 207, "total_steps": 237, "loss": 0.11603157967329025, "lr": 5.135996256306619e-07, "epoch": 2.624203821656051, "percentage": 87.34, "elapsed_time": "0:07:16", "remaining_time": "0:01:03"}
{"current_steps": 208, "total_steps": 237, "loss": 0.23446135222911835, "lr": 4.815326118139813e-07, "epoch": 2.6369426751592355, "percentage": 87.76, "elapsed_time": "0:07:18", "remaining_time": "0:01:01"}
{"current_steps": 209, "total_steps": 237, "loss": 0.07456080615520477, "lr": 4.5044853077479134e-07, "epoch": 2.6496815286624202, "percentage": 88.19, "elapsed_time": "0:07:20", "remaining_time": "0:00:59"}
{"current_steps": 210, "total_steps": 237, "loss": 0.08614747226238251, "lr": 4.203541444435211e-07, "epoch": 2.662420382165605, "percentage": 88.61, "elapsed_time": "0:07:22", "remaining_time": "0:00:56"}
{"current_steps": 211, "total_steps": 237, "loss": 0.1103319376707077, "lr": 3.9125599945560866e-07, "epoch": 2.6751592356687897, "percentage": 89.03, "elapsed_time": "0:07:24", "remaining_time": "0:00:54"}
{"current_steps": 212, "total_steps": 237, "loss": 0.12318957597017288, "lr": 3.631604257273774e-07, "epoch": 2.6878980891719744, "percentage": 89.45, "elapsed_time": "0:07:26", "remaining_time": "0:00:52"}
{"current_steps": 213, "total_steps": 237, "loss": 0.12906193733215332, "lr": 3.360735350790428e-07, "epoch": 2.700636942675159, "percentage": 89.87, "elapsed_time": "0:07:28", "remaining_time": "0:00:50"}
{"current_steps": 214, "total_steps": 237, "loss": 0.11330439895391464, "lr": 3.100012199051627e-07, "epoch": 2.713375796178344, "percentage": 90.3, "elapsed_time": "0:07:30", "remaining_time": "0:00:48"}
{"current_steps": 215, "total_steps": 237, "loss": 0.11007498949766159, "lr": 2.8494915189283325e-07, "epoch": 2.7261146496815285, "percentage": 90.72, "elapsed_time": "0:07:32", "remaining_time": "0:00:46"}
{"current_steps": 216, "total_steps": 237, "loss": 0.1229395642876625, "lr": 2.6092278078788004e-07, "epoch": 2.738853503184713, "percentage": 91.14, "elapsed_time": "0:07:35", "remaining_time": "0:00:44"}
{"current_steps": 217, "total_steps": 237, "loss": 0.14496152102947235, "lr": 2.3792733320934348e-07, "epoch": 2.7515923566878984, "percentage": 91.56, "elapsed_time": "0:07:37", "remaining_time": "0:00:42"}
{"current_steps": 218, "total_steps": 237, "loss": 0.1528952717781067, "lr": 2.1596781151249524e-07, "epoch": 2.7643312101910826, "percentage": 91.98, "elapsed_time": "0:07:39", "remaining_time": "0:00:40"}
{"current_steps": 219, "total_steps": 237, "loss": 0.10217823833227158, "lr": 1.9504899270064105e-07, "epoch": 2.777070063694268, "percentage": 92.41, "elapsed_time": "0:07:41", "remaining_time": "0:00:37"}
{"current_steps": 220, "total_steps": 237, "loss": 0.11019779741764069, "lr": 1.7517542738595071e-07, "epoch": 2.789808917197452, "percentage": 92.83, "elapsed_time": "0:07:43", "remaining_time": "0:00:35"}
{"current_steps": 221, "total_steps": 237, "loss": 0.11216893792152405, "lr": 1.5635143879952575e-07, "epoch": 2.802547770700637, "percentage": 93.25, "elapsed_time": "0:07:45", "remaining_time": "0:00:33"}
{"current_steps": 222, "total_steps": 237, "loss": 0.11647488921880722, "lr": 1.3858112185094418e-07, "epoch": 2.8152866242038215, "percentage": 93.67, "elapsed_time": "0:07:47", "remaining_time": "0:00:31"}
{"current_steps": 223, "total_steps": 237, "loss": 0.1183975338935852, "lr": 1.2186834223746612e-07, "epoch": 2.8280254777070066, "percentage": 94.09, "elapsed_time": "0:07:49", "remaining_time": "0:00:29"}
{"current_steps": 224, "total_steps": 237, "loss": 0.1070728451013565, "lr": 1.0621673560309798e-07, "epoch": 2.840764331210191, "percentage": 94.51, "elapsed_time": "0:07:52", "remaining_time": "0:00:27"}
{"current_steps": 225, "total_steps": 237, "loss": 0.09823741018772125, "lr": 9.162970674771177e-08, "epoch": 2.853503184713376, "percentage": 94.94, "elapsed_time": "0:07:54", "remaining_time": "0:00:25"}
{"current_steps": 226, "total_steps": 237, "loss": 0.09354014694690704, "lr": 7.81104288863721e-08, "epoch": 2.8662420382165603, "percentage": 95.36, "elapsed_time": "0:07:56", "remaining_time": "0:00:23"}
{"current_steps": 227, "total_steps": 237, "loss": 0.11680345237255096, "lr": 6.566184295904777e-08, "epoch": 2.8789808917197455, "percentage": 95.78, "elapsed_time": "0:07:58", "remaining_time": "0:00:21"}
{"current_steps": 228, "total_steps": 237, "loss": 0.12170577049255371, "lr": 5.4286656990847897e-08, "epoch": 2.8917197452229297, "percentage": 96.2, "elapsed_time": "0:08:00", "remaining_time": "0:00:18"}
{"current_steps": 229, "total_steps": 237, "loss": 0.10433293133974075, "lr": 4.398734550292716e-08, "epoch": 2.904458598726115, "percentage": 96.62, "elapsed_time": "0:08:02", "remaining_time": "0:00:16"}
{"current_steps": 230, "total_steps": 237, "loss": 0.09482909739017487, "lr": 3.476614897418573e-08, "epoch": 2.917197452229299, "percentage": 97.05, "elapsed_time": "0:08:04", "remaining_time": "0:00:14"}
{"current_steps": 231, "total_steps": 237, "loss": 0.11947542428970337, "lr": 2.6625073353884756e-08, "epoch": 2.9299363057324843, "percentage": 97.47, "elapsed_time": "0:08:07", "remaining_time": "0:00:12"}
{"current_steps": 232, "total_steps": 237, "loss": 0.10883431136608124, "lr": 1.9565889625275945e-08, "epoch": 2.9426751592356686, "percentage": 97.89, "elapsed_time": "0:08:09", "remaining_time": "0:00:10"}
{"current_steps": 233, "total_steps": 237, "loss": 0.09434341639280319, "lr": 1.3590133420350315e-08, "epoch": 2.9554140127388537, "percentage": 98.31, "elapsed_time": "0:08:11", "remaining_time": "0:00:08"}
{"current_steps": 234, "total_steps": 237, "loss": 0.14017127454280853, "lr": 8.699104685779835e-09, "epoch": 2.968152866242038, "percentage": 98.73, "elapsed_time": "0:08:13", "remaining_time": "0:00:06"}
{"current_steps": 235, "total_steps": 237, "loss": 0.11752738803625107, "lr": 4.89386740013198e-09, "epoch": 2.980891719745223, "percentage": 99.16, "elapsed_time": "0:08:15", "remaining_time": "0:00:04"}
{"current_steps": 236, "total_steps": 237, "loss": 0.12928688526153564, "lr": 2.1752493424148647e-09, "epoch": 2.9936305732484074, "percentage": 99.58, "elapsed_time": "0:08:17", "remaining_time": "0:00:02"}
{"current_steps": 237, "total_steps": 237, "loss": 0.0627690926194191, "lr": 5.438419120062933e-10, "epoch": 3.0, "percentage": 100.0, "elapsed_time": "0:08:18", "remaining_time": "0:00:00"}
{"current_steps": 237, "total_steps": 237, "epoch": 3.0, "percentage": 100.0, "elapsed_time": "0:09:48", "remaining_time": "0:00:00"}

3
training_args.bin Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ace1a5ab94f4adca6ff5393a16275ac20ceea95bc5df297716b9dbbce32886c8
size 6968