初始化项目,由ModelHub XC社区提供模型

Model: shuoxing/llama3-8b-full-pretrain-wash-c4-0-6m-sft-bs64
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-06-12 17:02:16 +08:00
commit 38c670dc21
10 changed files with 410 additions and 0 deletions

36
.gitattributes vendored Normal file
View File

@@ -0,0 +1,36 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text

58
README.md Normal file
View File

@@ -0,0 +1,58 @@
---
library_name: transformers
tags:
- llama-factory
- generated_from_trainer
model-index:
- name: llama3-8b-full-pretrain-wash-c4-0-6m-sft-bs64
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# llama3-8b-full-pretrain-wash-c4-0-6m-sft-bs64
This model was trained from scratch on an unknown dataset.
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- total_eval_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 0.1
- num_epochs: 3.0
### Training results
### Framework versions
- Transformers 5.2.0
- Pytorch 2.6.0+cu124
- Datasets 4.0.0
- Tokenizers 0.22.2

5
chat_template.jinja Normal file
View File

@@ -0,0 +1,5 @@
{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>
'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>
' }}{% endif %}

32
config.json Normal file
View File

@@ -0,0 +1,32 @@
{
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"dtype": "bfloat16",
"eos_token_id": 128009,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pad_token_id": 128009,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_parameters": {
"rope_theta": 500000.0,
"rope_type": "default"
},
"tie_word_embeddings": false,
"transformers_version": "5.2.0",
"use_cache": false,
"vocab_size": 128256
}

13
generation_config.json Normal file
View File

@@ -0,0 +1,13 @@
{
"bos_token_id": 128000,
"do_sample": true,
"eos_token_id": [
128001,
128009
],
"max_length": 4096,
"pad_token_id": 128009,
"temperature": 0.6,
"top_p": 0.9,
"transformers_version": "5.2.0"
}

3
model.safetensors Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:212db0722178b9e8013564fcc246068318180358a7f056c274550b711429cd63
size 16060556616

3
tokenizer.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8c1dcab308e7cf5970ea38815e0a62887d705c5b436f869ca27a5dcdd40c36a6
size 17210148

19
tokenizer_config.json Normal file
View File

@@ -0,0 +1,19 @@
{
"backend": "tokenizers",
"bos_token": "<|begin_of_text|>",
"clean_up_tokenization_spaces": true,
"eos_token": "<|eot_id|>",
"extra_special_tokens": [
"<|eom_id|>"
],
"is_local": true,
"model_input_names": [
"input_ids",
"attention_mask"
],
"model_max_length": 1000000000000000019884624838656,
"pad_token": "<|eot_id|>",
"padding_side": "right",
"split_special_tokens": false,
"tokenizer_class": "TokenizersBackend"
}

238
trainer_log.jsonl Normal file
View File

@@ -0,0 +1,238 @@
{"current_steps": 1, "total_steps": 237, "loss": 2.205630302429199, "lr": 0.0, "epoch": 0.012738853503184714, "percentage": 0.42, "elapsed_time": "0:00:04", "remaining_time": "0:18:27"}
{"current_steps": 2, "total_steps": 237, "loss": 2.1361422538757324, "lr": 4.1666666666666667e-07, "epoch": 0.025477707006369428, "percentage": 0.84, "elapsed_time": "0:00:06", "remaining_time": "0:13:12"}
{"current_steps": 3, "total_steps": 237, "loss": 2.1427900791168213, "lr": 8.333333333333333e-07, "epoch": 0.03821656050955414, "percentage": 1.27, "elapsed_time": "0:00:08", "remaining_time": "0:11:26"}
{"current_steps": 4, "total_steps": 237, "loss": 2.0852017402648926, "lr": 1.25e-06, "epoch": 0.050955414012738856, "percentage": 1.69, "elapsed_time": "0:00:10", "remaining_time": "0:10:40"}
{"current_steps": 5, "total_steps": 237, "loss": 2.27068829536438, "lr": 1.6666666666666667e-06, "epoch": 0.06369426751592357, "percentage": 2.11, "elapsed_time": "0:00:13", "remaining_time": "0:10:15"}
{"current_steps": 6, "total_steps": 237, "loss": 2.090237617492676, "lr": 2.0833333333333334e-06, "epoch": 0.07643312101910828, "percentage": 2.53, "elapsed_time": "0:00:15", "remaining_time": "0:09:49"}
{"current_steps": 7, "total_steps": 237, "loss": 1.7877705097198486, "lr": 2.5e-06, "epoch": 0.08917197452229299, "percentage": 2.95, "elapsed_time": "0:00:17", "remaining_time": "0:09:49"}
{"current_steps": 8, "total_steps": 237, "loss": 1.7545547485351562, "lr": 2.916666666666667e-06, "epoch": 0.10191082802547771, "percentage": 3.38, "elapsed_time": "0:00:20", "remaining_time": "0:09:32"}
{"current_steps": 9, "total_steps": 237, "loss": 1.753498911857605, "lr": 3.3333333333333333e-06, "epoch": 0.11464968152866242, "percentage": 3.8, "elapsed_time": "0:00:22", "remaining_time": "0:09:19"}
{"current_steps": 10, "total_steps": 237, "loss": 1.76539945602417, "lr": 3.7500000000000005e-06, "epoch": 0.12738853503184713, "percentage": 4.22, "elapsed_time": "0:00:24", "remaining_time": "0:09:08"}
{"current_steps": 11, "total_steps": 237, "loss": 1.71544349193573, "lr": 4.166666666666667e-06, "epoch": 0.14012738853503184, "percentage": 4.64, "elapsed_time": "0:00:26", "remaining_time": "0:08:59"}
{"current_steps": 12, "total_steps": 237, "loss": 1.6188606023788452, "lr": 4.583333333333333e-06, "epoch": 0.15286624203821655, "percentage": 5.06, "elapsed_time": "0:00:28", "remaining_time": "0:08:53"}
{"current_steps": 13, "total_steps": 237, "loss": 1.7508370876312256, "lr": 5e-06, "epoch": 0.16560509554140126, "percentage": 5.49, "elapsed_time": "0:00:30", "remaining_time": "0:08:45"}
{"current_steps": 14, "total_steps": 237, "loss": 1.682621955871582, "lr": 5.416666666666667e-06, "epoch": 0.17834394904458598, "percentage": 5.91, "elapsed_time": "0:00:32", "remaining_time": "0:08:38"}
{"current_steps": 15, "total_steps": 237, "loss": 1.5453513860702515, "lr": 5.833333333333334e-06, "epoch": 0.1910828025477707, "percentage": 6.33, "elapsed_time": "0:00:34", "remaining_time": "0:08:32"}
{"current_steps": 16, "total_steps": 237, "loss": 1.5240049362182617, "lr": 6.25e-06, "epoch": 0.20382165605095542, "percentage": 6.75, "elapsed_time": "0:00:36", "remaining_time": "0:08:27"}
{"current_steps": 17, "total_steps": 237, "loss": 1.5656471252441406, "lr": 6.666666666666667e-06, "epoch": 0.21656050955414013, "percentage": 7.17, "elapsed_time": "0:00:38", "remaining_time": "0:08:21"}
{"current_steps": 18, "total_steps": 237, "loss": 1.3331950902938843, "lr": 7.083333333333335e-06, "epoch": 0.22929936305732485, "percentage": 7.59, "elapsed_time": "0:00:40", "remaining_time": "0:08:16"}
{"current_steps": 19, "total_steps": 237, "loss": 1.6144132614135742, "lr": 7.500000000000001e-06, "epoch": 0.24203821656050956, "percentage": 8.02, "elapsed_time": "0:00:42", "remaining_time": "0:08:12"}
{"current_steps": 20, "total_steps": 237, "loss": 1.5202698707580566, "lr": 7.916666666666667e-06, "epoch": 0.25477707006369427, "percentage": 8.44, "elapsed_time": "0:00:45", "remaining_time": "0:08:08"}
{"current_steps": 21, "total_steps": 237, "loss": 1.5168068408966064, "lr": 8.333333333333334e-06, "epoch": 0.267515923566879, "percentage": 8.86, "elapsed_time": "0:00:47", "remaining_time": "0:08:05"}
{"current_steps": 22, "total_steps": 237, "loss": 1.4229333400726318, "lr": 8.750000000000001e-06, "epoch": 0.2802547770700637, "percentage": 9.28, "elapsed_time": "0:00:49", "remaining_time": "0:08:02"}
{"current_steps": 23, "total_steps": 237, "loss": 1.4734992980957031, "lr": 9.166666666666666e-06, "epoch": 0.2929936305732484, "percentage": 9.7, "elapsed_time": "0:00:51", "remaining_time": "0:07:58"}
{"current_steps": 24, "total_steps": 237, "loss": 1.5587489604949951, "lr": 9.583333333333335e-06, "epoch": 0.3057324840764331, "percentage": 10.13, "elapsed_time": "0:00:53", "remaining_time": "0:07:54"}
{"current_steps": 25, "total_steps": 237, "loss": 1.3395847082138062, "lr": 1e-05, "epoch": 0.3184713375796178, "percentage": 10.55, "elapsed_time": "0:00:55", "remaining_time": "0:07:52"}
{"current_steps": 26, "total_steps": 237, "loss": 1.4213643074035645, "lr": 9.999456158087994e-06, "epoch": 0.33121019108280253, "percentage": 10.97, "elapsed_time": "0:00:57", "remaining_time": "0:07:49"}
{"current_steps": 27, "total_steps": 237, "loss": 1.5445196628570557, "lr": 9.997824750657586e-06, "epoch": 0.34394904458598724, "percentage": 11.39, "elapsed_time": "0:01:00", "remaining_time": "0:07:46"}
{"current_steps": 28, "total_steps": 237, "loss": 1.3953464031219482, "lr": 9.995106132599869e-06, "epoch": 0.35668789808917195, "percentage": 11.81, "elapsed_time": "0:01:02", "remaining_time": "0:07:43"}
{"current_steps": 29, "total_steps": 237, "loss": 1.285254955291748, "lr": 9.99130089531422e-06, "epoch": 0.36942675159235666, "percentage": 12.24, "elapsed_time": "0:01:04", "remaining_time": "0:07:42"}
{"current_steps": 30, "total_steps": 237, "loss": 1.5878655910491943, "lr": 9.98640986657965e-06, "epoch": 0.3821656050955414, "percentage": 12.66, "elapsed_time": "0:01:06", "remaining_time": "0:07:39"}
{"current_steps": 31, "total_steps": 237, "loss": 1.4214540719985962, "lr": 9.980434110374725e-06, "epoch": 0.39490445859872614, "percentage": 13.08, "elapsed_time": "0:01:08", "remaining_time": "0:07:36"}
{"current_steps": 32, "total_steps": 237, "loss": 1.5987496376037598, "lr": 9.973374926646117e-06, "epoch": 0.40764331210191085, "percentage": 13.5, "elapsed_time": "0:01:10", "remaining_time": "0:07:33"}
{"current_steps": 33, "total_steps": 237, "loss": 1.427947759628296, "lr": 9.965233851025816e-06, "epoch": 0.42038216560509556, "percentage": 13.92, "elapsed_time": "0:01:12", "remaining_time": "0:07:30"}
{"current_steps": 34, "total_steps": 237, "loss": 1.550269603729248, "lr": 9.956012654497073e-06, "epoch": 0.43312101910828027, "percentage": 14.35, "elapsed_time": "0:01:14", "remaining_time": "0:07:27"}
{"current_steps": 35, "total_steps": 237, "loss": 1.5326802730560303, "lr": 9.945713343009154e-06, "epoch": 0.445859872611465, "percentage": 14.77, "elapsed_time": "0:01:16", "remaining_time": "0:07:24"}
{"current_steps": 36, "total_steps": 237, "loss": 1.4180384874343872, "lr": 9.934338157040953e-06, "epoch": 0.4585987261146497, "percentage": 15.19, "elapsed_time": "0:01:19", "remaining_time": "0:07:21"}
{"current_steps": 37, "total_steps": 237, "loss": 1.5230302810668945, "lr": 9.921889571113629e-06, "epoch": 0.4713375796178344, "percentage": 15.61, "elapsed_time": "0:01:21", "remaining_time": "0:07:18"}
{"current_steps": 38, "total_steps": 237, "loss": 1.400693655014038, "lr": 9.90837029325229e-06, "epoch": 0.4840764331210191, "percentage": 16.03, "elapsed_time": "0:01:23", "remaining_time": "0:07:15"}
{"current_steps": 39, "total_steps": 237, "loss": 1.4020304679870605, "lr": 9.893783264396903e-06, "epoch": 0.4968152866242038, "percentage": 16.46, "elapsed_time": "0:01:25", "remaining_time": "0:07:12"}
{"current_steps": 40, "total_steps": 237, "loss": 1.4217119216918945, "lr": 9.878131657762535e-06, "epoch": 0.5095541401273885, "percentage": 16.88, "elapsed_time": "0:01:27", "remaining_time": "0:07:10"}
{"current_steps": 41, "total_steps": 237, "loss": 1.3887419700622559, "lr": 9.861418878149056e-06, "epoch": 0.5222929936305732, "percentage": 17.3, "elapsed_time": "0:01:29", "remaining_time": "0:07:08"}
{"current_steps": 42, "total_steps": 237, "loss": 1.4291396141052246, "lr": 9.843648561200476e-06, "epoch": 0.535031847133758, "percentage": 17.72, "elapsed_time": "0:01:31", "remaining_time": "0:07:05"}
{"current_steps": 43, "total_steps": 237, "loss": 1.5899100303649902, "lr": 9.82482457261405e-06, "epoch": 0.5477707006369427, "percentage": 18.14, "elapsed_time": "0:01:33", "remaining_time": "0:07:02"}
{"current_steps": 44, "total_steps": 237, "loss": 1.4736939668655396, "lr": 9.80495100729936e-06, "epoch": 0.5605095541401274, "percentage": 18.57, "elapsed_time": "0:01:35", "remaining_time": "0:07:00"}
{"current_steps": 45, "total_steps": 237, "loss": 1.4299547672271729, "lr": 9.784032188487507e-06, "epoch": 0.5732484076433121, "percentage": 18.99, "elapsed_time": "0:01:37", "remaining_time": "0:06:57"}
{"current_steps": 46, "total_steps": 237, "loss": 1.5320968627929688, "lr": 9.762072666790658e-06, "epoch": 0.5859872611464968, "percentage": 19.41, "elapsed_time": "0:01:39", "remaining_time": "0:06:54"}
{"current_steps": 47, "total_steps": 237, "loss": 1.6049351692199707, "lr": 9.73907721921212e-06, "epoch": 0.5987261146496815, "percentage": 19.83, "elapsed_time": "0:01:42", "remaining_time": "0:06:52"}
{"current_steps": 48, "total_steps": 237, "loss": 1.4209519624710083, "lr": 9.715050848107167e-06, "epoch": 0.6114649681528662, "percentage": 20.25, "elapsed_time": "0:01:44", "remaining_time": "0:06:50"}
{"current_steps": 49, "total_steps": 237, "loss": 1.3676856756210327, "lr": 9.689998780094839e-06, "epoch": 0.6242038216560509, "percentage": 20.68, "elapsed_time": "0:01:46", "remaining_time": "0:06:47"}
{"current_steps": 50, "total_steps": 237, "loss": 1.4694169759750366, "lr": 9.663926464920959e-06, "epoch": 0.6369426751592356, "percentage": 21.1, "elapsed_time": "0:01:48", "remaining_time": "0:06:45"}
{"current_steps": 51, "total_steps": 237, "loss": 1.5138176679611206, "lr": 9.636839574272623e-06, "epoch": 0.6496815286624203, "percentage": 21.52, "elapsed_time": "0:01:50", "remaining_time": "0:06:42"}
{"current_steps": 52, "total_steps": 237, "loss": 1.4585180282592773, "lr": 9.608744000544392e-06, "epoch": 0.6624203821656051, "percentage": 21.94, "elapsed_time": "0:01:52", "remaining_time": "0:06:40"}
{"current_steps": 53, "total_steps": 237, "loss": 1.2304143905639648, "lr": 9.579645855556481e-06, "epoch": 0.6751592356687898, "percentage": 22.36, "elapsed_time": "0:01:54", "remaining_time": "0:06:37"}
{"current_steps": 54, "total_steps": 237, "loss": 1.3905868530273438, "lr": 9.54955146922521e-06, "epoch": 0.6878980891719745, "percentage": 22.78, "elapsed_time": "0:01:56", "remaining_time": "0:06:35"}
{"current_steps": 55, "total_steps": 237, "loss": 1.340023159980774, "lr": 9.51846738818602e-06, "epoch": 0.7006369426751592, "percentage": 23.21, "elapsed_time": "0:01:58", "remaining_time": "0:06:33"}
{"current_steps": 56, "total_steps": 237, "loss": 1.3088650703430176, "lr": 9.48640037436934e-06, "epoch": 0.7133757961783439, "percentage": 23.63, "elapsed_time": "0:02:00", "remaining_time": "0:06:30"}
{"current_steps": 57, "total_steps": 237, "loss": 1.3765389919281006, "lr": 9.453357403529609e-06, "epoch": 0.7261146496815286, "percentage": 24.05, "elapsed_time": "0:02:02", "remaining_time": "0:06:28"}
{"current_steps": 58, "total_steps": 237, "loss": 1.4423935413360596, "lr": 9.419345663727805e-06, "epoch": 0.7388535031847133, "percentage": 24.47, "elapsed_time": "0:02:05", "remaining_time": "0:06:25"}
{"current_steps": 59, "total_steps": 237, "loss": 1.3210177421569824, "lr": 9.38437255376777e-06, "epoch": 0.7515923566878981, "percentage": 24.89, "elapsed_time": "0:02:07", "remaining_time": "0:06:23"}
{"current_steps": 60, "total_steps": 237, "loss": 1.441453456878662, "lr": 9.348445681586703e-06, "epoch": 0.7643312101910829, "percentage": 25.32, "elapsed_time": "0:02:09", "remaining_time": "0:06:21"}
{"current_steps": 61, "total_steps": 237, "loss": 1.4365830421447754, "lr": 9.31157286260014e-06, "epoch": 0.7770700636942676, "percentage": 25.74, "elapsed_time": "0:02:11", "remaining_time": "0:06:18"}
{"current_steps": 62, "total_steps": 237, "loss": 1.3535263538360596, "lr": 9.273762118001837e-06, "epoch": 0.7898089171974523, "percentage": 26.16, "elapsed_time": "0:02:13", "remaining_time": "0:06:16"}
{"current_steps": 63, "total_steps": 237, "loss": 1.3149755001068115, "lr": 9.235021673018849e-06, "epoch": 0.802547770700637, "percentage": 26.58, "elapsed_time": "0:02:15", "remaining_time": "0:06:14"}
{"current_steps": 64, "total_steps": 237, "loss": 1.3187739849090576, "lr": 9.195359955122244e-06, "epoch": 0.8152866242038217, "percentage": 27.0, "elapsed_time": "0:02:17", "remaining_time": "0:06:11"}
{"current_steps": 65, "total_steps": 237, "loss": 1.3459105491638184, "lr": 9.15478559219382e-06, "epoch": 0.8280254777070064, "percentage": 27.43, "elapsed_time": "0:02:19", "remaining_time": "0:06:09"}
{"current_steps": 66, "total_steps": 237, "loss": 1.485987901687622, "lr": 9.113307410649222e-06, "epoch": 0.8407643312101911, "percentage": 27.85, "elapsed_time": "0:02:21", "remaining_time": "0:06:06"}
{"current_steps": 67, "total_steps": 237, "loss": 1.4003167152404785, "lr": 9.070934433517872e-06, "epoch": 0.8535031847133758, "percentage": 28.27, "elapsed_time": "0:02:23", "remaining_time": "0:06:04"}
{"current_steps": 68, "total_steps": 237, "loss": 1.424062967300415, "lr": 9.027675878480131e-06, "epoch": 0.8662420382165605, "percentage": 28.69, "elapsed_time": "0:02:25", "remaining_time": "0:06:02"}
{"current_steps": 69, "total_steps": 237, "loss": 1.4042322635650635, "lr": 8.983541155862114e-06, "epoch": 0.8789808917197452, "percentage": 29.11, "elapsed_time": "0:02:27", "remaining_time": "0:05:59"}
{"current_steps": 70, "total_steps": 237, "loss": 1.3098585605621338, "lr": 8.938539866588593e-06, "epoch": 0.89171974522293, "percentage": 29.54, "elapsed_time": "0:02:29", "remaining_time": "0:05:57"}
{"current_steps": 71, "total_steps": 237, "loss": 1.3807165622711182, "lr": 8.892681800094447e-06, "epoch": 0.9044585987261147, "percentage": 29.96, "elapsed_time": "0:02:31", "remaining_time": "0:05:55"}
{"current_steps": 72, "total_steps": 237, "loss": 1.5126240253448486, "lr": 8.845976932195104e-06, "epoch": 0.9171974522292994, "percentage": 30.38, "elapsed_time": "0:02:33", "remaining_time": "0:05:52"}
{"current_steps": 73, "total_steps": 237, "loss": 1.547788143157959, "lr": 8.798435422916425e-06, "epoch": 0.9299363057324841, "percentage": 30.8, "elapsed_time": "0:02:36", "remaining_time": "0:05:50"}
{"current_steps": 74, "total_steps": 237, "loss": 1.282464623451233, "lr": 8.750067614284534e-06, "epoch": 0.9426751592356688, "percentage": 31.22, "elapsed_time": "0:02:38", "remaining_time": "0:05:48"}
{"current_steps": 75, "total_steps": 237, "loss": 1.503675103187561, "lr": 8.700884028076042e-06, "epoch": 0.9554140127388535, "percentage": 31.65, "elapsed_time": "0:02:40", "remaining_time": "0:05:46"}
{"current_steps": 76, "total_steps": 237, "loss": 1.3122708797454834, "lr": 8.650895363529172e-06, "epoch": 0.9681528662420382, "percentage": 32.07, "elapsed_time": "0:02:42", "remaining_time": "0:05:44"}
{"current_steps": 77, "total_steps": 237, "loss": 1.30131196975708, "lr": 8.600112495016289e-06, "epoch": 0.9808917197452229, "percentage": 32.49, "elapsed_time": "0:02:44", "remaining_time": "0:05:41"}
{"current_steps": 78, "total_steps": 237, "loss": 1.4969130754470825, "lr": 8.548546469678311e-06, "epoch": 0.9936305732484076, "percentage": 32.91, "elapsed_time": "0:02:46", "remaining_time": "0:05:39"}
{"current_steps": 79, "total_steps": 237, "loss": 1.4245132207870483, "lr": 8.496208505021572e-06, "epoch": 1.0, "percentage": 33.33, "elapsed_time": "0:02:47", "remaining_time": "0:05:35"}
{"current_steps": 80, "total_steps": 237, "loss": 0.8896101713180542, "lr": 8.443109986477574e-06, "epoch": 1.0127388535031847, "percentage": 33.76, "elapsed_time": "0:02:49", "remaining_time": "0:05:32"}
{"current_steps": 81, "total_steps": 237, "loss": 0.6292753219604492, "lr": 8.389262464926256e-06, "epoch": 1.0254777070063694, "percentage": 34.18, "elapsed_time": "0:02:51", "remaining_time": "0:05:30"}
{"current_steps": 82, "total_steps": 237, "loss": 0.6534790992736816, "lr": 8.334677654183254e-06, "epoch": 1.0382165605095541, "percentage": 34.6, "elapsed_time": "0:02:54", "remaining_time": "0:05:29"}
{"current_steps": 83, "total_steps": 237, "loss": 0.7352883815765381, "lr": 8.279367428451703e-06, "epoch": 1.0509554140127388, "percentage": 35.02, "elapsed_time": "0:02:56", "remaining_time": "0:05:27"}
{"current_steps": 84, "total_steps": 237, "loss": 0.6897856593132019, "lr": 8.223343819739164e-06, "epoch": 1.0636942675159236, "percentage": 35.44, "elapsed_time": "0:02:58", "remaining_time": "0:05:25"}
{"current_steps": 85, "total_steps": 237, "loss": 0.6689693927764893, "lr": 8.166619015240236e-06, "epoch": 1.0764331210191083, "percentage": 35.86, "elapsed_time": "0:03:00", "remaining_time": "0:05:23"}
{"current_steps": 86, "total_steps": 237, "loss": 0.5453209280967712, "lr": 8.109205354685367e-06, "epoch": 1.089171974522293, "percentage": 36.29, "elapsed_time": "0:03:02", "remaining_time": "0:05:20"}
{"current_steps": 87, "total_steps": 237, "loss": 0.6614304184913635, "lr": 8.051115327656538e-06, "epoch": 1.1019108280254777, "percentage": 36.71, "elapsed_time": "0:03:04", "remaining_time": "0:05:18"}
{"current_steps": 88, "total_steps": 237, "loss": 0.5656032562255859, "lr": 7.992361570870289e-06, "epoch": 1.1146496815286624, "percentage": 37.13, "elapsed_time": "0:03:06", "remaining_time": "0:05:16"}
{"current_steps": 89, "total_steps": 237, "loss": 0.5869074463844299, "lr": 7.932956865428792e-06, "epoch": 1.127388535031847, "percentage": 37.55, "elapsed_time": "0:03:09", "remaining_time": "0:05:14"}
{"current_steps": 90, "total_steps": 237, "loss": 0.5958786010742188, "lr": 7.872914134039485e-06, "epoch": 1.1401273885350318, "percentage": 37.97, "elapsed_time": "0:03:11", "remaining_time": "0:05:12"}
{"current_steps": 91, "total_steps": 237, "loss": 0.55110764503479, "lr": 7.812246438203905e-06, "epoch": 1.1528662420382165, "percentage": 38.4, "elapsed_time": "0:03:13", "remaining_time": "0:05:09"}
{"current_steps": 92, "total_steps": 237, "loss": 0.6653017997741699, "lr": 7.750966975376328e-06, "epoch": 1.1656050955414012, "percentage": 38.82, "elapsed_time": "0:03:15", "remaining_time": "0:05:07"}
{"current_steps": 93, "total_steps": 237, "loss": 0.5967283248901367, "lr": 7.689089076092851e-06, "epoch": 1.178343949044586, "percentage": 39.24, "elapsed_time": "0:03:17", "remaining_time": "0:05:05"}
{"current_steps": 94, "total_steps": 237, "loss": 0.5865520238876343, "lr": 7.626626201071494e-06, "epoch": 1.1910828025477707, "percentage": 39.66, "elapsed_time": "0:03:19", "remaining_time": "0:05:03"}
{"current_steps": 95, "total_steps": 237, "loss": 0.69908207654953, "lr": 7.563591938284012e-06, "epoch": 1.2038216560509554, "percentage": 40.08, "elapsed_time": "0:03:21", "remaining_time": "0:05:01"}
{"current_steps": 96, "total_steps": 237, "loss": 0.5799813270568848, "lr": 7.500000000000001e-06, "epoch": 1.21656050955414, "percentage": 40.51, "elapsed_time": "0:03:23", "remaining_time": "0:04:59"}
{"current_steps": 97, "total_steps": 237, "loss": 0.5793201923370361, "lr": 7.4358642198039835e-06, "epoch": 1.2292993630573248, "percentage": 40.93, "elapsed_time": "0:03:25", "remaining_time": "0:04:56"}
{"current_steps": 98, "total_steps": 237, "loss": 0.7011544704437256, "lr": 7.371198549586091e-06, "epoch": 1.2420382165605095, "percentage": 41.35, "elapsed_time": "0:03:27", "remaining_time": "0:04:54"}
{"current_steps": 99, "total_steps": 237, "loss": 0.5684264898300171, "lr": 7.306017056507018e-06, "epoch": 1.2547770700636942, "percentage": 41.77, "elapsed_time": "0:03:29", "remaining_time": "0:04:52"}
{"current_steps": 100, "total_steps": 237, "loss": 0.5350228548049927, "lr": 7.240333919937893e-06, "epoch": 1.267515923566879, "percentage": 42.19, "elapsed_time": "0:03:31", "remaining_time": "0:04:50"}
{"current_steps": 101, "total_steps": 237, "loss": 0.5655375123023987, "lr": 7.174163428375748e-06, "epoch": 1.2802547770700636, "percentage": 42.62, "elapsed_time": "0:03:33", "remaining_time": "0:04:48"}
{"current_steps": 102, "total_steps": 237, "loss": 0.4992554783821106, "lr": 7.107519976335241e-06, "epoch": 1.2929936305732483, "percentage": 43.04, "elapsed_time": "0:03:36", "remaining_time": "0:04:45"}
{"current_steps": 103, "total_steps": 237, "loss": 0.5392221212387085, "lr": 7.040418061217325e-06, "epoch": 1.305732484076433, "percentage": 43.46, "elapsed_time": "0:03:38", "remaining_time": "0:04:43"}
{"current_steps": 104, "total_steps": 237, "loss": 0.6300671696662903, "lr": 6.972872280155528e-06, "epoch": 1.3184713375796178, "percentage": 43.88, "elapsed_time": "0:03:40", "remaining_time": "0:04:41"}
{"current_steps": 105, "total_steps": 237, "loss": 0.6349946856498718, "lr": 6.9048973268405375e-06, "epoch": 1.3312101910828025, "percentage": 44.3, "elapsed_time": "0:03:42", "remaining_time": "0:04:39"}
{"current_steps": 106, "total_steps": 237, "loss": 0.6175198554992676, "lr": 6.836507988323785e-06, "epoch": 1.3439490445859872, "percentage": 44.73, "elapsed_time": "0:03:44", "remaining_time": "0:04:37"}
{"current_steps": 107, "total_steps": 237, "loss": 0.6079376935958862, "lr": 6.767719141800718e-06, "epoch": 1.356687898089172, "percentage": 45.15, "elapsed_time": "0:03:46", "remaining_time": "0:04:35"}
{"current_steps": 108, "total_steps": 237, "loss": 0.6216439008712769, "lr": 6.698545751374465e-06, "epoch": 1.3694267515923566, "percentage": 45.57, "elapsed_time": "0:03:48", "remaining_time": "0:04:32"}
{"current_steps": 109, "total_steps": 237, "loss": 0.5859916806221008, "lr": 6.629002864800589e-06, "epoch": 1.3821656050955413, "percentage": 45.99, "elapsed_time": "0:03:50", "remaining_time": "0:04:30"}
{"current_steps": 110, "total_steps": 237, "loss": 0.6229852437973022, "lr": 6.55910561021365e-06, "epoch": 1.394904458598726, "percentage": 46.41, "elapsed_time": "0:03:52", "remaining_time": "0:04:28"}
{"current_steps": 111, "total_steps": 237, "loss": 0.631940484046936, "lr": 6.488869192836279e-06, "epoch": 1.4076433121019107, "percentage": 46.84, "elapsed_time": "0:03:54", "remaining_time": "0:04:26"}
{"current_steps": 112, "total_steps": 237, "loss": 0.5421336889266968, "lr": 6.418308891671484e-06, "epoch": 1.4203821656050954, "percentage": 47.26, "elapsed_time": "0:03:56", "remaining_time": "0:04:24"}
{"current_steps": 113, "total_steps": 237, "loss": 0.5953583717346191, "lr": 6.347440056178904e-06, "epoch": 1.4331210191082802, "percentage": 47.68, "elapsed_time": "0:03:58", "remaining_time": "0:04:22"}
{"current_steps": 114, "total_steps": 237, "loss": 0.5827579498291016, "lr": 6.27627810293574e-06, "epoch": 1.4458598726114649, "percentage": 48.1, "elapsed_time": "0:04:00", "remaining_time": "0:04:20"}
{"current_steps": 115, "total_steps": 237, "loss": 0.6041445732116699, "lr": 6.204838512283073e-06, "epoch": 1.4585987261146496, "percentage": 48.52, "elapsed_time": "0:04:03", "remaining_time": "0:04:17"}
{"current_steps": 116, "total_steps": 237, "loss": 0.57387375831604, "lr": 6.133136824958334e-06, "epoch": 1.4713375796178343, "percentage": 48.95, "elapsed_time": "0:04:05", "remaining_time": "0:04:15"}
{"current_steps": 117, "total_steps": 237, "loss": 0.5562150478363037, "lr": 6.061188638714616e-06, "epoch": 1.484076433121019, "percentage": 49.37, "elapsed_time": "0:04:07", "remaining_time": "0:04:13"}
{"current_steps": 118, "total_steps": 237, "loss": 0.5856057405471802, "lr": 5.989009604927587e-06, "epoch": 1.4968152866242037, "percentage": 49.79, "elapsed_time": "0:04:09", "remaining_time": "0:04:11"}
{"current_steps": 119, "total_steps": 237, "loss": 0.6272364854812622, "lr": 5.916615425190744e-06, "epoch": 1.5095541401273884, "percentage": 50.21, "elapsed_time": "0:04:11", "remaining_time": "0:04:09"}
{"current_steps": 120, "total_steps": 237, "loss": 0.5784306526184082, "lr": 5.844021847899735e-06, "epoch": 1.5222929936305731, "percentage": 50.63, "elapsed_time": "0:04:13", "remaining_time": "0:04:07"}
{"current_steps": 121, "total_steps": 237, "loss": 0.5239511132240295, "lr": 5.771244664826512e-06, "epoch": 1.5350318471337578, "percentage": 51.05, "elapsed_time": "0:04:15", "remaining_time": "0:04:05"}
{"current_steps": 122, "total_steps": 237, "loss": 0.6610599756240845, "lr": 5.698299707684031e-06, "epoch": 1.5477707006369426, "percentage": 51.48, "elapsed_time": "0:04:17", "remaining_time": "0:04:02"}
{"current_steps": 123, "total_steps": 237, "loss": 0.6224923133850098, "lr": 5.6252028446822805e-06, "epoch": 1.5605095541401273, "percentage": 51.9, "elapsed_time": "0:04:19", "remaining_time": "0:04:00"}
{"current_steps": 124, "total_steps": 237, "loss": 0.608780026435852, "lr": 5.55196997707635e-06, "epoch": 1.573248407643312, "percentage": 52.32, "elapsed_time": "0:04:21", "remaining_time": "0:03:58"}
{"current_steps": 125, "total_steps": 237, "loss": 0.5765759944915771, "lr": 5.478617035707337e-06, "epoch": 1.5859872611464967, "percentage": 52.74, "elapsed_time": "0:04:23", "remaining_time": "0:03:56"}
{"current_steps": 126, "total_steps": 237, "loss": 0.5430376529693604, "lr": 5.4051599775368e-06, "epoch": 1.5987261146496814, "percentage": 53.16, "elapsed_time": "0:04:26", "remaining_time": "0:03:54"}
{"current_steps": 127, "total_steps": 237, "loss": 0.6367968916893005, "lr": 5.33161478217552e-06, "epoch": 1.611464968152866, "percentage": 53.59, "elapsed_time": "0:04:28", "remaining_time": "0:03:52"}
{"current_steps": 128, "total_steps": 237, "loss": 0.6323709487915039, "lr": 5.257997448407366e-06, "epoch": 1.6242038216560508, "percentage": 54.01, "elapsed_time": "0:04:30", "remaining_time": "0:03:50"}
{"current_steps": 129, "total_steps": 237, "loss": 0.5036509037017822, "lr": 5.184323990708959e-06, "epoch": 1.6369426751592355, "percentage": 54.43, "elapsed_time": "0:04:32", "remaining_time": "0:03:48"}
{"current_steps": 130, "total_steps": 237, "loss": 0.6375566124916077, "lr": 5.110610435765935e-06, "epoch": 1.6496815286624202, "percentage": 54.85, "elapsed_time": "0:04:34", "remaining_time": "0:03:46"}
{"current_steps": 131, "total_steps": 237, "loss": 0.4993886351585388, "lr": 5.0368728189865624e-06, "epoch": 1.662420382165605, "percentage": 55.27, "elapsed_time": "0:04:36", "remaining_time": "0:03:44"}
{"current_steps": 132, "total_steps": 237, "loss": 0.5915786623954773, "lr": 4.9631271810134375e-06, "epoch": 1.6751592356687897, "percentage": 55.7, "elapsed_time": "0:04:39", "remaining_time": "0:03:41"}
{"current_steps": 133, "total_steps": 237, "loss": 0.4808322489261627, "lr": 4.8893895642340665e-06, "epoch": 1.6878980891719744, "percentage": 56.12, "elapsed_time": "0:04:41", "remaining_time": "0:03:39"}
{"current_steps": 134, "total_steps": 237, "loss": 0.6569564342498779, "lr": 4.815676009291044e-06, "epoch": 1.700636942675159, "percentage": 56.54, "elapsed_time": "0:04:43", "remaining_time": "0:03:37"}
{"current_steps": 135, "total_steps": 237, "loss": 0.5696136951446533, "lr": 4.742002551592635e-06, "epoch": 1.7133757961783438, "percentage": 56.96, "elapsed_time": "0:04:45", "remaining_time": "0:03:35"}
{"current_steps": 136, "total_steps": 237, "loss": 0.5490207672119141, "lr": 4.668385217824482e-06, "epoch": 1.7261146496815285, "percentage": 57.38, "elapsed_time": "0:04:47", "remaining_time": "0:03:33"}
{"current_steps": 137, "total_steps": 237, "loss": 0.6286443471908569, "lr": 4.594840022463201e-06, "epoch": 1.7388535031847132, "percentage": 57.81, "elapsed_time": "0:04:49", "remaining_time": "0:03:31"}
{"current_steps": 138, "total_steps": 237, "loss": 0.5054807066917419, "lr": 4.5213829642926635e-06, "epoch": 1.7515923566878981, "percentage": 58.23, "elapsed_time": "0:04:51", "remaining_time": "0:03:29"}
{"current_steps": 139, "total_steps": 237, "loss": 0.6229608058929443, "lr": 4.4480300229236525e-06, "epoch": 1.7643312101910829, "percentage": 58.65, "elapsed_time": "0:04:53", "remaining_time": "0:03:27"}
{"current_steps": 140, "total_steps": 237, "loss": 0.5688886642456055, "lr": 4.374797155317721e-06, "epoch": 1.7770700636942676, "percentage": 59.07, "elapsed_time": "0:04:55", "remaining_time": "0:03:24"}
{"current_steps": 141, "total_steps": 237, "loss": 0.570158839225769, "lr": 4.30170029231597e-06, "epoch": 1.7898089171974523, "percentage": 59.49, "elapsed_time": "0:04:57", "remaining_time": "0:03:22"}
{"current_steps": 142, "total_steps": 237, "loss": 0.5360066890716553, "lr": 4.228755335173488e-06, "epoch": 1.802547770700637, "percentage": 59.92, "elapsed_time": "0:04:59", "remaining_time": "0:03:20"}
{"current_steps": 143, "total_steps": 237, "loss": 0.5978993773460388, "lr": 4.155978152100266e-06, "epoch": 1.8152866242038217, "percentage": 60.34, "elapsed_time": "0:05:02", "remaining_time": "0:03:18"}
{"current_steps": 144, "total_steps": 237, "loss": 0.654821515083313, "lr": 4.0833845748092586e-06, "epoch": 1.8280254777070064, "percentage": 60.76, "elapsed_time": "0:05:04", "remaining_time": "0:03:16"}
{"current_steps": 145, "total_steps": 237, "loss": 0.5607833862304688, "lr": 4.010990395072414e-06, "epoch": 1.8407643312101911, "percentage": 61.18, "elapsed_time": "0:05:06", "remaining_time": "0:03:14"}
{"current_steps": 146, "total_steps": 237, "loss": 0.5751765966415405, "lr": 3.938811361285386e-06, "epoch": 1.8535031847133758, "percentage": 61.6, "elapsed_time": "0:05:08", "remaining_time": "0:03:12"}
{"current_steps": 147, "total_steps": 237, "loss": 0.7263962626457214, "lr": 3.866863175041666e-06, "epoch": 1.8662420382165605, "percentage": 62.03, "elapsed_time": "0:05:10", "remaining_time": "0:03:10"}
{"current_steps": 148, "total_steps": 237, "loss": 0.65910804271698, "lr": 3.7951614877169285e-06, "epoch": 1.8789808917197452, "percentage": 62.45, "elapsed_time": "0:05:12", "remaining_time": "0:03:07"}
{"current_steps": 149, "total_steps": 237, "loss": 0.5089215636253357, "lr": 3.7237218970642624e-06, "epoch": 1.89171974522293, "percentage": 62.87, "elapsed_time": "0:05:14", "remaining_time": "0:03:05"}
{"current_steps": 150, "total_steps": 237, "loss": 0.5584551095962524, "lr": 3.6525599438210956e-06, "epoch": 1.9044585987261147, "percentage": 63.29, "elapsed_time": "0:05:16", "remaining_time": "0:03:03"}
{"current_steps": 151, "total_steps": 237, "loss": 0.6027523875236511, "lr": 3.5816911083285165e-06, "epoch": 1.9171974522292994, "percentage": 63.71, "elapsed_time": "0:05:18", "remaining_time": "0:03:01"}
{"current_steps": 152, "total_steps": 237, "loss": 0.47968733310699463, "lr": 3.511130807163724e-06, "epoch": 1.929936305732484, "percentage": 64.14, "elapsed_time": "0:05:20", "remaining_time": "0:02:59"}
{"current_steps": 153, "total_steps": 237, "loss": 0.558478832244873, "lr": 3.440894389786352e-06, "epoch": 1.9426751592356688, "percentage": 64.56, "elapsed_time": "0:05:22", "remaining_time": "0:02:57"}
{"current_steps": 154, "total_steps": 237, "loss": 0.6799242496490479, "lr": 3.370997135199413e-06, "epoch": 1.9554140127388535, "percentage": 64.98, "elapsed_time": "0:05:25", "remaining_time": "0:02:55"}
{"current_steps": 155, "total_steps": 237, "loss": 0.6166017055511475, "lr": 3.3014542486255365e-06, "epoch": 1.9681528662420382, "percentage": 65.4, "elapsed_time": "0:05:27", "remaining_time": "0:02:53"}
{"current_steps": 156, "total_steps": 237, "loss": 0.5956500172615051, "lr": 3.2322808581992825e-06, "epoch": 1.980891719745223, "percentage": 65.82, "elapsed_time": "0:05:29", "remaining_time": "0:02:50"}
{"current_steps": 157, "total_steps": 237, "loss": 0.4991570711135864, "lr": 3.1634920116762175e-06, "epoch": 1.9936305732484076, "percentage": 66.24, "elapsed_time": "0:05:31", "remaining_time": "0:02:48"}
{"current_steps": 158, "total_steps": 237, "loss": 0.39665284752845764, "lr": 3.0951026731594634e-06, "epoch": 2.0, "percentage": 66.67, "elapsed_time": "0:05:32", "remaining_time": "0:02:46"}
{"current_steps": 159, "total_steps": 237, "loss": 0.1434047520160675, "lr": 3.0271277198444737e-06, "epoch": 2.0127388535031847, "percentage": 67.09, "elapsed_time": "0:05:34", "remaining_time": "0:02:44"}
{"current_steps": 160, "total_steps": 237, "loss": 0.1640336662530899, "lr": 2.9595819387826753e-06, "epoch": 2.0254777070063694, "percentage": 67.51, "elapsed_time": "0:05:36", "remaining_time": "0:02:41"}
{"current_steps": 161, "total_steps": 237, "loss": 0.14122995734214783, "lr": 2.89248002366476e-06, "epoch": 2.038216560509554, "percentage": 67.93, "elapsed_time": "0:05:38", "remaining_time": "0:02:39"}
{"current_steps": 162, "total_steps": 237, "loss": 0.18970023095607758, "lr": 2.8258365716242543e-06, "epoch": 2.050955414012739, "percentage": 68.35, "elapsed_time": "0:05:40", "remaining_time": "0:02:37"}
{"current_steps": 163, "total_steps": 237, "loss": 0.2818552851676941, "lr": 2.7596660800621076e-06, "epoch": 2.0636942675159236, "percentage": 68.78, "elapsed_time": "0:05:42", "remaining_time": "0:02:35"}
{"current_steps": 164, "total_steps": 237, "loss": 0.17025958001613617, "lr": 2.6939829434929834e-06, "epoch": 2.0764331210191083, "percentage": 69.2, "elapsed_time": "0:05:44", "remaining_time": "0:02:33"}
{"current_steps": 165, "total_steps": 237, "loss": 0.1915724128484726, "lr": 2.6288014504139104e-06, "epoch": 2.089171974522293, "percentage": 69.62, "elapsed_time": "0:05:47", "remaining_time": "0:02:31"}
{"current_steps": 166, "total_steps": 237, "loss": 0.12392200529575348, "lr": 2.5641357801960186e-06, "epoch": 2.1019108280254777, "percentage": 70.04, "elapsed_time": "0:05:49", "remaining_time": "0:02:29"}
{"current_steps": 167, "total_steps": 237, "loss": 0.15184086561203003, "lr": 2.5000000000000015e-06, "epoch": 2.1146496815286624, "percentage": 70.46, "elapsed_time": "0:05:51", "remaining_time": "0:02:27"}
{"current_steps": 168, "total_steps": 237, "loss": 0.11359566450119019, "lr": 2.4364080617159885e-06, "epoch": 2.127388535031847, "percentage": 70.89, "elapsed_time": "0:05:53", "remaining_time": "0:02:25"}
{"current_steps": 169, "total_steps": 237, "loss": 0.16131919622421265, "lr": 2.373373798928507e-06, "epoch": 2.140127388535032, "percentage": 71.31, "elapsed_time": "0:05:55", "remaining_time": "0:02:23"}
{"current_steps": 170, "total_steps": 237, "loss": 0.17047160863876343, "lr": 2.310910923907149e-06, "epoch": 2.1528662420382165, "percentage": 71.73, "elapsed_time": "0:05:57", "remaining_time": "0:02:20"}
{"current_steps": 171, "total_steps": 237, "loss": 0.15950073301792145, "lr": 2.249033024623672e-06, "epoch": 2.1656050955414012, "percentage": 72.15, "elapsed_time": "0:05:59", "remaining_time": "0:02:18"}
{"current_steps": 172, "total_steps": 237, "loss": 0.1451823115348816, "lr": 2.187753561796097e-06, "epoch": 2.178343949044586, "percentage": 72.57, "elapsed_time": "0:06:01", "remaining_time": "0:02:16"}
{"current_steps": 173, "total_steps": 237, "loss": 0.14787515997886658, "lr": 2.127085865960516e-06, "epoch": 2.1910828025477707, "percentage": 73.0, "elapsed_time": "0:06:03", "remaining_time": "0:02:14"}
{"current_steps": 174, "total_steps": 237, "loss": 0.1409212052822113, "lr": 2.0670431345712092e-06, "epoch": 2.2038216560509554, "percentage": 73.42, "elapsed_time": "0:06:05", "remaining_time": "0:02:12"}
{"current_steps": 175, "total_steps": 237, "loss": 0.13373146951198578, "lr": 2.0076384291297134e-06, "epoch": 2.21656050955414, "percentage": 73.84, "elapsed_time": "0:06:08", "remaining_time": "0:02:10"}
{"current_steps": 176, "total_steps": 237, "loss": 0.13855525851249695, "lr": 1.9488846723434646e-06, "epoch": 2.229299363057325, "percentage": 74.26, "elapsed_time": "0:06:10", "remaining_time": "0:02:08"}
{"current_steps": 177, "total_steps": 237, "loss": 0.13551361858844757, "lr": 1.890794645314633e-06, "epoch": 2.2420382165605095, "percentage": 74.68, "elapsed_time": "0:06:12", "remaining_time": "0:02:06"}
{"current_steps": 178, "total_steps": 237, "loss": 0.15751245617866516, "lr": 1.8333809847597644e-06, "epoch": 2.254777070063694, "percentage": 75.11, "elapsed_time": "0:06:14", "remaining_time": "0:02:04"}
{"current_steps": 179, "total_steps": 237, "loss": 0.16300000250339508, "lr": 1.7766561802608374e-06, "epoch": 2.267515923566879, "percentage": 75.53, "elapsed_time": "0:06:16", "remaining_time": "0:02:02"}
{"current_steps": 180, "total_steps": 237, "loss": 0.12554647028446198, "lr": 1.7206325715483003e-06, "epoch": 2.2802547770700636, "percentage": 75.95, "elapsed_time": "0:06:18", "remaining_time": "0:01:59"}
{"current_steps": 181, "total_steps": 237, "loss": 0.11852402985095978, "lr": 1.665322345816746e-06, "epoch": 2.2929936305732483, "percentage": 76.37, "elapsed_time": "0:06:20", "remaining_time": "0:01:57"}
{"current_steps": 182, "total_steps": 237, "loss": 0.11313524842262268, "lr": 1.6107375350737437e-06, "epoch": 2.305732484076433, "percentage": 76.79, "elapsed_time": "0:06:22", "remaining_time": "0:01:55"}
{"current_steps": 183, "total_steps": 237, "loss": 0.1032472476363182, "lr": 1.556890013522428e-06, "epoch": 2.3184713375796178, "percentage": 77.22, "elapsed_time": "0:06:24", "remaining_time": "0:01:53"}
{"current_steps": 184, "total_steps": 237, "loss": 0.15453463792800903, "lr": 1.50379149497843e-06, "epoch": 2.3312101910828025, "percentage": 77.64, "elapsed_time": "0:06:27", "remaining_time": "0:01:51"}
{"current_steps": 185, "total_steps": 237, "loss": 0.10700733959674835, "lr": 1.4514535303216893e-06, "epoch": 2.343949044585987, "percentage": 78.06, "elapsed_time": "0:06:29", "remaining_time": "0:01:49"}
{"current_steps": 186, "total_steps": 237, "loss": 0.10318736732006073, "lr": 1.3998875049837141e-06, "epoch": 2.356687898089172, "percentage": 78.48, "elapsed_time": "0:06:31", "remaining_time": "0:01:47"}
{"current_steps": 187, "total_steps": 237, "loss": 0.16100738942623138, "lr": 1.3491046364708294e-06, "epoch": 2.3694267515923566, "percentage": 78.9, "elapsed_time": "0:06:33", "remaining_time": "0:01:45"}
{"current_steps": 188, "total_steps": 237, "loss": 0.13357210159301758, "lr": 1.2991159719239581e-06, "epoch": 2.3821656050955413, "percentage": 79.32, "elapsed_time": "0:06:35", "remaining_time": "0:01:43"}
{"current_steps": 189, "total_steps": 237, "loss": 0.14397501945495605, "lr": 1.249932385715467e-06, "epoch": 2.394904458598726, "percentage": 79.75, "elapsed_time": "0:06:37", "remaining_time": "0:01:40"}
{"current_steps": 190, "total_steps": 237, "loss": 0.10784805566072464, "lr": 1.2015645770835765e-06, "epoch": 2.4076433121019107, "percentage": 80.17, "elapsed_time": "0:06:39", "remaining_time": "0:01:38"}
{"current_steps": 191, "total_steps": 237, "loss": 0.12623994052410126, "lr": 1.1540230678048969e-06, "epoch": 2.4203821656050954, "percentage": 80.59, "elapsed_time": "0:06:41", "remaining_time": "0:01:36"}
{"current_steps": 192, "total_steps": 237, "loss": 0.1347275674343109, "lr": 1.1073181999055538e-06, "epoch": 2.43312101910828, "percentage": 81.01, "elapsed_time": "0:06:43", "remaining_time": "0:01:34"}
{"current_steps": 193, "total_steps": 237, "loss": 0.156468003988266, "lr": 1.0614601334114099e-06, "epoch": 2.445859872611465, "percentage": 81.43, "elapsed_time": "0:06:45", "remaining_time": "0:01:32"}
{"current_steps": 194, "total_steps": 237, "loss": 0.10251176357269287, "lr": 1.016458844137887e-06, "epoch": 2.4585987261146496, "percentage": 81.86, "elapsed_time": "0:06:47", "remaining_time": "0:01:30"}
{"current_steps": 195, "total_steps": 237, "loss": 0.10291886329650879, "lr": 9.723241215198692e-07, "epoch": 2.4713375796178343, "percentage": 82.28, "elapsed_time": "0:06:49", "remaining_time": "0:01:28"}
{"current_steps": 196, "total_steps": 237, "loss": 0.11970998346805573, "lr": 9.290655664821296e-07, "epoch": 2.484076433121019, "percentage": 82.7, "elapsed_time": "0:06:52", "remaining_time": "0:01:26"}
{"current_steps": 197, "total_steps": 237, "loss": 0.15029165148735046, "lr": 8.866925893507805e-07, "epoch": 2.4968152866242037, "percentage": 83.12, "elapsed_time": "0:06:54", "remaining_time": "0:01:24"}
{"current_steps": 198, "total_steps": 237, "loss": 0.13055002689361572, "lr": 8.45214407806182e-07, "epoch": 2.5095541401273884, "percentage": 83.54, "elapsed_time": "0:06:56", "remaining_time": "0:01:21"}
{"current_steps": 199, "total_steps": 237, "loss": 0.13153906166553497, "lr": 8.046400448777575e-07, "epoch": 2.522292993630573, "percentage": 83.97, "elapsed_time": "0:06:58", "remaining_time": "0:01:19"}
{"current_steps": 200, "total_steps": 237, "loss": 0.11118735373020172, "lr": 7.649783269811523e-07, "epoch": 2.535031847133758, "percentage": 84.39, "elapsed_time": "0:07:00", "remaining_time": "0:01:17"}
{"current_steps": 201, "total_steps": 237, "loss": 0.1302662193775177, "lr": 7.26237881998163e-07, "epoch": 2.5477707006369426, "percentage": 84.81, "elapsed_time": "0:07:02", "remaining_time": "0:01:15"}
{"current_steps": 202, "total_steps": 237, "loss": 0.12127632647752762, "lr": 6.884271373998608e-07, "epoch": 2.5605095541401273, "percentage": 85.23, "elapsed_time": "0:07:04", "remaining_time": "0:01:13"}
{"current_steps": 203, "total_steps": 237, "loss": 0.11241280287504196, "lr": 6.515543184133e-07, "epoch": 2.573248407643312, "percentage": 85.65, "elapsed_time": "0:07:06", "remaining_time": "0:01:11"}
{"current_steps": 204, "total_steps": 237, "loss": 0.14668843150138855, "lr": 6.156274462322292e-07, "epoch": 2.5859872611464967, "percentage": 86.08, "elapsed_time": "0:07:08", "remaining_time": "0:01:09"}
{"current_steps": 205, "total_steps": 237, "loss": 0.11415961384773254, "lr": 5.806543362721945e-07, "epoch": 2.5987261146496814, "percentage": 86.5, "elapsed_time": "0:07:10", "remaining_time": "0:01:07"}
{"current_steps": 206, "total_steps": 237, "loss": 0.10757691413164139, "lr": 5.466425964703914e-07, "epoch": 2.611464968152866, "percentage": 86.92, "elapsed_time": "0:07:12", "remaining_time": "0:01:05"}
{"current_steps": 207, "total_steps": 237, "loss": 0.11599895358085632, "lr": 5.135996256306619e-07, "epoch": 2.624203821656051, "percentage": 87.34, "elapsed_time": "0:07:14", "remaining_time": "0:01:03"}
{"current_steps": 208, "total_steps": 237, "loss": 0.23778586089611053, "lr": 4.815326118139813e-07, "epoch": 2.6369426751592355, "percentage": 87.76, "elapsed_time": "0:07:16", "remaining_time": "0:01:00"}
{"current_steps": 209, "total_steps": 237, "loss": 0.09149561077356339, "lr": 4.5044853077479134e-07, "epoch": 2.6496815286624202, "percentage": 88.19, "elapsed_time": "0:07:19", "remaining_time": "0:00:58"}
{"current_steps": 210, "total_steps": 237, "loss": 0.08711031079292297, "lr": 4.203541444435211e-07, "epoch": 2.662420382165605, "percentage": 88.61, "elapsed_time": "0:07:21", "remaining_time": "0:00:56"}
{"current_steps": 211, "total_steps": 237, "loss": 0.1195639818906784, "lr": 3.9125599945560866e-07, "epoch": 2.6751592356687897, "percentage": 89.03, "elapsed_time": "0:07:23", "remaining_time": "0:00:54"}
{"current_steps": 212, "total_steps": 237, "loss": 0.13059048354625702, "lr": 3.631604257273774e-07, "epoch": 2.6878980891719744, "percentage": 89.45, "elapsed_time": "0:07:25", "remaining_time": "0:00:52"}
{"current_steps": 213, "total_steps": 237, "loss": 0.14207160472869873, "lr": 3.360735350790428e-07, "epoch": 2.700636942675159, "percentage": 89.87, "elapsed_time": "0:07:27", "remaining_time": "0:00:50"}
{"current_steps": 214, "total_steps": 237, "loss": 0.13172012567520142, "lr": 3.100012199051627e-07, "epoch": 2.713375796178344, "percentage": 90.3, "elapsed_time": "0:07:29", "remaining_time": "0:00:48"}
{"current_steps": 215, "total_steps": 237, "loss": 0.13604804873466492, "lr": 2.8494915189283325e-07, "epoch": 2.7261146496815285, "percentage": 90.72, "elapsed_time": "0:07:31", "remaining_time": "0:00:46"}
{"current_steps": 216, "total_steps": 237, "loss": 0.1447272002696991, "lr": 2.6092278078788004e-07, "epoch": 2.738853503184713, "percentage": 91.14, "elapsed_time": "0:07:33", "remaining_time": "0:00:44"}
{"current_steps": 217, "total_steps": 237, "loss": 0.16000479459762573, "lr": 2.3792733320934348e-07, "epoch": 2.7515923566878984, "percentage": 91.56, "elapsed_time": "0:07:35", "remaining_time": "0:00:42"}
{"current_steps": 218, "total_steps": 237, "loss": 0.15536722540855408, "lr": 2.1596781151249524e-07, "epoch": 2.7643312101910826, "percentage": 91.98, "elapsed_time": "0:07:37", "remaining_time": "0:00:39"}
{"current_steps": 219, "total_steps": 237, "loss": 0.11038240790367126, "lr": 1.9504899270064105e-07, "epoch": 2.777070063694268, "percentage": 92.41, "elapsed_time": "0:07:39", "remaining_time": "0:00:37"}
{"current_steps": 220, "total_steps": 237, "loss": 0.1188005730509758, "lr": 1.7517542738595071e-07, "epoch": 2.789808917197452, "percentage": 92.83, "elapsed_time": "0:07:42", "remaining_time": "0:00:35"}
{"current_steps": 221, "total_steps": 237, "loss": 0.12404206395149231, "lr": 1.5635143879952575e-07, "epoch": 2.802547770700637, "percentage": 93.25, "elapsed_time": "0:07:44", "remaining_time": "0:00:33"}
{"current_steps": 222, "total_steps": 237, "loss": 0.1163729578256607, "lr": 1.3858112185094418e-07, "epoch": 2.8152866242038215, "percentage": 93.67, "elapsed_time": "0:07:46", "remaining_time": "0:00:31"}
{"current_steps": 223, "total_steps": 237, "loss": 0.12924784421920776, "lr": 1.2186834223746612e-07, "epoch": 2.8280254777070066, "percentage": 94.09, "elapsed_time": "0:07:48", "remaining_time": "0:00:29"}
{"current_steps": 224, "total_steps": 237, "loss": 0.12632641196250916, "lr": 1.0621673560309798e-07, "epoch": 2.840764331210191, "percentage": 94.51, "elapsed_time": "0:07:50", "remaining_time": "0:00:27"}
{"current_steps": 225, "total_steps": 237, "loss": 0.10880691558122635, "lr": 9.162970674771177e-08, "epoch": 2.853503184713376, "percentage": 94.94, "elapsed_time": "0:07:52", "remaining_time": "0:00:25"}
{"current_steps": 226, "total_steps": 237, "loss": 0.10457868129014969, "lr": 7.81104288863721e-08, "epoch": 2.8662420382165603, "percentage": 95.36, "elapsed_time": "0:07:54", "remaining_time": "0:00:23"}
{"current_steps": 227, "total_steps": 237, "loss": 0.13866788148880005, "lr": 6.566184295904777e-08, "epoch": 2.8789808917197455, "percentage": 95.78, "elapsed_time": "0:07:56", "remaining_time": "0:00:21"}
{"current_steps": 228, "total_steps": 237, "loss": 0.14570261538028717, "lr": 5.4286656990847897e-08, "epoch": 2.8917197452229297, "percentage": 96.2, "elapsed_time": "0:07:58", "remaining_time": "0:00:18"}
{"current_steps": 229, "total_steps": 237, "loss": 0.1143546923995018, "lr": 4.398734550292716e-08, "epoch": 2.904458598726115, "percentage": 96.62, "elapsed_time": "0:08:00", "remaining_time": "0:00:16"}
{"current_steps": 230, "total_steps": 237, "loss": 0.10592009872198105, "lr": 3.476614897418573e-08, "epoch": 2.917197452229299, "percentage": 97.05, "elapsed_time": "0:08:03", "remaining_time": "0:00:14"}
{"current_steps": 231, "total_steps": 237, "loss": 0.18025150895118713, "lr": 2.6625073353884756e-08, "epoch": 2.9299363057324843, "percentage": 97.47, "elapsed_time": "0:08:05", "remaining_time": "0:00:12"}
{"current_steps": 232, "total_steps": 237, "loss": 0.11485205590724945, "lr": 1.9565889625275945e-08, "epoch": 2.9426751592356686, "percentage": 97.89, "elapsed_time": "0:08:07", "remaining_time": "0:00:10"}
{"current_steps": 233, "total_steps": 237, "loss": 0.11141972988843918, "lr": 1.3590133420350315e-08, "epoch": 2.9554140127388537, "percentage": 98.31, "elapsed_time": "0:08:09", "remaining_time": "0:00:08"}
{"current_steps": 234, "total_steps": 237, "loss": 0.1508815437555313, "lr": 8.699104685779835e-09, "epoch": 2.968152866242038, "percentage": 98.73, "elapsed_time": "0:08:12", "remaining_time": "0:00:06"}
{"current_steps": 235, "total_steps": 237, "loss": 0.1304333209991455, "lr": 4.89386740013198e-09, "epoch": 2.980891719745223, "percentage": 99.16, "elapsed_time": "0:08:14", "remaining_time": "0:00:04"}
{"current_steps": 236, "total_steps": 237, "loss": 0.13910380005836487, "lr": 2.1752493424148647e-09, "epoch": 2.9936305732484074, "percentage": 99.58, "elapsed_time": "0:08:16", "remaining_time": "0:00:02"}
{"current_steps": 237, "total_steps": 237, "loss": 0.06192271411418915, "lr": 5.438419120062933e-10, "epoch": 3.0, "percentage": 100.0, "elapsed_time": "0:08:17", "remaining_time": "0:00:00"}
{"current_steps": 237, "total_steps": 237, "epoch": 3.0, "percentage": 100.0, "elapsed_time": "0:09:32", "remaining_time": "0:00:00"}

3
training_args.bin Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8e9e2dc0ef4467b9d2ba29e40cf501643c2786ca084962a378449f6b64bbf796
size 6968