初始化项目,由ModelHub XC社区提供模型

Model: DCAgent/FourDatasetMixQwen3_8B
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-13 04:30:37 +08:00
commit 2a5b291815
22 changed files with 159959 additions and 0 deletions

36
.gitattributes vendored Normal file
View File

@@ -0,0 +1,36 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text

61
README.md Normal file
View File

@@ -0,0 +1,61 @@
---
library_name: transformers
license: other
base_model: Qwen/Qwen3-8B
tags:
- llama-factory
- full
- generated_from_trainer
model-index:
- name: g1_weighted_31600_cap10_8b__Qwen3-8B
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# g1_weighted_31600_cap10_8b__Qwen3-8B
This model is a fine-tuned version of [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) on the otagents_10k dataset.
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 4e-05
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- total_eval_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.98) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 5.0
### Training results
### Framework versions
- Transformers 4.57.6
- Pytorch 2.9.1+cu128
- Datasets 3.2.0
- Tokenizers 0.22.2

28
added_tokens.json Normal file
View File

@@ -0,0 +1,28 @@
{
"</think>": 151668,
"</tool_call>": 151658,
"</tool_response>": 151666,
"<think>": 151667,
"<tool_call>": 151657,
"<tool_response>": 151665,
"<|box_end|>": 151649,
"<|box_start|>": 151648,
"<|endoftext|>": 151643,
"<|file_sep|>": 151664,
"<|fim_middle|>": 151660,
"<|fim_pad|>": 151662,
"<|fim_prefix|>": 151659,
"<|fim_suffix|>": 151661,
"<|im_end|>": 151645,
"<|im_start|>": 151644,
"<|image_pad|>": 151655,
"<|object_ref_end|>": 151647,
"<|object_ref_start|>": 151646,
"<|quad_end|>": 151651,
"<|quad_start|>": 151650,
"<|repo_name|>": 151663,
"<|video_pad|>": 151656,
"<|vision_end|>": 151653,
"<|vision_pad|>": 151654,
"<|vision_start|>": 151652
}

16
all_results.json Normal file
View File

@@ -0,0 +1,16 @@
{
"achieved_tflops_per_gpu": 4.132335436646405,
"achieved_tflops_per_gpu_theoretical": 111.80085057316455,
"epoch": 5.0,
"loss_nan_ranks": 0,
"loss_rank_avg": 0.01754429191350937,
"mfu_percent": 1.324466486104617,
"mfu_percent_theoretical": 35.833605952937354,
"total_flos": 1.586502852660953e+18,
"train_loss": 0.19733358554124833,
"train_runtime": 95981.0062,
"train_samples_per_second": 0.52,
"train_steps_per_second": 0.033,
"valid_targets_mean": 6467.2,
"valid_targets_min": 3371
}

89
chat_template.jinja Normal file
View File

@@ -0,0 +1,89 @@
{%- if tools %}
{{- '<|im_start|>system\n' }}
{%- if messages[0].role == 'system' %}
{{- messages[0].content + '\n\n' }}
{%- endif %}
{{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
{%- for tool in tools %}
{{- "\n" }}
{{- tool | tojson }}
{%- endfor %}
{{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
{%- else %}
{%- if messages[0].role == 'system' %}
{{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
{%- endif %}
{%- endif %}
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
{%- for message in messages[::-1] %}
{%- set index = (messages|length - 1) - loop.index0 %}
{%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
{%- set ns.multi_step_tool = false %}
{%- set ns.last_query_index = index %}
{%- endif %}
{%- endfor %}
{%- for message in messages %}
{%- if message.content is string %}
{%- set content = message.content %}
{%- else %}
{%- set content = '' %}
{%- endif %}
{%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
{{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
{%- elif message.role == "assistant" %}
{%- set reasoning_content = '' %}
{%- if message.reasoning_content is string %}
{%- set reasoning_content = message.reasoning_content %}
{%- else %}
{%- if '</think>' in content %}
{%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
{%- set content = content.split('</think>')[-1].lstrip('\n') %}
{%- endif %}
{%- endif %}
{%- if loop.index0 > ns.last_query_index %}
{%- if loop.last or (not loop.last and reasoning_content) %}
{{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
{%- else %}
{{- '<|im_start|>' + message.role + '\n' + content }}
{%- endif %}
{%- else %}
{{- '<|im_start|>' + message.role + '\n' + content }}
{%- endif %}
{%- if message.tool_calls %}
{%- for tool_call in message.tool_calls %}
{%- if (loop.first and content) or (not loop.first) %}
{{- '\n' }}
{%- endif %}
{%- if tool_call.function %}
{%- set tool_call = tool_call.function %}
{%- endif %}
{{- '<tool_call>\n{"name": "' }}
{{- tool_call.name }}
{{- '", "arguments": ' }}
{%- if tool_call.arguments is string %}
{{- tool_call.arguments }}
{%- else %}
{{- tool_call.arguments | tojson }}
{%- endif %}
{{- '}\n</tool_call>' }}
{%- endfor %}
{%- endif %}
{{- '<|im_end|>\n' }}
{%- elif message.role == "tool" %}
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
{{- '<|im_start|>user' }}
{%- endif %}
{{- '\n<tool_response>\n' }}
{{- content }}
{{- '\n</tool_response>' }}
{%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
{{- '<|im_end|>\n' }}
{%- endif %}
{%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
{{- '<|im_start|>assistant\n' }}
{%- if enable_thinking is defined and enable_thinking is false %}
{{- '<think>\n\n</think>\n\n' }}
{%- endif %}
{%- endif %}

68
config.json Normal file
View File

@@ -0,0 +1,68 @@
{
"architectures": [
"Qwen3ForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"dtype": "bfloat16",
"eos_token_id": 151645,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 12288,
"layer_types": [
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention"
],
"max_position_embeddings": 40960,
"max_window_layers": 36,
"model_type": "qwen3",
"num_attention_heads": 32,
"num_hidden_layers": 36,
"num_key_value_heads": 8,
"pad_token_id": 151643,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000,
"sliding_window": null,
"tie_word_embeddings": false,
"transformers_version": "4.57.6",
"use_cache": false,
"use_sliding_window": false,
"vocab_size": 151936
}

12
generation_config.json Normal file
View File

@@ -0,0 +1,12 @@
{
"do_sample": true,
"eos_token_id": [
151645,
151643
],
"pad_token_id": 151643,
"temperature": 0.6,
"top_k": 20,
"top_p": 0.95,
"transformers_version": "4.57.6"
}

151388
merges.txt Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1856cf5f32cdd0b0bfcc35af8856bb497eb9d745568075f9fb865d924c9b369f
size 4902257696

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a6d0350e819451b98b9c45957010e49f58500c868fbe6882cd6040c04e9d1cf5
size 4915960368

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d47a9cd2b520c4f9831518cdabc8e69ab623e12f7600abe3fde4753647a24d91
size 4983068496

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0123cda50119c73e05117b4b7eb108f021273f1fc37a703f35a85e368b917160
size 1580230264

View File

@@ -0,0 +1,407 @@
{
"metadata": {
"total_parameters": 308224,
"total_size": 16381470720
},
"weight_map": {
"lm_head.weight": "model-00004-of-00004.safetensors",
"model.embed_tokens.weight": "model-00001-of-00004.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.20.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.21.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.21.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.21.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.22.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.30.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.32.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.32.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.32.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.32.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.32.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.32.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.32.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.32.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.32.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.32.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.32.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.33.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.33.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.33.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.33.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.33.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.33.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.33.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.33.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.33.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.33.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.33.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.34.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.34.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.34.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.34.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.34.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.34.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.34.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.34.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.34.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.34.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.34.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.35.input_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.35.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.35.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.35.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.35.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.35.self_attn.k_norm.weight": "model-00004-of-00004.safetensors",
"model.layers.35.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.35.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.35.self_attn.q_norm.weight": "model-00004-of-00004.safetensors",
"model.layers.35.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.35.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.9.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.norm.weight": "model-00004-of-00004.safetensors"
}
}

31
special_tokens_map.json Normal file
View File

@@ -0,0 +1,31 @@
{
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"eos_token": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

BIN
tokenizer.json (Stored with Git LFS) Normal file

Binary file not shown.

240
tokenizer_config.json Normal file
View File

@@ -0,0 +1,240 @@
{
"add_bos_token": false,
"add_prefix_space": false,
"added_tokens_decoder": {
"151643": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151644": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151645": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151646": {
"content": "<|object_ref_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151647": {
"content": "<|object_ref_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151648": {
"content": "<|box_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151649": {
"content": "<|box_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151650": {
"content": "<|quad_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151651": {
"content": "<|quad_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151652": {
"content": "<|vision_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151653": {
"content": "<|vision_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151654": {
"content": "<|vision_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151655": {
"content": "<|image_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151656": {
"content": "<|video_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151657": {
"content": "<tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151658": {
"content": "</tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151659": {
"content": "<|fim_prefix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151660": {
"content": "<|fim_middle|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151661": {
"content": "<|fim_suffix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151662": {
"content": "<|fim_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151663": {
"content": "<|repo_name|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151664": {
"content": "<|file_sep|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151665": {
"content": "<tool_response>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151666": {
"content": "</tool_response>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151667": {
"content": "<think>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151668": {
"content": "</think>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
}
},
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"bos_token": null,
"clean_up_tokenization_spaces": false,
"eos_token": "<|im_end|>",
"errors": "replace",
"extra_special_tokens": {},
"model_max_length": 32768,
"pad_token": "<|endoftext|>",
"padding_side": "right",
"split_special_tokens": false,
"tokenizer_class": "Qwen2Tokenizer",
"unk_token": null
}

16
train_results.json Normal file
View File

@@ -0,0 +1,16 @@
{
"achieved_tflops_per_gpu": 4.132335436646405,
"achieved_tflops_per_gpu_theoretical": 111.80085057316455,
"epoch": 5.0,
"loss_nan_ranks": 0,
"loss_rank_avg": 0.01754429191350937,
"mfu_percent": 1.324466486104617,
"mfu_percent_theoretical": 35.833605952937354,
"total_flos": 1.586502852660953e+18,
"train_loss": 0.19733358554124833,
"train_runtime": 95981.0062,
"train_samples_per_second": 0.52,
"train_steps_per_second": 0.033,
"valid_targets_mean": 6467.2,
"valid_targets_min": 3371
}

626
trainer_log.jsonl Normal file
View File

@@ -0,0 +1,626 @@
{"current_steps": 5, "total_steps": 3125, "loss": 1.1917, "lr": 5.111821086261981e-07, "epoch": 0.008009611533840609, "percentage": 0.16, "elapsed_time": "0:01:41", "remaining_time": "17:30:55"}
{"current_steps": 10, "total_steps": 3125, "loss": 1.0982, "lr": 1.1501597444089457e-06, "epoch": 0.016019223067681217, "percentage": 0.32, "elapsed_time": "0:03:57", "remaining_time": "20:34:40"}
{"current_steps": 15, "total_steps": 3125, "loss": 1.0183, "lr": 1.7891373801916933e-06, "epoch": 0.024028834601521828, "percentage": 0.48, "elapsed_time": "0:06:05", "remaining_time": "21:03:25"}
{"current_steps": 20, "total_steps": 3125, "loss": 0.8124, "lr": 2.428115015974441e-06, "epoch": 0.032038446135362435, "percentage": 0.64, "elapsed_time": "0:08:14", "remaining_time": "21:18:36"}
{"current_steps": 25, "total_steps": 3125, "loss": 0.7519, "lr": 3.0670926517571885e-06, "epoch": 0.040048057669203045, "percentage": 0.8, "elapsed_time": "0:10:12", "remaining_time": "21:05:34"}
{"current_steps": 30, "total_steps": 3125, "loss": 0.6987, "lr": 3.7060702875399364e-06, "epoch": 0.048057669203043656, "percentage": 0.96, "elapsed_time": "0:12:18", "remaining_time": "21:10:08"}
{"current_steps": 35, "total_steps": 3125, "loss": 0.642, "lr": 4.345047923322684e-06, "epoch": 0.05606728073688426, "percentage": 1.12, "elapsed_time": "0:14:13", "remaining_time": "20:55:31"}
{"current_steps": 40, "total_steps": 3125, "loss": 0.6346, "lr": 4.984025559105431e-06, "epoch": 0.06407689227072487, "percentage": 1.28, "elapsed_time": "0:15:46", "remaining_time": "20:17:01"}
{"current_steps": 45, "total_steps": 3125, "loss": 0.5798, "lr": 5.623003194888179e-06, "epoch": 0.07208650380456548, "percentage": 1.44, "elapsed_time": "0:17:43", "remaining_time": "20:12:58"}
{"current_steps": 50, "total_steps": 3125, "loss": 0.5628, "lr": 6.261980830670928e-06, "epoch": 0.08009611533840609, "percentage": 1.6, "elapsed_time": "0:20:13", "remaining_time": "20:44:18"}
{"current_steps": 55, "total_steps": 3125, "loss": 0.5212, "lr": 6.900958466453675e-06, "epoch": 0.0881057268722467, "percentage": 1.76, "elapsed_time": "0:22:32", "remaining_time": "20:58:18"}
{"current_steps": 60, "total_steps": 3125, "loss": 0.5565, "lr": 7.5399361022364225e-06, "epoch": 0.09611533840608731, "percentage": 1.92, "elapsed_time": "0:24:21", "remaining_time": "20:44:32"}
{"current_steps": 65, "total_steps": 3125, "loss": 0.5051, "lr": 8.17891373801917e-06, "epoch": 0.10412494993992791, "percentage": 2.08, "elapsed_time": "0:26:30", "remaining_time": "20:47:38"}
{"current_steps": 70, "total_steps": 3125, "loss": 0.508, "lr": 8.817891373801917e-06, "epoch": 0.11213456147376852, "percentage": 2.24, "elapsed_time": "0:28:48", "remaining_time": "20:57:09"}
{"current_steps": 75, "total_steps": 3125, "loss": 0.5028, "lr": 9.456869009584665e-06, "epoch": 0.12014417300760913, "percentage": 2.4, "elapsed_time": "0:31:08", "remaining_time": "21:06:18"}
{"current_steps": 80, "total_steps": 3125, "loss": 0.4603, "lr": 1.0095846645367413e-05, "epoch": 0.12815378454144974, "percentage": 2.56, "elapsed_time": "0:33:43", "remaining_time": "21:23:44"}
{"current_steps": 85, "total_steps": 3125, "loss": 0.501, "lr": 1.073482428115016e-05, "epoch": 0.13616339607529035, "percentage": 2.72, "elapsed_time": "0:35:41", "remaining_time": "21:16:13"}
{"current_steps": 90, "total_steps": 3125, "loss": 0.4883, "lr": 1.1373801916932907e-05, "epoch": 0.14417300760913096, "percentage": 2.88, "elapsed_time": "0:38:09", "remaining_time": "21:26:30"}
{"current_steps": 95, "total_steps": 3125, "loss": 0.4764, "lr": 1.2012779552715656e-05, "epoch": 0.15218261914297157, "percentage": 3.04, "elapsed_time": "0:39:50", "remaining_time": "21:10:34"}
{"current_steps": 100, "total_steps": 3125, "loss": 0.4679, "lr": 1.2651757188498404e-05, "epoch": 0.16019223067681218, "percentage": 3.2, "elapsed_time": "0:41:39", "remaining_time": "21:00:22"}
{"current_steps": 105, "total_steps": 3125, "loss": 0.4604, "lr": 1.329073482428115e-05, "epoch": 0.1682018422106528, "percentage": 3.36, "elapsed_time": "0:44:08", "remaining_time": "21:09:30"}
{"current_steps": 110, "total_steps": 3125, "loss": 0.4609, "lr": 1.39297124600639e-05, "epoch": 0.1762114537444934, "percentage": 3.52, "elapsed_time": "0:46:12", "remaining_time": "21:06:23"}
{"current_steps": 115, "total_steps": 3125, "loss": 0.451, "lr": 1.4568690095846648e-05, "epoch": 0.184221065278334, "percentage": 3.68, "elapsed_time": "0:48:05", "remaining_time": "20:58:53"}
{"current_steps": 120, "total_steps": 3125, "loss": 0.4778, "lr": 1.5207667731629394e-05, "epoch": 0.19223067681217462, "percentage": 3.84, "elapsed_time": "0:50:00", "remaining_time": "20:52:19"}
{"current_steps": 125, "total_steps": 3125, "loss": 0.4563, "lr": 1.584664536741214e-05, "epoch": 0.2002402883460152, "percentage": 4.0, "elapsed_time": "0:52:37", "remaining_time": "21:02:56"}
{"current_steps": 130, "total_steps": 3125, "loss": 0.4297, "lr": 1.648562300319489e-05, "epoch": 0.20824989987985582, "percentage": 4.16, "elapsed_time": "0:55:32", "remaining_time": "21:19:27"}
{"current_steps": 135, "total_steps": 3125, "loss": 0.4041, "lr": 1.712460063897764e-05, "epoch": 0.21625951141369643, "percentage": 4.32, "elapsed_time": "0:58:40", "remaining_time": "21:39:23"}
{"current_steps": 140, "total_steps": 3125, "loss": 0.4128, "lr": 1.7763578274760385e-05, "epoch": 0.22426912294753704, "percentage": 4.48, "elapsed_time": "1:01:45", "remaining_time": "21:56:54"}
{"current_steps": 145, "total_steps": 3125, "loss": 0.4104, "lr": 1.840255591054313e-05, "epoch": 0.23227873448137765, "percentage": 4.64, "elapsed_time": "1:03:53", "remaining_time": "21:53:14"}
{"current_steps": 150, "total_steps": 3125, "loss": 0.4081, "lr": 1.904153354632588e-05, "epoch": 0.24028834601521826, "percentage": 4.8, "elapsed_time": "1:07:01", "remaining_time": "22:09:23"}
{"current_steps": 155, "total_steps": 3125, "loss": 0.3917, "lr": 1.9680511182108627e-05, "epoch": 0.24829795754905887, "percentage": 4.96, "elapsed_time": "1:10:01", "remaining_time": "22:21:51"}
{"current_steps": 160, "total_steps": 3125, "loss": 0.2926, "lr": 2.0319488817891376e-05, "epoch": 0.2563075690828995, "percentage": 5.12, "elapsed_time": "1:12:45", "remaining_time": "22:28:12"}
{"current_steps": 165, "total_steps": 3125, "loss": 0.2895, "lr": 2.0958466453674126e-05, "epoch": 0.2643171806167401, "percentage": 5.28, "elapsed_time": "1:15:54", "remaining_time": "22:41:44"}
{"current_steps": 170, "total_steps": 3125, "loss": 0.2567, "lr": 2.1597444089456872e-05, "epoch": 0.2723267921505807, "percentage": 5.44, "elapsed_time": "1:19:08", "remaining_time": "22:55:45"}
{"current_steps": 175, "total_steps": 3125, "loss": 0.2474, "lr": 2.2236421725239618e-05, "epoch": 0.2803364036844213, "percentage": 5.6, "elapsed_time": "1:22:18", "remaining_time": "23:07:28"}
{"current_steps": 180, "total_steps": 3125, "loss": 0.2582, "lr": 2.2875399361022364e-05, "epoch": 0.2883460152182619, "percentage": 5.76, "elapsed_time": "1:25:18", "remaining_time": "23:15:39"}
{"current_steps": 185, "total_steps": 3125, "loss": 0.2494, "lr": 2.3514376996805114e-05, "epoch": 0.29635562675210253, "percentage": 5.92, "elapsed_time": "1:28:41", "remaining_time": "23:29:30"}
{"current_steps": 190, "total_steps": 3125, "loss": 0.2554, "lr": 2.415335463258786e-05, "epoch": 0.30436523828594314, "percentage": 6.08, "elapsed_time": "1:31:57", "remaining_time": "23:40:24"}
{"current_steps": 195, "total_steps": 3125, "loss": 0.2389, "lr": 2.4792332268370606e-05, "epoch": 0.31237484981978375, "percentage": 6.24, "elapsed_time": "1:35:03", "remaining_time": "23:48:25"}
{"current_steps": 200, "total_steps": 3125, "loss": 0.2458, "lr": 2.543130990415336e-05, "epoch": 0.32038446135362436, "percentage": 6.4, "elapsed_time": "1:38:12", "remaining_time": "23:56:22"}
{"current_steps": 205, "total_steps": 3125, "loss": 0.2587, "lr": 2.6070287539936105e-05, "epoch": 0.32839407288746497, "percentage": 6.56, "elapsed_time": "1:41:00", "remaining_time": "23:58:43"}
{"current_steps": 210, "total_steps": 3125, "loss": 0.2561, "lr": 2.670926517571885e-05, "epoch": 0.3364036844213056, "percentage": 6.72, "elapsed_time": "1:44:00", "remaining_time": "1 day, 0:03:37"}
{"current_steps": 215, "total_steps": 3125, "loss": 0.2419, "lr": 2.73482428115016e-05, "epoch": 0.3444132959551462, "percentage": 6.88, "elapsed_time": "1:47:12", "remaining_time": "1 day, 0:10:56"}
{"current_steps": 220, "total_steps": 3125, "loss": 0.2429, "lr": 2.7987220447284347e-05, "epoch": 0.3524229074889868, "percentage": 7.04, "elapsed_time": "1:50:01", "remaining_time": "1 day, 0:12:45"}
{"current_steps": 225, "total_steps": 3125, "loss": 0.2419, "lr": 2.8626198083067093e-05, "epoch": 0.3604325190228274, "percentage": 7.2, "elapsed_time": "1:52:57", "remaining_time": "1 day, 0:15:51"}
{"current_steps": 230, "total_steps": 3125, "loss": 0.2373, "lr": 2.9265175718849843e-05, "epoch": 0.368442130556668, "percentage": 7.36, "elapsed_time": "1:55:58", "remaining_time": "1 day, 0:19:51"}
{"current_steps": 235, "total_steps": 3125, "loss": 0.2345, "lr": 2.9904153354632592e-05, "epoch": 0.37645174209050863, "percentage": 7.52, "elapsed_time": "1:58:52", "remaining_time": "1 day, 0:21:52"}
{"current_steps": 240, "total_steps": 3125, "loss": 0.2295, "lr": 3.054313099041534e-05, "epoch": 0.38446135362434924, "percentage": 7.68, "elapsed_time": "2:01:49", "remaining_time": "1 day, 0:24:26"}
{"current_steps": 245, "total_steps": 3125, "loss": 0.2399, "lr": 3.1182108626198084e-05, "epoch": 0.39247096515818986, "percentage": 7.84, "elapsed_time": "2:04:53", "remaining_time": "1 day, 0:28:02"}
{"current_steps": 250, "total_steps": 3125, "loss": 0.2384, "lr": 3.1821086261980834e-05, "epoch": 0.4004805766920304, "percentage": 8.0, "elapsed_time": "2:08:01", "remaining_time": "1 day, 0:32:15"}
{"current_steps": 255, "total_steps": 3125, "loss": 0.2262, "lr": 3.246006389776358e-05, "epoch": 0.408490188225871, "percentage": 8.16, "elapsed_time": "2:11:03", "remaining_time": "1 day, 0:35:05"}
{"current_steps": 260, "total_steps": 3125, "loss": 0.2481, "lr": 3.3099041533546326e-05, "epoch": 0.41649979975971163, "percentage": 8.32, "elapsed_time": "2:13:56", "remaining_time": "1 day, 0:36:00"}
{"current_steps": 265, "total_steps": 3125, "loss": 0.2543, "lr": 3.3738019169329076e-05, "epoch": 0.42450941129355224, "percentage": 8.48, "elapsed_time": "2:16:55", "remaining_time": "1 day, 0:37:45"}
{"current_steps": 270, "total_steps": 3125, "loss": 0.2313, "lr": 3.4376996805111825e-05, "epoch": 0.43251902282739285, "percentage": 8.64, "elapsed_time": "2:19:54", "remaining_time": "1 day, 0:39:19"}
{"current_steps": 275, "total_steps": 3125, "loss": 0.2356, "lr": 3.5015974440894575e-05, "epoch": 0.44052863436123346, "percentage": 8.8, "elapsed_time": "2:22:57", "remaining_time": "1 day, 0:41:35"}
{"current_steps": 280, "total_steps": 3125, "loss": 0.2515, "lr": 3.565495207667732e-05, "epoch": 0.4485382458950741, "percentage": 8.96, "elapsed_time": "2:25:42", "remaining_time": "1 day, 0:40:28"}
{"current_steps": 285, "total_steps": 3125, "loss": 0.2316, "lr": 3.629392971246007e-05, "epoch": 0.4565478574289147, "percentage": 9.12, "elapsed_time": "2:28:52", "remaining_time": "1 day, 0:43:27"}
{"current_steps": 290, "total_steps": 3125, "loss": 0.2303, "lr": 3.6932907348242816e-05, "epoch": 0.4645574689627553, "percentage": 9.28, "elapsed_time": "2:31:59", "remaining_time": "1 day, 0:45:54"}
{"current_steps": 295, "total_steps": 3125, "loss": 0.2343, "lr": 3.757188498402556e-05, "epoch": 0.4725670804965959, "percentage": 9.44, "elapsed_time": "2:34:52", "remaining_time": "1 day, 0:45:42"}
{"current_steps": 300, "total_steps": 3125, "loss": 0.2402, "lr": 3.821086261980831e-05, "epoch": 0.4805766920304365, "percentage": 9.6, "elapsed_time": "2:38:00", "remaining_time": "1 day, 0:47:57"}
{"current_steps": 305, "total_steps": 3125, "loss": 0.2139, "lr": 3.884984025559106e-05, "epoch": 0.4885863035642771, "percentage": 9.76, "elapsed_time": "2:41:12", "remaining_time": "1 day, 0:50:35"}
{"current_steps": 310, "total_steps": 3125, "loss": 0.237, "lr": 3.94888178913738e-05, "epoch": 0.49659591509811774, "percentage": 9.92, "elapsed_time": "2:43:39", "remaining_time": "1 day, 0:46:11"}
{"current_steps": 315, "total_steps": 3125, "loss": 0.4689, "lr": 3.9999987518434296e-05, "epoch": 0.5046055266319583, "percentage": 10.08, "elapsed_time": "2:45:08", "remaining_time": "1 day, 0:33:13"}
{"current_steps": 320, "total_steps": 3125, "loss": 0.5348, "lr": 3.999955066527015e-05, "epoch": 0.512615138165799, "percentage": 10.24, "elapsed_time": "2:46:14", "remaining_time": "1 day, 0:17:16"}
{"current_steps": 325, "total_steps": 3125, "loss": 0.5388, "lr": 3.999848974939926e-05, "epoch": 0.5206247496996396, "percentage": 10.4, "elapsed_time": "2:47:24", "remaining_time": "1 day, 0:02:21"}
{"current_steps": 330, "total_steps": 3125, "loss": 0.5552, "lr": 3.999680480392626e-05, "epoch": 0.5286343612334802, "percentage": 10.56, "elapsed_time": "2:48:28", "remaining_time": "23:46:59"}
{"current_steps": 335, "total_steps": 3125, "loss": 0.5724, "lr": 3.999449588142792e-05, "epoch": 0.5366439727673208, "percentage": 10.72, "elapsed_time": "2:49:49", "remaining_time": "23:34:23"}
{"current_steps": 340, "total_steps": 3125, "loss": 0.5475, "lr": 3.9991563053951476e-05, "epoch": 0.5446535843011614, "percentage": 10.88, "elapsed_time": "2:51:03", "remaining_time": "23:21:12"}
{"current_steps": 345, "total_steps": 3125, "loss": 0.5417, "lr": 3.99880064130124e-05, "epoch": 0.552663195835002, "percentage": 11.04, "elapsed_time": "2:52:08", "remaining_time": "23:07:05"}
{"current_steps": 350, "total_steps": 3125, "loss": 0.5749, "lr": 3.9983826069591535e-05, "epoch": 0.5606728073688426, "percentage": 11.2, "elapsed_time": "2:53:12", "remaining_time": "22:53:19"}
{"current_steps": 355, "total_steps": 3125, "loss": 0.5677, "lr": 3.997902215413163e-05, "epoch": 0.5686824189026832, "percentage": 11.36, "elapsed_time": "2:54:24", "remaining_time": "22:40:54"}
{"current_steps": 360, "total_steps": 3125, "loss": 0.534, "lr": 3.997359481653327e-05, "epoch": 0.5766920304365238, "percentage": 11.52, "elapsed_time": "2:55:34", "remaining_time": "22:28:33"}
{"current_steps": 365, "total_steps": 3125, "loss": 0.5213, "lr": 3.996754422615023e-05, "epoch": 0.5847016419703644, "percentage": 11.68, "elapsed_time": "2:56:54", "remaining_time": "22:17:45"}
{"current_steps": 370, "total_steps": 3125, "loss": 0.5252, "lr": 3.996087057178411e-05, "epoch": 0.5927112535042051, "percentage": 11.84, "elapsed_time": "2:58:10", "remaining_time": "22:06:41"}
{"current_steps": 375, "total_steps": 3125, "loss": 0.5467, "lr": 3.995357406167856e-05, "epoch": 0.6007208650380457, "percentage": 12.0, "elapsed_time": "2:59:18", "remaining_time": "21:54:58"}
{"current_steps": 380, "total_steps": 3125, "loss": 0.5521, "lr": 3.994565492351267e-05, "epoch": 0.6087304765718863, "percentage": 12.16, "elapsed_time": "3:00:15", "remaining_time": "21:42:11"}
{"current_steps": 385, "total_steps": 3125, "loss": 0.5714, "lr": 3.993711340439394e-05, "epoch": 0.6167400881057269, "percentage": 12.32, "elapsed_time": "3:01:09", "remaining_time": "21:29:18"}
{"current_steps": 390, "total_steps": 3125, "loss": 0.5509, "lr": 3.9927949770850535e-05, "epoch": 0.6247496996395675, "percentage": 12.48, "elapsed_time": "3:01:52", "remaining_time": "21:15:27"}
{"current_steps": 395, "total_steps": 3125, "loss": 0.5728, "lr": 3.991816430882297e-05, "epoch": 0.6327593111734081, "percentage": 12.64, "elapsed_time": "3:02:41", "remaining_time": "21:02:42"}
{"current_steps": 400, "total_steps": 3125, "loss": 0.568, "lr": 3.9907757323655206e-05, "epoch": 0.6407689227072487, "percentage": 12.8, "elapsed_time": "3:03:26", "remaining_time": "20:49:40"}
{"current_steps": 405, "total_steps": 3125, "loss": 0.5683, "lr": 3.98967291400851e-05, "epoch": 0.6487785342410893, "percentage": 12.96, "elapsed_time": "3:04:13", "remaining_time": "20:37:15"}
{"current_steps": 410, "total_steps": 3125, "loss": 0.5594, "lr": 3.98850801022343e-05, "epoch": 0.6567881457749299, "percentage": 13.12, "elapsed_time": "3:05:09", "remaining_time": "20:26:07"}
{"current_steps": 415, "total_steps": 3125, "loss": 0.557, "lr": 3.987281057359746e-05, "epoch": 0.6647977573087706, "percentage": 13.28, "elapsed_time": "3:05:58", "remaining_time": "20:14:29"}
{"current_steps": 420, "total_steps": 3125, "loss": 0.5517, "lr": 3.985992093703096e-05, "epoch": 0.6728073688426112, "percentage": 13.44, "elapsed_time": "3:06:52", "remaining_time": "20:03:32"}
{"current_steps": 425, "total_steps": 3125, "loss": 0.5803, "lr": 3.98464115947409e-05, "epoch": 0.6808169803764518, "percentage": 13.6, "elapsed_time": "3:07:36", "remaining_time": "19:51:51"}
{"current_steps": 430, "total_steps": 3125, "loss": 0.5195, "lr": 3.9832282968270595e-05, "epoch": 0.6888265919102924, "percentage": 13.76, "elapsed_time": "3:08:26", "remaining_time": "19:41:05"}
{"current_steps": 435, "total_steps": 3125, "loss": 0.544, "lr": 3.9817535498487385e-05, "epoch": 0.696836203444133, "percentage": 13.92, "elapsed_time": "3:09:16", "remaining_time": "19:30:29"}
{"current_steps": 440, "total_steps": 3125, "loss": 0.5663, "lr": 3.980216964556892e-05, "epoch": 0.7048458149779736, "percentage": 14.08, "elapsed_time": "3:09:58", "remaining_time": "19:19:18"}
{"current_steps": 445, "total_steps": 3125, "loss": 0.5732, "lr": 3.978618588898873e-05, "epoch": 0.7128554265118142, "percentage": 14.24, "elapsed_time": "3:10:37", "remaining_time": "19:08:02"}
{"current_steps": 450, "total_steps": 3125, "loss": 0.5465, "lr": 3.976958472750137e-05, "epoch": 0.7208650380456548, "percentage": 14.4, "elapsed_time": "3:11:25", "remaining_time": "18:57:55"}
{"current_steps": 455, "total_steps": 3125, "loss": 0.5309, "lr": 3.9752366679126754e-05, "epoch": 0.7288746495794954, "percentage": 14.56, "elapsed_time": "3:12:16", "remaining_time": "18:48:18"}
{"current_steps": 460, "total_steps": 3125, "loss": 0.5663, "lr": 3.973453228113405e-05, "epoch": 0.736884261113336, "percentage": 14.72, "elapsed_time": "3:13:08", "remaining_time": "18:38:55"}
{"current_steps": 465, "total_steps": 3125, "loss": 0.5489, "lr": 3.971608209002489e-05, "epoch": 0.7448938726471767, "percentage": 14.88, "elapsed_time": "3:13:53", "remaining_time": "18:29:07"}
{"current_steps": 470, "total_steps": 3125, "loss": 0.4561, "lr": 3.969701668151603e-05, "epoch": 0.7529034841810173, "percentage": 15.04, "elapsed_time": "3:15:38", "remaining_time": "18:25:11"}
{"current_steps": 475, "total_steps": 3125, "loss": 0.3063, "lr": 3.9677336650521336e-05, "epoch": 0.7609130957148579, "percentage": 15.2, "elapsed_time": "3:19:09", "remaining_time": "18:31:07"}
{"current_steps": 480, "total_steps": 3125, "loss": 0.3397, "lr": 3.9657042611133294e-05, "epoch": 0.7689227072486985, "percentage": 15.36, "elapsed_time": "3:23:16", "remaining_time": "18:40:07"}
{"current_steps": 485, "total_steps": 3125, "loss": 0.3538, "lr": 3.963613519660379e-05, "epoch": 0.7769323187825391, "percentage": 15.52, "elapsed_time": "3:27:01", "remaining_time": "18:46:51"}
{"current_steps": 490, "total_steps": 3125, "loss": 0.3542, "lr": 3.961461505932435e-05, "epoch": 0.7849419303163797, "percentage": 15.68, "elapsed_time": "3:31:17", "remaining_time": "18:56:13"}
{"current_steps": 495, "total_steps": 3125, "loss": 0.33, "lr": 3.959248287080583e-05, "epoch": 0.7929515418502202, "percentage": 15.84, "elapsed_time": "3:34:52", "remaining_time": "19:01:39"}
{"current_steps": 500, "total_steps": 3125, "loss": 0.3767, "lr": 3.9569739321657416e-05, "epoch": 0.8009611533840608, "percentage": 16.0, "elapsed_time": "3:38:58", "remaining_time": "19:09:35"}
{"current_steps": 505, "total_steps": 3125, "loss": 0.3302, "lr": 3.9546385121565095e-05, "epoch": 0.8089707649179014, "percentage": 16.16, "elapsed_time": "3:41:50", "remaining_time": "19:10:55"}
{"current_steps": 510, "total_steps": 3125, "loss": 0.3849, "lr": 3.952242099926951e-05, "epoch": 0.816980376451742, "percentage": 16.32, "elapsed_time": "3:45:47", "remaining_time": "19:17:45"}
{"current_steps": 515, "total_steps": 3125, "loss": 0.372, "lr": 3.9497847702543196e-05, "epoch": 0.8249899879855827, "percentage": 16.48, "elapsed_time": "3:50:03", "remaining_time": "19:25:57"}
{"current_steps": 520, "total_steps": 3125, "loss": 0.3619, "lr": 3.94726659981673e-05, "epoch": 0.8329995995194233, "percentage": 16.64, "elapsed_time": "3:54:10", "remaining_time": "19:33:08"}
{"current_steps": 525, "total_steps": 3125, "loss": 0.3448, "lr": 3.94468766719076e-05, "epoch": 0.8410092110532639, "percentage": 16.8, "elapsed_time": "3:57:33", "remaining_time": "19:36:28"}
{"current_steps": 530, "total_steps": 3125, "loss": 0.3547, "lr": 3.942048052849001e-05, "epoch": 0.8490188225871045, "percentage": 16.96, "elapsed_time": "4:01:38", "remaining_time": "19:43:06"}
{"current_steps": 535, "total_steps": 3125, "loss": 0.3307, "lr": 3.939347839157548e-05, "epoch": 0.8570284341209451, "percentage": 17.12, "elapsed_time": "4:05:19", "remaining_time": "19:47:40"}
{"current_steps": 540, "total_steps": 3125, "loss": 0.3225, "lr": 3.9365871103734264e-05, "epoch": 0.8650380456547857, "percentage": 17.28, "elapsed_time": "4:09:42", "remaining_time": "19:55:23"}
{"current_steps": 545, "total_steps": 3125, "loss": 0.2924, "lr": 3.933765952641965e-05, "epoch": 0.8730476571886263, "percentage": 17.44, "elapsed_time": "4:13:37", "remaining_time": "20:00:37"}
{"current_steps": 550, "total_steps": 3125, "loss": 0.3117, "lr": 3.930884453994109e-05, "epoch": 0.8810572687224669, "percentage": 17.6, "elapsed_time": "4:17:41", "remaining_time": "20:06:28"}
{"current_steps": 555, "total_steps": 3125, "loss": 0.3388, "lr": 3.9279427043436706e-05, "epoch": 0.8890668802563075, "percentage": 17.76, "elapsed_time": "4:22:33", "remaining_time": "20:15:50"}
{"current_steps": 560, "total_steps": 3125, "loss": 0.3752, "lr": 3.924940795484525e-05, "epoch": 0.8970764917901481, "percentage": 17.92, "elapsed_time": "4:26:38", "remaining_time": "20:21:17"}
{"current_steps": 565, "total_steps": 3125, "loss": 0.3344, "lr": 3.9218788210877436e-05, "epoch": 0.9050861033239888, "percentage": 18.08, "elapsed_time": "4:30:39", "remaining_time": "20:26:21"}
{"current_steps": 570, "total_steps": 3125, "loss": 0.3462, "lr": 3.918756876698676e-05, "epoch": 0.9130957148578294, "percentage": 18.24, "elapsed_time": "4:34:12", "remaining_time": "20:29:06"}
{"current_steps": 575, "total_steps": 3125, "loss": 0.3186, "lr": 3.9155750597339634e-05, "epoch": 0.92110532639167, "percentage": 18.4, "elapsed_time": "4:37:56", "remaining_time": "20:32:34"}
{"current_steps": 580, "total_steps": 3125, "loss": 0.4225, "lr": 3.912333469478502e-05, "epoch": 0.9291149379255106, "percentage": 18.56, "elapsed_time": "4:42:21", "remaining_time": "20:38:56"}
{"current_steps": 585, "total_steps": 3125, "loss": 0.3832, "lr": 3.909032207082344e-05, "epoch": 0.9371245494593512, "percentage": 18.72, "elapsed_time": "4:46:15", "remaining_time": "20:42:55"}
{"current_steps": 590, "total_steps": 3125, "loss": 0.3439, "lr": 3.90567137555754e-05, "epoch": 0.9451341609931918, "percentage": 18.88, "elapsed_time": "4:50:54", "remaining_time": "20:49:54"}
{"current_steps": 595, "total_steps": 3125, "loss": 0.2978, "lr": 3.9022510797749286e-05, "epoch": 0.9531437725270324, "percentage": 19.04, "elapsed_time": "4:55:36", "remaining_time": "20:56:58"}
{"current_steps": 600, "total_steps": 3125, "loss": 0.2719, "lr": 3.898771426460859e-05, "epoch": 0.961153384060873, "percentage": 19.2, "elapsed_time": "4:59:09", "remaining_time": "20:58:57"}
{"current_steps": 605, "total_steps": 3125, "loss": 0.2591, "lr": 3.8952325241938635e-05, "epoch": 0.9691629955947136, "percentage": 19.36, "elapsed_time": "5:02:50", "remaining_time": "21:01:23"}
{"current_steps": 610, "total_steps": 3125, "loss": 0.3369, "lr": 3.8916344834012695e-05, "epoch": 0.9771726071285542, "percentage": 19.52, "elapsed_time": "5:07:35", "remaining_time": "21:08:11"}
{"current_steps": 615, "total_steps": 3125, "loss": 0.3219, "lr": 3.887977416355754e-05, "epoch": 0.9851822186623949, "percentage": 19.68, "elapsed_time": "5:12:03", "remaining_time": "21:13:34"}
{"current_steps": 620, "total_steps": 3125, "loss": 0.2945, "lr": 3.884261437171838e-05, "epoch": 0.9931918301962355, "percentage": 19.84, "elapsed_time": "5:16:01", "remaining_time": "21:16:51"}
{"current_steps": 625, "total_steps": 3125, "loss": 0.2984, "lr": 3.8804866618023284e-05, "epoch": 1.0, "percentage": 20.0, "elapsed_time": "5:19:35", "remaining_time": "21:18:23"}
{"current_steps": 630, "total_steps": 3125, "loss": 0.5045, "lr": 3.876653208034698e-05, "epoch": 1.0080096115338406, "percentage": 20.16, "elapsed_time": "5:21:11", "remaining_time": "21:11:59"}
{"current_steps": 635, "total_steps": 3125, "loss": 0.4707, "lr": 3.8727611954874114e-05, "epoch": 1.0160192230676812, "percentage": 20.32, "elapsed_time": "5:23:22", "remaining_time": "21:08:03"}
{"current_steps": 640, "total_steps": 3125, "loss": 0.4665, "lr": 3.8688107456061904e-05, "epoch": 1.0240288346015218, "percentage": 20.48, "elapsed_time": "5:25:25", "remaining_time": "21:03:35"}
{"current_steps": 645, "total_steps": 3125, "loss": 0.4398, "lr": 3.864801981660227e-05, "epoch": 1.0320384461353624, "percentage": 20.64, "elapsed_time": "5:27:30", "remaining_time": "20:59:15"}
{"current_steps": 650, "total_steps": 3125, "loss": 0.4399, "lr": 3.860735028738337e-05, "epoch": 1.040048057669203, "percentage": 20.8, "elapsed_time": "5:29:26", "remaining_time": "20:54:23"}
{"current_steps": 655, "total_steps": 3125, "loss": 0.4441, "lr": 3.856610013745051e-05, "epoch": 1.0480576692030437, "percentage": 20.96, "elapsed_time": "5:31:29", "remaining_time": "20:50:04"}
{"current_steps": 660, "total_steps": 3125, "loss": 0.4323, "lr": 3.852427065396665e-05, "epoch": 1.0560672807368843, "percentage": 21.12, "elapsed_time": "5:33:21", "remaining_time": "20:45:02"}
{"current_steps": 665, "total_steps": 3125, "loss": 0.4469, "lr": 3.848186314217213e-05, "epoch": 1.0640768922707249, "percentage": 21.28, "elapsed_time": "5:34:52", "remaining_time": "20:38:48"}
{"current_steps": 670, "total_steps": 3125, "loss": 0.4252, "lr": 3.843887892534402e-05, "epoch": 1.0720865038045655, "percentage": 21.44, "elapsed_time": "5:36:47", "remaining_time": "20:34:03"}
{"current_steps": 675, "total_steps": 3125, "loss": 0.4291, "lr": 3.8395319344754776e-05, "epoch": 1.080096115338406, "percentage": 21.6, "elapsed_time": "5:39:16", "remaining_time": "20:31:27"}
{"current_steps": 680, "total_steps": 3125, "loss": 0.3991, "lr": 3.8351185759630435e-05, "epoch": 1.0881057268722467, "percentage": 21.76, "elapsed_time": "5:41:34", "remaining_time": "20:28:08"}
{"current_steps": 685, "total_steps": 3125, "loss": 0.4347, "lr": 3.830647954710816e-05, "epoch": 1.0961153384060873, "percentage": 21.92, "elapsed_time": "5:43:20", "remaining_time": "20:22:59"}
{"current_steps": 690, "total_steps": 3125, "loss": 0.3976, "lr": 3.826120210219331e-05, "epoch": 1.104124949939928, "percentage": 22.08, "elapsed_time": "5:45:26", "remaining_time": "20:19:05"}
{"current_steps": 695, "total_steps": 3125, "loss": 0.4006, "lr": 3.8215354837715836e-05, "epoch": 1.1121345614737685, "percentage": 22.24, "elapsed_time": "5:47:42", "remaining_time": "20:15:42"}
{"current_steps": 700, "total_steps": 3125, "loss": 0.4012, "lr": 3.816893918428631e-05, "epoch": 1.1201441730076092, "percentage": 22.4, "elapsed_time": "5:50:00", "remaining_time": "20:12:30"}
{"current_steps": 705, "total_steps": 3125, "loss": 0.3668, "lr": 3.8121956590251153e-05, "epoch": 1.1281537845414498, "percentage": 22.56, "elapsed_time": "5:52:33", "remaining_time": "20:10:11"}
{"current_steps": 710, "total_steps": 3125, "loss": 0.4024, "lr": 3.8074408521647576e-05, "epoch": 1.1361633960752904, "percentage": 22.72, "elapsed_time": "5:54:29", "remaining_time": "20:05:46"}
{"current_steps": 715, "total_steps": 3125, "loss": 0.3939, "lr": 3.802629646215771e-05, "epoch": 1.144173007609131, "percentage": 22.88, "elapsed_time": "5:56:56", "remaining_time": "20:03:06"}
{"current_steps": 720, "total_steps": 3125, "loss": 0.3789, "lr": 3.79776219130624e-05, "epoch": 1.1521826191429716, "percentage": 23.04, "elapsed_time": "5:58:36", "remaining_time": "19:57:51"}
{"current_steps": 725, "total_steps": 3125, "loss": 0.4055, "lr": 3.792838639319431e-05, "epoch": 1.1601922306768122, "percentage": 23.2, "elapsed_time": "6:00:25", "remaining_time": "19:53:07"}
{"current_steps": 730, "total_steps": 3125, "loss": 0.3711, "lr": 3.787859143889054e-05, "epoch": 1.1682018422106528, "percentage": 23.36, "elapsed_time": "6:02:53", "remaining_time": "19:50:34"}
{"current_steps": 735, "total_steps": 3125, "loss": 0.3745, "lr": 3.782823860394469e-05, "epoch": 1.1762114537444934, "percentage": 23.52, "elapsed_time": "6:04:56", "remaining_time": "19:46:42"}
{"current_steps": 740, "total_steps": 3125, "loss": 0.3604, "lr": 3.777732945955841e-05, "epoch": 1.184221065278334, "percentage": 23.68, "elapsed_time": "6:06:50", "remaining_time": "19:42:18"}
{"current_steps": 745, "total_steps": 3125, "loss": 0.3821, "lr": 3.772586559429229e-05, "epoch": 1.1922306768121747, "percentage": 23.84, "elapsed_time": "6:08:43", "remaining_time": "19:37:57"}
{"current_steps": 750, "total_steps": 3125, "loss": 0.3573, "lr": 3.767384861401636e-05, "epoch": 1.2002402883460153, "percentage": 24.0, "elapsed_time": "6:11:19", "remaining_time": "19:35:52"}
{"current_steps": 755, "total_steps": 3125, "loss": 0.3413, "lr": 3.762128014185998e-05, "epoch": 1.2082498998798559, "percentage": 24.16, "elapsed_time": "6:15:33", "remaining_time": "19:38:54"}
{"current_steps": 760, "total_steps": 3125, "loss": 0.3206, "lr": 3.7568161818161135e-05, "epoch": 1.2162595114136965, "percentage": 24.32, "elapsed_time": "6:18:41", "remaining_time": "19:38:24"}
{"current_steps": 765, "total_steps": 3125, "loss": 0.3266, "lr": 3.751449530041532e-05, "epoch": 1.224269122947537, "percentage": 24.48, "elapsed_time": "6:21:45", "remaining_time": "19:37:42"}
{"current_steps": 770, "total_steps": 3125, "loss": 0.3229, "lr": 3.7460282263223764e-05, "epoch": 1.2322787344813777, "percentage": 24.64, "elapsed_time": "6:23:52", "remaining_time": "19:34:03"}
{"current_steps": 775, "total_steps": 3125, "loss": 0.325, "lr": 3.740552439824122e-05, "epoch": 1.2402883460152183, "percentage": 24.8, "elapsed_time": "6:26:59", "remaining_time": "19:33:28"}
{"current_steps": 780, "total_steps": 3125, "loss": 0.3143, "lr": 3.735022341412314e-05, "epoch": 1.248297957549059, "percentage": 24.96, "elapsed_time": "6:29:59", "remaining_time": "19:32:28"}
{"current_steps": 785, "total_steps": 3125, "loss": 0.2145, "lr": 3.7294381036472386e-05, "epoch": 1.2563075690828995, "percentage": 25.12, "elapsed_time": "6:32:42", "remaining_time": "19:30:36"}
{"current_steps": 790, "total_steps": 3125, "loss": 0.2093, "lr": 3.723799900778538e-05, "epoch": 1.2643171806167401, "percentage": 25.28, "elapsed_time": "6:35:50", "remaining_time": "19:29:58"}
{"current_steps": 795, "total_steps": 3125, "loss": 0.1842, "lr": 3.7181079087397705e-05, "epoch": 1.2723267921505808, "percentage": 25.44, "elapsed_time": "6:39:04", "remaining_time": "19:29:35"}
{"current_steps": 800, "total_steps": 3125, "loss": 0.1788, "lr": 3.712362305142926e-05, "epoch": 1.2803364036844214, "percentage": 25.6, "elapsed_time": "6:42:12", "remaining_time": "19:28:55"}
{"current_steps": 805, "total_steps": 3125, "loss": 0.1847, "lr": 3.706563269272878e-05, "epoch": 1.288346015218262, "percentage": 25.76, "elapsed_time": "6:45:11", "remaining_time": "19:27:46"}
{"current_steps": 810, "total_steps": 3125, "loss": 0.1803, "lr": 3.700710982081794e-05, "epoch": 1.2963556267521026, "percentage": 25.92, "elapsed_time": "6:48:34", "remaining_time": "19:27:41"}
{"current_steps": 815, "total_steps": 3125, "loss": 0.1843, "lr": 3.694805626183486e-05, "epoch": 1.3043652382859432, "percentage": 26.08, "elapsed_time": "6:51:49", "remaining_time": "19:27:14"}
{"current_steps": 820, "total_steps": 3125, "loss": 0.1707, "lr": 3.688847385847711e-05, "epoch": 1.3123748498197838, "percentage": 26.24, "elapsed_time": "6:54:55", "remaining_time": "19:26:20"}
{"current_steps": 825, "total_steps": 3125, "loss": 0.1741, "lr": 3.682836446994428e-05, "epoch": 1.3203844613536244, "percentage": 26.4, "elapsed_time": "6:58:03", "remaining_time": "19:25:29"}
{"current_steps": 830, "total_steps": 3125, "loss": 0.1846, "lr": 3.676772997187989e-05, "epoch": 1.328394072887465, "percentage": 26.56, "elapsed_time": "7:00:50", "remaining_time": "19:23:39"}
{"current_steps": 835, "total_steps": 3125, "loss": 0.1788, "lr": 3.670657225631289e-05, "epoch": 1.3364036844213056, "percentage": 26.72, "elapsed_time": "7:03:49", "remaining_time": "19:22:20"}
{"current_steps": 840, "total_steps": 3125, "loss": 0.1668, "lr": 3.6644893231598635e-05, "epoch": 1.3444132959551462, "percentage": 26.88, "elapsed_time": "7:07:01", "remaining_time": "19:21:35"}
{"current_steps": 845, "total_steps": 3125, "loss": 0.168, "lr": 3.658269482235932e-05, "epoch": 1.3524229074889869, "percentage": 27.04, "elapsed_time": "7:09:49", "remaining_time": "19:19:45"}
{"current_steps": 850, "total_steps": 3125, "loss": 0.1654, "lr": 3.651997896942394e-05, "epoch": 1.3604325190228275, "percentage": 27.2, "elapsed_time": "7:12:44", "remaining_time": "19:18:13"}
{"current_steps": 855, "total_steps": 3125, "loss": 0.1611, "lr": 3.645674762976769e-05, "epoch": 1.368442130556668, "percentage": 27.36, "elapsed_time": "7:15:45", "remaining_time": "19:16:56"}
{"current_steps": 860, "total_steps": 3125, "loss": 0.1696, "lr": 3.639300277645096e-05, "epoch": 1.3764517420905087, "percentage": 27.52, "elapsed_time": "7:18:39", "remaining_time": "19:15:18"}
{"current_steps": 865, "total_steps": 3125, "loss": 0.1566, "lr": 3.6328746398557715e-05, "epoch": 1.3844613536243493, "percentage": 27.68, "elapsed_time": "7:21:37", "remaining_time": "19:13:49"}
{"current_steps": 870, "total_steps": 3125, "loss": 0.1624, "lr": 3.6263980501133466e-05, "epoch": 1.39247096515819, "percentage": 27.84, "elapsed_time": "7:24:40", "remaining_time": "19:12:35"}
{"current_steps": 875, "total_steps": 3125, "loss": 0.1575, "lr": 3.619870710512268e-05, "epoch": 1.4004805766920305, "percentage": 28.0, "elapsed_time": "7:27:48", "remaining_time": "19:11:30"}
{"current_steps": 880, "total_steps": 3125, "loss": 0.1494, "lr": 3.6132928247305713e-05, "epoch": 1.408490188225871, "percentage": 28.16, "elapsed_time": "7:30:51", "remaining_time": "19:10:10"}
{"current_steps": 885, "total_steps": 3125, "loss": 0.167, "lr": 3.60666459802353e-05, "epoch": 1.4164997997597117, "percentage": 28.32, "elapsed_time": "7:33:44", "remaining_time": "19:08:26"}
{"current_steps": 890, "total_steps": 3125, "loss": 0.1689, "lr": 3.599986237217245e-05, "epoch": 1.4245094112935521, "percentage": 28.48, "elapsed_time": "7:36:42", "remaining_time": "19:06:54"}
{"current_steps": 895, "total_steps": 3125, "loss": 0.1502, "lr": 3.593257950702194e-05, "epoch": 1.432519022827393, "percentage": 28.64, "elapsed_time": "7:39:40", "remaining_time": "19:05:20"}
{"current_steps": 900, "total_steps": 3125, "loss": 0.1506, "lr": 3.586479948426728e-05, "epoch": 1.4405286343612334, "percentage": 28.8, "elapsed_time": "7:42:44", "remaining_time": "19:03:59"}
{"current_steps": 905, "total_steps": 3125, "loss": 0.1561, "lr": 3.579652441890523e-05, "epoch": 1.4485382458950742, "percentage": 28.96, "elapsed_time": "7:45:28", "remaining_time": "19:01:50"}
{"current_steps": 910, "total_steps": 3125, "loss": 0.1433, "lr": 3.572775644137974e-05, "epoch": 1.4565478574289146, "percentage": 29.12, "elapsed_time": "7:48:38", "remaining_time": "19:00:42"}
{"current_steps": 915, "total_steps": 3125, "loss": 0.1422, "lr": 3.5658497697515534e-05, "epoch": 1.4645574689627554, "percentage": 29.28, "elapsed_time": "7:51:45", "remaining_time": "18:59:27"}
{"current_steps": 920, "total_steps": 3125, "loss": 0.1438, "lr": 3.558875034845113e-05, "epoch": 1.4725670804965958, "percentage": 29.44, "elapsed_time": "7:54:38", "remaining_time": "18:57:35"}
{"current_steps": 925, "total_steps": 3125, "loss": 0.1555, "lr": 3.551851657057139e-05, "epoch": 1.4805766920304366, "percentage": 29.6, "elapsed_time": "7:57:47", "remaining_time": "18:56:20"}
{"current_steps": 930, "total_steps": 3125, "loss": 0.1335, "lr": 3.544779855543963e-05, "epoch": 1.488586303564277, "percentage": 29.76, "elapsed_time": "8:00:58", "remaining_time": "18:55:13"}
{"current_steps": 935, "total_steps": 3125, "loss": 0.1448, "lr": 3.5376598509729226e-05, "epoch": 1.4965959150981178, "percentage": 29.92, "elapsed_time": "8:03:25", "remaining_time": "18:52:19"}
{"current_steps": 940, "total_steps": 3125, "loss": 0.2606, "lr": 3.5304918655154754e-05, "epoch": 1.5046055266319582, "percentage": 30.08, "elapsed_time": "8:04:54", "remaining_time": "18:47:09"}
{"current_steps": 945, "total_steps": 3125, "loss": 0.2995, "lr": 3.523276122840266e-05, "epoch": 1.512615138165799, "percentage": 30.24, "elapsed_time": "8:06:00", "remaining_time": "18:41:09"}
{"current_steps": 950, "total_steps": 3125, "loss": 0.3033, "lr": 3.516012848106149e-05, "epoch": 1.5206247496996395, "percentage": 30.4, "elapsed_time": "8:07:10", "remaining_time": "18:35:22"}
{"current_steps": 955, "total_steps": 3125, "loss": 0.3023, "lr": 3.5087022679551614e-05, "epoch": 1.5286343612334803, "percentage": 30.56, "elapsed_time": "8:08:14", "remaining_time": "18:29:23"}
{"current_steps": 960, "total_steps": 3125, "loss": 0.3305, "lr": 3.5013446105054486e-05, "epoch": 1.5366439727673207, "percentage": 30.72, "elapsed_time": "8:09:34", "remaining_time": "18:24:06"}
{"current_steps": 965, "total_steps": 3125, "loss": 0.2999, "lr": 3.493940105344152e-05, "epoch": 1.5446535843011615, "percentage": 30.88, "elapsed_time": "8:10:49", "remaining_time": "18:18:37"}
{"current_steps": 970, "total_steps": 3125, "loss": 0.3051, "lr": 3.4864889835202366e-05, "epoch": 1.552663195835002, "percentage": 31.04, "elapsed_time": "8:11:53", "remaining_time": "18:12:49"}
{"current_steps": 975, "total_steps": 3125, "loss": 0.334, "lr": 3.4789914775372905e-05, "epoch": 1.5606728073688427, "percentage": 31.2, "elapsed_time": "8:12:58", "remaining_time": "18:07:03"}
{"current_steps": 980, "total_steps": 3125, "loss": 0.3472, "lr": 3.471447821346264e-05, "epoch": 1.5686824189026831, "percentage": 31.36, "elapsed_time": "8:14:10", "remaining_time": "18:01:38"}
{"current_steps": 985, "total_steps": 3125, "loss": 0.3006, "lr": 3.463858250338168e-05, "epoch": 1.576692030436524, "percentage": 31.52, "elapsed_time": "8:15:20", "remaining_time": "17:56:10"}
{"current_steps": 990, "total_steps": 3125, "loss": 0.2937, "lr": 3.4562230013367374e-05, "epoch": 1.5847016419703643, "percentage": 31.68, "elapsed_time": "8:16:40", "remaining_time": "17:51:07"}
{"current_steps": 995, "total_steps": 3125, "loss": 0.3095, "lr": 3.448542312591032e-05, "epoch": 1.5927112535042052, "percentage": 31.84, "elapsed_time": "8:17:56", "remaining_time": "17:45:56"}
{"current_steps": 1000, "total_steps": 3125, "loss": 0.3112, "lr": 3.440816423768007e-05, "epoch": 1.6007208650380456, "percentage": 32.0, "elapsed_time": "8:19:04", "remaining_time": "17:40:32"}
{"current_steps": 1005, "total_steps": 3125, "loss": 0.303, "lr": 3.433045575945031e-05, "epoch": 1.6087304765718864, "percentage": 32.16, "elapsed_time": "8:20:02", "remaining_time": "17:34:48"}
{"current_steps": 1010, "total_steps": 3125, "loss": 0.3246, "lr": 3.42523001160237e-05, "epoch": 1.6167400881057268, "percentage": 32.32, "elapsed_time": "8:20:56", "remaining_time": "17:28:59"}
{"current_steps": 1015, "total_steps": 3125, "loss": 0.2997, "lr": 3.417369974615615e-05, "epoch": 1.6247496996395676, "percentage": 32.48, "elapsed_time": "8:21:38", "remaining_time": "17:22:50"}
{"current_steps": 1020, "total_steps": 3125, "loss": 0.328, "lr": 3.409465710248074e-05, "epoch": 1.632759311173408, "percentage": 32.64, "elapsed_time": "8:22:28", "remaining_time": "17:16:57"}
{"current_steps": 1025, "total_steps": 3125, "loss": 0.2954, "lr": 3.401517465143119e-05, "epoch": 1.6407689227072488, "percentage": 32.8, "elapsed_time": "8:23:12", "remaining_time": "17:10:57"}
{"current_steps": 1030, "total_steps": 3125, "loss": 0.2999, "lr": 3.393525487316489e-05, "epoch": 1.6487785342410892, "percentage": 32.96, "elapsed_time": "8:23:59", "remaining_time": "17:05:06"}
{"current_steps": 1035, "total_steps": 3125, "loss": 0.308, "lr": 3.385490026148554e-05, "epoch": 1.65678814577493, "percentage": 33.12, "elapsed_time": "8:24:55", "remaining_time": "16:59:37"}
{"current_steps": 1040, "total_steps": 3125, "loss": 0.308, "lr": 3.377411332376529e-05, "epoch": 1.6647977573087704, "percentage": 33.28, "elapsed_time": "8:25:45", "remaining_time": "16:53:56"}
{"current_steps": 1045, "total_steps": 3125, "loss": 0.2913, "lr": 3.369289658086651e-05, "epoch": 1.6728073688426113, "percentage": 33.44, "elapsed_time": "8:26:38", "remaining_time": "16:48:26"}
{"current_steps": 1050, "total_steps": 3125, "loss": 0.3129, "lr": 3.3611252567063184e-05, "epoch": 1.6808169803764517, "percentage": 33.6, "elapsed_time": "8:27:23", "remaining_time": "16:42:41"}
{"current_steps": 1055, "total_steps": 3125, "loss": 0.2778, "lr": 3.352918382996174e-05, "epoch": 1.6888265919102925, "percentage": 33.76, "elapsed_time": "8:28:13", "remaining_time": "16:37:11"}
{"current_steps": 1060, "total_steps": 3125, "loss": 0.2922, "lr": 3.344669293042163e-05, "epoch": 1.6968362034441329, "percentage": 33.92, "elapsed_time": "8:29:03", "remaining_time": "16:31:42"}
{"current_steps": 1065, "total_steps": 3125, "loss": 0.3065, "lr": 3.336378244247539e-05, "epoch": 1.7048458149779737, "percentage": 34.08, "elapsed_time": "8:29:45", "remaining_time": "16:26:00"}
{"current_steps": 1070, "total_steps": 3125, "loss": 0.3, "lr": 3.3280454953248326e-05, "epoch": 1.712855426511814, "percentage": 34.24, "elapsed_time": "8:30:24", "remaining_time": "16:20:15"}
{"current_steps": 1075, "total_steps": 3125, "loss": 0.2957, "lr": 3.3196713062877765e-05, "epoch": 1.720865038045655, "percentage": 34.4, "elapsed_time": "8:31:12", "remaining_time": "16:14:51"}
{"current_steps": 1080, "total_steps": 3125, "loss": 0.2941, "lr": 3.311255938443196e-05, "epoch": 1.7288746495794953, "percentage": 34.56, "elapsed_time": "8:32:03", "remaining_time": "16:09:35"}
{"current_steps": 1085, "total_steps": 3125, "loss": 0.3067, "lr": 3.3027996543828524e-05, "epoch": 1.7368842611133362, "percentage": 34.72, "elapsed_time": "8:32:55", "remaining_time": "16:04:22"}
{"current_steps": 1090, "total_steps": 3125, "loss": 0.2908, "lr": 3.2943027179752494e-05, "epoch": 1.7448938726471765, "percentage": 34.88, "elapsed_time": "8:33:40", "remaining_time": "15:59:00"}
{"current_steps": 1095, "total_steps": 3125, "loss": 0.2581, "lr": 3.285765394357401e-05, "epoch": 1.7529034841810174, "percentage": 35.04, "elapsed_time": "8:35:25", "remaining_time": "15:55:32"}
{"current_steps": 1100, "total_steps": 3125, "loss": 0.1996, "lr": 3.277187949926556e-05, "epoch": 1.7609130957148578, "percentage": 35.2, "elapsed_time": "8:38:56", "remaining_time": "15:55:19"}
{"current_steps": 1105, "total_steps": 3125, "loss": 0.238, "lr": 3.268570652331888e-05, "epoch": 1.7689227072486986, "percentage": 35.36, "elapsed_time": "8:43:02", "remaining_time": "15:56:08"}
{"current_steps": 1110, "total_steps": 3125, "loss": 0.2424, "lr": 3.2599137704661405e-05, "epoch": 1.776932318782539, "percentage": 35.52, "elapsed_time": "8:46:46", "remaining_time": "15:56:15"}
{"current_steps": 1115, "total_steps": 3125, "loss": 0.246, "lr": 3.251217574457239e-05, "epoch": 1.7849419303163798, "percentage": 35.68, "elapsed_time": "8:51:02", "remaining_time": "15:57:17"}
{"current_steps": 1120, "total_steps": 3125, "loss": 0.2217, "lr": 3.242482335659861e-05, "epoch": 1.7929515418502202, "percentage": 35.84, "elapsed_time": "8:54:36", "remaining_time": "15:57:03"}
{"current_steps": 1125, "total_steps": 3125, "loss": 0.2677, "lr": 3.2337083266469687e-05, "epoch": 1.8009611533840608, "percentage": 36.0, "elapsed_time": "8:58:42", "remaining_time": "15:57:42"}
{"current_steps": 1130, "total_steps": 3125, "loss": 0.2201, "lr": 3.224895821201304e-05, "epoch": 1.8089707649179014, "percentage": 36.16, "elapsed_time": "9:01:34", "remaining_time": "15:56:08"}
{"current_steps": 1135, "total_steps": 3125, "loss": 0.2863, "lr": 3.2160450943068446e-05, "epoch": 1.816980376451742, "percentage": 36.32, "elapsed_time": "9:05:32", "remaining_time": "15:56:29"}
{"current_steps": 1140, "total_steps": 3125, "loss": 0.2559, "lr": 3.207156422140225e-05, "epoch": 1.8249899879855827, "percentage": 36.48, "elapsed_time": "9:09:48", "remaining_time": "15:57:20"}
{"current_steps": 1145, "total_steps": 3125, "loss": 0.2566, "lr": 3.198230082062115e-05, "epoch": 1.8329995995194233, "percentage": 36.64, "elapsed_time": "9:13:54", "remaining_time": "15:57:51"}
{"current_steps": 1150, "total_steps": 3125, "loss": 0.2404, "lr": 3.189266352608574e-05, "epoch": 1.8410092110532639, "percentage": 36.8, "elapsed_time": "9:17:17", "remaining_time": "15:57:04"}
{"current_steps": 1155, "total_steps": 3125, "loss": 0.2564, "lr": 3.180265513482345e-05, "epoch": 1.8490188225871045, "percentage": 36.96, "elapsed_time": "9:21:21", "remaining_time": "15:57:28"}
{"current_steps": 1160, "total_steps": 3125, "loss": 0.2279, "lr": 3.171227845544143e-05, "epoch": 1.857028434120945, "percentage": 37.12, "elapsed_time": "9:25:03", "remaining_time": "15:57:10"}
{"current_steps": 1165, "total_steps": 3125, "loss": 0.2238, "lr": 3.162153630803877e-05, "epoch": 1.8650380456547857, "percentage": 37.28, "elapsed_time": "9:29:25", "remaining_time": "15:58:00"}
{"current_steps": 1170, "total_steps": 3125, "loss": 0.1995, "lr": 3.153043152411861e-05, "epoch": 1.8730476571886263, "percentage": 37.44, "elapsed_time": "9:33:19", "remaining_time": "15:58:00"}
{"current_steps": 1175, "total_steps": 3125, "loss": 0.2228, "lr": 3.14389669464997e-05, "epoch": 1.881057268722467, "percentage": 37.6, "elapsed_time": "9:37:24", "remaining_time": "15:58:14"}
{"current_steps": 1180, "total_steps": 3125, "loss": 0.2518, "lr": 3.134714542922777e-05, "epoch": 1.8890668802563075, "percentage": 37.76, "elapsed_time": "9:42:15", "remaining_time": "15:59:44"}
{"current_steps": 1185, "total_steps": 3125, "loss": 0.277, "lr": 3.1254969837486425e-05, "epoch": 1.8970764917901481, "percentage": 37.92, "elapsed_time": "9:46:19", "remaining_time": "15:59:53"}
{"current_steps": 1190, "total_steps": 3125, "loss": 0.2366, "lr": 3.116244304750774e-05, "epoch": 1.9050861033239888, "percentage": 38.08, "elapsed_time": "9:50:20", "remaining_time": "15:59:55"}
{"current_steps": 1195, "total_steps": 3125, "loss": 0.2472, "lr": 3.106956794648254e-05, "epoch": 1.9130957148578294, "percentage": 38.24, "elapsed_time": "9:53:52", "remaining_time": "15:59:09"}
{"current_steps": 1200, "total_steps": 3125, "loss": 0.2115, "lr": 3.097634743247026e-05, "epoch": 1.92110532639167, "percentage": 38.4, "elapsed_time": "9:57:36", "remaining_time": "15:58:39"}
{"current_steps": 1205, "total_steps": 3125, "loss": 0.3195, "lr": 3.08827844143086e-05, "epoch": 1.9291149379255106, "percentage": 38.56, "elapsed_time": "10:02:01", "remaining_time": "15:59:14"}
{"current_steps": 1210, "total_steps": 3125, "loss": 0.2811, "lr": 3.078888181152264e-05, "epoch": 1.9371245494593512, "percentage": 38.72, "elapsed_time": "10:05:55", "remaining_time": "15:58:57"}
{"current_steps": 1215, "total_steps": 3125, "loss": 0.2502, "lr": 3.0694642554233855e-05, "epoch": 1.9451341609931918, "percentage": 38.88, "elapsed_time": "10:10:33", "remaining_time": "15:59:48"}
{"current_steps": 1220, "total_steps": 3125, "loss": 0.2042, "lr": 3.0600069583068594e-05, "epoch": 1.9531437725270324, "percentage": 39.04, "elapsed_time": "10:15:15", "remaining_time": "16:00:42"}
{"current_steps": 1225, "total_steps": 3125, "loss": 0.1792, "lr": 3.0505165849066394e-05, "epoch": 1.961153384060873, "percentage": 39.2, "elapsed_time": "10:18:47", "remaining_time": "15:59:46"}
{"current_steps": 1230, "total_steps": 3125, "loss": 0.1679, "lr": 3.040993431358782e-05, "epoch": 1.9691629955947136, "percentage": 39.36, "elapsed_time": "10:22:28", "remaining_time": "15:59:00"}
{"current_steps": 1235, "total_steps": 3125, "loss": 0.2538, "lr": 3.031437794822215e-05, "epoch": 1.9771726071285542, "percentage": 39.52, "elapsed_time": "10:27:13", "remaining_time": "15:59:52"}
{"current_steps": 1240, "total_steps": 3125, "loss": 0.2317, "lr": 3.021849973469455e-05, "epoch": 1.9851822186623949, "percentage": 39.68, "elapsed_time": "10:31:40", "remaining_time": "16:00:14"}
{"current_steps": 1245, "total_steps": 3125, "loss": 0.2017, "lr": 3.012230266477313e-05, "epoch": 1.9931918301962355, "percentage": 39.84, "elapsed_time": "10:35:38", "remaining_time": "15:59:51"}
{"current_steps": 1250, "total_steps": 3125, "loss": 0.2049, "lr": 3.0025789740175502e-05, "epoch": 2.0, "percentage": 40.0, "elapsed_time": "10:39:12", "remaining_time": "15:58:49"}
{"current_steps": 1255, "total_steps": 3125, "loss": 0.3146, "lr": 2.9928963972475186e-05, "epoch": 2.0080096115338404, "percentage": 40.16, "elapsed_time": "10:40:48", "remaining_time": "15:54:49"}
{"current_steps": 1260, "total_steps": 3125, "loss": 0.2874, "lr": 2.9831828383007585e-05, "epoch": 2.016019223067681, "percentage": 40.32, "elapsed_time": "10:42:59", "remaining_time": "15:51:44"}
{"current_steps": 1265, "total_steps": 3125, "loss": 0.2772, "lr": 2.9734386002775754e-05, "epoch": 2.0240288346015216, "percentage": 40.48, "elapsed_time": "10:45:02", "remaining_time": "15:48:26"}
{"current_steps": 1270, "total_steps": 3125, "loss": 0.2701, "lr": 2.963663987235577e-05, "epoch": 2.0320384461353624, "percentage": 40.64, "elapsed_time": "10:47:07", "remaining_time": "15:45:12"}
{"current_steps": 1275, "total_steps": 3125, "loss": 0.2763, "lr": 2.95385930418019e-05, "epoch": 2.040048057669203, "percentage": 40.8, "elapsed_time": "10:49:02", "remaining_time": "15:41:44"}
{"current_steps": 1280, "total_steps": 3125, "loss": 0.2632, "lr": 2.9440248570551406e-05, "epoch": 2.0480576692030437, "percentage": 40.96, "elapsed_time": "10:51:05", "remaining_time": "15:38:29"}
{"current_steps": 1285, "total_steps": 3125, "loss": 0.2468, "lr": 2.934160952732907e-05, "epoch": 2.056067280736884, "percentage": 41.12, "elapsed_time": "10:52:57", "remaining_time": "15:34:58"}
{"current_steps": 1290, "total_steps": 3125, "loss": 0.2634, "lr": 2.9242678990051462e-05, "epoch": 2.064076892270725, "percentage": 41.28, "elapsed_time": "10:54:28", "remaining_time": "15:30:59"}
{"current_steps": 1295, "total_steps": 3125, "loss": 0.2604, "lr": 2.9143460045730886e-05, "epoch": 2.0720865038045653, "percentage": 41.44, "elapsed_time": "10:56:23", "remaining_time": "15:27:33"}
{"current_steps": 1300, "total_steps": 3125, "loss": 0.253, "lr": 2.9043955790379035e-05, "epoch": 2.080096115338406, "percentage": 41.6, "elapsed_time": "10:58:52", "remaining_time": "15:24:57"}
{"current_steps": 1305, "total_steps": 3125, "loss": 0.3053, "lr": 2.8944169328910427e-05, "epoch": 2.0881057268722465, "percentage": 41.76, "elapsed_time": "11:01:09", "remaining_time": "15:22:04"}
{"current_steps": 1310, "total_steps": 3125, "loss": 0.2623, "lr": 2.884410377504547e-05, "epoch": 2.0961153384060873, "percentage": 41.92, "elapsed_time": "11:02:55", "remaining_time": "15:18:29"}
{"current_steps": 1315, "total_steps": 3125, "loss": 0.2343, "lr": 2.8743762251213333e-05, "epoch": 2.1041249499399277, "percentage": 42.08, "elapsed_time": "11:05:02", "remaining_time": "15:15:22"}
{"current_steps": 1320, "total_steps": 3125, "loss": 0.2301, "lr": 2.8643147888454507e-05, "epoch": 2.1121345614737685, "percentage": 42.24, "elapsed_time": "11:07:17", "remaining_time": "15:12:27"}
{"current_steps": 1325, "total_steps": 3125, "loss": 0.238, "lr": 2.854226382632312e-05, "epoch": 2.120144173007609, "percentage": 42.4, "elapsed_time": "11:09:35", "remaining_time": "15:09:37"}
{"current_steps": 1330, "total_steps": 3125, "loss": 0.2239, "lr": 2.844111321278893e-05, "epoch": 2.1281537845414498, "percentage": 42.56, "elapsed_time": "11:12:08", "remaining_time": "15:07:07"}
{"current_steps": 1335, "total_steps": 3125, "loss": 0.2295, "lr": 2.833969920413913e-05, "epoch": 2.13616339607529, "percentage": 42.72, "elapsed_time": "11:14:04", "remaining_time": "15:03:48"}
{"current_steps": 1340, "total_steps": 3125, "loss": 0.2308, "lr": 2.8238024964879857e-05, "epoch": 2.144173007609131, "percentage": 42.88, "elapsed_time": "11:16:30", "remaining_time": "15:01:10"}
{"current_steps": 1345, "total_steps": 3125, "loss": 0.23, "lr": 2.8136093667637438e-05, "epoch": 2.1521826191429714, "percentage": 43.04, "elapsed_time": "11:18:11", "remaining_time": "14:57:31"}
{"current_steps": 1350, "total_steps": 3125, "loss": 0.242, "lr": 2.8033908493059394e-05, "epoch": 2.160192230676812, "percentage": 43.2, "elapsed_time": "11:19:59", "remaining_time": "14:54:03"}
{"current_steps": 1355, "total_steps": 3125, "loss": 0.2286, "lr": 2.793147262971519e-05, "epoch": 2.1682018422106526, "percentage": 43.36, "elapsed_time": "11:22:27", "remaining_time": "14:51:28"}
{"current_steps": 1360, "total_steps": 3125, "loss": 0.2176, "lr": 2.7828789273996748e-05, "epoch": 2.1762114537444934, "percentage": 43.52, "elapsed_time": "11:24:31", "remaining_time": "14:48:21"}
{"current_steps": 1365, "total_steps": 3125, "loss": 0.2098, "lr": 2.7725861630018703e-05, "epoch": 2.184221065278334, "percentage": 43.68, "elapsed_time": "11:26:24", "remaining_time": "14:45:01"}
{"current_steps": 1370, "total_steps": 3125, "loss": 0.2104, "lr": 2.7622692909518423e-05, "epoch": 2.1922306768121747, "percentage": 43.84, "elapsed_time": "11:28:17", "remaining_time": "14:41:43"}
{"current_steps": 1375, "total_steps": 3125, "loss": 0.2155, "lr": 2.7519286331755766e-05, "epoch": 2.200240288346015, "percentage": 44.0, "elapsed_time": "11:30:53", "remaining_time": "14:39:18"}
{"current_steps": 1380, "total_steps": 3125, "loss": 0.207, "lr": 2.7415645123412672e-05, "epoch": 2.208249899879856, "percentage": 44.16, "elapsed_time": "11:33:47", "remaining_time": "14:37:17"}
{"current_steps": 1385, "total_steps": 3125, "loss": 0.1921, "lr": 2.731177251849246e-05, "epoch": 2.2162595114136963, "percentage": 44.32, "elapsed_time": "11:36:54", "remaining_time": "14:35:32"}
{"current_steps": 1390, "total_steps": 3125, "loss": 0.1998, "lr": 2.7207671758218884e-05, "epoch": 2.224269122947537, "percentage": 44.48, "elapsed_time": "11:39:58", "remaining_time": "14:33:42"}
{"current_steps": 1395, "total_steps": 3125, "loss": 0.1904, "lr": 2.710334609093504e-05, "epoch": 2.2322787344813775, "percentage": 44.64, "elapsed_time": "11:42:05", "remaining_time": "14:30:42"}
{"current_steps": 1400, "total_steps": 3125, "loss": 0.1966, "lr": 2.699879877200198e-05, "epoch": 2.2402883460152183, "percentage": 44.8, "elapsed_time": "11:45:12", "remaining_time": "14:28:55"}
{"current_steps": 1405, "total_steps": 3125, "loss": 0.1886, "lr": 2.6894033063697143e-05, "epoch": 2.2482979575490587, "percentage": 44.96, "elapsed_time": "11:48:11", "remaining_time": "14:26:58"}
{"current_steps": 1410, "total_steps": 3125, "loss": 0.1275, "lr": 2.6789052235112554e-05, "epoch": 2.2563075690828995, "percentage": 45.12, "elapsed_time": "11:50:54", "remaining_time": "14:24:41"}
{"current_steps": 1415, "total_steps": 3125, "loss": 0.1217, "lr": 2.66838595620528e-05, "epoch": 2.26431718061674, "percentage": 45.28, "elapsed_time": "11:54:02", "remaining_time": "14:22:53"}
{"current_steps": 1420, "total_steps": 3125, "loss": 0.1088, "lr": 2.6578458326932842e-05, "epoch": 2.2723267921505808, "percentage": 45.44, "elapsed_time": "11:57:15", "remaining_time": "14:21:13"}
{"current_steps": 1425, "total_steps": 3125, "loss": 0.1045, "lr": 2.6472851818675583e-05, "epoch": 2.280336403684421, "percentage": 45.6, "elapsed_time": "12:00:24", "remaining_time": "14:19:25"}
{"current_steps": 1430, "total_steps": 3125, "loss": 0.1065, "lr": 2.6367043332609223e-05, "epoch": 2.288346015218262, "percentage": 45.76, "elapsed_time": "12:03:22", "remaining_time": "14:17:26"}
{"current_steps": 1435, "total_steps": 3125, "loss": 0.1055, "lr": 2.6261036170364448e-05, "epoch": 2.2963556267521024, "percentage": 45.92, "elapsed_time": "12:06:45", "remaining_time": "14:15:54"}
{"current_steps": 1440, "total_steps": 3125, "loss": 0.112, "lr": 2.6154833639771415e-05, "epoch": 2.304365238285943, "percentage": 46.08, "elapsed_time": "12:10:00", "remaining_time": "14:14:12"}
{"current_steps": 1445, "total_steps": 3125, "loss": 0.0982, "lr": 2.6048439054756492e-05, "epoch": 2.3123748498197836, "percentage": 46.24, "elapsed_time": "12:13:05", "remaining_time": "14:12:19"}
{"current_steps": 1450, "total_steps": 3125, "loss": 0.098, "lr": 2.594185573523892e-05, "epoch": 2.3203844613536244, "percentage": 46.4, "elapsed_time": "12:16:13", "remaining_time": "14:10:28"}
{"current_steps": 1455, "total_steps": 3125, "loss": 0.1112, "lr": 2.583508700702716e-05, "epoch": 2.328394072887465, "percentage": 46.56, "elapsed_time": "12:19:00", "remaining_time": "14:08:12"}
{"current_steps": 1460, "total_steps": 3125, "loss": 0.1026, "lr": 2.572813620171513e-05, "epoch": 2.3364036844213056, "percentage": 46.72, "elapsed_time": "12:21:59", "remaining_time": "14:06:10"}
{"current_steps": 1465, "total_steps": 3125, "loss": 0.0946, "lr": 2.5621006656578267e-05, "epoch": 2.344413295955146, "percentage": 46.88, "elapsed_time": "12:25:10", "remaining_time": "14:04:22"}
{"current_steps": 1470, "total_steps": 3125, "loss": 0.1012, "lr": 2.5513701714469373e-05, "epoch": 2.352422907488987, "percentage": 47.04, "elapsed_time": "12:27:59", "remaining_time": "14:02:07"}
{"current_steps": 1475, "total_steps": 3125, "loss": 0.0945, "lr": 2.540622472371429e-05, "epoch": 2.3604325190228272, "percentage": 47.2, "elapsed_time": "12:30:54", "remaining_time": "13:59:59"}
{"current_steps": 1480, "total_steps": 3125, "loss": 0.0966, "lr": 2.5298579038007478e-05, "epoch": 2.368442130556668, "percentage": 47.36, "elapsed_time": "12:33:55", "remaining_time": "13:57:58"}
{"current_steps": 1485, "total_steps": 3125, "loss": 0.1086, "lr": 2.519076801630727e-05, "epoch": 2.3764517420905085, "percentage": 47.52, "elapsed_time": "12:36:48", "remaining_time": "13:55:48"}
{"current_steps": 1490, "total_steps": 3125, "loss": 0.09, "lr": 2.508279502273117e-05, "epoch": 2.3844613536243493, "percentage": 47.68, "elapsed_time": "12:39:45", "remaining_time": "13:53:41"}
{"current_steps": 1495, "total_steps": 3125, "loss": 0.0901, "lr": 2.4974663426450798e-05, "epoch": 2.3924709651581897, "percentage": 47.84, "elapsed_time": "12:42:49", "remaining_time": "13:51:42"}
{"current_steps": 1500, "total_steps": 3125, "loss": 0.0908, "lr": 2.4866376601586798e-05, "epoch": 2.4004805766920305, "percentage": 48.0, "elapsed_time": "12:45:56", "remaining_time": "13:49:46"}
{"current_steps": 1505, "total_steps": 3125, "loss": 0.0874, "lr": 2.475793792710352e-05, "epoch": 2.408490188225871, "percentage": 48.16, "elapsed_time": "12:50:33", "remaining_time": "13:49:26"}
{"current_steps": 1510, "total_steps": 3125, "loss": 0.1039, "lr": 2.4649350786703637e-05, "epoch": 2.4164997997597117, "percentage": 48.32, "elapsed_time": "12:53:26", "remaining_time": "13:47:13"}
{"current_steps": 1515, "total_steps": 3125, "loss": 0.1018, "lr": 2.45406185687225e-05, "epoch": 2.424509411293552, "percentage": 48.48, "elapsed_time": "12:56:24", "remaining_time": "13:45:05"}
{"current_steps": 1520, "total_steps": 3125, "loss": 0.0866, "lr": 2.443174466602246e-05, "epoch": 2.432519022827393, "percentage": 48.64, "elapsed_time": "12:59:22", "remaining_time": "13:42:57"}
{"current_steps": 1525, "total_steps": 3125, "loss": 0.0871, "lr": 2.4322732475886953e-05, "epoch": 2.4405286343612334, "percentage": 48.8, "elapsed_time": "13:02:25", "remaining_time": "13:40:54"}
{"current_steps": 1530, "total_steps": 3125, "loss": 0.0961, "lr": 2.4213585399914528e-05, "epoch": 2.448538245895074, "percentage": 48.96, "elapsed_time": "13:05:09", "remaining_time": "13:38:31"}
{"current_steps": 1535, "total_steps": 3125, "loss": 0.0816, "lr": 2.4104306843912687e-05, "epoch": 2.4565478574289146, "percentage": 49.12, "elapsed_time": "13:08:19", "remaining_time": "13:36:34"}
{"current_steps": 1540, "total_steps": 3125, "loss": 0.0813, "lr": 2.3994900217791615e-05, "epoch": 2.4645574689627554, "percentage": 49.28, "elapsed_time": "13:11:26", "remaining_time": "13:34:34"}
{"current_steps": 1545, "total_steps": 3125, "loss": 0.0816, "lr": 2.3885368935457762e-05, "epoch": 2.472567080496596, "percentage": 49.44, "elapsed_time": "13:14:18", "remaining_time": "13:32:18"}
{"current_steps": 1550, "total_steps": 3125, "loss": 0.0911, "lr": 2.3775716414707355e-05, "epoch": 2.4805766920304366, "percentage": 49.6, "elapsed_time": "13:17:26", "remaining_time": "13:30:18"}
{"current_steps": 1555, "total_steps": 3125, "loss": 0.0784, "lr": 2.36659460771197e-05, "epoch": 2.488586303564277, "percentage": 49.76, "elapsed_time": "13:20:39", "remaining_time": "13:28:22"}
{"current_steps": 1560, "total_steps": 3125, "loss": 0.0834, "lr": 2.3556061347950455e-05, "epoch": 2.496595915098118, "percentage": 49.92, "elapsed_time": "13:23:06", "remaining_time": "13:25:40"}
{"current_steps": 1565, "total_steps": 3125, "loss": 0.136, "lr": 2.3446065656024734e-05, "epoch": 2.5046055266319582, "percentage": 50.08, "elapsed_time": "13:24:34", "remaining_time": "13:22:00"}
{"current_steps": 1570, "total_steps": 3125, "loss": 0.1563, "lr": 2.33359624336301e-05, "epoch": 2.512615138165799, "percentage": 50.24, "elapsed_time": "13:25:40", "remaining_time": "13:17:58"}
{"current_steps": 1575, "total_steps": 3125, "loss": 0.1701, "lr": 2.3225755116409497e-05, "epoch": 2.5206247496996395, "percentage": 50.4, "elapsed_time": "13:26:49", "remaining_time": "13:14:01"}
{"current_steps": 1580, "total_steps": 3125, "loss": 0.1572, "lr": 2.311544714325403e-05, "epoch": 2.5286343612334803, "percentage": 50.56, "elapsed_time": "13:27:53", "remaining_time": "13:10:00"}
{"current_steps": 1585, "total_steps": 3125, "loss": 0.1777, "lr": 2.300504195619563e-05, "epoch": 2.5366439727673207, "percentage": 50.72, "elapsed_time": "13:29:14", "remaining_time": "13:06:16"}
{"current_steps": 1590, "total_steps": 3125, "loss": 0.156, "lr": 2.2894543000299697e-05, "epoch": 2.5446535843011615, "percentage": 50.88, "elapsed_time": "13:30:28", "remaining_time": "13:02:26"}
{"current_steps": 1595, "total_steps": 3125, "loss": 0.158, "lr": 2.2783953723557572e-05, "epoch": 2.552663195835002, "percentage": 51.04, "elapsed_time": "13:31:33", "remaining_time": "12:58:29"}
{"current_steps": 1600, "total_steps": 3125, "loss": 0.1877, "lr": 2.2673277576778946e-05, "epoch": 2.5606728073688427, "percentage": 51.2, "elapsed_time": "13:32:37", "remaining_time": "12:54:32"}
{"current_steps": 1605, "total_steps": 3125, "loss": 0.2165, "lr": 2.2562518013484208e-05, "epoch": 2.568682418902683, "percentage": 51.36, "elapsed_time": "13:33:49", "remaining_time": "12:50:43"}
{"current_steps": 1610, "total_steps": 3125, "loss": 0.1526, "lr": 2.245167848979664e-05, "epoch": 2.576692030436524, "percentage": 51.52, "elapsed_time": "13:35:00", "remaining_time": "12:46:54"}
{"current_steps": 1615, "total_steps": 3125, "loss": 0.1496, "lr": 2.23407624643346e-05, "epoch": 2.5847016419703643, "percentage": 51.68, "elapsed_time": "13:36:20", "remaining_time": "12:43:15"}
{"current_steps": 1620, "total_steps": 3125, "loss": 0.1706, "lr": 2.2229773398103606e-05, "epoch": 2.592711253504205, "percentage": 51.84, "elapsed_time": "13:37:35", "remaining_time": "12:39:33"}
{"current_steps": 1625, "total_steps": 3125, "loss": 0.1582, "lr": 2.2118714754388323e-05, "epoch": 2.6007208650380456, "percentage": 52.0, "elapsed_time": "13:38:44", "remaining_time": "12:35:45"}
{"current_steps": 1630, "total_steps": 3125, "loss": 0.153, "lr": 2.200758999864449e-05, "epoch": 2.6087304765718864, "percentage": 52.16, "elapsed_time": "13:39:41", "remaining_time": "12:31:48"}
{"current_steps": 1635, "total_steps": 3125, "loss": 0.1672, "lr": 2.1896402598390818e-05, "epoch": 2.616740088105727, "percentage": 52.32, "elapsed_time": "13:40:35", "remaining_time": "12:27:48"}
{"current_steps": 1640, "total_steps": 3125, "loss": 0.1467, "lr": 2.178515602310074e-05, "epoch": 2.6247496996395676, "percentage": 52.48, "elapsed_time": "13:41:18", "remaining_time": "12:23:40"}
{"current_steps": 1645, "total_steps": 3125, "loss": 0.1743, "lr": 2.1673853744094193e-05, "epoch": 2.632759311173408, "percentage": 52.64, "elapsed_time": "13:42:07", "remaining_time": "12:19:39"}
{"current_steps": 1650, "total_steps": 3125, "loss": 0.138, "lr": 2.1562499234429283e-05, "epoch": 2.640768922707249, "percentage": 52.8, "elapsed_time": "13:42:52", "remaining_time": "12:15:35"}
{"current_steps": 1655, "total_steps": 3125, "loss": 0.1423, "lr": 2.1451095968793908e-05, "epoch": 2.648778534241089, "percentage": 52.96, "elapsed_time": "13:43:39", "remaining_time": "12:11:35"}
{"current_steps": 1660, "total_steps": 3125, "loss": 0.1479, "lr": 2.1339647423397337e-05, "epoch": 2.65678814577493, "percentage": 53.12, "elapsed_time": "13:44:35", "remaining_time": "12:07:43"}
{"current_steps": 1665, "total_steps": 3125, "loss": 0.1541, "lr": 2.122815707586176e-05, "epoch": 2.6647977573087704, "percentage": 53.28, "elapsed_time": "13:45:24", "remaining_time": "12:03:47"}
{"current_steps": 1670, "total_steps": 3125, "loss": 0.1384, "lr": 2.111662840511373e-05, "epoch": 2.6728073688426113, "percentage": 53.44, "elapsed_time": "13:46:18", "remaining_time": "11:59:55"}
{"current_steps": 1675, "total_steps": 3125, "loss": 0.1493, "lr": 2.1005064891275638e-05, "epoch": 2.6808169803764517, "percentage": 53.6, "elapsed_time": "13:47:02", "remaining_time": "11:55:57"}
{"current_steps": 1680, "total_steps": 3125, "loss": 0.1291, "lr": 2.0893470015557126e-05, "epoch": 2.6888265919102925, "percentage": 53.76, "elapsed_time": "13:47:53", "remaining_time": "11:52:05"}
{"current_steps": 1685, "total_steps": 3125, "loss": 0.1418, "lr": 2.078184726014643e-05, "epoch": 2.696836203444133, "percentage": 53.92, "elapsed_time": "13:48:43", "remaining_time": "11:48:13"}
{"current_steps": 1690, "total_steps": 3125, "loss": 0.1437, "lr": 2.0670200108101754e-05, "epoch": 2.7048458149779737, "percentage": 54.08, "elapsed_time": "13:49:25", "remaining_time": "11:44:16"}
{"current_steps": 1695, "total_steps": 3125, "loss": 0.1383, "lr": 2.0558532043242557e-05, "epoch": 2.712855426511814, "percentage": 54.24, "elapsed_time": "13:50:04", "remaining_time": "11:40:17"}
{"current_steps": 1700, "total_steps": 3125, "loss": 0.1405, "lr": 2.0446846550040863e-05, "epoch": 2.720865038045655, "percentage": 54.4, "elapsed_time": "13:50:52", "remaining_time": "11:36:28"}
{"current_steps": 1705, "total_steps": 3125, "loss": 0.1437, "lr": 2.033514711351253e-05, "epoch": 2.7288746495794953, "percentage": 54.56, "elapsed_time": "13:51:44", "remaining_time": "11:32:42"}
{"current_steps": 1710, "total_steps": 3125, "loss": 0.15, "lr": 2.022343721910851e-05, "epoch": 2.736884261113336, "percentage": 54.72, "elapsed_time": "13:52:35", "remaining_time": "11:28:57"}
{"current_steps": 1715, "total_steps": 3125, "loss": 0.1389, "lr": 2.0111720352606054e-05, "epoch": 2.7448938726471765, "percentage": 54.88, "elapsed_time": "13:53:21", "remaining_time": "11:25:08"}
{"current_steps": 1720, "total_steps": 3125, "loss": 0.1359, "lr": 2e-05, "epoch": 2.7529034841810174, "percentage": 55.04, "elapsed_time": "13:55:06", "remaining_time": "11:22:10"}
{"current_steps": 1725, "total_steps": 3125, "loss": 0.1306, "lr": 1.988827964739395e-05, "epoch": 2.7609130957148578, "percentage": 55.2, "elapsed_time": "13:58:36", "remaining_time": "11:20:36"}
{"current_steps": 1730, "total_steps": 3125, "loss": 0.1681, "lr": 1.9776562780891494e-05, "epoch": 2.7689227072486986, "percentage": 55.36, "elapsed_time": "14:02:42", "remaining_time": "11:19:31"}
{"current_steps": 1735, "total_steps": 3125, "loss": 0.1681, "lr": 1.966485288648747e-05, "epoch": 2.776932318782539, "percentage": 55.52, "elapsed_time": "14:06:26", "remaining_time": "11:18:07"}
{"current_steps": 1740, "total_steps": 3125, "loss": 0.1707, "lr": 1.9553153449959144e-05, "epoch": 2.78494193031638, "percentage": 55.68, "elapsed_time": "14:10:42", "remaining_time": "11:17:08"}
{"current_steps": 1745, "total_steps": 3125, "loss": 0.1409, "lr": 1.9441467956757453e-05, "epoch": 2.79295154185022, "percentage": 55.84, "elapsed_time": "14:14:17", "remaining_time": "11:15:35"}
{"current_steps": 1750, "total_steps": 3125, "loss": 0.1857, "lr": 1.9329799891898256e-05, "epoch": 2.800961153384061, "percentage": 56.0, "elapsed_time": "14:18:23", "remaining_time": "11:14:26"}
{"current_steps": 1755, "total_steps": 3125, "loss": 0.1434, "lr": 1.9218152739853576e-05, "epoch": 2.8089707649179014, "percentage": 56.16, "elapsed_time": "14:21:15", "remaining_time": "11:12:19"}
{"current_steps": 1760, "total_steps": 3125, "loss": 0.2049, "lr": 1.9106529984442884e-05, "epoch": 2.816980376451742, "percentage": 56.32, "elapsed_time": "14:25:12", "remaining_time": "11:11:01"}
{"current_steps": 1765, "total_steps": 3125, "loss": 0.1728, "lr": 1.8994935108724366e-05, "epoch": 2.8249899879855827, "percentage": 56.48, "elapsed_time": "14:29:29", "remaining_time": "11:09:58"}
{"current_steps": 1770, "total_steps": 3125, "loss": 0.1722, "lr": 1.8883371594886276e-05, "epoch": 2.8329995995194235, "percentage": 56.64, "elapsed_time": "14:33:35", "remaining_time": "11:08:45"}
{"current_steps": 1775, "total_steps": 3125, "loss": 0.1614, "lr": 1.877184292413824e-05, "epoch": 2.841009211053264, "percentage": 56.8, "elapsed_time": "14:36:57", "remaining_time": "11:06:58"}
{"current_steps": 1780, "total_steps": 3125, "loss": 0.185, "lr": 1.8660352576602663e-05, "epoch": 2.8490188225871043, "percentage": 56.96, "elapsed_time": "14:41:01", "remaining_time": "11:05:43"}
{"current_steps": 1785, "total_steps": 3125, "loss": 0.1518, "lr": 1.8548904031206102e-05, "epoch": 2.857028434120945, "percentage": 57.12, "elapsed_time": "14:44:43", "remaining_time": "11:04:09"}
{"current_steps": 1790, "total_steps": 3125, "loss": 0.1509, "lr": 1.843750076557072e-05, "epoch": 2.865038045654786, "percentage": 57.28, "elapsed_time": "14:49:06", "remaining_time": "11:03:06"}
{"current_steps": 1795, "total_steps": 3125, "loss": 0.1229, "lr": 1.832614625590581e-05, "epoch": 2.8730476571886263, "percentage": 57.44, "elapsed_time": "14:53:00", "remaining_time": "11:01:40"}
{"current_steps": 1800, "total_steps": 3125, "loss": 0.1438, "lr": 1.8214843976899264e-05, "epoch": 2.8810572687224667, "percentage": 57.6, "elapsed_time": "14:57:04", "remaining_time": "11:00:20"}
{"current_steps": 1805, "total_steps": 3125, "loss": 0.1784, "lr": 1.810359740160919e-05, "epoch": 2.8890668802563075, "percentage": 57.76, "elapsed_time": "15:01:56", "remaining_time": "10:59:35"}
{"current_steps": 1810, "total_steps": 3125, "loss": 0.1995, "lr": 1.7992410001355515e-05, "epoch": 2.8970764917901484, "percentage": 57.92, "elapsed_time": "15:06:00", "remaining_time": "10:58:13"}
{"current_steps": 1815, "total_steps": 3125, "loss": 0.1641, "lr": 1.788128524561168e-05, "epoch": 2.9050861033239888, "percentage": 58.08, "elapsed_time": "15:10:01", "remaining_time": "10:56:49"}
{"current_steps": 1820, "total_steps": 3125, "loss": 0.1619, "lr": 1.7770226601896397e-05, "epoch": 2.913095714857829, "percentage": 58.24, "elapsed_time": "15:13:34", "remaining_time": "10:55:03"}
{"current_steps": 1825, "total_steps": 3125, "loss": 0.1295, "lr": 1.7659237535665404e-05, "epoch": 2.92110532639167, "percentage": 58.4, "elapsed_time": "15:17:17", "remaining_time": "10:53:25"}
{"current_steps": 1830, "total_steps": 3125, "loss": 0.2358, "lr": 1.754832151020337e-05, "epoch": 2.929114937925511, "percentage": 58.56, "elapsed_time": "15:21:42", "remaining_time": "10:52:15"}
{"current_steps": 1835, "total_steps": 3125, "loss": 0.2001, "lr": 1.74374819865158e-05, "epoch": 2.937124549459351, "percentage": 58.72, "elapsed_time": "15:25:37", "remaining_time": "10:50:42"}
{"current_steps": 1840, "total_steps": 3125, "loss": 0.1729, "lr": 1.7326722423221057e-05, "epoch": 2.9451341609931916, "percentage": 58.88, "elapsed_time": "15:30:14", "remaining_time": "10:49:39"}
{"current_steps": 1845, "total_steps": 3125, "loss": 0.1291, "lr": 1.7216046276442438e-05, "epoch": 2.9531437725270324, "percentage": 59.04, "elapsed_time": "15:34:57", "remaining_time": "10:48:38"}
{"current_steps": 1850, "total_steps": 3125, "loss": 0.1061, "lr": 1.7105456999700306e-05, "epoch": 2.9611533840608733, "percentage": 59.2, "elapsed_time": "15:38:29", "remaining_time": "10:46:48"}
{"current_steps": 1855, "total_steps": 3125, "loss": 0.1002, "lr": 1.6994958043804374e-05, "epoch": 2.9691629955947136, "percentage": 59.36, "elapsed_time": "15:42:09", "remaining_time": "10:45:02"}
{"current_steps": 1860, "total_steps": 3125, "loss": 0.1746, "lr": 1.6884552856745972e-05, "epoch": 2.977172607128554, "percentage": 59.52, "elapsed_time": "15:46:54", "remaining_time": "10:44:00"}
{"current_steps": 1865, "total_steps": 3125, "loss": 0.1567, "lr": 1.6774244883590503e-05, "epoch": 2.985182218662395, "percentage": 59.68, "elapsed_time": "15:51:22", "remaining_time": "10:42:44"}
{"current_steps": 1870, "total_steps": 3125, "loss": 0.1289, "lr": 1.6664037566369905e-05, "epoch": 2.9931918301962357, "percentage": 59.84, "elapsed_time": "15:55:20", "remaining_time": "10:41:09"}
{"current_steps": 1875, "total_steps": 3125, "loss": 0.1319, "lr": 1.6553934343975273e-05, "epoch": 3.0, "percentage": 60.0, "elapsed_time": "15:58:54", "remaining_time": "10:39:16"}
{"current_steps": 1880, "total_steps": 3125, "loss": 0.1797, "lr": 1.644393865204955e-05, "epoch": 3.0080096115338404, "percentage": 60.16, "elapsed_time": "16:00:30", "remaining_time": "10:36:04"}
{"current_steps": 1885, "total_steps": 3125, "loss": 0.1628, "lr": 1.6334053922880304e-05, "epoch": 3.016019223067681, "percentage": 60.32, "elapsed_time": "16:02:41", "remaining_time": "10:33:17"}
{"current_steps": 1890, "total_steps": 3125, "loss": 0.1521, "lr": 1.622428358529265e-05, "epoch": 3.0240288346015216, "percentage": 60.48, "elapsed_time": "16:04:44", "remaining_time": "10:30:24"}
{"current_steps": 1895, "total_steps": 3125, "loss": 0.1471, "lr": 1.611463106454224e-05, "epoch": 3.0320384461353624, "percentage": 60.64, "elapsed_time": "16:06:49", "remaining_time": "10:27:32"}
{"current_steps": 1900, "total_steps": 3125, "loss": 0.1433, "lr": 1.6005099782208392e-05, "epoch": 3.040048057669203, "percentage": 60.8, "elapsed_time": "16:08:44", "remaining_time": "10:24:34"}
{"current_steps": 1905, "total_steps": 3125, "loss": 0.1375, "lr": 1.5895693156087317e-05, "epoch": 3.0480576692030437, "percentage": 60.96, "elapsed_time": "16:10:47", "remaining_time": "10:21:43"}
{"current_steps": 1910, "total_steps": 3125, "loss": 0.1241, "lr": 1.578641460008548e-05, "epoch": 3.056067280736884, "percentage": 61.12, "elapsed_time": "16:12:39", "remaining_time": "10:18:43"}
{"current_steps": 1915, "total_steps": 3125, "loss": 0.1343, "lr": 1.5677267524113054e-05, "epoch": 3.064076892270725, "percentage": 61.28, "elapsed_time": "16:14:10", "remaining_time": "10:15:32"}
{"current_steps": 1920, "total_steps": 3125, "loss": 0.1336, "lr": 1.5568255333977547e-05, "epoch": 3.0720865038045653, "percentage": 61.44, "elapsed_time": "16:16:05", "remaining_time": "10:12:35"}
{"current_steps": 1925, "total_steps": 3125, "loss": 0.1343, "lr": 1.5459381431277506e-05, "epoch": 3.080096115338406, "percentage": 61.6, "elapsed_time": "16:18:34", "remaining_time": "10:10:01"}
{"current_steps": 1930, "total_steps": 3125, "loss": 0.214, "lr": 1.5350649213296373e-05, "epoch": 3.0881057268722465, "percentage": 61.76, "elapsed_time": "16:20:51", "remaining_time": "10:07:19"}
{"current_steps": 1935, "total_steps": 3125, "loss": 0.1317, "lr": 1.5242062072896483e-05, "epoch": 3.0961153384060873, "percentage": 61.92, "elapsed_time": "16:22:37", "remaining_time": "10:04:18"}
{"current_steps": 1940, "total_steps": 3125, "loss": 0.1188, "lr": 1.5133623398413209e-05, "epoch": 3.1041249499399277, "percentage": 62.08, "elapsed_time": "16:24:44", "remaining_time": "10:01:30"}
{"current_steps": 1945, "total_steps": 3125, "loss": 0.1142, "lr": 1.50253365735492e-05, "epoch": 3.1121345614737685, "percentage": 62.24, "elapsed_time": "16:26:59", "remaining_time": "9:58:47"}
{"current_steps": 1950, "total_steps": 3125, "loss": 0.1183, "lr": 1.4917204977268833e-05, "epoch": 3.120144173007609, "percentage": 62.4, "elapsed_time": "16:29:16", "remaining_time": "9:56:06"}
{"current_steps": 1955, "total_steps": 3125, "loss": 0.1109, "lr": 1.4809231983692733e-05, "epoch": 3.1281537845414498, "percentage": 62.56, "elapsed_time": "16:31:49", "remaining_time": "9:53:34"}
{"current_steps": 1960, "total_steps": 3125, "loss": 0.1139, "lr": 1.4701420961992533e-05, "epoch": 3.13616339607529, "percentage": 62.72, "elapsed_time": "16:33:46", "remaining_time": "9:50:41"}
{"current_steps": 1965, "total_steps": 3125, "loss": 0.1142, "lr": 1.459377527628571e-05, "epoch": 3.144173007609131, "percentage": 62.88, "elapsed_time": "16:36:12", "remaining_time": "9:48:05"}
{"current_steps": 1970, "total_steps": 3125, "loss": 0.1143, "lr": 1.4486298285530634e-05, "epoch": 3.1521826191429714, "percentage": 63.04, "elapsed_time": "16:37:52", "remaining_time": "9:45:03"}
{"current_steps": 1975, "total_steps": 3125, "loss": 0.1202, "lr": 1.4378993343421736e-05, "epoch": 3.160192230676812, "percentage": 63.2, "elapsed_time": "16:39:41", "remaining_time": "9:42:05"}
{"current_steps": 1980, "total_steps": 3125, "loss": 0.1098, "lr": 1.4271863798284877e-05, "epoch": 3.1682018422106526, "percentage": 63.36, "elapsed_time": "16:42:08", "remaining_time": "9:39:31"}
{"current_steps": 1985, "total_steps": 3125, "loss": 0.1106, "lr": 1.4164912992972846e-05, "epoch": 3.1762114537444934, "percentage": 63.52, "elapsed_time": "16:44:12", "remaining_time": "9:36:43"}
{"current_steps": 1990, "total_steps": 3125, "loss": 0.1032, "lr": 1.4058144264761087e-05, "epoch": 3.184221065278334, "percentage": 63.68, "elapsed_time": "16:46:05", "remaining_time": "9:33:49"}
{"current_steps": 1995, "total_steps": 3125, "loss": 0.1003, "lr": 1.3951560945243517e-05, "epoch": 3.1922306768121747, "percentage": 63.84, "elapsed_time": "16:47:59", "remaining_time": "9:30:56"}
{"current_steps": 2000, "total_steps": 3125, "loss": 0.1219, "lr": 1.3845166360228597e-05, "epoch": 3.200240288346015, "percentage": 64.0, "elapsed_time": "16:50:34", "remaining_time": "9:28:27"}
{"current_steps": 2005, "total_steps": 3125, "loss": 0.1179, "lr": 1.3738963829635559e-05, "epoch": 3.208249899879856, "percentage": 64.16, "elapsed_time": "16:53:28", "remaining_time": "9:26:07"}
{"current_steps": 2010, "total_steps": 3125, "loss": 0.1057, "lr": 1.3632956667390784e-05, "epoch": 3.2162595114136963, "percentage": 64.32, "elapsed_time": "16:56:35", "remaining_time": "9:23:55"}
{"current_steps": 2015, "total_steps": 3125, "loss": 0.1064, "lr": 1.3527148181324425e-05, "epoch": 3.224269122947537, "percentage": 64.48, "elapsed_time": "16:59:39", "remaining_time": "9:21:42"}
{"current_steps": 2020, "total_steps": 3125, "loss": 0.0949, "lr": 1.3421541673067168e-05, "epoch": 3.2322787344813775, "percentage": 64.64, "elapsed_time": "17:01:46", "remaining_time": "9:18:56"}
{"current_steps": 2025, "total_steps": 3125, "loss": 0.1003, "lr": 1.3316140437947207e-05, "epoch": 3.2402883460152183, "percentage": 64.8, "elapsed_time": "17:04:53", "remaining_time": "9:16:44"}
{"current_steps": 2030, "total_steps": 3125, "loss": 0.0951, "lr": 1.321094776488745e-05, "epoch": 3.2482979575490587, "percentage": 64.96, "elapsed_time": "17:07:53", "remaining_time": "9:14:27"}
{"current_steps": 2035, "total_steps": 3125, "loss": 0.0727, "lr": 1.3105966936302856e-05, "epoch": 3.2563075690828995, "percentage": 65.12, "elapsed_time": "17:10:35", "remaining_time": "9:12:00"}
{"current_steps": 2040, "total_steps": 3125, "loss": 0.0679, "lr": 1.3001201227998023e-05, "epoch": 3.26431718061674, "percentage": 65.28, "elapsed_time": "17:13:43", "remaining_time": "9:09:47"}
{"current_steps": 2045, "total_steps": 3125, "loss": 0.0627, "lr": 1.2896653909064964e-05, "epoch": 3.2723267921505808, "percentage": 65.44, "elapsed_time": "17:16:56", "remaining_time": "9:07:37"}
{"current_steps": 2050, "total_steps": 3125, "loss": 0.0584, "lr": 1.2792328241781124e-05, "epoch": 3.280336403684421, "percentage": 65.6, "elapsed_time": "17:20:05", "remaining_time": "9:05:24"}
{"current_steps": 2055, "total_steps": 3125, "loss": 0.0595, "lr": 1.2688227481507546e-05, "epoch": 3.288346015218262, "percentage": 65.76, "elapsed_time": "17:23:03", "remaining_time": "9:03:06"}
{"current_steps": 2060, "total_steps": 3125, "loss": 0.0607, "lr": 1.258435487658733e-05, "epoch": 3.2963556267521024, "percentage": 65.92, "elapsed_time": "17:26:26", "remaining_time": "9:00:59"}
{"current_steps": 2065, "total_steps": 3125, "loss": 0.0618, "lr": 1.2480713668244243e-05, "epoch": 3.304365238285943, "percentage": 66.08, "elapsed_time": "17:29:40", "remaining_time": "8:58:49"}
{"current_steps": 2070, "total_steps": 3125, "loss": 0.0527, "lr": 1.2377307090481586e-05, "epoch": 3.3123748498197836, "percentage": 66.24, "elapsed_time": "17:32:46", "remaining_time": "8:56:33"}
{"current_steps": 2075, "total_steps": 3125, "loss": 0.0516, "lr": 1.2274138369981298e-05, "epoch": 3.3203844613536244, "percentage": 66.4, "elapsed_time": "17:35:54", "remaining_time": "8:54:18"}
{"current_steps": 2080, "total_steps": 3125, "loss": 0.0598, "lr": 1.2171210726003256e-05, "epoch": 3.328394072887465, "percentage": 66.56, "elapsed_time": "17:38:41", "remaining_time": "8:51:53"}
{"current_steps": 2085, "total_steps": 3125, "loss": 0.0544, "lr": 1.2068527370284815e-05, "epoch": 3.3364036844213056, "percentage": 66.72, "elapsed_time": "17:41:39", "remaining_time": "8:49:33"}
{"current_steps": 2090, "total_steps": 3125, "loss": 0.0499, "lr": 1.1966091506940616e-05, "epoch": 3.344413295955146, "percentage": 66.88, "elapsed_time": "17:44:51", "remaining_time": "8:47:19"}
{"current_steps": 2095, "total_steps": 3125, "loss": 0.0538, "lr": 1.1863906332362569e-05, "epoch": 3.352422907488987, "percentage": 67.04, "elapsed_time": "17:47:39", "remaining_time": "8:44:54"}
{"current_steps": 2100, "total_steps": 3125, "loss": 0.0496, "lr": 1.176197503512015e-05, "epoch": 3.3604325190228272, "percentage": 67.2, "elapsed_time": "17:50:34", "remaining_time": "8:42:32"}
{"current_steps": 2105, "total_steps": 3125, "loss": 0.0537, "lr": 1.1660300795860877e-05, "epoch": 3.368442130556668, "percentage": 67.36, "elapsed_time": "17:53:35", "remaining_time": "8:40:13"}
{"current_steps": 2110, "total_steps": 3125, "loss": 0.0572, "lr": 1.1558886787211071e-05, "epoch": 3.3764517420905085, "percentage": 67.52, "elapsed_time": "17:56:28", "remaining_time": "8:37:49"}
{"current_steps": 2115, "total_steps": 3125, "loss": 0.0459, "lr": 1.1457736173676883e-05, "epoch": 3.3844613536243493, "percentage": 67.68, "elapsed_time": "17:59:25", "remaining_time": "8:35:28"}
{"current_steps": 2120, "total_steps": 3125, "loss": 0.0442, "lr": 1.1356852111545493e-05, "epoch": 3.3924709651581897, "percentage": 67.84, "elapsed_time": "18:02:29", "remaining_time": "8:33:09"}
{"current_steps": 2125, "total_steps": 3125, "loss": 0.046, "lr": 1.1256237748786675e-05, "epoch": 3.4004805766920305, "percentage": 68.0, "elapsed_time": "18:05:37", "remaining_time": "8:30:52"}
{"current_steps": 2130, "total_steps": 3125, "loss": 0.0421, "lr": 1.1155896224954543e-05, "epoch": 3.408490188225871, "percentage": 68.16, "elapsed_time": "18:08:39", "remaining_time": "8:28:33"}
{"current_steps": 2135, "total_steps": 3125, "loss": 0.051, "lr": 1.1055830671089578e-05, "epoch": 3.4164997997597117, "percentage": 68.32, "elapsed_time": "18:11:32", "remaining_time": "8:26:08"}
{"current_steps": 2140, "total_steps": 3125, "loss": 0.0509, "lr": 1.0956044209620966e-05, "epoch": 3.424509411293552, "percentage": 68.48, "elapsed_time": "18:14:30", "remaining_time": "8:23:46"}
{"current_steps": 2145, "total_steps": 3125, "loss": 0.0421, "lr": 1.0856539954269121e-05, "epoch": 3.432519022827393, "percentage": 68.64, "elapsed_time": "18:17:28", "remaining_time": "8:21:24"}
{"current_steps": 2150, "total_steps": 3125, "loss": 0.0417, "lr": 1.0757321009948543e-05, "epoch": 3.4405286343612334, "percentage": 68.8, "elapsed_time": "18:20:31", "remaining_time": "8:19:04"}
{"current_steps": 2155, "total_steps": 3125, "loss": 0.0455, "lr": 1.0658390472670938e-05, "epoch": 3.448538245895074, "percentage": 68.96, "elapsed_time": "18:23:16", "remaining_time": "8:16:36"}
{"current_steps": 2160, "total_steps": 3125, "loss": 0.038, "lr": 1.0559751429448597e-05, "epoch": 3.4565478574289146, "percentage": 69.12, "elapsed_time": "18:26:26", "remaining_time": "8:14:18"}
{"current_steps": 2165, "total_steps": 3125, "loss": 0.0391, "lr": 1.0461406958198101e-05, "epoch": 3.4645574689627554, "percentage": 69.28, "elapsed_time": "18:29:33", "remaining_time": "8:11:59"}
{"current_steps": 2170, "total_steps": 3125, "loss": 0.0387, "lr": 1.0363360127644235e-05, "epoch": 3.472567080496596, "percentage": 69.44, "elapsed_time": "18:32:25", "remaining_time": "8:09:34"}
{"current_steps": 2175, "total_steps": 3125, "loss": 0.043, "lr": 1.0265613997224255e-05, "epoch": 3.4805766920304366, "percentage": 69.6, "elapsed_time": "18:35:33", "remaining_time": "8:07:15"}
{"current_steps": 2180, "total_steps": 3125, "loss": 0.035, "lr": 1.0168171616992422e-05, "epoch": 3.488586303564277, "percentage": 69.76, "elapsed_time": "18:38:45", "remaining_time": "8:04:58"}
{"current_steps": 2185, "total_steps": 3125, "loss": 0.0377, "lr": 1.007103602752483e-05, "epoch": 3.496595915098118, "percentage": 69.92, "elapsed_time": "18:41:12", "remaining_time": "8:02:21"}
{"current_steps": 2190, "total_steps": 3125, "loss": 0.0625, "lr": 9.974210259824505e-06, "epoch": 3.5046055266319582, "percentage": 70.08, "elapsed_time": "18:42:41", "remaining_time": "7:59:19"}
{"current_steps": 2195, "total_steps": 3125, "loss": 0.0655, "lr": 9.877697335226872e-06, "epoch": 3.512615138165799, "percentage": 70.24, "elapsed_time": "18:43:46", "remaining_time": "7:56:08"}
{"current_steps": 2200, "total_steps": 3125, "loss": 0.0706, "lr": 9.781500265305448e-06, "epoch": 3.5206247496996395, "percentage": 70.4, "elapsed_time": "18:44:56", "remaining_time": "7:52:59"}
{"current_steps": 2205, "total_steps": 3125, "loss": 0.0654, "lr": 9.685622051777856e-06, "epoch": 3.5286343612334803, "percentage": 70.56, "elapsed_time": "18:46:00", "remaining_time": "7:49:48"}
{"current_steps": 2210, "total_steps": 3125, "loss": 0.0764, "lr": 9.590065686412182e-06, "epoch": 3.5366439727673207, "percentage": 70.72, "elapsed_time": "18:47:21", "remaining_time": "7:46:45"}
{"current_steps": 2215, "total_steps": 3125, "loss": 0.0636, "lr": 9.494834150933616e-06, "epoch": 3.5446535843011615, "percentage": 70.88, "elapsed_time": "18:48:35", "remaining_time": "7:43:39"}
{"current_steps": 2220, "total_steps": 3125, "loss": 0.0607, "lr": 9.399930416931404e-06, "epoch": 3.552663195835002, "percentage": 71.04, "elapsed_time": "18:49:39", "remaining_time": "7:40:30"}
{"current_steps": 2225, "total_steps": 3125, "loss": 0.0728, "lr": 9.30535744576615e-06, "epoch": 3.5606728073688427, "percentage": 71.2, "elapsed_time": "18:50:44", "remaining_time": "7:37:22"}
{"current_steps": 2230, "total_steps": 3125, "loss": 0.091, "lr": 9.211118188477362e-06, "epoch": 3.568682418902683, "percentage": 71.36, "elapsed_time": "18:51:56", "remaining_time": "7:34:17"}
{"current_steps": 2235, "total_steps": 3125, "loss": 0.0572, "lr": 9.117215585691408e-06, "epoch": 3.576692030436524, "percentage": 71.52, "elapsed_time": "18:53:06", "remaining_time": "7:31:12"}
{"current_steps": 2240, "total_steps": 3125, "loss": 0.0551, "lr": 9.023652567529744e-06, "epoch": 3.5847016419703643, "percentage": 71.68, "elapsed_time": "18:54:26", "remaining_time": "7:28:12"}
{"current_steps": 2245, "total_steps": 3125, "loss": 0.0651, "lr": 8.930432053517465e-06, "epoch": 3.592711253504205, "percentage": 71.84, "elapsed_time": "18:55:42", "remaining_time": "7:25:10"}
{"current_steps": 2250, "total_steps": 3125, "loss": 0.0565, "lr": 8.837556952492264e-06, "epoch": 3.6007208650380456, "percentage": 72.0, "elapsed_time": "18:56:50", "remaining_time": "7:22:06"}
{"current_steps": 2255, "total_steps": 3125, "loss": 0.0568, "lr": 8.745030162513582e-06, "epoch": 3.6087304765718864, "percentage": 72.16, "elapsed_time": "18:59:22", "remaining_time": "7:19:34"}
{"current_steps": 2260, "total_steps": 3125, "loss": 0.062, "lr": 8.652854570772236e-06, "epoch": 3.616740088105727, "percentage": 72.32, "elapsed_time": "19:00:15", "remaining_time": "7:16:25"}
{"current_steps": 2265, "total_steps": 3125, "loss": 0.0532, "lr": 8.561033053500312e-06, "epoch": 3.6247496996395676, "percentage": 72.48, "elapsed_time": "19:00:58", "remaining_time": "7:13:13"}
{"current_steps": 2270, "total_steps": 3125, "loss": 0.0653, "lr": 8.46956847588141e-06, "epoch": 3.632759311173408, "percentage": 72.64, "elapsed_time": "19:01:48", "remaining_time": "7:10:03"}
{"current_steps": 2275, "total_steps": 3125, "loss": 0.0457, "lr": 8.378463691961237e-06, "epoch": 3.640768922707249, "percentage": 72.8, "elapsed_time": "19:02:32", "remaining_time": "7:06:53"}
{"current_steps": 2280, "total_steps": 3125, "loss": 0.0484, "lr": 8.287721544558574e-06, "epoch": 3.648778534241089, "percentage": 72.96, "elapsed_time": "19:03:19", "remaining_time": "7:03:44"}
{"current_steps": 2285, "total_steps": 3125, "loss": 0.052, "lr": 8.197344865176548e-06, "epoch": 3.65678814577493, "percentage": 73.12, "elapsed_time": "19:04:16", "remaining_time": "7:00:38"}
{"current_steps": 2290, "total_steps": 3125, "loss": 0.0511, "lr": 8.10733647391427e-06, "epoch": 3.6647977573087704, "percentage": 73.28, "elapsed_time": "19:05:05", "remaining_time": "6:57:31"}
{"current_steps": 2295, "total_steps": 3125, "loss": 0.045, "lr": 8.017699179378849e-06, "epoch": 3.6728073688426113, "percentage": 73.44, "elapsed_time": "19:05:58", "remaining_time": "6:54:27"}
{"current_steps": 2300, "total_steps": 3125, "loss": 0.0472, "lr": 7.928435778597763e-06, "epoch": 3.6808169803764517, "percentage": 73.6, "elapsed_time": "19:06:43", "remaining_time": "6:51:19"}
{"current_steps": 2305, "total_steps": 3125, "loss": 0.0419, "lr": 7.839549056931557e-06, "epoch": 3.6888265919102925, "percentage": 73.76, "elapsed_time": "19:07:33", "remaining_time": "6:48:14"}
{"current_steps": 2310, "total_steps": 3125, "loss": 0.0453, "lr": 7.751041787986965e-06, "epoch": 3.696836203444133, "percentage": 73.92, "elapsed_time": "19:08:23", "remaining_time": "6:45:10"}
{"current_steps": 2315, "total_steps": 3125, "loss": 0.0463, "lr": 7.662916733530317e-06, "epoch": 3.7048458149779737, "percentage": 74.08, "elapsed_time": "19:09:05", "remaining_time": "6:42:03"}
{"current_steps": 2320, "total_steps": 3125, "loss": 0.0466, "lr": 7.575176643401394e-06, "epoch": 3.712855426511814, "percentage": 74.24, "elapsed_time": "19:09:44", "remaining_time": "6:38:56"}
{"current_steps": 2325, "total_steps": 3125, "loss": 0.0473, "lr": 7.487824255427616e-06, "epoch": 3.720865038045655, "percentage": 74.4, "elapsed_time": "19:10:32", "remaining_time": "6:35:53"}
{"current_steps": 2330, "total_steps": 3125, "loss": 0.0492, "lr": 7.400862295338595e-06, "epoch": 3.7288746495794953, "percentage": 74.56, "elapsed_time": "19:11:23", "remaining_time": "6:32:51"}
{"current_steps": 2335, "total_steps": 3125, "loss": 0.0479, "lr": 7.314293476681122e-06, "epoch": 3.736884261113336, "percentage": 74.72, "elapsed_time": "19:12:15", "remaining_time": "6:29:50"}
{"current_steps": 2340, "total_steps": 3125, "loss": 0.042, "lr": 7.228120500734443e-06, "epoch": 3.7448938726471765, "percentage": 74.88, "elapsed_time": "19:13:00", "remaining_time": "6:26:48"}
{"current_steps": 2345, "total_steps": 3125, "loss": 0.0601, "lr": 7.1423460564259995e-06, "epoch": 3.7529034841810174, "percentage": 75.04, "elapsed_time": "19:14:46", "remaining_time": "6:24:06"}
{"current_steps": 2350, "total_steps": 3125, "loss": 0.0784, "lr": 7.056972820247516e-06, "epoch": 3.7609130957148578, "percentage": 75.2, "elapsed_time": "19:18:16", "remaining_time": "6:21:59"}
{"current_steps": 2355, "total_steps": 3125, "loss": 0.1051, "lr": 6.97200345617149e-06, "epoch": 3.7689227072486986, "percentage": 75.36, "elapsed_time": "19:22:22", "remaining_time": "6:20:03"}
{"current_steps": 2360, "total_steps": 3125, "loss": 0.1056, "lr": 6.887440615568044e-06, "epoch": 3.776932318782539, "percentage": 75.52, "elapsed_time": "19:26:06", "remaining_time": "6:17:59"}
{"current_steps": 2365, "total_steps": 3125, "loss": 0.1105, "lr": 6.803286937122233e-06, "epoch": 3.78494193031638, "percentage": 75.68, "elapsed_time": "19:30:22", "remaining_time": "6:16:06"}
{"current_steps": 2370, "total_steps": 3125, "loss": 0.0831, "lr": 6.719545046751674e-06, "epoch": 3.79295154185022, "percentage": 75.84, "elapsed_time": "19:33:56", "remaining_time": "6:13:58"}
{"current_steps": 2375, "total_steps": 3125, "loss": 0.1128, "lr": 6.636217557524605e-06, "epoch": 3.800961153384061, "percentage": 76.0, "elapsed_time": "19:38:02", "remaining_time": "6:12:00"}
{"current_steps": 2380, "total_steps": 3125, "loss": 0.0812, "lr": 6.55330706957837e-06, "epoch": 3.8089707649179014, "percentage": 76.16, "elapsed_time": "19:40:54", "remaining_time": "6:09:39"}
{"current_steps": 2385, "total_steps": 3125, "loss": 0.1338, "lr": 6.4708161700382655e-06, "epoch": 3.816980376451742, "percentage": 76.32, "elapsed_time": "19:44:51", "remaining_time": "6:07:37"}
{"current_steps": 2390, "total_steps": 3125, "loss": 0.1047, "lr": 6.388747432936819e-06, "epoch": 3.8249899879855827, "percentage": 76.48, "elapsed_time": "19:49:07", "remaining_time": "6:05:41"}
{"current_steps": 2395, "total_steps": 3125, "loss": 0.1051, "lr": 6.3071034191334915e-06, "epoch": 3.8329995995194235, "percentage": 76.64, "elapsed_time": "19:53:13", "remaining_time": "6:03:41"}
{"current_steps": 2400, "total_steps": 3125, "loss": 0.0906, "lr": 6.22588667623472e-06, "epoch": 3.841009211053264, "percentage": 76.8, "elapsed_time": "19:56:36", "remaining_time": "6:01:28"}
{"current_steps": 2405, "total_steps": 3125, "loss": 0.1146, "lr": 6.145099738514466e-06, "epoch": 3.8490188225871043, "percentage": 76.96, "elapsed_time": "20:00:41", "remaining_time": "5:59:27"}
{"current_steps": 2410, "total_steps": 3125, "loss": 0.0875, "lr": 6.064745126835112e-06, "epoch": 3.857028434120945, "percentage": 77.12, "elapsed_time": "20:04:22", "remaining_time": "5:57:18"}
{"current_steps": 2415, "total_steps": 3125, "loss": 0.089, "lr": 5.984825348568812e-06, "epoch": 3.865038045654786, "percentage": 77.28, "elapsed_time": "20:08:45", "remaining_time": "5:55:22"}
{"current_steps": 2420, "total_steps": 3125, "loss": 0.0692, "lr": 5.905342897519262e-06, "epoch": 3.8730476571886263, "percentage": 77.44, "elapsed_time": "20:12:39", "remaining_time": "5:53:16"}
{"current_steps": 2425, "total_steps": 3125, "loss": 0.0818, "lr": 5.826300253843851e-06, "epoch": 3.8810572687224667, "percentage": 77.6, "elapsed_time": "20:16:43", "remaining_time": "5:51:13"}
{"current_steps": 2430, "total_steps": 3125, "loss": 0.111, "lr": 5.7476998839763035e-06, "epoch": 3.8890668802563075, "percentage": 77.76, "elapsed_time": "20:21:35", "remaining_time": "5:49:23"}
{"current_steps": 2435, "total_steps": 3125, "loss": 0.1484, "lr": 5.669544240549698e-06, "epoch": 3.8970764917901484, "percentage": 77.92, "elapsed_time": "20:25:39", "remaining_time": "5:47:18"}
{"current_steps": 2440, "total_steps": 3125, "loss": 0.1001, "lr": 5.591835762319946e-06, "epoch": 3.9050861033239888, "percentage": 78.08, "elapsed_time": "20:29:40", "remaining_time": "5:45:12"}
{"current_steps": 2445, "total_steps": 3125, "loss": 0.0929, "lr": 5.514576874089683e-06, "epoch": 3.913095714857829, "percentage": 78.24, "elapsed_time": "20:33:12", "remaining_time": "5:42:58"}
{"current_steps": 2450, "total_steps": 3125, "loss": 0.0703, "lr": 5.437769986632622e-06, "epoch": 3.92110532639167, "percentage": 78.4, "elapsed_time": "20:36:56", "remaining_time": "5:40:47"}
{"current_steps": 2455, "total_steps": 3125, "loss": 0.1554, "lr": 5.361417496618315e-06, "epoch": 3.929114937925511, "percentage": 78.56, "elapsed_time": "20:41:21", "remaining_time": "5:38:46"}
{"current_steps": 2460, "total_steps": 3125, "loss": 0.1247, "lr": 5.285521786537368e-06, "epoch": 3.937124549459351, "percentage": 78.72, "elapsed_time": "20:45:15", "remaining_time": "5:36:37"}
{"current_steps": 2465, "total_steps": 3125, "loss": 0.1054, "lr": 5.2100852246270975e-06, "epoch": 3.9451341609931916, "percentage": 78.88, "elapsed_time": "20:49:53", "remaining_time": "5:34:39"}
{"current_steps": 2470, "total_steps": 3125, "loss": 0.0711, "lr": 5.135110164797637e-06, "epoch": 3.9531437725270324, "percentage": 79.04, "elapsed_time": "20:54:35", "remaining_time": "5:32:41"}
{"current_steps": 2475, "total_steps": 3125, "loss": 0.0578, "lr": 5.060598946558484e-06, "epoch": 3.9611533840608733, "percentage": 79.2, "elapsed_time": "20:58:08", "remaining_time": "5:30:25"}
{"current_steps": 2480, "total_steps": 3125, "loss": 0.0501, "lr": 4.986553894945512e-06, "epoch": 3.9691629955947136, "percentage": 79.36, "elapsed_time": "21:01:49", "remaining_time": "5:28:10"}
{"current_steps": 2485, "total_steps": 3125, "loss": 0.102, "lr": 4.912977320448391e-06, "epoch": 3.977172607128554, "percentage": 79.52, "elapsed_time": "21:06:33", "remaining_time": "5:26:11"}
{"current_steps": 2490, "total_steps": 3125, "loss": 0.0914, "lr": 4.839871518938513e-06, "epoch": 3.985182218662395, "percentage": 79.68, "elapsed_time": "21:11:01", "remaining_time": "5:24:08"}
{"current_steps": 2495, "total_steps": 3125, "loss": 0.07, "lr": 4.767238771597347e-06, "epoch": 3.9931918301962357, "percentage": 79.84, "elapsed_time": "21:14:59", "remaining_time": "5:21:56"}
{"current_steps": 2500, "total_steps": 3125, "loss": 0.0705, "lr": 4.695081344845254e-06, "epoch": 4.0, "percentage": 80.0, "elapsed_time": "21:18:33", "remaining_time": "5:19:38"}
{"current_steps": 2505, "total_steps": 3125, "loss": 0.088, "lr": 4.623401490270778e-06, "epoch": 4.008009611533841, "percentage": 80.16, "elapsed_time": "21:20:09", "remaining_time": "5:16:50"}
{"current_steps": 2510, "total_steps": 3125, "loss": 0.0807, "lr": 4.552201444560373e-06, "epoch": 4.016019223067681, "percentage": 80.32, "elapsed_time": "21:22:20", "remaining_time": "5:14:12"}
{"current_steps": 2515, "total_steps": 3125, "loss": 0.073, "lr": 4.481483429428615e-06, "epoch": 4.024028834601522, "percentage": 80.48, "elapsed_time": "21:24:23", "remaining_time": "5:11:31"}
{"current_steps": 2520, "total_steps": 3125, "loss": 0.0695, "lr": 4.4112496515488765e-06, "epoch": 4.032038446135362, "percentage": 80.64, "elapsed_time": "21:26:27", "remaining_time": "5:08:51"}
{"current_steps": 2525, "total_steps": 3125, "loss": 0.0652, "lr": 4.341502302484472e-06, "epoch": 4.040048057669203, "percentage": 80.8, "elapsed_time": "21:28:23", "remaining_time": "5:06:09"}
{"current_steps": 2530, "total_steps": 3125, "loss": 0.0606, "lr": 4.272243558620264e-06, "epoch": 4.048057669203043, "percentage": 80.96, "elapsed_time": "21:30:26", "remaining_time": "5:03:29"}
{"current_steps": 2535, "total_steps": 3125, "loss": 0.0548, "lr": 4.203475581094771e-06, "epoch": 4.056067280736884, "percentage": 81.12, "elapsed_time": "21:32:18", "remaining_time": "5:00:46"}
{"current_steps": 2540, "total_steps": 3125, "loss": 0.059, "lr": 4.135200515732716e-06, "epoch": 4.064076892270725, "percentage": 81.28, "elapsed_time": "21:33:49", "remaining_time": "4:57:59"}
{"current_steps": 2545, "total_steps": 3125, "loss": 0.0586, "lr": 4.067420492978065e-06, "epoch": 4.072086503804566, "percentage": 81.44, "elapsed_time": "21:35:44", "remaining_time": "4:55:17"}
{"current_steps": 2550, "total_steps": 3125, "loss": 0.0625, "lr": 4.000137627827554e-06, "epoch": 4.080096115338406, "percentage": 81.6, "elapsed_time": "21:38:12", "remaining_time": "4:52:44"}
{"current_steps": 2555, "total_steps": 3125, "loss": 0.1026, "lr": 3.9333540197647035e-06, "epoch": 4.0881057268722465, "percentage": 81.76, "elapsed_time": "21:40:30", "remaining_time": "4:50:07"}
{"current_steps": 2560, "total_steps": 3125, "loss": 0.0541, "lr": 3.867071752694282e-06, "epoch": 4.096115338406087, "percentage": 81.92, "elapsed_time": "21:42:16", "remaining_time": "4:47:24"}
{"current_steps": 2565, "total_steps": 3125, "loss": 0.049, "lr": 3.8012928948773243e-06, "epoch": 4.104124949939928, "percentage": 82.08, "elapsed_time": "21:44:22", "remaining_time": "4:44:46"}
{"current_steps": 2570, "total_steps": 3125, "loss": 0.0488, "lr": 3.7360194988665364e-06, "epoch": 4.112134561473768, "percentage": 82.24, "elapsed_time": "21:46:37", "remaining_time": "4:42:10"}
{"current_steps": 2575, "total_steps": 3125, "loss": 0.0494, "lr": 3.6712536014422885e-06, "epoch": 4.120144173007609, "percentage": 82.4, "elapsed_time": "21:48:55", "remaining_time": "4:39:34"}
{"current_steps": 2580, "total_steps": 3125, "loss": 0.0451, "lr": 3.606997223549049e-06, "epoch": 4.12815378454145, "percentage": 82.56, "elapsed_time": "21:51:29", "remaining_time": "4:37:02"}
{"current_steps": 2585, "total_steps": 3125, "loss": 0.0463, "lr": 3.543252370232313e-06, "epoch": 4.136163396075291, "percentage": 82.72, "elapsed_time": "21:53:25", "remaining_time": "4:34:22"}
{"current_steps": 2590, "total_steps": 3125, "loss": 0.0458, "lr": 3.4800210305760662e-06, "epoch": 4.1441730076091305, "percentage": 82.88, "elapsed_time": "21:55:51", "remaining_time": "4:31:48"}
{"current_steps": 2595, "total_steps": 3125, "loss": 0.0456, "lr": 3.4173051776406817e-06, "epoch": 4.152182619142971, "percentage": 83.04, "elapsed_time": "21:57:32", "remaining_time": "4:29:05"}
{"current_steps": 2600, "total_steps": 3125, "loss": 0.0493, "lr": 3.3551067684013706e-06, "epoch": 4.160192230676812, "percentage": 83.2, "elapsed_time": "21:59:21", "remaining_time": "4:26:24"}
{"current_steps": 2605, "total_steps": 3125, "loss": 0.0418, "lr": 3.2934277436871187e-06, "epoch": 4.168201842210653, "percentage": 83.36, "elapsed_time": "22:01:49", "remaining_time": "4:23:51"}
{"current_steps": 2610, "total_steps": 3125, "loss": 0.0454, "lr": 3.232270028120121e-06, "epoch": 4.176211453744493, "percentage": 83.52, "elapsed_time": "22:03:52", "remaining_time": "4:21:13"}
{"current_steps": 2615, "total_steps": 3125, "loss": 0.0413, "lr": 3.1716355300557256e-06, "epoch": 4.184221065278334, "percentage": 83.68, "elapsed_time": "22:05:46", "remaining_time": "4:18:33"}
{"current_steps": 2620, "total_steps": 3125, "loss": 0.0379, "lr": 3.111526141522896e-06, "epoch": 4.192230676812175, "percentage": 83.84, "elapsed_time": "22:07:39", "remaining_time": "4:15:54"}
{"current_steps": 2625, "total_steps": 3125, "loss": 0.0582, "lr": 3.0519437381651507e-06, "epoch": 4.2002402883460155, "percentage": 84.0, "elapsed_time": "22:10:15", "remaining_time": "4:13:22"}
{"current_steps": 2630, "total_steps": 3125, "loss": 0.0575, "lr": 2.992890179182062e-06, "epoch": 4.208249899879855, "percentage": 84.16, "elapsed_time": "22:13:09", "remaining_time": "4:10:55"}
{"current_steps": 2635, "total_steps": 3125, "loss": 0.0507, "lr": 2.93436730727122e-06, "epoch": 4.216259511413696, "percentage": 84.32, "elapsed_time": "22:16:16", "remaining_time": "4:08:29"}
{"current_steps": 2640, "total_steps": 3125, "loss": 0.0514, "lr": 2.8763769485707447e-06, "epoch": 4.224269122947537, "percentage": 84.48, "elapsed_time": "22:19:21", "remaining_time": "4:06:03"}
{"current_steps": 2645, "total_steps": 3125, "loss": 0.0413, "lr": 2.818920912602294e-06, "epoch": 4.232278734481378, "percentage": 84.64, "elapsed_time": "22:21:28", "remaining_time": "4:03:26"}
{"current_steps": 2650, "total_steps": 3125, "loss": 0.0421, "lr": 2.762000992214626e-06, "epoch": 4.240288346015218, "percentage": 84.8, "elapsed_time": "22:24:35", "remaining_time": "4:01:00"}
{"current_steps": 2655, "total_steps": 3125, "loss": 0.0365, "lr": 2.7056189635276162e-06, "epoch": 4.248297957549059, "percentage": 84.96, "elapsed_time": "22:27:34", "remaining_time": "3:58:33"}
{"current_steps": 2660, "total_steps": 3125, "loss": 0.0365, "lr": 2.6497765858768643e-06, "epoch": 4.2563075690828995, "percentage": 85.12, "elapsed_time": "22:30:17", "remaining_time": "3:56:02"}
{"current_steps": 2665, "total_steps": 3125, "loss": 0.0369, "lr": 2.594475601758786e-06, "epoch": 4.26431718061674, "percentage": 85.28, "elapsed_time": "22:33:24", "remaining_time": "3:53:36"}
{"current_steps": 2670, "total_steps": 3125, "loss": 0.0322, "lr": 2.539717736776237e-06, "epoch": 4.27232679215058, "percentage": 85.44, "elapsed_time": "22:36:38", "remaining_time": "3:51:11"}
{"current_steps": 2675, "total_steps": 3125, "loss": 0.029, "lr": 2.4855046995846844e-06, "epoch": 4.280336403684421, "percentage": 85.6, "elapsed_time": "22:39:47", "remaining_time": "3:48:45"}
{"current_steps": 2680, "total_steps": 3125, "loss": 0.0287, "lr": 2.431838181838868e-06, "epoch": 4.288346015218262, "percentage": 85.76, "elapsed_time": "22:42:46", "remaining_time": "3:46:16"}
{"current_steps": 2685, "total_steps": 3125, "loss": 0.0303, "lr": 2.3787198581400285e-06, "epoch": 4.296355626752103, "percentage": 85.92, "elapsed_time": "22:46:09", "remaining_time": "3:43:52"}
{"current_steps": 2690, "total_steps": 3125, "loss": 0.0302, "lr": 2.3261513859836437e-06, "epoch": 4.304365238285943, "percentage": 86.08, "elapsed_time": "22:49:24", "remaining_time": "3:41:26"}
{"current_steps": 2695, "total_steps": 3125, "loss": 0.0246, "lr": 2.27413440570772e-06, "epoch": 4.312374849819784, "percentage": 86.24, "elapsed_time": "22:52:30", "remaining_time": "3:38:59"}
{"current_steps": 2700, "total_steps": 3125, "loss": 0.0234, "lr": 2.222670540441596e-06, "epoch": 4.320384461353624, "percentage": 86.4, "elapsed_time": "22:55:37", "remaining_time": "3:36:32"}
{"current_steps": 2705, "total_steps": 3125, "loss": 0.0276, "lr": 2.17176139605531e-06, "epoch": 4.328394072887465, "percentage": 86.56, "elapsed_time": "22:58:24", "remaining_time": "3:34:01"}
{"current_steps": 2710, "total_steps": 3125, "loss": 0.0243, "lr": 2.121408561109466e-06, "epoch": 4.336403684421305, "percentage": 86.72, "elapsed_time": "23:01:23", "remaining_time": "3:31:32"}
{"current_steps": 2715, "total_steps": 3125, "loss": 0.0233, "lr": 2.071613606805696e-06, "epoch": 4.344413295955146, "percentage": 86.88, "elapsed_time": "23:04:35", "remaining_time": "3:29:05"}
{"current_steps": 2720, "total_steps": 3125, "loss": 0.0242, "lr": 2.0223780869376018e-06, "epoch": 4.352422907488987, "percentage": 87.04, "elapsed_time": "23:07:23", "remaining_time": "3:26:34"}
{"current_steps": 2725, "total_steps": 3125, "loss": 0.0217, "lr": 1.9737035378422907e-06, "epoch": 4.360432519022828, "percentage": 87.2, "elapsed_time": "23:10:19", "remaining_time": "3:24:05"}
{"current_steps": 2730, "total_steps": 3125, "loss": 0.0242, "lr": 1.925591478352424e-06, "epoch": 4.368442130556668, "percentage": 87.36, "elapsed_time": "23:13:20", "remaining_time": "3:21:36"}
{"current_steps": 2735, "total_steps": 3125, "loss": 0.0285, "lr": 1.8780434097488443e-06, "epoch": 4.3764517420905085, "percentage": 87.52, "elapsed_time": "23:16:13", "remaining_time": "3:19:05"}
{"current_steps": 2740, "total_steps": 3125, "loss": 0.019, "lr": 1.831060815713699e-06, "epoch": 4.384461353624349, "percentage": 87.68, "elapsed_time": "23:19:10", "remaining_time": "3:16:35"}
{"current_steps": 2745, "total_steps": 3125, "loss": 0.0178, "lr": 1.7846451622841643e-06, "epoch": 4.39247096515819, "percentage": 87.84, "elapsed_time": "23:22:14", "remaining_time": "3:14:07"}
{"current_steps": 2750, "total_steps": 3125, "loss": 0.0194, "lr": 1.7387978978066988e-06, "epoch": 4.40048057669203, "percentage": 88.0, "elapsed_time": "23:25:21", "remaining_time": "3:11:38"}
{"current_steps": 2755, "total_steps": 3125, "loss": 0.0173, "lr": 1.6935204528918347e-06, "epoch": 4.408490188225871, "percentage": 88.16, "elapsed_time": "23:28:23", "remaining_time": "3:09:08"}
{"current_steps": 2760, "total_steps": 3125, "loss": 0.0213, "lr": 1.6488142403695651e-06, "epoch": 4.416499799759712, "percentage": 88.32, "elapsed_time": "23:31:16", "remaining_time": "3:06:38"}
{"current_steps": 2765, "total_steps": 3125, "loss": 0.0218, "lr": 1.6046806552452254e-06, "epoch": 4.424509411293553, "percentage": 88.48, "elapsed_time": "23:34:14", "remaining_time": "3:04:08"}
{"current_steps": 2770, "total_steps": 3125, "loss": 0.0171, "lr": 1.5611210746559868e-06, "epoch": 4.4325190228273925, "percentage": 88.64, "elapsed_time": "23:37:12", "remaining_time": "3:01:37"}
{"current_steps": 2775, "total_steps": 3125, "loss": 0.0174, "lr": 1.5181368578278744e-06, "epoch": 4.440528634361233, "percentage": 88.8, "elapsed_time": "23:40:15", "remaining_time": "2:59:07"}
{"current_steps": 2780, "total_steps": 3125, "loss": 0.0175, "lr": 1.4757293460333566e-06, "epoch": 4.448538245895074, "percentage": 88.96, "elapsed_time": "23:43:00", "remaining_time": "2:56:35"}
{"current_steps": 2785, "total_steps": 3125, "loss": 0.0147, "lr": 1.4338998625494905e-06, "epoch": 4.456547857428915, "percentage": 89.12, "elapsed_time": "23:46:09", "remaining_time": "2:54:06"}
{"current_steps": 2790, "total_steps": 3125, "loss": 0.0152, "lr": 1.3926497126166405e-06, "epoch": 4.464557468962755, "percentage": 89.28, "elapsed_time": "23:49:16", "remaining_time": "2:51:36"}
{"current_steps": 2795, "total_steps": 3125, "loss": 0.0148, "lr": 1.3519801833977298e-06, "epoch": 4.472567080496596, "percentage": 89.44, "elapsed_time": "23:52:08", "remaining_time": "2:49:05"}
{"current_steps": 2800, "total_steps": 3125, "loss": 0.0167, "lr": 1.3118925439381003e-06, "epoch": 4.480576692030437, "percentage": 89.6, "elapsed_time": "23:55:16", "remaining_time": "2:46:35"}
{"current_steps": 2805, "total_steps": 3125, "loss": 0.0124, "lr": 1.2723880451258918e-06, "epoch": 4.4885863035642775, "percentage": 89.76, "elapsed_time": "23:58:28", "remaining_time": "2:44:06"}
{"current_steps": 2810, "total_steps": 3125, "loss": 0.0131, "lr": 1.2334679196530219e-06, "epoch": 4.496595915098117, "percentage": 89.92, "elapsed_time": "1 day, 0:00:55", "remaining_time": "2:41:31"}
{"current_steps": 2815, "total_steps": 3125, "loss": 0.022, "lr": 1.1951333819767163e-06, "epoch": 4.504605526631958, "percentage": 90.08, "elapsed_time": "1 day, 0:02:23", "remaining_time": "2:38:50"}
{"current_steps": 2820, "total_steps": 3125, "loss": 0.0231, "lr": 1.157385628281622e-06, "epoch": 4.512615138165799, "percentage": 90.24, "elapsed_time": "1 day, 0:03:29", "remaining_time": "2:36:07"}
{"current_steps": 2825, "total_steps": 3125, "loss": 0.0271, "lr": 1.1202258364424633e-06, "epoch": 4.52062474969964, "percentage": 90.4, "elapsed_time": "1 day, 0:04:39", "remaining_time": "2:33:24"}
{"current_steps": 2830, "total_steps": 3125, "loss": 0.023, "lr": 1.0836551659873073e-06, "epoch": 4.52863436123348, "percentage": 90.56, "elapsed_time": "1 day, 0:05:42", "remaining_time": "2:30:42"}
{"current_steps": 2835, "total_steps": 3125, "loss": 0.0277, "lr": 1.0476747580613723e-06, "epoch": 4.536643972767321, "percentage": 90.72, "elapsed_time": "1 day, 0:07:03", "remaining_time": "2:28:01"}
{"current_steps": 2840, "total_steps": 3125, "loss": 0.0203, "lr": 1.012285735391416e-06, "epoch": 4.5446535843011615, "percentage": 90.88, "elapsed_time": "1 day, 0:08:17", "remaining_time": "2:25:20"}
{"current_steps": 2845, "total_steps": 3125, "loss": 0.0204, "lr": 9.774892022507166e-07, "epoch": 4.552663195835002, "percentage": 91.04, "elapsed_time": "1 day, 0:09:22", "remaining_time": "2:22:38"}
{"current_steps": 2850, "total_steps": 3125, "loss": 0.0258, "lr": 9.432862444245994e-07, "epoch": 4.560672807368842, "percentage": 91.2, "elapsed_time": "1 day, 0:10:26", "remaining_time": "2:19:57"}
{"current_steps": 2855, "total_steps": 3125, "loss": 0.0376, "lr": 9.096779291765667e-07, "epoch": 4.568682418902683, "percentage": 91.36, "elapsed_time": "1 day, 0:11:38", "remaining_time": "2:17:16"}
{"current_steps": 2860, "total_steps": 3125, "loss": 0.0193, "lr": 8.766653052149831e-07, "epoch": 4.576692030436524, "percentage": 91.52, "elapsed_time": "1 day, 0:12:48", "remaining_time": "2:14:36"}
{"current_steps": 2865, "total_steps": 3125, "loss": 0.0174, "lr": 8.442494026603709e-07, "epoch": 4.584701641970365, "percentage": 91.68, "elapsed_time": "1 day, 0:14:08", "remaining_time": "2:11:57"}
{"current_steps": 2870, "total_steps": 3125, "loss": 0.023, "lr": 8.124312330132423e-07, "epoch": 4.592711253504205, "percentage": 91.84, "elapsed_time": "1 day, 0:15:24", "remaining_time": "2:09:18"}
{"current_steps": 2875, "total_steps": 3125, "loss": 0.0178, "lr": 7.812117891225667e-07, "epoch": 4.600720865038046, "percentage": 92.0, "elapsed_time": "1 day, 0:16:32", "remaining_time": "2:06:39"}
{"current_steps": 2880, "total_steps": 3125, "loss": 0.018, "lr": 7.505920451547544e-07, "epoch": 4.608730476571886, "percentage": 92.16, "elapsed_time": "1 day, 0:17:29", "remaining_time": "2:03:59"}
{"current_steps": 2885, "total_steps": 3125, "loss": 0.0205, "lr": 7.205729565632947e-07, "epoch": 4.616740088105727, "percentage": 92.32, "elapsed_time": "1 day, 0:18:23", "remaining_time": "2:01:19"}
{"current_steps": 2890, "total_steps": 3125, "loss": 0.0179, "lr": 6.911554600589121e-07, "epoch": 4.624749699639567, "percentage": 92.48, "elapsed_time": "1 day, 0:19:06", "remaining_time": "1:58:38"}
{"current_steps": 2895, "total_steps": 3125, "loss": 0.0222, "lr": 6.62340473580354e-07, "epoch": 4.632759311173408, "percentage": 92.64, "elapsed_time": "1 day, 0:19:55", "remaining_time": "1:55:59"}
{"current_steps": 2900, "total_steps": 3125, "loss": 0.0137, "lr": 6.341288962657422e-07, "epoch": 4.640768922707249, "percentage": 92.8, "elapsed_time": "1 day, 0:20:39", "remaining_time": "1:53:19"}
{"current_steps": 2905, "total_steps": 3125, "loss": 0.0153, "lr": 6.06521608424524e-07, "epoch": 4.64877853424109, "percentage": 92.96, "elapsed_time": "1 day, 0:21:26", "remaining_time": "1:50:40"}
{"current_steps": 2910, "total_steps": 3125, "loss": 0.016, "lr": 5.795194715099905e-07, "epoch": 4.65678814577493, "percentage": 93.12, "elapsed_time": "1 day, 0:22:23", "remaining_time": "1:48:02"}
{"current_steps": 2915, "total_steps": 3125, "loss": 0.0156, "lr": 5.531233280924042e-07, "epoch": 4.66479775730877, "percentage": 93.28, "elapsed_time": "1 day, 0:23:12", "remaining_time": "1:45:24"}
{"current_steps": 2920, "total_steps": 3125, "loss": 0.0139, "lr": 5.273340018327044e-07, "epoch": 4.672807368842611, "percentage": 93.44, "elapsed_time": "1 day, 0:24:05", "remaining_time": "1:42:47"}
{"current_steps": 2925, "total_steps": 3125, "loss": 0.0139, "lr": 5.02152297456806e-07, "epoch": 4.680816980376452, "percentage": 93.6, "elapsed_time": "1 day, 0:24:50", "remaining_time": "1:40:09"}
{"current_steps": 2930, "total_steps": 3125, "loss": 0.0122, "lr": 4.775790007304993e-07, "epoch": 4.688826591910292, "percentage": 93.76, "elapsed_time": "1 day, 0:25:40", "remaining_time": "1:37:32"}
{"current_steps": 2935, "total_steps": 3125, "loss": 0.0124, "lr": 4.5361487843490924e-07, "epoch": 4.696836203444133, "percentage": 93.92, "elapsed_time": "1 day, 0:26:30", "remaining_time": "1:34:56"}
{"current_steps": 2940, "total_steps": 3125, "loss": 0.0135, "lr": 4.3026067834258667e-07, "epoch": 4.704845814977974, "percentage": 94.08, "elapsed_time": "1 day, 0:27:12", "remaining_time": "1:32:19"}
{"current_steps": 2945, "total_steps": 3125, "loss": 0.0123, "lr": 4.0751712919417484e-07, "epoch": 4.7128554265118145, "percentage": 94.24, "elapsed_time": "1 day, 0:27:51", "remaining_time": "1:29:42"}
{"current_steps": 2950, "total_steps": 3125, "loss": 0.0129, "lr": 3.853849406756549e-07, "epoch": 4.7208650380456545, "percentage": 94.4, "elapsed_time": "1 day, 0:28:39", "remaining_time": "1:27:07"}
{"current_steps": 2955, "total_steps": 3125, "loss": 0.0137, "lr": 3.6386480339621886e-07, "epoch": 4.728874649579495, "percentage": 94.56, "elapsed_time": "1 day, 0:29:30", "remaining_time": "1:24:32"}
{"current_steps": 2960, "total_steps": 3125, "loss": 0.012, "lr": 3.4295738886670925e-07, "epoch": 4.736884261113336, "percentage": 94.72, "elapsed_time": "1 day, 0:30:21", "remaining_time": "1:21:57"}
{"current_steps": 2965, "total_steps": 3125, "loss": 0.0101, "lr": 3.226633494786668e-07, "epoch": 4.744893872647177, "percentage": 94.88, "elapsed_time": "1 day, 0:31:07", "remaining_time": "1:19:23"}
{"current_steps": 2970, "total_steps": 3125, "loss": 0.0282, "lr": 3.0298331848398033e-07, "epoch": 4.752903484181017, "percentage": 95.04, "elapsed_time": "1 day, 0:32:52", "remaining_time": "1:16:52"}
{"current_steps": 2975, "total_steps": 3125, "loss": 0.0512, "lr": 2.839179099751133e-07, "epoch": 4.760913095714858, "percentage": 95.2, "elapsed_time": "1 day, 0:36:22", "remaining_time": "1:14:26"}
{"current_steps": 2980, "total_steps": 3125, "loss": 0.0765, "lr": 2.654677188659549e-07, "epoch": 4.768922707248699, "percentage": 95.36, "elapsed_time": "1 day, 0:40:28", "remaining_time": "1:12:02"}
{"current_steps": 2985, "total_steps": 3125, "loss": 0.0744, "lr": 2.476333208732462e-07, "epoch": 4.776932318782539, "percentage": 95.52, "elapsed_time": "1 day, 0:44:12", "remaining_time": "1:09:36"}
{"current_steps": 2990, "total_steps": 3125, "loss": 0.0815, "lr": 2.3041527249863193e-07, "epoch": 4.784941930316379, "percentage": 95.68, "elapsed_time": "1 day, 0:48:27", "remaining_time": "1:07:12"}
{"current_steps": 2995, "total_steps": 3125, "loss": 0.0592, "lr": 2.1381411101127013e-07, "epoch": 4.79295154185022, "percentage": 95.84, "elapsed_time": "1 day, 0:52:01", "remaining_time": "1:04:45"}
{"current_steps": 3000, "total_steps": 3125, "loss": 0.0818, "lr": 1.9783035443108999e-07, "epoch": 4.800961153384061, "percentage": 96.0, "elapsed_time": "1 day, 0:56:07", "remaining_time": "1:02:20"}
{"current_steps": 3005, "total_steps": 3125, "loss": 0.0552, "lr": 1.8246450151261362e-07, "epoch": 4.808970764917902, "percentage": 96.16, "elapsed_time": "1 day, 1:00:33", "remaining_time": "0:59:55"}
{"current_steps": 3010, "total_steps": 3125, "loss": 0.1021, "lr": 1.6771703172940635e-07, "epoch": 4.816980376451742, "percentage": 96.32, "elapsed_time": "1 day, 1:04:29", "remaining_time": "0:57:28"}
{"current_steps": 3015, "total_steps": 3125, "loss": 0.0764, "lr": 1.5358840525909967e-07, "epoch": 4.824989987985583, "percentage": 96.48, "elapsed_time": "1 day, 1:08:45", "remaining_time": "0:55:02"}
{"current_steps": 3020, "total_steps": 3125, "loss": 0.0769, "lr": 1.4007906296904072e-07, "epoch": 4.8329995995194235, "percentage": 96.64, "elapsed_time": "1 day, 1:12:51", "remaining_time": "0:52:35"}
{"current_steps": 3025, "total_steps": 3125, "loss": 0.0618, "lr": 1.2718942640254084e-07, "epoch": 4.841009211053263, "percentage": 96.8, "elapsed_time": "1 day, 1:16:13", "remaining_time": "0:50:07"}
{"current_steps": 3030, "total_steps": 3125, "loss": 0.0828, "lr": 1.1491989776570623e-07, "epoch": 4.849018822587104, "percentage": 96.96, "elapsed_time": "1 day, 1:20:17", "remaining_time": "0:47:39"}
{"current_steps": 3035, "total_steps": 3125, "loss": 0.0593, "lr": 1.0327085991490127e-07, "epoch": 4.857028434120945, "percentage": 97.12, "elapsed_time": "1 day, 1:23:58", "remaining_time": "0:45:11"}
{"current_steps": 3040, "total_steps": 3125, "loss": 0.0623, "lr": 9.22426763447981e-08, "epoch": 4.865038045654786, "percentage": 97.28, "elapsed_time": "1 day, 1:28:21", "remaining_time": "0:42:44"}
{"current_steps": 3045, "total_steps": 3125, "loss": 0.048, "lr": 8.183569117703461e-08, "epoch": 4.873047657188627, "percentage": 97.44, "elapsed_time": "1 day, 1:32:15", "remaining_time": "0:40:15"}
{"current_steps": 3050, "total_steps": 3125, "loss": 0.0599, "lr": 7.205022914946957e-08, "epoch": 4.881057268722467, "percentage": 97.6, "elapsed_time": "1 day, 1:36:19", "remaining_time": "0:37:46"}
{"current_steps": 3055, "total_steps": 3125, "loss": 0.0803, "lr": 6.288659560606203e-08, "epoch": 4.8890668802563075, "percentage": 97.76, "elapsed_time": "1 day, 1:41:10", "remaining_time": "0:35:18"}
{"current_steps": 3060, "total_steps": 3125, "loss": 0.104, "lr": 5.4345076487332114e-08, "epoch": 4.897076491790148, "percentage": 97.92, "elapsed_time": "1 day, 1:45:14", "remaining_time": "0:32:49"}
{"current_steps": 3065, "total_steps": 3125, "loss": 0.069, "lr": 4.642593832144382e-08, "epoch": 4.905086103323988, "percentage": 98.08, "elapsed_time": "1 day, 1:49:15", "remaining_time": "0:30:19"}
{"current_steps": 3070, "total_steps": 3125, "loss": 0.0634, "lr": 3.912942821589161e-08, "epoch": 4.913095714857829, "percentage": 98.24, "elapsed_time": "1 day, 1:52:48", "remaining_time": "0:27:49"}
{"current_steps": 3075, "total_steps": 3125, "loss": 0.0483, "lr": 3.2455773849779935e-08, "epoch": 4.92110532639167, "percentage": 98.4, "elapsed_time": "1 day, 1:56:31", "remaining_time": "0:25:18"}
{"current_steps": 3080, "total_steps": 3125, "loss": 0.1183, "lr": 2.6405183466731154e-08, "epoch": 4.929114937925511, "percentage": 98.56, "elapsed_time": "1 day, 2:00:56", "remaining_time": "0:22:48"}
{"current_steps": 3085, "total_steps": 3125, "loss": 0.0908, "lr": 2.0977845868375145e-08, "epoch": 4.937124549459352, "percentage": 98.72, "elapsed_time": "1 day, 2:04:50", "remaining_time": "0:20:17"}
{"current_steps": 3090, "total_steps": 3125, "loss": 0.0751, "lr": 1.6173930408467376e-08, "epoch": 4.945134160993192, "percentage": 98.88, "elapsed_time": "1 day, 2:09:28", "remaining_time": "0:17:46"}
{"current_steps": 3095, "total_steps": 3125, "loss": 0.0491, "lr": 1.199358698759978e-08, "epoch": 4.953143772527032, "percentage": 99.04, "elapsed_time": "1 day, 2:14:10", "remaining_time": "0:15:15"}
{"current_steps": 3100, "total_steps": 3125, "loss": 0.0386, "lr": 8.436946048522298e-09, "epoch": 4.961153384060873, "percentage": 99.2, "elapsed_time": "1 day, 2:17:42", "remaining_time": "0:12:43"}
{"current_steps": 3105, "total_steps": 3125, "loss": 0.0327, "lr": 5.504118572081662e-09, "epoch": 4.969162995594713, "percentage": 99.36, "elapsed_time": "1 day, 2:21:23", "remaining_time": "0:10:11"}
{"current_steps": 3110, "total_steps": 3125, "loss": 0.0701, "lr": 3.1951960737419686e-09, "epoch": 4.977172607128554, "percentage": 99.52, "elapsed_time": "1 day, 2:26:07", "remaining_time": "0:07:39"}
{"current_steps": 3115, "total_steps": 3125, "loss": 0.0609, "lr": 1.5102506007447227e-09, "epoch": 4.985182218662395, "percentage": 99.68, "elapsed_time": "1 day, 2:30:34", "remaining_time": "0:05:06"}
{"current_steps": 3120, "total_steps": 3125, "loss": 0.0437, "lr": 4.493347298528683e-10, "epoch": 4.993191830196236, "percentage": 99.84, "elapsed_time": "1 day, 2:34:33", "remaining_time": "0:02:33"}
{"current_steps": 3125, "total_steps": 3125, "loss": 0.0448, "lr": 1.248156571209691e-11, "epoch": 5.0, "percentage": 100.0, "elapsed_time": "1 day, 2:38:06", "remaining_time": "0:00:00"}
{"current_steps": 3125, "total_steps": 3125, "epoch": 5.0, "percentage": 100.0, "elapsed_time": "1 day, 2:39:41", "remaining_time": "0:00:00"}

6922
trainer_state.json Normal file

File diff suppressed because it is too large Load Diff

3
training_args.bin Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:aff1b17707cdb4951e97cf54968be6e28c6a74467692566418974a1b10c5cf65
size 8529

BIN
training_loss.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 47 KiB

1
vocab.json Normal file

File diff suppressed because one or more lines are too long