初始化项目,由ModelHub XC社区提供模型

Model: laion/alfworld-swesmith-r2egym-swegym-131k-32B-lc
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-04 07:27:43 +08:00
commit aa88f5e3dc
33 changed files with 155891 additions and 0 deletions

36
.gitattributes vendored Normal file
View File

@@ -0,0 +1,36 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text

61
README.md Normal file
View File

@@ -0,0 +1,61 @@
---
library_name: transformers
license: other
base_model: Qwen/Qwen3-32B
tags:
- llama-factory
- full
- generated_from_trainer
model-index:
- name: sft__glm46-neulab-agenttuning-alfworld-sandboxes-maxeps-131k-glm46-swesmith-maxeps-131k-GLM-4-7
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# sft__glm46-neulab-agenttuning-alfworld-sandboxes-maxeps-131k-glm46-swesmith-maxeps-131k-GLM-4-7
This model is a fine-tuned version of [Qwen/Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) on the /e/data1/datasets/playground/ot/hf_hub/datasets--penfever--glm46-neulab-agenttuning-alfworld-sandboxes-maxeps-131k/snapshots/fdb0d0afe08aa3c31c7605b40c18d5e48fdc206c_thinking_preprocessed, the /e/data1/datasets/playground/ot/hf_hub/datasets--penfever--glm46-swesmith-maxeps-131k/snapshots/4d4c2d4a9d21f73870ed31c7bc6028035b3b6ca7_thinking_preprocessed, the /e/data1/datasets/playground/ot/hf_hub/datasets--DCAgent2--GLM-4.7-r2egym_sandboxes-maxeps-131k/snapshots/167ff86e8203fa2412574480bf52623cb62320e8_thinking_preprocessed and the /e/data1/datasets/playground/ot/hf_hub/datasets--DCAgent2--glm46-swegym-tasks-maxeps-131k/snapshots/bc7a253d567261d84db295a138b8af86eac6ae4c_thinking_preprocessed datasets.
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 4e-05
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 64
- gradient_accumulation_steps: 6
- total_train_batch_size: 384
- total_eval_batch_size: 512
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 7.0
### Training results
### Framework versions
- Transformers 4.57.6
- Pytorch 2.9.1+cu130
- Datasets 4.7.0
- Tokenizers 0.22.2

28
added_tokens.json Normal file
View File

@@ -0,0 +1,28 @@
{
"</think>": 151668,
"</tool_call>": 151658,
"</tool_response>": 151666,
"<think>": 151667,
"<tool_call>": 151657,
"<tool_response>": 151665,
"<|box_end|>": 151649,
"<|box_start|>": 151648,
"<|endoftext|>": 151643,
"<|file_sep|>": 151664,
"<|fim_middle|>": 151660,
"<|fim_pad|>": 151662,
"<|fim_prefix|>": 151659,
"<|fim_suffix|>": 151661,
"<|im_end|>": 151645,
"<|im_start|>": 151644,
"<|image_pad|>": 151655,
"<|object_ref_end|>": 151647,
"<|object_ref_start|>": 151646,
"<|quad_end|>": 151651,
"<|quad_start|>": 151650,
"<|repo_name|>": 151663,
"<|video_pad|>": 151656,
"<|vision_end|>": 151653,
"<|vision_pad|>": 151654,
"<|vision_start|>": 151652
}

14
all_results.json Normal file
View File

@@ -0,0 +1,14 @@
{
"achieved_tflops_per_gpu": 1063.315854388195,
"achieved_tflops_per_gpu_theoretical": 121285192.64972909,
"epoch": 6.597039473684211,
"loss_nan_ranks": 0,
"loss_rank_avg": 0.18427793681621552,
"mfu_percent": 75.14599677655089,
"mfu_percent_theoretical": 8571391.706694635,
"total_flos": 7.249602429950362e+16,
"train_loss": 0.0,
"train_runtime": 1.0653,
"train_samples_per_second": 7990.221,
"train_steps_per_second": 1333.894
}

89
chat_template.jinja Normal file
View File

@@ -0,0 +1,89 @@
{%- if tools %}
{{- '<|im_start|>system\n' }}
{%- if messages[0].role == 'system' %}
{{- messages[0].content + '\n\n' }}
{%- endif %}
{{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
{%- for tool in tools %}
{{- "\n" }}
{{- tool | tojson }}
{%- endfor %}
{{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
{%- else %}
{%- if messages[0].role == 'system' %}
{{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
{%- endif %}
{%- endif %}
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
{%- for message in messages[::-1] %}
{%- set index = (messages|length - 1) - loop.index0 %}
{%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
{%- set ns.multi_step_tool = false %}
{%- set ns.last_query_index = index %}
{%- endif %}
{%- endfor %}
{%- for message in messages %}
{%- if message.content is string %}
{%- set content = message.content %}
{%- else %}
{%- set content = '' %}
{%- endif %}
{%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
{{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
{%- elif message.role == "assistant" %}
{%- set reasoning_content = '' %}
{%- if message.reasoning_content is string %}
{%- set reasoning_content = message.reasoning_content %}
{%- else %}
{%- if '</think>' in content %}
{%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
{%- set content = content.split('</think>')[-1].lstrip('\n') %}
{%- endif %}
{%- endif %}
{%- if loop.index0 > ns.last_query_index %}
{%- if loop.last or (not loop.last and reasoning_content) %}
{{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
{%- else %}
{{- '<|im_start|>' + message.role + '\n' + content }}
{%- endif %}
{%- else %}
{{- '<|im_start|>' + message.role + '\n' + content }}
{%- endif %}
{%- if message.tool_calls %}
{%- for tool_call in message.tool_calls %}
{%- if (loop.first and content) or (not loop.first) %}
{{- '\n' }}
{%- endif %}
{%- if tool_call.function %}
{%- set tool_call = tool_call.function %}
{%- endif %}
{{- '<tool_call>\n{"name": "' }}
{{- tool_call.name }}
{{- '", "arguments": ' }}
{%- if tool_call.arguments is string %}
{{- tool_call.arguments }}
{%- else %}
{{- tool_call.arguments | tojson }}
{%- endif %}
{{- '}\n</tool_call>' }}
{%- endfor %}
{%- endif %}
{{- '<|im_end|>\n' }}
{%- elif message.role == "tool" %}
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
{{- '<|im_start|>user' }}
{%- endif %}
{{- '\n<tool_response>\n' }}
{{- content }}
{{- '\n</tool_response>' }}
{%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
{{- '<|im_end|>\n' }}
{%- endif %}
{%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
{{- '<|im_start|>assistant\n' }}
{%- if enable_thinking is defined and enable_thinking is false %}
{{- '<think>\n\n</think>\n\n' }}
{%- endif %}
{%- endif %}

100
config.json Normal file
View File

@@ -0,0 +1,100 @@
{
"architectures": [
"Qwen3ForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"dtype": "bfloat16",
"eos_token_id": 151645,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 5120,
"initializer_range": 0.02,
"intermediate_size": 25600,
"layer_types": [
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention"
],
"max_position_embeddings": 163840,
"max_window_layers": 64,
"model_type": "qwen3",
"num_attention_heads": 64,
"num_hidden_layers": 64,
"num_key_value_heads": 8,
"pad_token_id": 151643,
"rms_norm_eps": 1e-06,
"rope_scaling": {
"factor": 4.0,
"original_max_position_embeddings": 32768,
"rope_type": "yarn"
},
"rope_theta": 1000000,
"sliding_window": null,
"tie_word_embeddings": false,
"transformers_version": "4.57.6",
"use_cache": false,
"use_sliding_window": false,
"vocab_size": 151936
}

12
generation_config.json Normal file
View File

@@ -0,0 +1,12 @@
{
"do_sample": true,
"eos_token_id": [
151645,
151643
],
"pad_token_id": 151643,
"temperature": 0.6,
"top_k": 20,
"top_p": 0.95,
"transformers_version": "4.57.6"
}

151388
merges.txt Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:621b7f3078463e6c0d1e671dffdd13262701cc14dfd6c6f968e21c0d199f59f1
size 4932307584

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2a5f6d1b1c0f75d2688c38b4e32d81f8c440eecbfb22fc27d8ab125acc558a8e
size 4875989696

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e69c555d3ffacfeceed4439369b739bea00ad9782105b2bb8609fb2806fbbfe4
size 4875989720

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9be4f4ca72949045bd104bbde7c65736f3a9a8385c4a9eb8ca51e118b97e537f
size 4875989752

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a225ceb76bda7cda34397f67bec66e51ef8d0ce103b5389743c4acea64e3c927
size 4875989752

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a9102c0bb2188ea361073620fafa5580c56ba2f6f250650da8d89f24d9c1bb80
size 4875989752

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c5aacc24d0efa0c67625359167c293653db63bf71e072775f3cd42e078bf5cdf
size 4875989752

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:92127c84fb4c69d6dc7ea217ff9f92cb83767a2869aaa6e7fdc80b9058c425f5
size 4875989752

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6bab2e4f671531fd1b791171fe28811276535b91f699d79801c8864d6f836634
size 4875989752

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:996834d91b2d78522fd81b6f20cef6bc09e0df3ca019fa2ac3400e8d04a1363a
size 4875989752

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a37b6bb0f36ccee80bbd370536644a467195c1c40b1f2784eac0b250f70bb616
size 4875989752

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:bcc1054c37f7ed0fc145633e9dcb851e1be5606f27bd2fa5532f37c7f2994a13
size 4875989752

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c471ef57beadc93ad53c616ba16a626cfc4b43a4cc7f4b6432f6508d774987f8
size 4875989752

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f45e20ce67629739defcf9f480266dceb97851a4cc9bd695cb46675cdd5838d2
size 2080144040

View File

@@ -0,0 +1,715 @@
{
"metadata": {
"total_parameters": 676864,
"total_size": 65524246528
},
"weight_map": {
"lm_head.weight": "model-00014-of-00014.safetensors",
"model.embed_tokens.weight": "model-00001-of-00014.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00014.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00014.safetensors",
"model.layers.0.self_attn.k_norm.weight": "model-00001-of-00014.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.0.self_attn.q_norm.weight": "model-00001-of-00014.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00014.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00014.safetensors",
"model.layers.1.self_attn.k_norm.weight": "model-00001-of-00014.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.1.self_attn.q_norm.weight": "model-00001-of-00014.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.10.input_layernorm.weight": "model-00003-of-00014.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00003-of-00014.safetensors",
"model.layers.10.self_attn.k_norm.weight": "model-00003-of-00014.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.10.self_attn.q_norm.weight": "model-00003-of-00014.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.11.input_layernorm.weight": "model-00003-of-00014.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00003-of-00014.safetensors",
"model.layers.11.self_attn.k_norm.weight": "model-00003-of-00014.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.11.self_attn.q_norm.weight": "model-00003-of-00014.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.12.input_layernorm.weight": "model-00003-of-00014.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00003-of-00014.safetensors",
"model.layers.12.self_attn.k_norm.weight": "model-00003-of-00014.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.12.self_attn.q_norm.weight": "model-00003-of-00014.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.13.input_layernorm.weight": "model-00004-of-00014.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00004-of-00014.safetensors",
"model.layers.13.self_attn.k_norm.weight": "model-00003-of-00014.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.13.self_attn.q_norm.weight": "model-00003-of-00014.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.14.input_layernorm.weight": "model-00004-of-00014.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00004-of-00014.safetensors",
"model.layers.14.self_attn.k_norm.weight": "model-00004-of-00014.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.14.self_attn.q_norm.weight": "model-00004-of-00014.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.15.input_layernorm.weight": "model-00004-of-00014.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00004-of-00014.safetensors",
"model.layers.15.self_attn.k_norm.weight": "model-00004-of-00014.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.15.self_attn.q_norm.weight": "model-00004-of-00014.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.16.input_layernorm.weight": "model-00004-of-00014.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00004-of-00014.safetensors",
"model.layers.16.self_attn.k_norm.weight": "model-00004-of-00014.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.16.self_attn.q_norm.weight": "model-00004-of-00014.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.17.input_layernorm.weight": "model-00004-of-00014.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00004-of-00014.safetensors",
"model.layers.17.self_attn.k_norm.weight": "model-00004-of-00014.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.17.self_attn.q_norm.weight": "model-00004-of-00014.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.18.input_layernorm.weight": "model-00005-of-00014.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00005-of-00014.safetensors",
"model.layers.18.self_attn.k_norm.weight": "model-00004-of-00014.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.18.self_attn.q_norm.weight": "model-00004-of-00014.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.19.input_layernorm.weight": "model-00005-of-00014.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00005-of-00014.safetensors",
"model.layers.19.self_attn.k_norm.weight": "model-00005-of-00014.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.19.self_attn.q_norm.weight": "model-00005-of-00014.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00014.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00014.safetensors",
"model.layers.2.self_attn.k_norm.weight": "model-00001-of-00014.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.2.self_attn.q_norm.weight": "model-00001-of-00014.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.20.input_layernorm.weight": "model-00005-of-00014.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00005-of-00014.safetensors",
"model.layers.20.self_attn.k_norm.weight": "model-00005-of-00014.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.20.self_attn.q_norm.weight": "model-00005-of-00014.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.21.input_layernorm.weight": "model-00005-of-00014.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00005-of-00014.safetensors",
"model.layers.21.self_attn.k_norm.weight": "model-00005-of-00014.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.21.self_attn.q_norm.weight": "model-00005-of-00014.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.22.input_layernorm.weight": "model-00005-of-00014.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00005-of-00014.safetensors",
"model.layers.22.self_attn.k_norm.weight": "model-00005-of-00014.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.22.self_attn.q_norm.weight": "model-00005-of-00014.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.23.input_layernorm.weight": "model-00006-of-00014.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00006-of-00014.safetensors",
"model.layers.23.self_attn.k_norm.weight": "model-00005-of-00014.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.23.self_attn.q_norm.weight": "model-00005-of-00014.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.24.input_layernorm.weight": "model-00006-of-00014.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00006-of-00014.safetensors",
"model.layers.24.self_attn.k_norm.weight": "model-00006-of-00014.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.24.self_attn.q_norm.weight": "model-00006-of-00014.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.25.input_layernorm.weight": "model-00006-of-00014.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00006-of-00014.safetensors",
"model.layers.25.self_attn.k_norm.weight": "model-00006-of-00014.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.25.self_attn.q_norm.weight": "model-00006-of-00014.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.26.input_layernorm.weight": "model-00006-of-00014.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00006-of-00014.safetensors",
"model.layers.26.self_attn.k_norm.weight": "model-00006-of-00014.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.26.self_attn.q_norm.weight": "model-00006-of-00014.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.27.input_layernorm.weight": "model-00006-of-00014.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00006-of-00014.safetensors",
"model.layers.27.self_attn.k_norm.weight": "model-00006-of-00014.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.27.self_attn.q_norm.weight": "model-00006-of-00014.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.28.input_layernorm.weight": "model-00007-of-00014.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00007-of-00014.safetensors",
"model.layers.28.self_attn.k_norm.weight": "model-00006-of-00014.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.28.self_attn.q_norm.weight": "model-00006-of-00014.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.29.input_layernorm.weight": "model-00007-of-00014.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00007-of-00014.safetensors",
"model.layers.29.self_attn.k_norm.weight": "model-00007-of-00014.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.29.self_attn.q_norm.weight": "model-00007-of-00014.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.3.input_layernorm.weight": "model-00002-of-00014.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00002-of-00014.safetensors",
"model.layers.3.self_attn.k_norm.weight": "model-00001-of-00014.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.3.self_attn.q_norm.weight": "model-00001-of-00014.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.30.input_layernorm.weight": "model-00007-of-00014.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00007-of-00014.safetensors",
"model.layers.30.self_attn.k_norm.weight": "model-00007-of-00014.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.30.self_attn.q_norm.weight": "model-00007-of-00014.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.31.input_layernorm.weight": "model-00007-of-00014.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00007-of-00014.safetensors",
"model.layers.31.self_attn.k_norm.weight": "model-00007-of-00014.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.31.self_attn.q_norm.weight": "model-00007-of-00014.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.32.input_layernorm.weight": "model-00007-of-00014.safetensors",
"model.layers.32.mlp.down_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.32.mlp.gate_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.32.mlp.up_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.32.post_attention_layernorm.weight": "model-00007-of-00014.safetensors",
"model.layers.32.self_attn.k_norm.weight": "model-00007-of-00014.safetensors",
"model.layers.32.self_attn.k_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.32.self_attn.o_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.32.self_attn.q_norm.weight": "model-00007-of-00014.safetensors",
"model.layers.32.self_attn.q_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.32.self_attn.v_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.33.input_layernorm.weight": "model-00008-of-00014.safetensors",
"model.layers.33.mlp.down_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.33.mlp.gate_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.33.mlp.up_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.33.post_attention_layernorm.weight": "model-00008-of-00014.safetensors",
"model.layers.33.self_attn.k_norm.weight": "model-00007-of-00014.safetensors",
"model.layers.33.self_attn.k_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.33.self_attn.o_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.33.self_attn.q_norm.weight": "model-00007-of-00014.safetensors",
"model.layers.33.self_attn.q_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.33.self_attn.v_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.34.input_layernorm.weight": "model-00008-of-00014.safetensors",
"model.layers.34.mlp.down_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.34.mlp.gate_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.34.mlp.up_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.34.post_attention_layernorm.weight": "model-00008-of-00014.safetensors",
"model.layers.34.self_attn.k_norm.weight": "model-00008-of-00014.safetensors",
"model.layers.34.self_attn.k_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.34.self_attn.o_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.34.self_attn.q_norm.weight": "model-00008-of-00014.safetensors",
"model.layers.34.self_attn.q_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.34.self_attn.v_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.35.input_layernorm.weight": "model-00008-of-00014.safetensors",
"model.layers.35.mlp.down_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.35.mlp.gate_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.35.mlp.up_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.35.post_attention_layernorm.weight": "model-00008-of-00014.safetensors",
"model.layers.35.self_attn.k_norm.weight": "model-00008-of-00014.safetensors",
"model.layers.35.self_attn.k_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.35.self_attn.o_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.35.self_attn.q_norm.weight": "model-00008-of-00014.safetensors",
"model.layers.35.self_attn.q_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.35.self_attn.v_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.36.input_layernorm.weight": "model-00008-of-00014.safetensors",
"model.layers.36.mlp.down_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.36.mlp.gate_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.36.mlp.up_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.36.post_attention_layernorm.weight": "model-00008-of-00014.safetensors",
"model.layers.36.self_attn.k_norm.weight": "model-00008-of-00014.safetensors",
"model.layers.36.self_attn.k_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.36.self_attn.o_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.36.self_attn.q_norm.weight": "model-00008-of-00014.safetensors",
"model.layers.36.self_attn.q_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.36.self_attn.v_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.37.input_layernorm.weight": "model-00008-of-00014.safetensors",
"model.layers.37.mlp.down_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.37.mlp.gate_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.37.mlp.up_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.37.post_attention_layernorm.weight": "model-00008-of-00014.safetensors",
"model.layers.37.self_attn.k_norm.weight": "model-00008-of-00014.safetensors",
"model.layers.37.self_attn.k_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.37.self_attn.o_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.37.self_attn.q_norm.weight": "model-00008-of-00014.safetensors",
"model.layers.37.self_attn.q_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.37.self_attn.v_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.38.input_layernorm.weight": "model-00009-of-00014.safetensors",
"model.layers.38.mlp.down_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.38.mlp.gate_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.38.mlp.up_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.38.post_attention_layernorm.weight": "model-00009-of-00014.safetensors",
"model.layers.38.self_attn.k_norm.weight": "model-00008-of-00014.safetensors",
"model.layers.38.self_attn.k_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.38.self_attn.o_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.38.self_attn.q_norm.weight": "model-00008-of-00014.safetensors",
"model.layers.38.self_attn.q_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.38.self_attn.v_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.39.input_layernorm.weight": "model-00009-of-00014.safetensors",
"model.layers.39.mlp.down_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.39.mlp.gate_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.39.mlp.up_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.39.post_attention_layernorm.weight": "model-00009-of-00014.safetensors",
"model.layers.39.self_attn.k_norm.weight": "model-00009-of-00014.safetensors",
"model.layers.39.self_attn.k_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.39.self_attn.o_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.39.self_attn.q_norm.weight": "model-00009-of-00014.safetensors",
"model.layers.39.self_attn.q_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.39.self_attn.v_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.4.input_layernorm.weight": "model-00002-of-00014.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00002-of-00014.safetensors",
"model.layers.4.self_attn.k_norm.weight": "model-00002-of-00014.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.4.self_attn.q_norm.weight": "model-00002-of-00014.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.40.input_layernorm.weight": "model-00009-of-00014.safetensors",
"model.layers.40.mlp.down_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.40.mlp.gate_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.40.mlp.up_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.40.post_attention_layernorm.weight": "model-00009-of-00014.safetensors",
"model.layers.40.self_attn.k_norm.weight": "model-00009-of-00014.safetensors",
"model.layers.40.self_attn.k_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.40.self_attn.o_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.40.self_attn.q_norm.weight": "model-00009-of-00014.safetensors",
"model.layers.40.self_attn.q_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.40.self_attn.v_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.41.input_layernorm.weight": "model-00009-of-00014.safetensors",
"model.layers.41.mlp.down_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.41.mlp.gate_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.41.mlp.up_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.41.post_attention_layernorm.weight": "model-00009-of-00014.safetensors",
"model.layers.41.self_attn.k_norm.weight": "model-00009-of-00014.safetensors",
"model.layers.41.self_attn.k_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.41.self_attn.o_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.41.self_attn.q_norm.weight": "model-00009-of-00014.safetensors",
"model.layers.41.self_attn.q_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.41.self_attn.v_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.42.input_layernorm.weight": "model-00009-of-00014.safetensors",
"model.layers.42.mlp.down_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.42.mlp.gate_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.42.mlp.up_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.42.post_attention_layernorm.weight": "model-00009-of-00014.safetensors",
"model.layers.42.self_attn.k_norm.weight": "model-00009-of-00014.safetensors",
"model.layers.42.self_attn.k_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.42.self_attn.o_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.42.self_attn.q_norm.weight": "model-00009-of-00014.safetensors",
"model.layers.42.self_attn.q_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.42.self_attn.v_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.43.input_layernorm.weight": "model-00010-of-00014.safetensors",
"model.layers.43.mlp.down_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.43.mlp.gate_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.43.mlp.up_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.43.post_attention_layernorm.weight": "model-00010-of-00014.safetensors",
"model.layers.43.self_attn.k_norm.weight": "model-00009-of-00014.safetensors",
"model.layers.43.self_attn.k_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.43.self_attn.o_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.43.self_attn.q_norm.weight": "model-00009-of-00014.safetensors",
"model.layers.43.self_attn.q_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.43.self_attn.v_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.44.input_layernorm.weight": "model-00010-of-00014.safetensors",
"model.layers.44.mlp.down_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.44.mlp.gate_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.44.mlp.up_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.44.post_attention_layernorm.weight": "model-00010-of-00014.safetensors",
"model.layers.44.self_attn.k_norm.weight": "model-00010-of-00014.safetensors",
"model.layers.44.self_attn.k_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.44.self_attn.o_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.44.self_attn.q_norm.weight": "model-00010-of-00014.safetensors",
"model.layers.44.self_attn.q_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.44.self_attn.v_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.45.input_layernorm.weight": "model-00010-of-00014.safetensors",
"model.layers.45.mlp.down_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.45.mlp.gate_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.45.mlp.up_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.45.post_attention_layernorm.weight": "model-00010-of-00014.safetensors",
"model.layers.45.self_attn.k_norm.weight": "model-00010-of-00014.safetensors",
"model.layers.45.self_attn.k_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.45.self_attn.o_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.45.self_attn.q_norm.weight": "model-00010-of-00014.safetensors",
"model.layers.45.self_attn.q_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.45.self_attn.v_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.46.input_layernorm.weight": "model-00010-of-00014.safetensors",
"model.layers.46.mlp.down_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.46.mlp.gate_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.46.mlp.up_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.46.post_attention_layernorm.weight": "model-00010-of-00014.safetensors",
"model.layers.46.self_attn.k_norm.weight": "model-00010-of-00014.safetensors",
"model.layers.46.self_attn.k_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.46.self_attn.o_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.46.self_attn.q_norm.weight": "model-00010-of-00014.safetensors",
"model.layers.46.self_attn.q_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.46.self_attn.v_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.47.input_layernorm.weight": "model-00010-of-00014.safetensors",
"model.layers.47.mlp.down_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.47.mlp.gate_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.47.mlp.up_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.47.post_attention_layernorm.weight": "model-00010-of-00014.safetensors",
"model.layers.47.self_attn.k_norm.weight": "model-00010-of-00014.safetensors",
"model.layers.47.self_attn.k_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.47.self_attn.o_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.47.self_attn.q_norm.weight": "model-00010-of-00014.safetensors",
"model.layers.47.self_attn.q_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.47.self_attn.v_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.48.input_layernorm.weight": "model-00011-of-00014.safetensors",
"model.layers.48.mlp.down_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.48.mlp.gate_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.48.mlp.up_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.48.post_attention_layernorm.weight": "model-00011-of-00014.safetensors",
"model.layers.48.self_attn.k_norm.weight": "model-00010-of-00014.safetensors",
"model.layers.48.self_attn.k_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.48.self_attn.o_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.48.self_attn.q_norm.weight": "model-00010-of-00014.safetensors",
"model.layers.48.self_attn.q_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.48.self_attn.v_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.49.input_layernorm.weight": "model-00011-of-00014.safetensors",
"model.layers.49.mlp.down_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.49.mlp.gate_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.49.mlp.up_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.49.post_attention_layernorm.weight": "model-00011-of-00014.safetensors",
"model.layers.49.self_attn.k_norm.weight": "model-00011-of-00014.safetensors",
"model.layers.49.self_attn.k_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.49.self_attn.o_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.49.self_attn.q_norm.weight": "model-00011-of-00014.safetensors",
"model.layers.49.self_attn.q_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.49.self_attn.v_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.5.input_layernorm.weight": "model-00002-of-00014.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00002-of-00014.safetensors",
"model.layers.5.self_attn.k_norm.weight": "model-00002-of-00014.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.5.self_attn.q_norm.weight": "model-00002-of-00014.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.50.input_layernorm.weight": "model-00011-of-00014.safetensors",
"model.layers.50.mlp.down_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.50.mlp.gate_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.50.mlp.up_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.50.post_attention_layernorm.weight": "model-00011-of-00014.safetensors",
"model.layers.50.self_attn.k_norm.weight": "model-00011-of-00014.safetensors",
"model.layers.50.self_attn.k_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.50.self_attn.o_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.50.self_attn.q_norm.weight": "model-00011-of-00014.safetensors",
"model.layers.50.self_attn.q_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.50.self_attn.v_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.51.input_layernorm.weight": "model-00011-of-00014.safetensors",
"model.layers.51.mlp.down_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.51.mlp.gate_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.51.mlp.up_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.51.post_attention_layernorm.weight": "model-00011-of-00014.safetensors",
"model.layers.51.self_attn.k_norm.weight": "model-00011-of-00014.safetensors",
"model.layers.51.self_attn.k_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.51.self_attn.o_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.51.self_attn.q_norm.weight": "model-00011-of-00014.safetensors",
"model.layers.51.self_attn.q_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.51.self_attn.v_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.52.input_layernorm.weight": "model-00011-of-00014.safetensors",
"model.layers.52.mlp.down_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.52.mlp.gate_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.52.mlp.up_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.52.post_attention_layernorm.weight": "model-00011-of-00014.safetensors",
"model.layers.52.self_attn.k_norm.weight": "model-00011-of-00014.safetensors",
"model.layers.52.self_attn.k_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.52.self_attn.o_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.52.self_attn.q_norm.weight": "model-00011-of-00014.safetensors",
"model.layers.52.self_attn.q_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.52.self_attn.v_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.53.input_layernorm.weight": "model-00012-of-00014.safetensors",
"model.layers.53.mlp.down_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.53.mlp.gate_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.53.mlp.up_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.53.post_attention_layernorm.weight": "model-00012-of-00014.safetensors",
"model.layers.53.self_attn.k_norm.weight": "model-00011-of-00014.safetensors",
"model.layers.53.self_attn.k_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.53.self_attn.o_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.53.self_attn.q_norm.weight": "model-00011-of-00014.safetensors",
"model.layers.53.self_attn.q_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.53.self_attn.v_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.54.input_layernorm.weight": "model-00012-of-00014.safetensors",
"model.layers.54.mlp.down_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.54.mlp.gate_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.54.mlp.up_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.54.post_attention_layernorm.weight": "model-00012-of-00014.safetensors",
"model.layers.54.self_attn.k_norm.weight": "model-00012-of-00014.safetensors",
"model.layers.54.self_attn.k_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.54.self_attn.o_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.54.self_attn.q_norm.weight": "model-00012-of-00014.safetensors",
"model.layers.54.self_attn.q_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.54.self_attn.v_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.55.input_layernorm.weight": "model-00012-of-00014.safetensors",
"model.layers.55.mlp.down_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.55.mlp.gate_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.55.mlp.up_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.55.post_attention_layernorm.weight": "model-00012-of-00014.safetensors",
"model.layers.55.self_attn.k_norm.weight": "model-00012-of-00014.safetensors",
"model.layers.55.self_attn.k_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.55.self_attn.o_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.55.self_attn.q_norm.weight": "model-00012-of-00014.safetensors",
"model.layers.55.self_attn.q_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.55.self_attn.v_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.56.input_layernorm.weight": "model-00012-of-00014.safetensors",
"model.layers.56.mlp.down_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.56.mlp.gate_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.56.mlp.up_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.56.post_attention_layernorm.weight": "model-00012-of-00014.safetensors",
"model.layers.56.self_attn.k_norm.weight": "model-00012-of-00014.safetensors",
"model.layers.56.self_attn.k_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.56.self_attn.o_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.56.self_attn.q_norm.weight": "model-00012-of-00014.safetensors",
"model.layers.56.self_attn.q_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.56.self_attn.v_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.57.input_layernorm.weight": "model-00012-of-00014.safetensors",
"model.layers.57.mlp.down_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.57.mlp.gate_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.57.mlp.up_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.57.post_attention_layernorm.weight": "model-00012-of-00014.safetensors",
"model.layers.57.self_attn.k_norm.weight": "model-00012-of-00014.safetensors",
"model.layers.57.self_attn.k_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.57.self_attn.o_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.57.self_attn.q_norm.weight": "model-00012-of-00014.safetensors",
"model.layers.57.self_attn.q_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.57.self_attn.v_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.58.input_layernorm.weight": "model-00013-of-00014.safetensors",
"model.layers.58.mlp.down_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.58.mlp.gate_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.58.mlp.up_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.58.post_attention_layernorm.weight": "model-00013-of-00014.safetensors",
"model.layers.58.self_attn.k_norm.weight": "model-00012-of-00014.safetensors",
"model.layers.58.self_attn.k_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.58.self_attn.o_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.58.self_attn.q_norm.weight": "model-00012-of-00014.safetensors",
"model.layers.58.self_attn.q_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.58.self_attn.v_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.59.input_layernorm.weight": "model-00013-of-00014.safetensors",
"model.layers.59.mlp.down_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.59.mlp.gate_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.59.mlp.up_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.59.post_attention_layernorm.weight": "model-00013-of-00014.safetensors",
"model.layers.59.self_attn.k_norm.weight": "model-00013-of-00014.safetensors",
"model.layers.59.self_attn.k_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.59.self_attn.o_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.59.self_attn.q_norm.weight": "model-00013-of-00014.safetensors",
"model.layers.59.self_attn.q_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.59.self_attn.v_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.6.input_layernorm.weight": "model-00002-of-00014.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00002-of-00014.safetensors",
"model.layers.6.self_attn.k_norm.weight": "model-00002-of-00014.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.6.self_attn.q_norm.weight": "model-00002-of-00014.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.60.input_layernorm.weight": "model-00013-of-00014.safetensors",
"model.layers.60.mlp.down_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.60.mlp.gate_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.60.mlp.up_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.60.post_attention_layernorm.weight": "model-00013-of-00014.safetensors",
"model.layers.60.self_attn.k_norm.weight": "model-00013-of-00014.safetensors",
"model.layers.60.self_attn.k_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.60.self_attn.o_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.60.self_attn.q_norm.weight": "model-00013-of-00014.safetensors",
"model.layers.60.self_attn.q_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.60.self_attn.v_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.61.input_layernorm.weight": "model-00013-of-00014.safetensors",
"model.layers.61.mlp.down_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.61.mlp.gate_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.61.mlp.up_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.61.post_attention_layernorm.weight": "model-00013-of-00014.safetensors",
"model.layers.61.self_attn.k_norm.weight": "model-00013-of-00014.safetensors",
"model.layers.61.self_attn.k_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.61.self_attn.o_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.61.self_attn.q_norm.weight": "model-00013-of-00014.safetensors",
"model.layers.61.self_attn.q_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.61.self_attn.v_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.62.input_layernorm.weight": "model-00013-of-00014.safetensors",
"model.layers.62.mlp.down_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.62.mlp.gate_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.62.mlp.up_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.62.post_attention_layernorm.weight": "model-00013-of-00014.safetensors",
"model.layers.62.self_attn.k_norm.weight": "model-00013-of-00014.safetensors",
"model.layers.62.self_attn.k_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.62.self_attn.o_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.62.self_attn.q_norm.weight": "model-00013-of-00014.safetensors",
"model.layers.62.self_attn.q_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.62.self_attn.v_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.63.input_layernorm.weight": "model-00014-of-00014.safetensors",
"model.layers.63.mlp.down_proj.weight": "model-00014-of-00014.safetensors",
"model.layers.63.mlp.gate_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.63.mlp.up_proj.weight": "model-00014-of-00014.safetensors",
"model.layers.63.post_attention_layernorm.weight": "model-00014-of-00014.safetensors",
"model.layers.63.self_attn.k_norm.weight": "model-00013-of-00014.safetensors",
"model.layers.63.self_attn.k_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.63.self_attn.o_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.63.self_attn.q_norm.weight": "model-00013-of-00014.safetensors",
"model.layers.63.self_attn.q_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.63.self_attn.v_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.7.input_layernorm.weight": "model-00002-of-00014.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00002-of-00014.safetensors",
"model.layers.7.self_attn.k_norm.weight": "model-00002-of-00014.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.7.self_attn.q_norm.weight": "model-00002-of-00014.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.8.input_layernorm.weight": "model-00003-of-00014.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00003-of-00014.safetensors",
"model.layers.8.self_attn.k_norm.weight": "model-00002-of-00014.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.8.self_attn.q_norm.weight": "model-00002-of-00014.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.9.input_layernorm.weight": "model-00003-of-00014.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00003-of-00014.safetensors",
"model.layers.9.self_attn.k_norm.weight": "model-00003-of-00014.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.9.self_attn.q_norm.weight": "model-00003-of-00014.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00003-of-00014.safetensors",
"model.norm.weight": "model-00014-of-00014.safetensors"
}
}

12
run_summary.json Normal file
View File

@@ -0,0 +1,12 @@
{
"agent_name": "bc7a253d567261d84db295a138b8af86eac6ae4c_thinking_preprocessed",
"training_start": null,
"training_end": null,
"created_by": "DCAgent",
"base_model_name": "Qwen/Qwen3-32B",
"dataset_name": "/e/data1/datasets/playground/ot/hf_hub/datasets--penfever--glm46-neulab-agenttuning-alfworld-sandboxes-maxeps-131k/snapshots/fdb0d0afe08aa3c31c7605b40c18d5e48fdc206c_thinking_preprocessed,/e/data1/datasets/playground/ot/hf_hub/datasets--penfever--glm46-swesmith-maxeps-131k/snapshots/4d4c2d4a9d21f73870ed31c7bc6028035b3b6ca7_thinking_preprocessed,/e/data1/datasets/playground/ot/hf_hub/datasets--DCAgent2--GLM-4.7-r2egym_sandboxes-maxeps-131k/snapshots/167ff86e8203fa2412574480bf52623cb62320e8_thinking_preprocessed,/e/data1/datasets/playground/ot/hf_hub/datasets--DCAgent2--glm46-swegym-tasks-maxeps-131k/snapshots/bc7a253d567261d84db295a138b8af86eac6ae4c_thinking_preprocessed",
"training_type": "SFT",
"training_parameters": "https://huggingface.co/laion/alfworld-swesmith-r2egym-swegym-131k-32B-lc/blob/main/config.json",
"wandb_link": null,
"traces_location_s3": null
}

31
special_tokens_map.json Normal file
View File

@@ -0,0 +1,31 @@
{
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"eos_token": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

3
tokenizer.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
size 11422654

240
tokenizer_config.json Normal file
View File

@@ -0,0 +1,240 @@
{
"add_bos_token": false,
"add_prefix_space": false,
"added_tokens_decoder": {
"151643": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151644": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151645": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151646": {
"content": "<|object_ref_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151647": {
"content": "<|object_ref_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151648": {
"content": "<|box_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151649": {
"content": "<|box_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151650": {
"content": "<|quad_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151651": {
"content": "<|quad_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151652": {
"content": "<|vision_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151653": {
"content": "<|vision_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151654": {
"content": "<|vision_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151655": {
"content": "<|image_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151656": {
"content": "<|video_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151657": {
"content": "<tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151658": {
"content": "</tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151659": {
"content": "<|fim_prefix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151660": {
"content": "<|fim_middle|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151661": {
"content": "<|fim_suffix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151662": {
"content": "<|fim_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151663": {
"content": "<|repo_name|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151664": {
"content": "<|file_sep|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151665": {
"content": "<tool_response>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151666": {
"content": "</tool_response>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151667": {
"content": "<think>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151668": {
"content": "</think>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
}
},
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"bos_token": null,
"clean_up_tokenization_spaces": false,
"eos_token": "<|im_end|>",
"errors": "replace",
"extra_special_tokens": {},
"model_max_length": 131072,
"pad_token": "<|endoftext|>",
"padding_side": "right",
"split_special_tokens": false,
"tokenizer_class": "Qwen2Tokenizer",
"unk_token": null
}

12
train_results.json Normal file
View File

@@ -0,0 +1,12 @@
{
"achieved_tflops_per_gpu": 1063.315854388195,
"achieved_tflops_per_gpu_theoretical": 121285192.64972909,
"epoch": 6.597039473684211,
"mfu_percent": 75.14599677655089,
"mfu_percent_theoretical": 8571391.706694635,
"total_flos": 7.249602429950362e+16,
"train_loss": 0.0,
"train_runtime": 1.0653,
"train_samples_per_second": 7990.221,
"train_steps_per_second": 1333.894
}

505
trainer_log.jsonl Normal file
View File

@@ -0,0 +1,505 @@
{"current_steps": 5, "total_steps": 1421, "loss": 0.5476, "lr": 1.118881118881119e-06, "epoch": 0.024671052631578948, "percentage": 0.35, "elapsed_time": "0:18:48", "remaining_time": "3 days, 16:47:09"}
{"current_steps": 10, "total_steps": 1421, "loss": 0.5429, "lr": 2.517482517482518e-06, "epoch": 0.049342105263157895, "percentage": 0.7, "elapsed_time": "0:37:25", "remaining_time": "3 days, 16:01:16"}
{"current_steps": 15, "total_steps": 1421, "loss": 0.5108, "lr": 3.916083916083917e-06, "epoch": 0.07401315789473684, "percentage": 1.06, "elapsed_time": "0:56:02", "remaining_time": "3 days, 15:32:57"}
{"current_steps": 20, "total_steps": 1421, "loss": 0.4949, "lr": 5.314685314685315e-06, "epoch": 0.09868421052631579, "percentage": 1.41, "elapsed_time": "1:14:37", "remaining_time": "3 days, 15:07:55"}
{"current_steps": 25, "total_steps": 1421, "loss": 0.4425, "lr": 6.713286713286714e-06, "epoch": 0.12335526315789473, "percentage": 1.76, "elapsed_time": "1:33:17", "remaining_time": "3 days, 14:48:59"}
{"current_steps": 30, "total_steps": 1421, "loss": 0.4344, "lr": 8.111888111888112e-06, "epoch": 0.14802631578947367, "percentage": 2.11, "elapsed_time": "1:51:54", "remaining_time": "3 days, 14:28:56"}
{"current_steps": 35, "total_steps": 1421, "loss": 0.4256, "lr": 9.510489510489511e-06, "epoch": 0.17269736842105263, "percentage": 2.46, "elapsed_time": "2:10:29", "remaining_time": "3 days, 14:07:20"}
{"current_steps": 40, "total_steps": 1421, "loss": 0.3893, "lr": 1.0909090909090909e-05, "epoch": 0.19736842105263158, "percentage": 2.81, "elapsed_time": "2:29:07", "remaining_time": "3 days, 13:48:41"}
{"current_steps": 45, "total_steps": 1421, "loss": 0.3565, "lr": 1.230769230769231e-05, "epoch": 0.22203947368421054, "percentage": 3.17, "elapsed_time": "2:47:47", "remaining_time": "3 days, 13:30:37"}
{"current_steps": 50, "total_steps": 1421, "loss": 0.3607, "lr": 1.3706293706293707e-05, "epoch": 0.24671052631578946, "percentage": 3.52, "elapsed_time": "3:06:39", "remaining_time": "3 days, 13:18:16"}
{"current_steps": 55, "total_steps": 1421, "loss": 0.336, "lr": 1.5104895104895105e-05, "epoch": 0.2713815789473684, "percentage": 3.87, "elapsed_time": "3:25:19", "remaining_time": "3 days, 12:59:34"}
{"current_steps": 60, "total_steps": 1421, "loss": 0.3261, "lr": 1.6503496503496507e-05, "epoch": 0.29605263157894735, "percentage": 4.22, "elapsed_time": "3:43:51", "remaining_time": "3 days, 12:37:58"}
{"current_steps": 65, "total_steps": 1421, "loss": 0.3038, "lr": 1.7902097902097903e-05, "epoch": 0.3207236842105263, "percentage": 4.57, "elapsed_time": "4:02:26", "remaining_time": "3 days, 12:17:48"}
{"current_steps": 70, "total_steps": 1421, "loss": 0.3026, "lr": 1.9300699300699302e-05, "epoch": 0.34539473684210525, "percentage": 4.93, "elapsed_time": "4:21:03", "remaining_time": "3 days, 11:58:20"}
{"current_steps": 75, "total_steps": 1421, "loss": 0.2916, "lr": 2.06993006993007e-05, "epoch": 0.37006578947368424, "percentage": 5.28, "elapsed_time": "4:39:40", "remaining_time": "3 days, 11:39:05"}
{"current_steps": 80, "total_steps": 1421, "loss": 0.2951, "lr": 2.2097902097902097e-05, "epoch": 0.39473684210526316, "percentage": 5.63, "elapsed_time": "4:58:19", "remaining_time": "3 days, 11:20:45"}
{"current_steps": 85, "total_steps": 1421, "loss": 0.2694, "lr": 2.3496503496503496e-05, "epoch": 0.4194078947368421, "percentage": 5.98, "elapsed_time": "5:16:57", "remaining_time": "3 days, 11:01:43"}
{"current_steps": 90, "total_steps": 1421, "loss": 0.2666, "lr": 2.48951048951049e-05, "epoch": 0.4440789473684211, "percentage": 6.33, "elapsed_time": "5:35:33", "remaining_time": "3 days, 10:42:33"}
{"current_steps": 95, "total_steps": 1421, "loss": 0.2516, "lr": 2.6293706293706294e-05, "epoch": 0.46875, "percentage": 6.69, "elapsed_time": "5:54:13", "remaining_time": "3 days, 10:24:21"}
{"current_steps": 100, "total_steps": 1421, "loss": 0.2534, "lr": 2.7692307692307694e-05, "epoch": 0.4934210526315789, "percentage": 7.04, "elapsed_time": "6:12:48", "remaining_time": "3 days, 10:04:52"}
{"current_steps": 105, "total_steps": 1421, "loss": 0.2346, "lr": 2.9090909090909093e-05, "epoch": 0.5180921052631579, "percentage": 7.39, "elapsed_time": "6:32:06", "remaining_time": "3 days, 9:54:22"}
{"current_steps": 110, "total_steps": 1421, "loss": 0.2508, "lr": 3.048951048951049e-05, "epoch": 0.5427631578947368, "percentage": 7.74, "elapsed_time": "6:50:42", "remaining_time": "3 days, 9:34:57"}
{"current_steps": 115, "total_steps": 1421, "loss": 0.2506, "lr": 3.188811188811189e-05, "epoch": 0.5674342105263158, "percentage": 8.09, "elapsed_time": "7:09:20", "remaining_time": "3 days, 9:15:53"}
{"current_steps": 120, "total_steps": 1421, "loss": 0.2417, "lr": 3.328671328671329e-05, "epoch": 0.5921052631578947, "percentage": 8.44, "elapsed_time": "7:27:53", "remaining_time": "3 days, 8:55:56"}
{"current_steps": 125, "total_steps": 1421, "loss": 0.2338, "lr": 3.468531468531469e-05, "epoch": 0.6167763157894737, "percentage": 8.8, "elapsed_time": "7:46:30", "remaining_time": "3 days, 8:36:43"}
{"current_steps": 130, "total_steps": 1421, "loss": 0.2519, "lr": 3.608391608391609e-05, "epoch": 0.6414473684210527, "percentage": 9.15, "elapsed_time": "8:05:09", "remaining_time": "3 days, 8:18:01"}
{"current_steps": 135, "total_steps": 1421, "loss": 0.476, "lr": 3.748251748251749e-05, "epoch": 0.6661184210526315, "percentage": 9.5, "elapsed_time": "8:24:08", "remaining_time": "3 days, 8:02:20"}
{"current_steps": 140, "total_steps": 1421, "loss": 0.4253, "lr": 3.888111888111888e-05, "epoch": 0.6907894736842105, "percentage": 9.85, "elapsed_time": "8:42:59", "remaining_time": "3 days, 7:45:20"}
{"current_steps": 145, "total_steps": 1421, "loss": 0.3997, "lr": 3.999993957205587e-05, "epoch": 0.7154605263157895, "percentage": 10.2, "elapsed_time": "9:01:43", "remaining_time": "3 days, 7:27:10"}
{"current_steps": 150, "total_steps": 1421, "loss": 0.3802, "lr": 3.999782463235198e-05, "epoch": 0.7401315789473685, "percentage": 10.56, "elapsed_time": "9:20:26", "remaining_time": "3 days, 7:08:49"}
{"current_steps": 155, "total_steps": 1421, "loss": 0.3671, "lr": 3.999268866058499e-05, "epoch": 0.7648026315789473, "percentage": 10.91, "elapsed_time": "9:39:09", "remaining_time": "3 days, 6:50:28"}
{"current_steps": 160, "total_steps": 1421, "loss": 0.377, "lr": 3.9984532432636075e-05, "epoch": 0.7894736842105263, "percentage": 11.26, "elapsed_time": "9:57:53", "remaining_time": "3 days, 6:32:07"}
{"current_steps": 165, "total_steps": 1421, "loss": 0.3691, "lr": 3.997335718065055e-05, "epoch": 0.8141447368421053, "percentage": 11.61, "elapsed_time": "10:16:38", "remaining_time": "3 days, 6:13:56"}
{"current_steps": 170, "total_steps": 1421, "loss": 0.3497, "lr": 3.995916459285176e-05, "epoch": 0.8388157894736842, "percentage": 11.96, "elapsed_time": "10:35:23", "remaining_time": "3 days, 5:55:42"}
{"current_steps": 175, "total_steps": 1421, "loss": 0.3507, "lr": 3.994195681328607e-05, "epoch": 0.8634868421052632, "percentage": 12.32, "elapsed_time": "10:54:07", "remaining_time": "3 days, 5:37:24"}
{"current_steps": 180, "total_steps": 1421, "loss": 0.3345, "lr": 3.99217364414989e-05, "epoch": 0.8881578947368421, "percentage": 12.67, "elapsed_time": "11:12:51", "remaining_time": "3 days, 5:18:56"}
{"current_steps": 185, "total_steps": 1421, "loss": 0.3074, "lr": 3.989850653214208e-05, "epoch": 0.912828947368421, "percentage": 13.02, "elapsed_time": "11:31:34", "remaining_time": "3 days, 5:00:30"}
{"current_steps": 190, "total_steps": 1421, "loss": 0.2908, "lr": 3.987227059451237e-05, "epoch": 0.9375, "percentage": 13.37, "elapsed_time": "11:50:19", "remaining_time": "3 days, 4:42:06"}
{"current_steps": 105, "total_steps": 1421, "loss": 0.3065, "lr": 2.9090909090909093e-05, "epoch": 0.024671052631578948, "percentage": 7.39, "elapsed_time": "0:18:50", "remaining_time": "3:56:09"}
{"current_steps": 110, "total_steps": 1421, "loss": 0.2958, "lr": 3.048951048951049e-05, "epoch": 0.049342105263157895, "percentage": 7.74, "elapsed_time": "0:37:28", "remaining_time": "7:26:33"}
{"current_steps": 115, "total_steps": 1421, "loss": 0.2856, "lr": 3.188811188811189e-05, "epoch": 0.07401315789473684, "percentage": 8.09, "elapsed_time": "0:56:05", "remaining_time": "10:37:02"}
{"current_steps": 120, "total_steps": 1421, "loss": 0.2982, "lr": 3.328671328671329e-05, "epoch": 0.09868421052631579, "percentage": 8.44, "elapsed_time": "1:14:41", "remaining_time": "13:29:51"}
{"current_steps": 125, "total_steps": 1421, "loss": 0.2746, "lr": 3.468531468531469e-05, "epoch": 0.12335526315789473, "percentage": 8.8, "elapsed_time": "1:33:22", "remaining_time": "16:08:06"}
{"current_steps": 130, "total_steps": 1421, "loss": 0.2767, "lr": 3.608391608391609e-05, "epoch": 0.14802631578947367, "percentage": 9.15, "elapsed_time": "1:52:01", "remaining_time": "18:32:32"}
{"current_steps": 135, "total_steps": 1421, "loss": 0.2763, "lr": 3.748251748251749e-05, "epoch": 0.17269736842105263, "percentage": 9.5, "elapsed_time": "2:10:37", "remaining_time": "20:44:15"}
{"current_steps": 140, "total_steps": 1421, "loss": 0.2626, "lr": 3.888111888111888e-05, "epoch": 0.19736842105263158, "percentage": 9.85, "elapsed_time": "2:29:17", "remaining_time": "22:46:00"}
{"current_steps": 145, "total_steps": 1421, "loss": 0.2494, "lr": 3.999993957205587e-05, "epoch": 0.22203947368421054, "percentage": 10.2, "elapsed_time": "2:47:57", "remaining_time": "1 day, 0:38:03"}
{"current_steps": 150, "total_steps": 1421, "loss": 0.2604, "lr": 3.999782463235198e-05, "epoch": 0.24671052631578946, "percentage": 10.56, "elapsed_time": "3:06:51", "remaining_time": "1 day, 2:23:15"}
{"current_steps": 155, "total_steps": 1421, "loss": 0.246, "lr": 3.999268866058499e-05, "epoch": 0.2713815789473684, "percentage": 10.91, "elapsed_time": "3:25:30", "remaining_time": "1 day, 3:58:30"}
{"current_steps": 160, "total_steps": 1421, "loss": 0.2436, "lr": 3.9984532432636075e-05, "epoch": 0.29605263157894735, "percentage": 11.26, "elapsed_time": "3:44:05", "remaining_time": "1 day, 5:26:08"}
{"current_steps": 165, "total_steps": 1421, "loss": 0.2348, "lr": 3.997335718065055e-05, "epoch": 0.3207236842105263, "percentage": 11.61, "elapsed_time": "4:02:42", "remaining_time": "1 day, 6:47:29"}
{"current_steps": 170, "total_steps": 1421, "loss": 0.2395, "lr": 3.995916459285176e-05, "epoch": 0.34539473684210525, "percentage": 11.96, "elapsed_time": "4:21:20", "remaining_time": "1 day, 8:03:12"}
{"current_steps": 175, "total_steps": 1421, "loss": 0.235, "lr": 3.994195681328607e-05, "epoch": 0.37006578947368424, "percentage": 12.32, "elapsed_time": "4:40:00", "remaining_time": "1 day, 9:13:41"}
{"current_steps": 180, "total_steps": 1421, "loss": 0.2436, "lr": 3.99217364414989e-05, "epoch": 0.39473684210526316, "percentage": 12.67, "elapsed_time": "4:58:42", "remaining_time": "1 day, 10:19:27"}
{"current_steps": 185, "total_steps": 1421, "loss": 0.2274, "lr": 3.989850653214208e-05, "epoch": 0.4194078947368421, "percentage": 13.02, "elapsed_time": "5:17:22", "remaining_time": "1 day, 11:20:23"}
{"current_steps": 190, "total_steps": 1421, "loss": 0.2289, "lr": 3.987227059451237e-05, "epoch": 0.4440789473684211, "percentage": 13.37, "elapsed_time": "5:35:59", "remaining_time": "1 day, 12:16:50"}
{"current_steps": 195, "total_steps": 1421, "loss": 0.2179, "lr": 3.984303259202129e-05, "epoch": 0.46875, "percentage": 13.72, "elapsed_time": "5:54:41", "remaining_time": "1 day, 13:09:58"}
{"current_steps": 200, "total_steps": 1421, "loss": 0.2229, "lr": 3.9810796941596414e-05, "epoch": 0.4934210526315789, "percentage": 14.07, "elapsed_time": "6:13:19", "remaining_time": "1 day, 13:59:08"}
{"current_steps": 205, "total_steps": 1421, "loss": 0.2104, "lr": 3.97755685130141e-05, "epoch": 0.5180921052631579, "percentage": 14.43, "elapsed_time": "6:32:40", "remaining_time": "1 day, 14:49:11"}
{"current_steps": 210, "total_steps": 1421, "loss": 0.2282, "lr": 3.973735262816381e-05, "epoch": 0.5427631578947368, "percentage": 14.78, "elapsed_time": "6:51:18", "remaining_time": "1 day, 15:31:54"}
{"current_steps": 215, "total_steps": 1421, "loss": 0.2303, "lr": 3.9696155060244166e-05, "epoch": 0.5674342105263158, "percentage": 15.13, "elapsed_time": "7:10:00", "remaining_time": "1 day, 16:12:00"}
{"current_steps": 220, "total_steps": 1421, "loss": 0.2234, "lr": 3.9651982032890774e-05, "epoch": 0.5921052631578947, "percentage": 15.48, "elapsed_time": "7:28:36", "remaining_time": "1 day, 16:49:00"}
{"current_steps": 225, "total_steps": 1421, "loss": 0.2168, "lr": 3.960484021923606e-05, "epoch": 0.6167763157894737, "percentage": 15.83, "elapsed_time": "7:47:17", "remaining_time": "1 day, 17:23:53"}
{"current_steps": 230, "total_steps": 1421, "loss": 0.2358, "lr": 3.9554736740901163e-05, "epoch": 0.6414473684210527, "percentage": 16.19, "elapsed_time": "8:05:59", "remaining_time": "1 day, 17:56:37"}
{"current_steps": 235, "total_steps": 1421, "loss": 0.462, "lr": 3.950167916692008e-05, "epoch": 0.6661184210526315, "percentage": 16.54, "elapsed_time": "8:25:06", "remaining_time": "1 day, 18:29:12"}
{"current_steps": 240, "total_steps": 1421, "loss": 0.4152, "lr": 3.9445675512596224e-05, "epoch": 0.6907894736842105, "percentage": 16.89, "elapsed_time": "8:43:55", "remaining_time": "1 day, 18:58:10"}
{"current_steps": 245, "total_steps": 1421, "loss": 0.3919, "lr": 3.938673423829159e-05, "epoch": 0.7154605263157895, "percentage": 17.24, "elapsed_time": "9:02:43", "remaining_time": "1 day, 19:25:07"}
{"current_steps": 250, "total_steps": 1421, "loss": 0.3738, "lr": 3.932486424814865e-05, "epoch": 0.7401315789473685, "percentage": 17.59, "elapsed_time": "9:21:29", "remaining_time": "1 day, 19:50:03"}
{"current_steps": 255, "total_steps": 1421, "loss": 0.3626, "lr": 3.92600748887452e-05, "epoch": 0.7648026315789473, "percentage": 17.95, "elapsed_time": "9:40:16", "remaining_time": "1 day, 20:13:20"}
{"current_steps": 260, "total_steps": 1421, "loss": 0.3734, "lr": 3.9192375947682436e-05, "epoch": 0.7894736842105263, "percentage": 18.3, "elapsed_time": "9:59:03", "remaining_time": "1 day, 20:35:02"}
{"current_steps": 265, "total_steps": 1421, "loss": 0.3667, "lr": 3.9121777652106325e-05, "epoch": 0.8141447368421053, "percentage": 18.65, "elapsed_time": "10:17:52", "remaining_time": "1 day, 20:55:18"}
{"current_steps": 270, "total_steps": 1421, "loss": 0.3485, "lr": 3.904829066716263e-05, "epoch": 0.8388157894736842, "percentage": 19.0, "elapsed_time": "10:36:39", "remaining_time": "1 day, 21:14:03"}
{"current_steps": 275, "total_steps": 1421, "loss": 0.3505, "lr": 3.8971926094385725e-05, "epoch": 0.8634868421052632, "percentage": 19.35, "elapsed_time": "10:55:28", "remaining_time": "1 day, 21:31:31"}
{"current_steps": 280, "total_steps": 1421, "loss": 0.3252, "lr": 3.889269547002153e-05, "epoch": 0.8881578947368421, "percentage": 19.7, "elapsed_time": "11:14:16", "remaining_time": "1 day, 21:47:39"}
{"current_steps": 285, "total_steps": 1421, "loss": 0.2997, "lr": 3.881061076328475e-05, "epoch": 0.912828947368421, "percentage": 20.06, "elapsed_time": "11:33:04", "remaining_time": "1 day, 22:02:33"}
{"current_steps": 205, "total_steps": 1421, "loss": 0.2447, "lr": 3.97755685130141e-05, "epoch": 0.024671052631578948, "percentage": 14.43, "elapsed_time": "0:18:50", "remaining_time": "1:51:46"}
{"current_steps": 210, "total_steps": 1421, "loss": 0.2418, "lr": 3.973735262816381e-05, "epoch": 0.049342105263157895, "percentage": 14.78, "elapsed_time": "0:37:27", "remaining_time": "3:36:03"}
{"current_steps": 215, "total_steps": 1421, "loss": 0.238, "lr": 3.9696155060244166e-05, "epoch": 0.07401315789473684, "percentage": 15.13, "elapsed_time": "0:56:07", "remaining_time": "5:14:50"}
{"current_steps": 220, "total_steps": 1421, "loss": 0.2522, "lr": 3.9651982032890774e-05, "epoch": 0.09868421052631579, "percentage": 15.48, "elapsed_time": "1:14:43", "remaining_time": "6:47:57"}
{"current_steps": 225, "total_steps": 1421, "loss": 0.2365, "lr": 3.960484021923606e-05, "epoch": 0.12335526315789473, "percentage": 15.83, "elapsed_time": "1:33:24", "remaining_time": "8:16:33"}
{"current_steps": 230, "total_steps": 1421, "loss": 0.2417, "lr": 3.9554736740901163e-05, "epoch": 0.14802631578947367, "percentage": 16.19, "elapsed_time": "1:52:02", "remaining_time": "9:40:13"}
{"current_steps": 235, "total_steps": 1421, "loss": 0.2442, "lr": 3.950167916692008e-05, "epoch": 0.17269736842105263, "percentage": 16.54, "elapsed_time": "2:10:37", "remaining_time": "10:59:16"}
{"current_steps": 240, "total_steps": 1421, "loss": 0.2347, "lr": 3.9445675512596224e-05, "epoch": 0.19736842105263158, "percentage": 16.89, "elapsed_time": "2:29:18", "remaining_time": "12:14:44"}
{"current_steps": 245, "total_steps": 1421, "loss": 0.2254, "lr": 3.938673423829159e-05, "epoch": 0.22203947368421054, "percentage": 17.24, "elapsed_time": "2:48:00", "remaining_time": "13:26:28"}
{"current_steps": 250, "total_steps": 1421, "loss": 0.2369, "lr": 3.932486424814865e-05, "epoch": 0.24671052631578946, "percentage": 17.59, "elapsed_time": "3:07:10", "remaining_time": "14:36:42"}
{"current_steps": 255, "total_steps": 1421, "loss": 0.2234, "lr": 3.92600748887452e-05, "epoch": 0.2713815789473684, "percentage": 17.95, "elapsed_time": "3:25:50", "remaining_time": "15:41:11"}
{"current_steps": 260, "total_steps": 1421, "loss": 0.2225, "lr": 3.9192375947682436e-05, "epoch": 0.29605263157894735, "percentage": 18.3, "elapsed_time": "3:44:24", "remaining_time": "16:42:05"}
{"current_steps": 265, "total_steps": 1421, "loss": 0.2157, "lr": 3.9121777652106325e-05, "epoch": 0.3207236842105263, "percentage": 18.65, "elapsed_time": "4:03:00", "remaining_time": "17:40:05"}
{"current_steps": 270, "total_steps": 1421, "loss": 0.2206, "lr": 3.904829066716263e-05, "epoch": 0.34539473684210525, "percentage": 19.0, "elapsed_time": "4:21:39", "remaining_time": "18:35:26"}
{"current_steps": 275, "total_steps": 1421, "loss": 0.217, "lr": 3.8971926094385725e-05, "epoch": 0.37006578947368424, "percentage": 19.35, "elapsed_time": "4:40:19", "remaining_time": "19:28:11"}
{"current_steps": 280, "total_steps": 1421, "loss": 0.2258, "lr": 3.889269547002153e-05, "epoch": 0.39473684210526316, "percentage": 19.7, "elapsed_time": "4:59:01", "remaining_time": "20:18:32"}
{"current_steps": 285, "total_steps": 1421, "loss": 0.2121, "lr": 3.881061076328475e-05, "epoch": 0.4194078947368421, "percentage": 20.06, "elapsed_time": "5:17:42", "remaining_time": "21:06:21"}
{"current_steps": 290, "total_steps": 1421, "loss": 0.2141, "lr": 3.872568437455071e-05, "epoch": 0.4440789473684211, "percentage": 20.41, "elapsed_time": "5:36:19", "remaining_time": "21:51:41"}
{"current_steps": 295, "total_steps": 1421, "loss": 0.2045, "lr": 3.863792913348202e-05, "epoch": 0.46875, "percentage": 20.76, "elapsed_time": "5:55:01", "remaining_time": "22:35:08"}
{"current_steps": 300, "total_steps": 1421, "loss": 0.2099, "lr": 3.854735829709049e-05, "epoch": 0.4934210526315789, "percentage": 21.11, "elapsed_time": "6:13:40", "remaining_time": "23:16:17"}
{"current_steps": 305, "total_steps": 1421, "loss": 0.2009, "lr": 3.8453985547734364e-05, "epoch": 0.5180921052631579, "percentage": 21.46, "elapsed_time": "6:33:01", "remaining_time": "23:58:06"}
{"current_steps": 310, "total_steps": 1421, "loss": 0.2189, "lr": 3.835782499105136e-05, "epoch": 0.5427631578947368, "percentage": 21.82, "elapsed_time": "6:51:40", "remaining_time": "1 day, 0:35:25"}
{"current_steps": 315, "total_steps": 1421, "loss": 0.2213, "lr": 3.825889115382777e-05, "epoch": 0.5674342105263158, "percentage": 22.17, "elapsed_time": "7:10:21", "remaining_time": "1 day, 1:11:00"}
{"current_steps": 320, "total_steps": 1421, "loss": 0.2148, "lr": 3.815719898180397e-05, "epoch": 0.5921052631578947, "percentage": 22.52, "elapsed_time": "7:28:55", "remaining_time": "1 day, 1:44:34"}
{"current_steps": 325, "total_steps": 1421, "loss": 0.2083, "lr": 3.8052763837416496e-05, "epoch": 0.6167763157894737, "percentage": 22.87, "elapsed_time": "7:47:33", "remaining_time": "1 day, 2:16:44"}
{"current_steps": 330, "total_steps": 1421, "loss": 0.2273, "lr": 3.794560149747736e-05, "epoch": 0.6414473684210527, "percentage": 23.22, "elapsed_time": "8:06:15", "remaining_time": "1 day, 2:47:36"}
{"current_steps": 335, "total_steps": 1421, "loss": 0.4536, "lr": 3.7835728150790626e-05, "epoch": 0.6661184210526315, "percentage": 23.57, "elapsed_time": "8:25:23", "remaining_time": "1 day, 3:18:21"}
{"current_steps": 340, "total_steps": 1421, "loss": 0.4077, "lr": 3.7723160395706846e-05, "epoch": 0.6907894736842105, "percentage": 23.93, "elapsed_time": "8:44:11", "remaining_time": "1 day, 3:46:36"}
{"current_steps": 345, "total_steps": 1421, "loss": 0.3847, "lr": 3.760791523761553e-05, "epoch": 0.7154605263157895, "percentage": 24.28, "elapsed_time": "9:02:57", "remaining_time": "1 day, 4:13:24"}
{"current_steps": 350, "total_steps": 1421, "loss": 0.3667, "lr": 3.749001008637621e-05, "epoch": 0.7401315789473685, "percentage": 24.63, "elapsed_time": "9:21:41", "remaining_time": "1 day, 4:38:47"}
{"current_steps": 355, "total_steps": 1421, "loss": 0.356, "lr": 3.736946275368834e-05, "epoch": 0.7648026315789473, "percentage": 24.98, "elapsed_time": "9:40:27", "remaining_time": "1 day, 5:03:00"}
{"current_steps": 360, "total_steps": 1421, "loss": 0.3672, "lr": 3.724629145040056e-05, "epoch": 0.7894736842105263, "percentage": 25.33, "elapsed_time": "9:59:13", "remaining_time": "1 day, 5:26:04"}
{"current_steps": 365, "total_steps": 1421, "loss": 0.3607, "lr": 3.7120514783759555e-05, "epoch": 0.8141447368421053, "percentage": 25.69, "elapsed_time": "10:18:01", "remaining_time": "1 day, 5:48:01"}
{"current_steps": 370, "total_steps": 1421, "loss": 0.343, "lr": 3.699215175459917e-05, "epoch": 0.8388157894736842, "percentage": 26.04, "elapsed_time": "10:36:47", "remaining_time": "1 day, 6:08:51"}
{"current_steps": 375, "total_steps": 1421, "loss": 0.3453, "lr": 3.686122175446992e-05, "epoch": 0.8634868421052632, "percentage": 26.39, "elapsed_time": "10:55:34", "remaining_time": "1 day, 6:28:35"}
{"current_steps": 380, "total_steps": 1421, "loss": 0.3121, "lr": 3.672774456270959e-05, "epoch": 0.8881578947368421, "percentage": 26.74, "elapsed_time": "11:14:19", "remaining_time": "1 day, 6:47:18"}
{"current_steps": 385, "total_steps": 1421, "loss": 0.2913, "lr": 3.659174034345522e-05, "epoch": 0.912828947368421, "percentage": 27.09, "elapsed_time": "11:33:06", "remaining_time": "1 day, 7:05:04"}
{"current_steps": 390, "total_steps": 1421, "loss": 0.2777, "lr": 3.645322964259689e-05, "epoch": 0.9375, "percentage": 27.45, "elapsed_time": "11:51:53", "remaining_time": "1 day, 7:21:56"}
{"current_steps": 305, "total_steps": 1421, "loss": 0.2265, "lr": 3.8453985547734364e-05, "epoch": 1.024671052631579, "percentage": 21.46, "elapsed_time": "0:18:49", "remaining_time": "1:08:54"}
{"current_steps": 310, "total_steps": 1421, "loss": 0.2244, "lr": 3.835782499105136e-05, "epoch": 1.049342105263158, "percentage": 21.82, "elapsed_time": "0:37:29", "remaining_time": "2:14:22"}
{"current_steps": 315, "total_steps": 1421, "loss": 0.2215, "lr": 3.825889115382777e-05, "epoch": 1.0740131578947367, "percentage": 22.17, "elapsed_time": "0:56:07", "remaining_time": "3:17:05"}
{"current_steps": 320, "total_steps": 1421, "loss": 0.2353, "lr": 3.815719898180397e-05, "epoch": 1.0986842105263157, "percentage": 22.52, "elapsed_time": "1:14:47", "remaining_time": "4:17:20"}
{"current_steps": 325, "total_steps": 1421, "loss": 0.221, "lr": 3.8052763837416496e-05, "epoch": 1.1233552631578947, "percentage": 22.87, "elapsed_time": "1:33:26", "remaining_time": "5:15:07"}
{"current_steps": 330, "total_steps": 1421, "loss": 0.2268, "lr": 3.794560149747736e-05, "epoch": 1.1480263157894737, "percentage": 23.22, "elapsed_time": "1:52:03", "remaining_time": "6:10:29"}
{"current_steps": 335, "total_steps": 1421, "loss": 0.2297, "lr": 3.7835728150790626e-05, "epoch": 1.1726973684210527, "percentage": 23.57, "elapsed_time": "2:10:44", "remaining_time": "7:03:49"}
{"current_steps": 340, "total_steps": 1421, "loss": 0.2213, "lr": 3.7723160395706846e-05, "epoch": 1.1973684210526316, "percentage": 23.93, "elapsed_time": "2:29:25", "remaining_time": "7:55:03"}
{"current_steps": 345, "total_steps": 1421, "loss": 0.213, "lr": 3.760791523761553e-05, "epoch": 1.2220394736842106, "percentage": 24.28, "elapsed_time": "2:48:04", "remaining_time": "8:44:11"}
{"current_steps": 350, "total_steps": 1421, "loss": 0.2247, "lr": 3.749001008637621e-05, "epoch": 1.2467105263157894, "percentage": 24.63, "elapsed_time": "3:06:57", "remaining_time": "9:32:06"}
{"current_steps": 355, "total_steps": 1421, "loss": 0.2116, "lr": 3.736946275368834e-05, "epoch": 1.2713815789473684, "percentage": 24.98, "elapsed_time": "3:25:36", "remaining_time": "10:17:23"}
{"current_steps": 360, "total_steps": 1421, "loss": 0.2112, "lr": 3.724629145040056e-05, "epoch": 1.2960526315789473, "percentage": 25.33, "elapsed_time": "3:44:10", "remaining_time": "11:00:40"}
{"current_steps": 365, "total_steps": 1421, "loss": 0.206, "lr": 3.7120514783759555e-05, "epoch": 1.3207236842105263, "percentage": 25.69, "elapsed_time": "4:02:52", "remaining_time": "11:42:41"}
{"current_steps": 370, "total_steps": 1421, "loss": 0.2105, "lr": 3.699215175459917e-05, "epoch": 1.3453947368421053, "percentage": 26.04, "elapsed_time": "4:21:32", "remaining_time": "12:22:54"}
{"current_steps": 375, "total_steps": 1421, "loss": 0.2072, "lr": 3.686122175446992e-05, "epoch": 1.3700657894736843, "percentage": 26.39, "elapsed_time": "4:40:10", "remaining_time": "13:01:29"}
{"current_steps": 380, "total_steps": 1421, "loss": 0.215, "lr": 3.672774456270959e-05, "epoch": 1.3947368421052633, "percentage": 26.74, "elapsed_time": "4:58:51", "remaining_time": "13:38:41"}
{"current_steps": 385, "total_steps": 1421, "loss": 0.2027, "lr": 3.659174034345522e-05, "epoch": 1.419407894736842, "percentage": 27.09, "elapsed_time": "5:17:32", "remaining_time": "14:14:27"}
{"current_steps": 390, "total_steps": 1421, "loss": 0.2047, "lr": 3.645322964259689e-05, "epoch": 1.444078947368421, "percentage": 27.45, "elapsed_time": "5:36:11", "remaining_time": "14:48:44"}
{"current_steps": 395, "total_steps": 1421, "loss": 0.1961, "lr": 3.631223338467394e-05, "epoch": 1.46875, "percentage": 27.8, "elapsed_time": "5:54:49", "remaining_time": "15:21:37"}
{"current_steps": 400, "total_steps": 1421, "loss": 0.2018, "lr": 3.616877286971396e-05, "epoch": 1.493421052631579, "percentage": 28.15, "elapsed_time": "6:13:29", "remaining_time": "15:53:19"}
{"current_steps": 405, "total_steps": 1421, "loss": 0.1954, "lr": 3.6022869770014964e-05, "epoch": 1.518092105263158, "percentage": 28.5, "elapsed_time": "6:32:54", "remaining_time": "16:25:39"}
{"current_steps": 410, "total_steps": 1421, "loss": 0.2134, "lr": 3.587454612687148e-05, "epoch": 1.5427631578947367, "percentage": 28.85, "elapsed_time": "6:51:31", "remaining_time": "16:54:44"}
{"current_steps": 415, "total_steps": 1421, "loss": 0.216, "lr": 3.5723824347244745e-05, "epoch": 1.567434210526316, "percentage": 29.2, "elapsed_time": "7:10:11", "remaining_time": "17:22:49"}
{"current_steps": 420, "total_steps": 1421, "loss": 0.2096, "lr": 3.557072720037779e-05, "epoch": 1.5921052631578947, "percentage": 29.56, "elapsed_time": "7:28:51", "remaining_time": "17:49:47"}
{"current_steps": 425, "total_steps": 1421, "loss": 0.2033, "lr": 3.541527781435568e-05, "epoch": 1.6167763157894737, "percentage": 29.91, "elapsed_time": "7:47:28", "remaining_time": "18:15:33"}
{"current_steps": 430, "total_steps": 1421, "loss": 0.2224, "lr": 3.525749967261164e-05, "epoch": 1.6414473684210527, "percentage": 30.26, "elapsed_time": "8:06:10", "remaining_time": "18:40:27"}
{"current_steps": 435, "total_steps": 1421, "loss": 0.4493, "lr": 3.509741661037945e-05, "epoch": 1.6661184210526314, "percentage": 30.61, "elapsed_time": "8:25:16", "remaining_time": "19:05:17"}
{"current_steps": 440, "total_steps": 1421, "loss": 0.4032, "lr": 3.493505281109269e-05, "epoch": 1.6907894736842106, "percentage": 30.96, "elapsed_time": "8:44:03", "remaining_time": "19:28:25"}
{"current_steps": 445, "total_steps": 1421, "loss": 0.3801, "lr": 3.477043280273139e-05, "epoch": 1.7154605263157894, "percentage": 31.32, "elapsed_time": "9:02:50", "remaining_time": "19:50:34"}
{"current_steps": 450, "total_steps": 1421, "loss": 0.3625, "lr": 3.460358145411669e-05, "epoch": 1.7401315789473686, "percentage": 31.67, "elapsed_time": "9:21:36", "remaining_time": "20:11:49"}
{"current_steps": 455, "total_steps": 1421, "loss": 0.3521, "lr": 3.4434523971153876e-05, "epoch": 1.7648026315789473, "percentage": 32.02, "elapsed_time": "9:40:22", "remaining_time": "20:32:11"}
{"current_steps": 460, "total_steps": 1421, "loss": 0.3632, "lr": 3.426328589302463e-05, "epoch": 1.7894736842105263, "percentage": 32.37, "elapsed_time": "9:59:10", "remaining_time": "20:51:45"}
{"current_steps": 465, "total_steps": 1421, "loss": 0.3571, "lr": 3.408989308832887e-05, "epoch": 1.8141447368421053, "percentage": 32.72, "elapsed_time": "10:17:58", "remaining_time": "21:10:31"}
{"current_steps": 470, "total_steps": 1421, "loss": 0.3397, "lr": 3.3914371751176806e-05, "epoch": 1.838815789473684, "percentage": 33.08, "elapsed_time": "10:36:45", "remaining_time": "21:28:25"}
{"current_steps": 475, "total_steps": 1421, "loss": 0.3422, "lr": 3.3736748397231865e-05, "epoch": 1.8634868421052633, "percentage": 33.43, "elapsed_time": "10:55:33", "remaining_time": "21:45:35"}
{"current_steps": 480, "total_steps": 1421, "loss": 0.3081, "lr": 3.3557049859705026e-05, "epoch": 1.888157894736842, "percentage": 33.78, "elapsed_time": "11:14:19", "remaining_time": "22:01:58"}
{"current_steps": 485, "total_steps": 1421, "loss": 0.287, "lr": 3.3375303285301175e-05, "epoch": 1.912828947368421, "percentage": 34.13, "elapsed_time": "11:33:06", "remaining_time": "22:17:36"}
{"current_steps": 405, "total_steps": 1421, "loss": 0.2166, "lr": 3.6022869770014964e-05, "epoch": 1.024671052631579, "percentage": 28.5, "elapsed_time": "0:18:49", "remaining_time": "0:47:13"}
{"current_steps": 410, "total_steps": 1421, "loss": 0.2144, "lr": 3.587454612687148e-05, "epoch": 1.049342105263158, "percentage": 28.85, "elapsed_time": "0:37:26", "remaining_time": "1:32:18"}
{"current_steps": 415, "total_steps": 1421, "loss": 0.2119, "lr": 3.5723824347244745e-05, "epoch": 1.0740131578947367, "percentage": 29.2, "elapsed_time": "0:56:03", "remaining_time": "2:15:54"}
{"current_steps": 420, "total_steps": 1421, "loss": 0.225, "lr": 3.557072720037779e-05, "epoch": 1.0986842105263157, "percentage": 29.56, "elapsed_time": "1:14:41", "remaining_time": "2:57:59"}
{"current_steps": 425, "total_steps": 1421, "loss": 0.2115, "lr": 3.541527781435568e-05, "epoch": 1.1233552631578947, "percentage": 29.91, "elapsed_time": "1:33:21", "remaining_time": "3:38:47"}
{"current_steps": 430, "total_steps": 1421, "loss": 0.2173, "lr": 3.525749967261164e-05, "epoch": 1.1480263157894737, "percentage": 30.26, "elapsed_time": "1:52:01", "remaining_time": "4:18:09"}
{"current_steps": 435, "total_steps": 1421, "loss": 0.2202, "lr": 3.509741661037945e-05, "epoch": 1.1726973684210527, "percentage": 30.61, "elapsed_time": "2:10:36", "remaining_time": "4:56:02"}
{"current_steps": 440, "total_steps": 1421, "loss": 0.2125, "lr": 3.493505281109269e-05, "epoch": 1.1973684210526316, "percentage": 30.96, "elapsed_time": "2:29:17", "remaining_time": "5:32:50"}
{"current_steps": 445, "total_steps": 1421, "loss": 0.2048, "lr": 3.477043280273139e-05, "epoch": 1.2220394736842106, "percentage": 31.32, "elapsed_time": "2:47:57", "remaining_time": "6:08:21"}
{"current_steps": 450, "total_steps": 1421, "loss": 0.2163, "lr": 3.460358145411669e-05, "epoch": 1.2467105263157894, "percentage": 31.67, "elapsed_time": "3:06:51", "remaining_time": "6:43:12"}
{"current_steps": 455, "total_steps": 1421, "loss": 0.2038, "lr": 3.4434523971153876e-05, "epoch": 1.2713815789473684, "percentage": 32.02, "elapsed_time": "3:25:28", "remaining_time": "7:16:15"}
{"current_steps": 460, "total_steps": 1421, "loss": 0.2034, "lr": 3.426328589302463e-05, "epoch": 1.2960526315789473, "percentage": 32.37, "elapsed_time": "3:44:03", "remaining_time": "7:48:05"}
{"current_steps": 465, "total_steps": 1421, "loss": 0.1997, "lr": 3.408989308832887e-05, "epoch": 1.3207236842105263, "percentage": 32.72, "elapsed_time": "4:02:39", "remaining_time": "8:18:53"}
{"current_steps": 470, "total_steps": 1421, "loss": 0.2034, "lr": 3.3914371751176806e-05, "epoch": 1.3453947368421053, "percentage": 33.08, "elapsed_time": "4:21:18", "remaining_time": "8:48:43"}
{"current_steps": 475, "total_steps": 1421, "loss": 0.2001, "lr": 3.3736748397231865e-05, "epoch": 1.3700657894736843, "percentage": 33.43, "elapsed_time": "4:39:57", "remaining_time": "9:17:33"}
{"current_steps": 480, "total_steps": 1421, "loss": 0.208, "lr": 3.3557049859705026e-05, "epoch": 1.3947368421052633, "percentage": 33.78, "elapsed_time": "4:58:39", "remaining_time": "9:45:30"}
{"current_steps": 485, "total_steps": 1421, "loss": 0.1964, "lr": 3.3375303285301175e-05, "epoch": 1.419407894736842, "percentage": 34.13, "elapsed_time": "5:17:19", "remaining_time": "10:12:24"}
{"current_steps": 490, "total_steps": 1421, "loss": 0.1981, "lr": 3.31915361301181e-05, "epoch": 1.444078947368421, "percentage": 34.48, "elapsed_time": "5:35:57", "remaining_time": "10:38:18"}
{"current_steps": 495, "total_steps": 1421, "loss": 0.1899, "lr": 3.300577615549874e-05, "epoch": 1.46875, "percentage": 34.83, "elapsed_time": "5:54:37", "remaining_time": "11:03:24"}
{"current_steps": 500, "total_steps": 1421, "loss": 0.1957, "lr": 3.281805142383738e-05, "epoch": 1.493421052631579, "percentage": 35.19, "elapsed_time": "6:13:15", "remaining_time": "11:27:31"}
{"current_steps": 505, "total_steps": 1421, "loss": 0.1918, "lr": 3.262839029434026e-05, "epoch": 1.518092105263158, "percentage": 35.54, "elapsed_time": "6:32:33", "remaining_time": "11:52:02"}
{"current_steps": 510, "total_steps": 1421, "loss": 0.2097, "lr": 3.243682141874147e-05, "epoch": 1.5427631578947367, "percentage": 35.89, "elapsed_time": "6:51:11", "remaining_time": "12:14:30"}
{"current_steps": 515, "total_steps": 1421, "loss": 0.2123, "lr": 3.2243373736974524e-05, "epoch": 1.567434210526316, "percentage": 36.24, "elapsed_time": "7:09:51", "remaining_time": "12:36:13"}
{"current_steps": 520, "total_steps": 1421, "loss": 0.206, "lr": 3.204807647280049e-05, "epoch": 1.5921052631578947, "percentage": 36.59, "elapsed_time": "7:28:26", "remaining_time": "12:56:59"}
{"current_steps": 525, "total_steps": 1421, "loss": 0.1997, "lr": 3.185095912939324e-05, "epoch": 1.6167763157894737, "percentage": 36.95, "elapsed_time": "7:47:05", "remaining_time": "13:17:09"}
{"current_steps": 530, "total_steps": 1421, "loss": 0.2188, "lr": 3.165205148488242e-05, "epoch": 1.6414473684210527, "percentage": 37.3, "elapsed_time": "8:05:47", "remaining_time": "13:36:40"}
{"current_steps": 535, "total_steps": 1421, "loss": 0.4471, "lr": 3.145138358785494e-05, "epoch": 1.6661184210526314, "percentage": 37.65, "elapsed_time": "8:24:52", "remaining_time": "13:56:06"}
{"current_steps": 540, "total_steps": 1421, "loss": 0.4003, "lr": 3.124898575281562e-05, "epoch": 1.6907894736842106, "percentage": 38.0, "elapsed_time": "8:43:39", "remaining_time": "14:14:20"}
{"current_steps": 545, "total_steps": 1421, "loss": 0.3773, "lr": 3.1044888555607594e-05, "epoch": 1.7154605263157894, "percentage": 38.35, "elapsed_time": "9:02:26", "remaining_time": "14:31:52"}
{"current_steps": 550, "total_steps": 1421, "loss": 0.3598, "lr": 3.0839122828793314e-05, "epoch": 1.7401315789473686, "percentage": 38.71, "elapsed_time": "9:21:10", "remaining_time": "14:48:41"}
{"current_steps": 555, "total_steps": 1421, "loss": 0.3494, "lr": 3.0631719656996707e-05, "epoch": 1.7648026315789473, "percentage": 39.06, "elapsed_time": "9:39:55", "remaining_time": "15:04:53"}
{"current_steps": 560, "total_steps": 1421, "loss": 0.3607, "lr": 3.042271037220731e-05, "epoch": 1.7894736842105263, "percentage": 39.41, "elapsed_time": "9:58:42", "remaining_time": "15:20:30"}
{"current_steps": 565, "total_steps": 1421, "loss": 0.3549, "lr": 3.0212126549046986e-05, "epoch": 1.8141447368421053, "percentage": 39.76, "elapsed_time": "10:17:29", "remaining_time": "15:35:32"}
{"current_steps": 570, "total_steps": 1421, "loss": 0.3376, "lr": 3.0000000000000004e-05, "epoch": 1.838815789473684, "percentage": 40.11, "elapsed_time": "10:36:16", "remaining_time": "15:49:56"}
{"current_steps": 575, "total_steps": 1421, "loss": 0.3402, "lr": 2.978636277060722e-05, "epoch": 1.8634868421052633, "percentage": 40.46, "elapsed_time": "10:55:03", "remaining_time": "16:03:46"}
{"current_steps": 580, "total_steps": 1421, "loss": 0.305, "lr": 2.9571247134624985e-05, "epoch": 1.888157894736842, "percentage": 40.82, "elapsed_time": "11:13:50", "remaining_time": "16:17:03"}
{"current_steps": 585, "total_steps": 1421, "loss": 0.2837, "lr": 2.9354685589149637e-05, "epoch": 1.912828947368421, "percentage": 41.17, "elapsed_time": "11:32:36", "remaining_time": "16:29:46"}
{"current_steps": 505, "total_steps": 1421, "loss": 0.2102, "lr": 3.262839029434026e-05, "epoch": 2.0246710526315788, "percentage": 35.54, "elapsed_time": "0:19:01", "remaining_time": "0:34:30"}
{"current_steps": 510, "total_steps": 1421, "loss": 0.2074, "lr": 3.243682141874147e-05, "epoch": 2.049342105263158, "percentage": 35.89, "elapsed_time": "0:37:45", "remaining_time": "1:07:27"}
{"current_steps": 515, "total_steps": 1421, "loss": 0.2051, "lr": 3.2243373736974524e-05, "epoch": 2.0740131578947367, "percentage": 36.24, "elapsed_time": "0:56:28", "remaining_time": "1:39:20"}
{"current_steps": 520, "total_steps": 1421, "loss": 0.2176, "lr": 3.204807647280049e-05, "epoch": 2.098684210526316, "percentage": 36.59, "elapsed_time": "1:15:17", "remaining_time": "2:10:26"}
{"current_steps": 525, "total_steps": 1421, "loss": 0.2046, "lr": 3.185095912939324e-05, "epoch": 2.1233552631578947, "percentage": 36.95, "elapsed_time": "1:34:09", "remaining_time": "2:40:41"}
{"current_steps": 530, "total_steps": 1421, "loss": 0.2105, "lr": 3.165205148488242e-05, "epoch": 2.1480263157894735, "percentage": 37.3, "elapsed_time": "1:53:00", "remaining_time": "3:09:58"}
{"current_steps": 535, "total_steps": 1421, "loss": 0.2131, "lr": 3.145138358785494e-05, "epoch": 2.1726973684210527, "percentage": 37.65, "elapsed_time": "2:11:46", "remaining_time": "3:38:13"}
{"current_steps": 540, "total_steps": 1421, "loss": 0.2057, "lr": 3.124898575281562e-05, "epoch": 2.1973684210526314, "percentage": 38.0, "elapsed_time": "2:30:38", "remaining_time": "4:05:45"}
{"current_steps": 545, "total_steps": 1421, "loss": 0.1985, "lr": 3.1044888555607594e-05, "epoch": 2.2220394736842106, "percentage": 38.35, "elapsed_time": "2:49:28", "remaining_time": "4:32:23"}
{"current_steps": 550, "total_steps": 1421, "loss": 0.2098, "lr": 3.0839122828793314e-05, "epoch": 2.2467105263157894, "percentage": 38.71, "elapsed_time": "3:08:32", "remaining_time": "4:58:35"}
{"current_steps": 555, "total_steps": 1421, "loss": 0.1979, "lr": 3.0631719656996707e-05, "epoch": 2.2713815789473686, "percentage": 39.06, "elapsed_time": "3:27:20", "remaining_time": "5:23:32"}
{"current_steps": 560, "total_steps": 1421, "loss": 0.1972, "lr": 3.042271037220731e-05, "epoch": 2.2960526315789473, "percentage": 39.41, "elapsed_time": "3:46:04", "remaining_time": "5:47:35"}
{"current_steps": 565, "total_steps": 1421, "loss": 0.1923, "lr": 3.0212126549046986e-05, "epoch": 2.3207236842105265, "percentage": 39.76, "elapsed_time": "4:04:54", "remaining_time": "6:11:03"}
{"current_steps": 570, "total_steps": 1421, "loss": 0.1969, "lr": 3.0000000000000004e-05, "epoch": 2.3453947368421053, "percentage": 40.11, "elapsed_time": "4:23:48", "remaining_time": "6:33:51"}
{"current_steps": 575, "total_steps": 1421, "loss": 0.1942, "lr": 2.978636277060722e-05, "epoch": 2.370065789473684, "percentage": 40.46, "elapsed_time": "4:42:39", "remaining_time": "6:55:52"}
{"current_steps": 580, "total_steps": 1421, "loss": 0.2033, "lr": 2.9571247134624985e-05, "epoch": 2.3947368421052633, "percentage": 40.82, "elapsed_time": "5:01:31", "remaining_time": "7:17:12"}
{"current_steps": 585, "total_steps": 1421, "loss": 0.1911, "lr": 2.9354685589149637e-05, "epoch": 2.419407894736842, "percentage": 41.17, "elapsed_time": "5:20:21", "remaining_time": "7:37:48"}
{"current_steps": 590, "total_steps": 1421, "loss": 0.1925, "lr": 2.9136710849708225e-05, "epoch": 2.4440789473684212, "percentage": 41.52, "elapsed_time": "5:39:06", "remaining_time": "7:57:38"}
{"current_steps": 595, "total_steps": 1421, "loss": 0.1844, "lr": 2.8917355845316214e-05, "epoch": 2.46875, "percentage": 41.87, "elapsed_time": "5:58:01", "remaining_time": "8:17:01"}
{"current_steps": 600, "total_steps": 1421, "loss": 0.1907, "lr": 2.869665371350299e-05, "epoch": 2.4934210526315788, "percentage": 42.22, "elapsed_time": "6:16:48", "remaining_time": "8:35:36"}
{"current_steps": 605, "total_steps": 1421, "loss": 0.1892, "lr": 2.8474637795305842e-05, "epoch": 2.518092105263158, "percentage": 42.58, "elapsed_time": "6:36:15", "remaining_time": "8:54:28"}
{"current_steps": 610, "total_steps": 1421, "loss": 0.207, "lr": 2.825134163023318e-05, "epoch": 2.5427631578947367, "percentage": 42.93, "elapsed_time": "6:55:02", "remaining_time": "9:11:48"}
{"current_steps": 615, "total_steps": 1421, "loss": 0.2097, "lr": 2.802679895119778e-05, "epoch": 2.567434210526316, "percentage": 43.28, "elapsed_time": "7:13:52", "remaining_time": "9:28:37"}
{"current_steps": 620, "total_steps": 1421, "loss": 0.2035, "lr": 2.7801043679420856e-05, "epoch": 2.5921052631578947, "percentage": 43.63, "elapsed_time": "7:32:36", "remaining_time": "9:44:45"}
{"current_steps": 625, "total_steps": 1421, "loss": 0.1972, "lr": 2.75741099193076e-05, "epoch": 2.6167763157894735, "percentage": 43.98, "elapsed_time": "7:51:27", "remaining_time": "10:00:26"}
{"current_steps": 630, "total_steps": 1421, "loss": 0.2163, "lr": 2.734603195329514e-05, "epoch": 2.6414473684210527, "percentage": 44.33, "elapsed_time": "8:10:18", "remaining_time": "10:15:37"}
{"current_steps": 635, "total_steps": 1421, "loss": 0.445, "lr": 2.711684423667353e-05, "epoch": 2.6661184210526314, "percentage": 44.69, "elapsed_time": "8:29:51", "remaining_time": "10:31:05"}
{"current_steps": 640, "total_steps": 1421, "loss": 0.3977, "lr": 2.688658139238067e-05, "epoch": 2.6907894736842106, "percentage": 45.04, "elapsed_time": "8:48:55", "remaining_time": "10:45:27"}
{"current_steps": 645, "total_steps": 1421, "loss": 0.3746, "lr": 2.6655278205771877e-05, "epoch": 2.7154605263157894, "percentage": 45.39, "elapsed_time": "9:08:01", "remaining_time": "10:59:19"}
{"current_steps": 650, "total_steps": 1421, "loss": 0.3572, "lr": 2.6422969619364965e-05, "epoch": 2.7401315789473686, "percentage": 45.74, "elapsed_time": "9:27:03", "remaining_time": "11:12:36"}
{"current_steps": 655, "total_steps": 1421, "loss": 0.3472, "lr": 2.6189690727561478e-05, "epoch": 2.7648026315789473, "percentage": 46.09, "elapsed_time": "9:46:07", "remaining_time": "11:25:27"}
{"current_steps": 660, "total_steps": 1421, "loss": 0.3585, "lr": 2.5955476771345116e-05, "epoch": 2.7894736842105265, "percentage": 46.45, "elapsed_time": "10:05:12", "remaining_time": "11:37:49"}
{"current_steps": 665, "total_steps": 1421, "loss": 0.3535, "lr": 2.5720363132957915e-05, "epoch": 2.8141447368421053, "percentage": 46.8, "elapsed_time": "10:24:21", "remaining_time": "11:49:47"}
{"current_steps": 670, "total_steps": 1421, "loss": 0.3362, "lr": 2.5484385330555138e-05, "epoch": 2.838815789473684, "percentage": 47.15, "elapsed_time": "10:43:28", "remaining_time": "12:01:16"}
{"current_steps": 675, "total_steps": 1421, "loss": 0.3387, "lr": 2.5247579012839584e-05, "epoch": 2.8634868421052633, "percentage": 47.5, "elapsed_time": "11:02:35", "remaining_time": "12:12:16"}
{"current_steps": 680, "total_steps": 1421, "loss": 0.3038, "lr": 2.500997995367626e-05, "epoch": 2.888157894736842, "percentage": 47.85, "elapsed_time": "11:21:38", "remaining_time": "12:22:47"}
{"current_steps": 685, "total_steps": 1421, "loss": 0.2816, "lr": 2.4771624046688043e-05, "epoch": 2.9128289473684212, "percentage": 48.21, "elapsed_time": "11:40:43", "remaining_time": "12:32:53"}
{"current_steps": 605, "total_steps": 1421, "loss": 0.2053, "lr": 2.8474637795305842e-05, "epoch": 2.0246710526315788, "percentage": 42.58, "elapsed_time": "0:18:47", "remaining_time": "0:25:21"}
{"current_steps": 610, "total_steps": 1421, "loss": 0.2023, "lr": 2.825134163023318e-05, "epoch": 2.049342105263158, "percentage": 42.93, "elapsed_time": "0:37:34", "remaining_time": "0:49:57"}
{"current_steps": 615, "total_steps": 1421, "loss": 0.2, "lr": 2.802679895119778e-05, "epoch": 2.0740131578947367, "percentage": 43.28, "elapsed_time": "0:56:22", "remaining_time": "1:13:52"}
{"current_steps": 620, "total_steps": 1421, "loss": 0.2119, "lr": 2.7801043679420856e-05, "epoch": 2.098684210526316, "percentage": 43.63, "elapsed_time": "1:15:10", "remaining_time": "1:37:07"}
{"current_steps": 625, "total_steps": 1421, "loss": 0.1992, "lr": 2.75741099193076e-05, "epoch": 2.1233552631578947, "percentage": 43.98, "elapsed_time": "1:36:28", "remaining_time": "2:02:52"}
{"current_steps": 630, "total_steps": 1421, "loss": 0.205, "lr": 2.734603195329514e-05, "epoch": 2.1480263157894735, "percentage": 44.33, "elapsed_time": "1:55:07", "remaining_time": "2:24:32"}
{"current_steps": 635, "total_steps": 1421, "loss": 0.2073, "lr": 2.711684423667353e-05, "epoch": 2.1726973684210527, "percentage": 44.69, "elapsed_time": "2:13:47", "remaining_time": "2:45:36"}
{"current_steps": 640, "total_steps": 1421, "loss": 0.2004, "lr": 2.688658139238067e-05, "epoch": 2.1973684210526314, "percentage": 45.04, "elapsed_time": "2:32:29", "remaining_time": "3:06:04"}
{"current_steps": 645, "total_steps": 1421, "loss": 0.1934, "lr": 2.6655278205771877e-05, "epoch": 2.2220394736842106, "percentage": 45.39, "elapsed_time": "2:51:10", "remaining_time": "3:25:56"}
{"current_steps": 650, "total_steps": 1421, "loss": 0.2045, "lr": 2.6422969619364965e-05, "epoch": 2.2467105263157894, "percentage": 45.74, "elapsed_time": "3:10:08", "remaining_time": "3:45:32"}
{"current_steps": 655, "total_steps": 1421, "loss": 0.1929, "lr": 2.6189690727561478e-05, "epoch": 2.2713815789473686, "percentage": 46.09, "elapsed_time": "3:28:50", "remaining_time": "4:04:14"}
{"current_steps": 660, "total_steps": 1421, "loss": 0.1928, "lr": 2.5955476771345116e-05, "epoch": 2.2960526315789473, "percentage": 46.45, "elapsed_time": "3:47:27", "remaining_time": "4:22:15"}
{"current_steps": 665, "total_steps": 1421, "loss": 0.1874, "lr": 2.5720363132957915e-05, "epoch": 2.3207236842105265, "percentage": 46.8, "elapsed_time": "4:06:06", "remaining_time": "4:39:47"}
{"current_steps": 670, "total_steps": 1421, "loss": 0.1915, "lr": 2.5484385330555138e-05, "epoch": 2.3453947368421053, "percentage": 47.15, "elapsed_time": "4:24:46", "remaining_time": "4:56:47"}
{"current_steps": 675, "total_steps": 1421, "loss": 0.1895, "lr": 2.5247579012839584e-05, "epoch": 2.370065789473684, "percentage": 47.5, "elapsed_time": "4:43:30", "remaining_time": "5:13:19"}
{"current_steps": 680, "total_steps": 1421, "loss": 0.2007, "lr": 2.500997995367626e-05, "epoch": 2.3947368421052633, "percentage": 47.85, "elapsed_time": "5:02:09", "remaining_time": "5:29:15"}
{"current_steps": 685, "total_steps": 1421, "loss": 0.1865, "lr": 2.4771624046688043e-05, "epoch": 2.419407894736842, "percentage": 48.21, "elapsed_time": "5:20:53", "remaining_time": "5:44:46"}
{"current_steps": 690, "total_steps": 1421, "loss": 0.1876, "lr": 2.4532547299833337e-05, "epoch": 2.4440789473684212, "percentage": 48.56, "elapsed_time": "5:39:31", "remaining_time": "5:59:42"}
{"current_steps": 695, "total_steps": 1421, "loss": 0.1798, "lr": 2.4292785829966407e-05, "epoch": 2.46875, "percentage": 48.91, "elapsed_time": "5:58:13", "remaining_time": "6:14:12"}
{"current_steps": 700, "total_steps": 1421, "loss": 0.1859, "lr": 2.405237585738126e-05, "epoch": 2.4934210526315788, "percentage": 49.26, "elapsed_time": "6:16:53", "remaining_time": "6:28:12"}
{"current_steps": 705, "total_steps": 1421, "loss": 0.1871, "lr": 2.381135370033996e-05, "epoch": 2.518092105263158, "percentage": 49.61, "elapsed_time": "6:36:17", "remaining_time": "6:42:28"}
{"current_steps": 710, "total_steps": 1421, "loss": 0.2049, "lr": 2.356975576958606e-05, "epoch": 2.5427631578947367, "percentage": 49.96, "elapsed_time": "6:54:59", "remaining_time": "6:55:34"}
{"current_steps": 715, "total_steps": 1421, "loss": 0.2075, "lr": 2.3327618562844116e-05, "epoch": 2.567434210526316, "percentage": 50.32, "elapsed_time": "7:13:39", "remaining_time": "7:08:12"}
{"current_steps": 720, "total_steps": 1421, "loss": 0.2012, "lr": 2.3084978659306048e-05, "epoch": 2.5921052631578947, "percentage": 50.67, "elapsed_time": "7:32:18", "remaining_time": "7:20:22"}
{"current_steps": 725, "total_steps": 1421, "loss": 0.195, "lr": 2.2841872714105196e-05, "epoch": 2.6167763157894735, "percentage": 51.02, "elapsed_time": "7:51:01", "remaining_time": "7:32:11"}
{"current_steps": 730, "total_steps": 1421, "loss": 0.2141, "lr": 2.25983374527789e-05, "epoch": 2.6414473684210527, "percentage": 51.37, "elapsed_time": "8:09:47", "remaining_time": "7:43:37"}
{"current_steps": 735, "total_steps": 1421, "loss": 0.4435, "lr": 2.2354409665720427e-05, "epoch": 2.6661184210526314, "percentage": 51.72, "elapsed_time": "8:29:00", "remaining_time": "7:55:04"}
{"current_steps": 740, "total_steps": 1421, "loss": 0.3957, "lr": 2.2110126202621162e-05, "epoch": 2.6907894736842106, "percentage": 52.08, "elapsed_time": "8:47:47", "remaining_time": "8:05:42"}
{"current_steps": 745, "total_steps": 1421, "loss": 0.3727, "lr": 2.1865523966903758e-05, "epoch": 2.7154605263157894, "percentage": 52.43, "elapsed_time": "9:06:34", "remaining_time": "8:15:56"}
{"current_steps": 750, "total_steps": 1421, "loss": 0.3552, "lr": 2.16206399101472e-05, "epoch": 2.7401315789473686, "percentage": 52.78, "elapsed_time": "9:25:20", "remaining_time": "8:25:47"}
{"current_steps": 755, "total_steps": 1421, "loss": 0.3453, "lr": 2.1375511026504653e-05, "epoch": 2.7648026315789473, "percentage": 53.13, "elapsed_time": "9:44:06", "remaining_time": "8:35:14"}
{"current_steps": 760, "total_steps": 1421, "loss": 0.3564, "lr": 2.113017434711479e-05, "epoch": 2.7894736842105265, "percentage": 53.48, "elapsed_time": "10:02:52", "remaining_time": "8:44:20"}
{"current_steps": 765, "total_steps": 1421, "loss": 0.3519, "lr": 2.088466693450758e-05, "epoch": 2.8141447368421053, "percentage": 53.84, "elapsed_time": "10:21:39", "remaining_time": "8:53:05"}
{"current_steps": 770, "total_steps": 1421, "loss": 0.3346, "lr": 2.0639025877005308e-05, "epoch": 2.838815789473684, "percentage": 54.19, "elapsed_time": "10:40:26", "remaining_time": "9:01:27"}
{"current_steps": 775, "total_steps": 1421, "loss": 0.3375, "lr": 2.039328828311976e-05, "epoch": 2.8634868421052633, "percentage": 54.54, "elapsed_time": "10:59:14", "remaining_time": "9:09:30"}
{"current_steps": 780, "total_steps": 1421, "loss": 0.3026, "lr": 2.014749127594625e-05, "epoch": 2.888157894736842, "percentage": 54.89, "elapsed_time": "11:18:02", "remaining_time": "9:17:12"}
{"current_steps": 785, "total_steps": 1421, "loss": 0.2797, "lr": 1.9901671987555568e-05, "epoch": 2.9128289473684212, "percentage": 55.24, "elapsed_time": "11:36:50", "remaining_time": "9:24:34"}
{"current_steps": 705, "total_steps": 1421, "loss": 0.2012, "lr": 2.381135370033996e-05, "epoch": 3.0246710526315788, "percentage": 49.61, "elapsed_time": "0:19:13", "remaining_time": "0:19:31"}
{"current_steps": 710, "total_steps": 1421, "loss": 0.1977, "lr": 2.356975576958606e-05, "epoch": 3.049342105263158, "percentage": 49.96, "elapsed_time": "0:38:17", "remaining_time": "0:38:20"}
{"current_steps": 715, "total_steps": 1421, "loss": 0.1954, "lr": 2.3327618562844116e-05, "epoch": 3.0740131578947367, "percentage": 50.32, "elapsed_time": "0:57:17", "remaining_time": "0:56:34"}
{"current_steps": 720, "total_steps": 1421, "loss": 0.2069, "lr": 2.3084978659306048e-05, "epoch": 3.098684210526316, "percentage": 50.67, "elapsed_time": "1:16:15", "remaining_time": "1:14:14"}
{"current_steps": 725, "total_steps": 1421, "loss": 0.1944, "lr": 2.2841872714105196e-05, "epoch": 3.1233552631578947, "percentage": 51.02, "elapsed_time": "1:35:15", "remaining_time": "1:31:27"}
{"current_steps": 730, "total_steps": 1421, "loss": 0.2002, "lr": 2.25983374527789e-05, "epoch": 3.1480263157894735, "percentage": 51.37, "elapsed_time": "1:54:15", "remaining_time": "1:48:09"}
{"current_steps": 735, "total_steps": 1421, "loss": 0.2024, "lr": 2.2354409665720427e-05, "epoch": 3.1726973684210527, "percentage": 51.72, "elapsed_time": "2:13:15", "remaining_time": "2:04:22"}
{"current_steps": 740, "total_steps": 1421, "loss": 0.1957, "lr": 2.2110126202621162e-05, "epoch": 3.1973684210526314, "percentage": 52.08, "elapsed_time": "2:32:17", "remaining_time": "2:20:09"}
{"current_steps": 745, "total_steps": 1421, "loss": 0.1889, "lr": 2.1865523966903758e-05, "epoch": 3.2220394736842106, "percentage": 52.43, "elapsed_time": "2:51:19", "remaining_time": "2:35:27"}
{"current_steps": 750, "total_steps": 1421, "loss": 0.1998, "lr": 2.16206399101472e-05, "epoch": 3.2467105263157894, "percentage": 52.78, "elapsed_time": "3:10:42", "remaining_time": "2:50:37"}
{"current_steps": 755, "total_steps": 1421, "loss": 0.1885, "lr": 2.1375511026504653e-05, "epoch": 3.2713815789473686, "percentage": 53.13, "elapsed_time": "3:29:40", "remaining_time": "3:04:57"}
{"current_steps": 760, "total_steps": 1421, "loss": 0.1875, "lr": 2.113017434711479e-05, "epoch": 3.2960526315789473, "percentage": 53.48, "elapsed_time": "3:48:37", "remaining_time": "3:18:50"}
{"current_steps": 765, "total_steps": 1421, "loss": 0.1823, "lr": 2.088466693450758e-05, "epoch": 3.3207236842105265, "percentage": 53.84, "elapsed_time": "4:07:41", "remaining_time": "3:32:23"}
{"current_steps": 770, "total_steps": 1421, "loss": 0.1863, "lr": 2.0639025877005308e-05, "epoch": 3.3453947368421053, "percentage": 54.19, "elapsed_time": "4:26:43", "remaining_time": "3:45:30"}
{"current_steps": 775, "total_steps": 1421, "loss": 0.1854, "lr": 2.039328828311976e-05, "epoch": 3.370065789473684, "percentage": 54.54, "elapsed_time": "4:45:48", "remaining_time": "3:58:14"}
{"current_steps": 780, "total_steps": 1421, "loss": 0.1983, "lr": 2.014749127594625e-05, "epoch": 3.3947368421052633, "percentage": 54.89, "elapsed_time": "5:04:54", "remaining_time": "4:10:34"}
{"current_steps": 785, "total_steps": 1421, "loss": 0.1828, "lr": 1.9901671987555568e-05, "epoch": 3.419407894736842, "percentage": 55.24, "elapsed_time": "5:24:00", "remaining_time": "4:22:30"}
{"current_steps": 790, "total_steps": 1421, "loss": 0.1834, "lr": 1.9655867553384472e-05, "epoch": 3.4440789473684212, "percentage": 55.59, "elapsed_time": "5:43:01", "remaining_time": "4:33:59"}
{"current_steps": 795, "total_steps": 1421, "loss": 0.1754, "lr": 1.9410115106625714e-05, "epoch": 3.46875, "percentage": 55.95, "elapsed_time": "6:02:02", "remaining_time": "4:45:05"}
{"current_steps": 800, "total_steps": 1421, "loss": 0.1812, "lr": 1.9164451772618435e-05, "epoch": 3.4934210526315788, "percentage": 56.3, "elapsed_time": "6:21:03", "remaining_time": "4:55:47"}
{"current_steps": 805, "total_steps": 1421, "loss": 0.1858, "lr": 1.891891466323966e-05, "epoch": 3.518092105263158, "percentage": 56.65, "elapsed_time": "6:40:50", "remaining_time": "5:06:43"}
{"current_steps": 810, "total_steps": 1421, "loss": 0.2033, "lr": 1.8673540871297927e-05, "epoch": 3.5427631578947367, "percentage": 57.0, "elapsed_time": "6:59:55", "remaining_time": "5:16:45"}
{"current_steps": 815, "total_steps": 1421, "loss": 0.206, "lr": 1.842836746492971e-05, "epoch": 3.567434210526316, "percentage": 57.35, "elapsed_time": "7:18:59", "remaining_time": "5:26:25"}
{"current_steps": 820, "total_steps": 1421, "loss": 0.1997, "lr": 1.8183431481999658e-05, "epoch": 3.5921052631578947, "percentage": 57.71, "elapsed_time": "7:37:58", "remaining_time": "5:35:39"}
{"current_steps": 825, "total_steps": 1421, "loss": 0.1935, "lr": 1.793876992450529e-05, "epoch": 3.6167763157894735, "percentage": 58.06, "elapsed_time": "7:57:02", "remaining_time": "5:44:37"}
{"current_steps": 830, "total_steps": 1421, "loss": 0.2124, "lr": 1.769441975298726e-05, "epoch": 3.6414473684210527, "percentage": 58.41, "elapsed_time": "8:16:08", "remaining_time": "5:53:16"}
{"current_steps": 835, "total_steps": 1421, "loss": 0.4421, "lr": 1.7450417880945705e-05, "epoch": 3.6661184210526314, "percentage": 58.76, "elapsed_time": "8:35:48", "remaining_time": "6:01:59"}
{"current_steps": 840, "total_steps": 1421, "loss": 0.3948, "lr": 1.720680116926388e-05, "epoch": 3.6907894736842106, "percentage": 59.11, "elapsed_time": "8:55:03", "remaining_time": "6:10:04"}
{"current_steps": 805, "total_steps": 1421, "loss": 0.1979, "lr": 1.891891466323966e-05, "epoch": 3.0246710526315788, "percentage": 56.65, "elapsed_time": "0:18:58", "remaining_time": "0:14:31"}
{"current_steps": 810, "total_steps": 1421, "loss": 0.1942, "lr": 1.8673540871297927e-05, "epoch": 3.049342105263158, "percentage": 57.0, "elapsed_time": "0:37:46", "remaining_time": "0:28:29"}
{"current_steps": 815, "total_steps": 1421, "loss": 0.1916, "lr": 1.842836746492971e-05, "epoch": 3.0740131578947367, "percentage": 57.35, "elapsed_time": "0:56:36", "remaining_time": "0:42:05"}
{"current_steps": 820, "total_steps": 1421, "loss": 0.2026, "lr": 1.8183431481999658e-05, "epoch": 3.098684210526316, "percentage": 57.71, "elapsed_time": "1:15:29", "remaining_time": "0:55:19"}
{"current_steps": 825, "total_steps": 1421, "loss": 0.1905, "lr": 1.793876992450529e-05, "epoch": 3.1233552631578947, "percentage": 58.06, "elapsed_time": "1:34:15", "remaining_time": "1:08:05"}
{"current_steps": 830, "total_steps": 1421, "loss": 0.1958, "lr": 1.769441975298726e-05, "epoch": 3.1480263157894735, "percentage": 58.41, "elapsed_time": "1:53:05", "remaining_time": "1:20:31"}
{"current_steps": 835, "total_steps": 1421, "loss": 0.198, "lr": 1.7450417880945705e-05, "epoch": 3.1726973684210527, "percentage": 58.76, "elapsed_time": "2:11:54", "remaining_time": "1:32:34"}
{"current_steps": 840, "total_steps": 1421, "loss": 0.1914, "lr": 1.720680116926388e-05, "epoch": 3.1973684210526314, "percentage": 59.11, "elapsed_time": "2:30:43", "remaining_time": "1:44:15"}
{"current_steps": 845, "total_steps": 1421, "loss": 0.185, "lr": 1.6963606420639602e-05, "epoch": 3.2220394736842106, "percentage": 59.47, "elapsed_time": "2:49:33", "remaining_time": "1:55:34"}
{"current_steps": 850, "total_steps": 1421, "loss": 0.1957, "lr": 1.6720870374025578e-05, "epoch": 3.2467105263157894, "percentage": 59.82, "elapsed_time": "3:08:39", "remaining_time": "2:06:44"}
{"current_steps": 855, "total_steps": 1421, "loss": 0.1846, "lr": 1.6478629699079278e-05, "epoch": 3.2713815789473686, "percentage": 60.17, "elapsed_time": "3:27:28", "remaining_time": "2:17:20"}
{"current_steps": 860, "total_steps": 1421, "loss": 0.1833, "lr": 1.6236920990623374e-05, "epoch": 3.2960526315789473, "percentage": 60.52, "elapsed_time": "3:46:14", "remaining_time": "2:27:35"}
{"current_steps": 865, "total_steps": 1421, "loss": 0.1781, "lr": 1.5995780763117382e-05, "epoch": 3.3207236842105265, "percentage": 60.87, "elapsed_time": "4:05:06", "remaining_time": "2:37:32"}
{"current_steps": 870, "total_steps": 1421, "loss": 0.182, "lr": 1.5755245445141544e-05, "epoch": 3.3453947368421053, "percentage": 61.22, "elapsed_time": "4:23:57", "remaining_time": "2:47:10"}
{"current_steps": 875, "total_steps": 1421, "loss": 0.1804, "lr": 1.5515351373893573e-05, "epoch": 3.370065789473684, "percentage": 61.58, "elapsed_time": "4:42:51", "remaining_time": "2:56:30"}
{"current_steps": 880, "total_steps": 1421, "loss": 0.1936, "lr": 1.5276134789699344e-05, "epoch": 3.3947368421052633, "percentage": 61.93, "elapsed_time": "5:01:45", "remaining_time": "3:05:30"}
{"current_steps": 885, "total_steps": 1421, "loss": 0.1787, "lr": 1.503763183053805e-05, "epoch": 3.419407894736842, "percentage": 62.28, "elapsed_time": "5:20:39", "remaining_time": "3:14:12"}
{"current_steps": 890, "total_steps": 1421, "loss": 0.1788, "lr": 1.4799878526582987e-05, "epoch": 3.4440789473684212, "percentage": 62.63, "elapsed_time": "5:39:29", "remaining_time": "3:22:33"}
{"current_steps": 895, "total_steps": 1421, "loss": 0.171, "lr": 1.4562910794758488e-05, "epoch": 3.46875, "percentage": 62.98, "elapsed_time": "5:58:19", "remaining_time": "3:30:35"}
{"current_steps": 900, "total_steps": 1421, "loss": 0.1771, "lr": 1.4326764433314066e-05, "epoch": 3.4934210526315788, "percentage": 63.34, "elapsed_time": "6:17:09", "remaining_time": "3:38:19"}
{"current_steps": 905, "total_steps": 1421, "loss": 0.1844, "lr": 1.4091475116416415e-05, "epoch": 3.518092105263158, "percentage": 63.69, "elapsed_time": "6:36:44", "remaining_time": "3:46:12"}
{"current_steps": 910, "total_steps": 1421, "loss": 0.2019, "lr": 1.3857078388760203e-05, "epoch": 3.5427631578947367, "percentage": 64.04, "elapsed_time": "6:55:34", "remaining_time": "3:53:21"}
{"current_steps": 915, "total_steps": 1421, "loss": 0.2043, "lr": 1.3623609660198373e-05, "epoch": 3.567434210526316, "percentage": 64.39, "elapsed_time": "7:14:28", "remaining_time": "4:00:15"}
{"current_steps": 920, "total_steps": 1421, "loss": 0.198, "lr": 1.3391104200392905e-05, "epoch": 3.5921052631578947, "percentage": 64.74, "elapsed_time": "7:33:18", "remaining_time": "4:06:51"}
{"current_steps": 925, "total_steps": 1421, "loss": 0.1918, "lr": 1.3159597133486628e-05, "epoch": 3.6167763157894735, "percentage": 65.1, "elapsed_time": "7:52:14", "remaining_time": "4:13:13"}
{"current_steps": 930, "total_steps": 1421, "loss": 0.2109, "lr": 1.292912343279713e-05, "epoch": 3.6414473684210527, "percentage": 65.45, "elapsed_time": "8:11:06", "remaining_time": "4:19:16"}
{"current_steps": 935, "total_steps": 1421, "loss": 0.4407, "lr": 1.2699717915533402e-05, "epoch": 3.6661184210526314, "percentage": 65.8, "elapsed_time": "8:30:20", "remaining_time": "4:25:16"}
{"current_steps": 940, "total_steps": 1421, "loss": 0.3933, "lr": 1.2471415237536065e-05, "epoch": 3.6907894736842106, "percentage": 66.15, "elapsed_time": "8:49:25", "remaining_time": "4:30:54"}
{"current_steps": 945, "total_steps": 1421, "loss": 0.3701, "lr": 1.2244249888041955e-05, "epoch": 3.7154605263157894, "percentage": 66.5, "elapsed_time": "9:08:27", "remaining_time": "4:36:15"}
{"current_steps": 950, "total_steps": 1421, "loss": 0.3526, "lr": 1.2018256184473967e-05, "epoch": 3.7401315789473686, "percentage": 66.85, "elapsed_time": "9:27:24", "remaining_time": "4:41:18"}
{"current_steps": 955, "total_steps": 1421, "loss": 0.3428, "lr": 1.1793468267256709e-05, "epoch": 3.7648026315789473, "percentage": 67.21, "elapsed_time": "9:46:24", "remaining_time": "4:46:08"}
{"current_steps": 960, "total_steps": 1421, "loss": 0.354, "lr": 1.156992009465904e-05, "epoch": 3.7894736842105265, "percentage": 67.56, "elapsed_time": "10:05:25", "remaining_time": "4:50:43"}
{"current_steps": 965, "total_steps": 1421, "loss": 0.3489, "lr": 1.1347645437664032e-05, "epoch": 3.8141447368421053, "percentage": 67.91, "elapsed_time": "10:24:28", "remaining_time": "4:55:05"}
{"current_steps": 970, "total_steps": 1421, "loss": 0.3318, "lr": 1.1126677874867245e-05, "epoch": 3.838815789473684, "percentage": 68.26, "elapsed_time": "10:43:28", "remaining_time": "4:59:11"}
{"current_steps": 975, "total_steps": 1421, "loss": 0.3346, "lr": 1.0907050787404105e-05, "epoch": 3.8634868421052633, "percentage": 68.61, "elapsed_time": "11:02:34", "remaining_time": "5:03:05"}
{"current_steps": 980, "total_steps": 1421, "loss": 0.3001, "lr": 1.0688797353907052e-05, "epoch": 3.888157894736842, "percentage": 68.97, "elapsed_time": "11:21:39", "remaining_time": "5:06:44"}
{"current_steps": 985, "total_steps": 1421, "loss": 0.277, "lr": 1.0471950545493328e-05, "epoch": 3.9128289473684212, "percentage": 69.32, "elapsed_time": "11:40:38", "remaining_time": "5:10:08"}
{"current_steps": 905, "total_steps": 1421, "loss": 0.195, "lr": 1.4091475116416415e-05, "epoch": 4.024671052631579, "percentage": 63.69, "elapsed_time": "0:19:02", "remaining_time": "0:10:51"}
{"current_steps": 910, "total_steps": 1421, "loss": 0.1909, "lr": 1.3857078388760203e-05, "epoch": 4.0493421052631575, "percentage": 64.04, "elapsed_time": "0:37:52", "remaining_time": "0:21:15"}
{"current_steps": 915, "total_steps": 1421, "loss": 0.188, "lr": 1.3623609660198373e-05, "epoch": 4.074013157894737, "percentage": 64.39, "elapsed_time": "0:56:43", "remaining_time": "0:31:22"}
{"current_steps": 920, "total_steps": 1421, "loss": 0.1987, "lr": 1.3391104200392905e-05, "epoch": 4.098684210526316, "percentage": 64.74, "elapsed_time": "1:15:31", "remaining_time": "0:41:07"}
{"current_steps": 925, "total_steps": 1421, "loss": 0.1865, "lr": 1.3159597133486628e-05, "epoch": 4.123355263157895, "percentage": 65.1, "elapsed_time": "1:34:25", "remaining_time": "0:50:38"}
{"current_steps": 930, "total_steps": 1421, "loss": 0.1918, "lr": 1.292912343279713e-05, "epoch": 4.1480263157894735, "percentage": 65.45, "elapsed_time": "1:53:14", "remaining_time": "0:59:47"}
{"current_steps": 935, "total_steps": 1421, "loss": 0.1942, "lr": 1.2699717915533402e-05, "epoch": 4.172697368421052, "percentage": 65.8, "elapsed_time": "2:12:01", "remaining_time": "1:08:37"}
{"current_steps": 940, "total_steps": 1421, "loss": 0.1874, "lr": 1.2471415237536065e-05, "epoch": 4.197368421052632, "percentage": 66.15, "elapsed_time": "2:30:49", "remaining_time": "1:17:10"}
{"current_steps": 945, "total_steps": 1421, "loss": 0.1813, "lr": 1.2244249888041955e-05, "epoch": 4.222039473684211, "percentage": 66.5, "elapsed_time": "2:49:39", "remaining_time": "1:25:27"}
{"current_steps": 950, "total_steps": 1421, "loss": 0.1919, "lr": 1.2018256184473967e-05, "epoch": 4.246710526315789, "percentage": 66.85, "elapsed_time": "3:08:46", "remaining_time": "1:33:35"}
{"current_steps": 955, "total_steps": 1421, "loss": 0.1804, "lr": 1.1793468267256709e-05, "epoch": 4.271381578947368, "percentage": 67.21, "elapsed_time": "3:27:35", "remaining_time": "1:41:17"}
{"current_steps": 960, "total_steps": 1421, "loss": 0.1788, "lr": 1.156992009465904e-05, "epoch": 4.296052631578947, "percentage": 67.56, "elapsed_time": "3:46:18", "remaining_time": "1:48:40"}
{"current_steps": 965, "total_steps": 1421, "loss": 0.1738, "lr": 1.1347645437664032e-05, "epoch": 4.3207236842105265, "percentage": 67.91, "elapsed_time": "4:05:04", "remaining_time": "1:55:48"}
{"current_steps": 970, "total_steps": 1421, "loss": 0.1776, "lr": 1.1126677874867245e-05, "epoch": 4.345394736842105, "percentage": 68.26, "elapsed_time": "4:23:52", "remaining_time": "2:02:41"}
{"current_steps": 975, "total_steps": 1421, "loss": 0.1757, "lr": 1.0907050787404105e-05, "epoch": 4.370065789473684, "percentage": 68.61, "elapsed_time": "4:42:44", "remaining_time": "2:09:20"}
{"current_steps": 980, "total_steps": 1421, "loss": 0.1882, "lr": 1.0688797353907052e-05, "epoch": 4.394736842105263, "percentage": 68.97, "elapsed_time": "5:01:39", "remaining_time": "2:15:44"}
{"current_steps": 985, "total_steps": 1421, "loss": 0.1753, "lr": 1.0471950545493328e-05, "epoch": 4.4194078947368425, "percentage": 69.32, "elapsed_time": "5:20:30", "remaining_time": "2:21:52"}
{"current_steps": 990, "total_steps": 1421, "loss": 0.1746, "lr": 1.0256543120784074e-05, "epoch": 4.444078947368421, "percentage": 69.67, "elapsed_time": "5:39:16", "remaining_time": "2:27:42"}
{"current_steps": 995, "total_steps": 1421, "loss": 0.1669, "lr": 1.0042607620955592e-05, "epoch": 4.46875, "percentage": 70.02, "elapsed_time": "5:58:11", "remaining_time": "2:33:21"}
{"current_steps": 1000, "total_steps": 1421, "loss": 0.1729, "lr": 9.830176364823349e-06, "epoch": 4.493421052631579, "percentage": 70.37, "elapsed_time": "6:16:58", "remaining_time": "2:38:42"}
{"current_steps": 1005, "total_steps": 1421, "loss": 0.1835, "lr": 9.619281443959711e-06, "epoch": 4.5180921052631575, "percentage": 70.72, "elapsed_time": "6:36:31", "remaining_time": "2:44:08"}
{"current_steps": 1010, "total_steps": 1421, "loss": 0.2007, "lr": 9.409954717845861e-06, "epoch": 4.542763157894737, "percentage": 71.08, "elapsed_time": "6:55:24", "remaining_time": "2:49:02"}
{"current_steps": 1015, "total_steps": 1421, "loss": 0.2033, "lr": 9.202227809058912e-06, "epoch": 4.567434210526316, "percentage": 71.43, "elapsed_time": "7:14:15", "remaining_time": "2:53:42"}
{"current_steps": 1020, "total_steps": 1421, "loss": 0.197, "lr": 8.996132098494688e-06, "epoch": 4.592105263157895, "percentage": 71.78, "elapsed_time": "7:33:07", "remaining_time": "2:58:08"}
{"current_steps": 1025, "total_steps": 1421, "loss": 0.1907, "lr": 8.791698720627138e-06, "epoch": 4.6167763157894735, "percentage": 72.13, "elapsed_time": "7:51:57", "remaining_time": "3:02:20"}
{"current_steps": 1030, "total_steps": 1421, "loss": 0.2098, "lr": 8.58895855880484e-06, "epoch": 4.641447368421053, "percentage": 72.48, "elapsed_time": "8:11:01", "remaining_time": "3:06:23"}
{"current_steps": 1035, "total_steps": 1421, "loss": 0.4399, "lr": 8.387942240585587e-06, "epoch": 4.666118421052632, "percentage": 72.84, "elapsed_time": "8:30:30", "remaining_time": "3:10:23"}
{"current_steps": 1040, "total_steps": 1421, "loss": 0.3921, "lr": 8.188680133109485e-06, "epoch": 4.690789473684211, "percentage": 73.19, "elapsed_time": "8:49:32", "remaining_time": "3:13:59"}
{"current_steps": 1045, "total_steps": 1421, "loss": 0.3688, "lr": 7.991202338511477e-06, "epoch": 4.715460526315789, "percentage": 73.54, "elapsed_time": "9:08:37", "remaining_time": "3:17:24"}
{"current_steps": 1050, "total_steps": 1421, "loss": 0.3512, "lr": 7.795538689373859e-06, "epoch": 4.740131578947368, "percentage": 73.89, "elapsed_time": "9:27:34", "remaining_time": "3:20:32"}
{"current_steps": 1055, "total_steps": 1421, "loss": 0.3414, "lr": 7.601718744219555e-06, "epoch": 4.764802631578947, "percentage": 74.24, "elapsed_time": "9:46:31", "remaining_time": "3:23:28"}
{"current_steps": 1060, "total_steps": 1421, "loss": 0.3529, "lr": 7.409771783046733e-06, "epoch": 4.7894736842105265, "percentage": 74.6, "elapsed_time": "10:05:27", "remaining_time": "3:26:11"}
{"current_steps": 1065, "total_steps": 1421, "loss": 0.3477, "lr": 7.219726802905573e-06, "epoch": 4.814144736842105, "percentage": 74.95, "elapsed_time": "10:24:27", "remaining_time": "3:28:44"}
{"current_steps": 1070, "total_steps": 1421, "loss": 0.3307, "lr": 7.0316125135176935e-06, "epoch": 4.838815789473684, "percentage": 75.3, "elapsed_time": "10:43:26", "remaining_time": "3:31:04"}
{"current_steps": 1075, "total_steps": 1421, "loss": 0.3334, "lr": 6.845457332939083e-06, "epoch": 4.863486842105263, "percentage": 75.65, "elapsed_time": "11:02:25", "remaining_time": "3:33:12"}
{"current_steps": 1080, "total_steps": 1421, "loss": 0.2985, "lr": 6.661289383266984e-06, "epoch": 4.8881578947368425, "percentage": 76.0, "elapsed_time": "11:21:23", "remaining_time": "3:35:08"}
{"current_steps": 1085, "total_steps": 1421, "loss": 0.2757, "lr": 6.479136486391599e-06, "epoch": 4.912828947368421, "percentage": 76.35, "elapsed_time": "11:40:21", "remaining_time": "3:36:53"}
{"current_steps": 1005, "total_steps": 1421, "loss": 0.1925, "lr": 9.619281443959711e-06, "epoch": 4.024671052631579, "percentage": 70.72, "elapsed_time": "0:19:02", "remaining_time": "0:07:52"}
{"current_steps": 1010, "total_steps": 1421, "loss": 0.188, "lr": 9.409954717845861e-06, "epoch": 4.0493421052631575, "percentage": 71.08, "elapsed_time": "0:37:52", "remaining_time": "0:15:24"}
{"current_steps": 1015, "total_steps": 1421, "loss": 0.1848, "lr": 9.202227809058912e-06, "epoch": 4.074013157894737, "percentage": 71.43, "elapsed_time": "0:56:43", "remaining_time": "0:22:41"}
{"current_steps": 1020, "total_steps": 1421, "loss": 0.1951, "lr": 8.996132098494688e-06, "epoch": 4.098684210526316, "percentage": 71.78, "elapsed_time": "1:15:30", "remaining_time": "0:29:41"}
{"current_steps": 1025, "total_steps": 1421, "loss": 0.183, "lr": 8.791698720627138e-06, "epoch": 4.123355263157895, "percentage": 72.13, "elapsed_time": "1:34:23", "remaining_time": "0:36:27"}
{"current_steps": 1030, "total_steps": 1421, "loss": 0.1881, "lr": 8.58895855880484e-06, "epoch": 4.1480263157894735, "percentage": 72.48, "elapsed_time": "1:53:10", "remaining_time": "0:42:57"}
{"current_steps": 1035, "total_steps": 1421, "loss": 0.1905, "lr": 8.387942240585587e-06, "epoch": 4.172697368421052, "percentage": 72.84, "elapsed_time": "2:11:56", "remaining_time": "0:49:12"}
{"current_steps": 1040, "total_steps": 1421, "loss": 0.1838, "lr": 8.188680133109485e-06, "epoch": 4.197368421052632, "percentage": 73.19, "elapsed_time": "2:30:42", "remaining_time": "0:55:12"}
{"current_steps": 1045, "total_steps": 1421, "loss": 0.1779, "lr": 7.991202338511477e-06, "epoch": 4.222039473684211, "percentage": 73.54, "elapsed_time": "2:49:32", "remaining_time": "1:01:00"}
{"current_steps": 1050, "total_steps": 1421, "loss": 0.1881, "lr": 7.795538689373859e-06, "epoch": 4.246710526315789, "percentage": 73.89, "elapsed_time": "3:08:45", "remaining_time": "1:06:41"}
{"current_steps": 1055, "total_steps": 1421, "loss": 0.1768, "lr": 7.601718744219555e-06, "epoch": 4.271381578947368, "percentage": 74.24, "elapsed_time": "3:27:36", "remaining_time": "1:12:01"}
{"current_steps": 1060, "total_steps": 1421, "loss": 0.1747, "lr": 7.409771783046733e-06, "epoch": 4.296052631578947, "percentage": 74.6, "elapsed_time": "3:46:22", "remaining_time": "1:17:05"}
{"current_steps": 1065, "total_steps": 1421, "loss": 0.17, "lr": 7.219726802905573e-06, "epoch": 4.3207236842105265, "percentage": 74.95, "elapsed_time": "4:05:09", "remaining_time": "1:21:57"}
{"current_steps": 1070, "total_steps": 1421, "loss": 0.1735, "lr": 7.0316125135176935e-06, "epoch": 4.345394736842105, "percentage": 75.3, "elapsed_time": "4:23:55", "remaining_time": "1:26:34"}
{"current_steps": 1075, "total_steps": 1421, "loss": 0.1717, "lr": 6.845457332939083e-06, "epoch": 4.370065789473684, "percentage": 75.65, "elapsed_time": "4:42:47", "remaining_time": "1:31:01"}
{"current_steps": 1080, "total_steps": 1421, "loss": 0.1848, "lr": 6.661289383266984e-06, "epoch": 4.394736842105263, "percentage": 76.0, "elapsed_time": "5:01:42", "remaining_time": "1:35:15"}
{"current_steps": 1085, "total_steps": 1421, "loss": 0.1709, "lr": 6.479136486391599e-06, "epoch": 4.4194078947368425, "percentage": 76.35, "elapsed_time": "5:20:41", "remaining_time": "1:39:18"}
{"current_steps": 1090, "total_steps": 1421, "loss": 0.1704, "lr": 6.299026159793042e-06, "epoch": 4.444078947368421, "percentage": 76.71, "elapsed_time": "5:39:29", "remaining_time": "1:43:05"}
{"current_steps": 1095, "total_steps": 1421, "loss": 0.1636, "lr": 6.120985612384369e-06, "epoch": 4.46875, "percentage": 77.06, "elapsed_time": "5:58:26", "remaining_time": "1:46:42"}
{"current_steps": 1100, "total_steps": 1421, "loss": 0.1691, "lr": 5.945041740401147e-06, "epoch": 4.493421052631579, "percentage": 77.41, "elapsed_time": "6:17:12", "remaining_time": "1:50:04"}
{"current_steps": 1105, "total_steps": 1421, "loss": 0.1827, "lr": 5.7712211233383104e-06, "epoch": 4.5180921052631575, "percentage": 77.76, "elapsed_time": "6:36:47", "remaining_time": "1:53:28"}
{"current_steps": 1110, "total_steps": 1421, "loss": 0.1998, "lr": 5.5995500199348565e-06, "epoch": 4.542763157894737, "percentage": 78.11, "elapsed_time": "6:55:39", "remaining_time": "1:56:27"}
{"current_steps": 1115, "total_steps": 1421, "loss": 0.2022, "lr": 5.430054364206965e-06, "epoch": 4.567434210526316, "percentage": 78.47, "elapsed_time": "7:14:33", "remaining_time": "1:59:15"}
{"current_steps": 1120, "total_steps": 1421, "loss": 0.1959, "lr": 5.262759761530214e-06, "epoch": 4.592105263157895, "percentage": 78.82, "elapsed_time": "7:33:26", "remaining_time": "2:01:51"}
{"current_steps": 1125, "total_steps": 1421, "loss": 0.1896, "lr": 5.097691484771434e-06, "epoch": 4.6167763157894735, "percentage": 79.17, "elapsed_time": "7:52:18", "remaining_time": "2:04:16"}
{"current_steps": 1130, "total_steps": 1421, "loss": 0.2087, "lr": 4.934874470470756e-06, "epoch": 4.641447368421053, "percentage": 79.52, "elapsed_time": "8:11:14", "remaining_time": "2:06:30"}
{"current_steps": 1135, "total_steps": 1421, "loss": 0.439, "lr": 4.77433331507454e-06, "epoch": 4.666118421052632, "percentage": 79.87, "elapsed_time": "8:30:34", "remaining_time": "2:08:39"}
{"current_steps": 1140, "total_steps": 1421, "loss": 0.3911, "lr": 4.6160922712195875e-06, "epoch": 4.690789473684211, "percentage": 80.23, "elapsed_time": "8:49:36", "remaining_time": "2:10:32"}
{"current_steps": 1145, "total_steps": 1421, "loss": 0.3683, "lr": 4.460175244069395e-06, "epoch": 4.715460526315789, "percentage": 80.58, "elapsed_time": "9:08:34", "remaining_time": "2:12:13"}
{"current_steps": 1150, "total_steps": 1421, "loss": 0.3508, "lr": 4.306605787702802e-06, "epoch": 4.740131578947368, "percentage": 80.93, "elapsed_time": "9:27:35", "remaining_time": "2:13:45"}
{"current_steps": 1155, "total_steps": 1421, "loss": 0.341, "lr": 4.155407101555764e-06, "epoch": 4.764802631578947, "percentage": 81.28, "elapsed_time": "9:46:33", "remaining_time": "2:15:05"}
{"current_steps": 1160, "total_steps": 1421, "loss": 0.3523, "lr": 4.006602026916617e-06, "epoch": 4.7894736842105265, "percentage": 81.63, "elapsed_time": "10:05:36", "remaining_time": "2:16:15"}
{"current_steps": 1165, "total_steps": 1421, "loss": 0.3471, "lr": 3.860213043475531e-06, "epoch": 4.814144736842105, "percentage": 81.98, "elapsed_time": "10:24:39", "remaining_time": "2:17:15"}
{"current_steps": 1170, "total_steps": 1421, "loss": 0.3301, "lr": 3.7162622659285185e-06, "epoch": 4.838815789473684, "percentage": 82.34, "elapsed_time": "10:43:39", "remaining_time": "2:18:05"}
{"current_steps": 1175, "total_steps": 1421, "loss": 0.333, "lr": 3.5747714406366154e-06, "epoch": 4.863486842105263, "percentage": 82.69, "elapsed_time": "11:02:40", "remaining_time": "2:18:44"}
{"current_steps": 1180, "total_steps": 1421, "loss": 0.2992, "lr": 3.435761942340705e-06, "epoch": 4.8881578947368425, "percentage": 83.04, "elapsed_time": "11:21:40", "remaining_time": "2:19:13"}
{"current_steps": 1185, "total_steps": 1421, "loss": 0.275, "lr": 3.2992547709324964e-06, "epoch": 4.912828947368421, "percentage": 83.39, "elapsed_time": "11:40:40", "remaining_time": "2:19:32"}
{"current_steps": 1105, "total_steps": 1421, "loss": 0.1901, "lr": 5.7712211233383104e-06, "epoch": 5.024671052631579, "percentage": 77.76, "elapsed_time": "0:18:40", "remaining_time": "0:05:20"}
{"current_steps": 1110, "total_steps": 1421, "loss": 0.1852, "lr": 5.5995500199348565e-06, "epoch": 5.0493421052631575, "percentage": 78.11, "elapsed_time": "0:37:05", "remaining_time": "0:10:23"}
{"current_steps": 1115, "total_steps": 1421, "loss": 0.182, "lr": 5.430054364206965e-06, "epoch": 5.074013157894737, "percentage": 78.47, "elapsed_time": "0:55:31", "remaining_time": "0:15:14"}
{"current_steps": 1120, "total_steps": 1421, "loss": 0.1921, "lr": 5.262759761530214e-06, "epoch": 5.098684210526316, "percentage": 78.82, "elapsed_time": "1:13:55", "remaining_time": "0:19:52"}
{"current_steps": 1125, "total_steps": 1421, "loss": 0.1797, "lr": 5.097691484771434e-06, "epoch": 5.123355263157895, "percentage": 79.17, "elapsed_time": "1:32:21", "remaining_time": "0:24:17"}
{"current_steps": 1130, "total_steps": 1421, "loss": 0.1847, "lr": 4.934874470470756e-06, "epoch": 5.1480263157894735, "percentage": 79.52, "elapsed_time": "1:50:45", "remaining_time": "0:28:31"}
{"current_steps": 1135, "total_steps": 1421, "loss": 0.187, "lr": 4.77433331507454e-06, "epoch": 5.172697368421052, "percentage": 79.87, "elapsed_time": "2:09:11", "remaining_time": "0:32:33"}
{"current_steps": 1140, "total_steps": 1421, "loss": 0.1803, "lr": 4.6160922712195875e-06, "epoch": 5.197368421052632, "percentage": 80.23, "elapsed_time": "2:27:38", "remaining_time": "0:36:23"}
{"current_steps": 1145, "total_steps": 1421, "loss": 0.1748, "lr": 4.460175244069395e-06, "epoch": 5.222039473684211, "percentage": 80.58, "elapsed_time": "2:46:05", "remaining_time": "0:40:02"}
{"current_steps": 1150, "total_steps": 1421, "loss": 0.1846, "lr": 4.306605787702802e-06, "epoch": 5.246710526315789, "percentage": 80.93, "elapsed_time": "3:04:47", "remaining_time": "0:43:32"}
{"current_steps": 1155, "total_steps": 1421, "loss": 0.1731, "lr": 4.155407101555764e-06, "epoch": 5.271381578947368, "percentage": 81.28, "elapsed_time": "3:23:12", "remaining_time": "0:46:47"}
{"current_steps": 1160, "total_steps": 1421, "loss": 0.1708, "lr": 4.006602026916617e-06, "epoch": 5.296052631578947, "percentage": 81.63, "elapsed_time": "3:41:36", "remaining_time": "0:49:51"}
{"current_steps": 1165, "total_steps": 1421, "loss": 0.1663, "lr": 3.860213043475531e-06, "epoch": 5.3207236842105265, "percentage": 81.98, "elapsed_time": "4:00:02", "remaining_time": "0:52:44"}
{"current_steps": 1170, "total_steps": 1421, "loss": 0.1694, "lr": 3.7162622659285185e-06, "epoch": 5.345394736842105, "percentage": 82.34, "elapsed_time": "4:18:31", "remaining_time": "0:55:27"}
{"current_steps": 1175, "total_steps": 1421, "loss": 0.1677, "lr": 3.5747714406366154e-06, "epoch": 5.370065789473684, "percentage": 82.69, "elapsed_time": "4:37:00", "remaining_time": "0:57:59"}
{"current_steps": 1180, "total_steps": 1421, "loss": 0.1807, "lr": 3.435761942340705e-06, "epoch": 5.394736842105263, "percentage": 83.04, "elapsed_time": "4:55:30", "remaining_time": "1:00:21"}
{"current_steps": 1185, "total_steps": 1421, "loss": 0.168, "lr": 3.2992547709324964e-06, "epoch": 5.4194078947368425, "percentage": 83.39, "elapsed_time": "5:14:00", "remaining_time": "1:02:32"}
{"current_steps": 1190, "total_steps": 1421, "loss": 0.167, "lr": 3.1652705482820665e-06, "epoch": 5.444078947368421, "percentage": 83.74, "elapsed_time": "5:32:26", "remaining_time": "1:04:32"}
{"current_steps": 1195, "total_steps": 1421, "loss": 0.1598, "lr": 3.033829515122608e-06, "epoch": 5.46875, "percentage": 84.1, "elapsed_time": "5:50:56", "remaining_time": "1:06:22"}
{"current_steps": 1200, "total_steps": 1421, "loss": 0.1656, "lr": 2.904951527992652e-06, "epoch": 5.493421052631579, "percentage": 84.45, "elapsed_time": "6:09:23", "remaining_time": "1:08:01"}
{"current_steps": 1205, "total_steps": 1421, "loss": 0.1824, "lr": 2.7786560562364285e-06, "epoch": 5.5180921052631575, "percentage": 84.8, "elapsed_time": "6:28:38", "remaining_time": "1:09:39"}
{"current_steps": 1210, "total_steps": 1421, "loss": 0.199, "lr": 2.6549621790626166e-06, "epoch": 5.542763157894737, "percentage": 85.15, "elapsed_time": "6:47:06", "remaining_time": "1:10:59"}
{"current_steps": 1215, "total_steps": 1421, "loss": 0.2016, "lr": 2.533888582662145e-06, "epoch": 5.567434210526316, "percentage": 85.5, "elapsed_time": "7:05:35", "remaining_time": "1:12:09"}
{"current_steps": 1220, "total_steps": 1421, "loss": 0.1953, "lr": 2.41545355738525e-06, "epoch": 5.592105263157895, "percentage": 85.86, "elapsed_time": "7:23:59", "remaining_time": "1:13:08"}
{"current_steps": 1225, "total_steps": 1421, "loss": 0.189, "lr": 2.299674994978436e-06, "epoch": 5.6167763157894735, "percentage": 86.21, "elapsed_time": "7:42:25", "remaining_time": "1:13:59"}
{"current_steps": 1230, "total_steps": 1421, "loss": 0.2081, "lr": 2.1865703858815656e-06, "epoch": 5.641447368421053, "percentage": 86.56, "elapsed_time": "8:00:55", "remaining_time": "1:14:40"}
{"current_steps": 1235, "total_steps": 1421, "loss": 0.4386, "lr": 2.076156816585639e-06, "epoch": 5.666118421052632, "percentage": 86.91, "elapsed_time": "8:19:48", "remaining_time": "1:15:16"}
{"current_steps": 1240, "total_steps": 1421, "loss": 0.3903, "lr": 1.9684509670515585e-06, "epoch": 5.690789473684211, "percentage": 87.26, "elapsed_time": "8:38:26", "remaining_time": "1:15:40"}
{"current_steps": 1245, "total_steps": 1421, "loss": 0.3671, "lr": 1.86346910819033e-06, "epoch": 5.715460526315789, "percentage": 87.61, "elapsed_time": "8:57:00", "remaining_time": "1:15:54"}
{"current_steps": 1250, "total_steps": 1421, "loss": 0.3496, "lr": 1.7612270994050362e-06, "epoch": 5.740131578947368, "percentage": 87.97, "elapsed_time": "9:15:33", "remaining_time": "1:16:00"}
{"current_steps": 1255, "total_steps": 1421, "loss": 0.3402, "lr": 1.6617403861949898e-06, "epoch": 5.764802631578947, "percentage": 88.32, "elapsed_time": "9:34:08", "remaining_time": "1:15:56"}
{"current_steps": 1260, "total_steps": 1421, "loss": 0.3515, "lr": 1.5650239978224346e-06, "epoch": 5.7894736842105265, "percentage": 88.67, "elapsed_time": "9:52:42", "remaining_time": "1:15:44"}
{"current_steps": 1265, "total_steps": 1421, "loss": 0.3464, "lr": 1.4710925450420632e-06, "epoch": 5.814144736842105, "percentage": 89.02, "elapsed_time": "10:11:18", "remaining_time": "1:15:23"}
{"current_steps": 1270, "total_steps": 1421, "loss": 0.3295, "lr": 1.379960217893841e-06, "epoch": 5.838815789473684, "percentage": 89.37, "elapsed_time": "10:29:52", "remaining_time": "1:14:53"}
{"current_steps": 1275, "total_steps": 1421, "loss": 0.3324, "lr": 1.2916407835593093e-06, "epoch": 5.863486842105263, "percentage": 89.73, "elapsed_time": "10:48:28", "remaining_time": "1:14:15"}
{"current_steps": 1280, "total_steps": 1421, "loss": 0.2991, "lr": 1.2061475842818337e-06, "epoch": 5.8881578947368425, "percentage": 90.08, "elapsed_time": "11:07:05", "remaining_time": "1:13:29"}
{"current_steps": 1285, "total_steps": 1421, "loss": 0.2743, "lr": 1.1234935353509946e-06, "epoch": 5.912828947368421, "percentage": 90.43, "elapsed_time": "11:25:40", "remaining_time": "1:12:34"}
{"current_steps": 1290, "total_steps": 1421, "loss": 0.2614, "lr": 1.0436911231515202e-06, "epoch": 5.9375, "percentage": 90.78, "elapsed_time": "11:44:16", "remaining_time": "1:11:31"}
{"current_steps": 1205, "total_steps": 1421, "loss": 0.1878, "lr": 2.7786560562364285e-06, "epoch": 5.024671052631579, "percentage": 84.8, "elapsed_time": "0:18:38", "remaining_time": "0:03:20"}
{"current_steps": 1210, "total_steps": 1421, "loss": 0.1825, "lr": 2.6549621790626166e-06, "epoch": 5.0493421052631575, "percentage": 85.15, "elapsed_time": "0:37:03", "remaining_time": "0:06:27"}
{"current_steps": 1215, "total_steps": 1421, "loss": 0.1791, "lr": 2.533888582662145e-06, "epoch": 5.074013157894737, "percentage": 85.5, "elapsed_time": "0:55:29", "remaining_time": "0:09:24"}
{"current_steps": 1220, "total_steps": 1421, "loss": 0.189, "lr": 2.41545355738525e-06, "epoch": 5.098684210526316, "percentage": 85.86, "elapsed_time": "1:13:52", "remaining_time": "0:12:10"}
{"current_steps": 1225, "total_steps": 1421, "loss": 0.1765, "lr": 2.299674994978436e-06, "epoch": 5.123355263157895, "percentage": 86.21, "elapsed_time": "1:32:17", "remaining_time": "0:14:46"}
{"current_steps": 1230, "total_steps": 1421, "loss": 0.1813, "lr": 2.1865703858815656e-06, "epoch": 5.1480263157894735, "percentage": 86.56, "elapsed_time": "1:50:43", "remaining_time": "0:17:11"}
{"current_steps": 1235, "total_steps": 1421, "loss": 0.1836, "lr": 2.076156816585639e-06, "epoch": 5.172697368421052, "percentage": 86.91, "elapsed_time": "2:09:10", "remaining_time": "0:19:27"}
{"current_steps": 1240, "total_steps": 1421, "loss": 0.177, "lr": 1.9684509670515585e-06, "epoch": 5.197368421052632, "percentage": 87.26, "elapsed_time": "2:27:38", "remaining_time": "0:21:33"}
{"current_steps": 1245, "total_steps": 1421, "loss": 0.1718, "lr": 1.86346910819033e-06, "epoch": 5.222039473684211, "percentage": 87.61, "elapsed_time": "2:46:04", "remaining_time": "0:23:28"}
{"current_steps": 1250, "total_steps": 1421, "loss": 0.1812, "lr": 1.7612270994050362e-06, "epoch": 5.246710526315789, "percentage": 87.97, "elapsed_time": "3:04:48", "remaining_time": "0:25:16"}
{"current_steps": 1255, "total_steps": 1421, "loss": 0.1702, "lr": 1.6617403861949898e-06, "epoch": 5.271381578947368, "percentage": 88.32, "elapsed_time": "3:23:15", "remaining_time": "0:26:53"}
{"current_steps": 1260, "total_steps": 1421, "loss": 0.1672, "lr": 1.5650239978224346e-06, "epoch": 5.296052631578947, "percentage": 88.67, "elapsed_time": "3:41:40", "remaining_time": "0:28:19"}
{"current_steps": 1265, "total_steps": 1421, "loss": 0.1629, "lr": 1.4710925450420632e-06, "epoch": 5.3207236842105265, "percentage": 89.02, "elapsed_time": "4:00:06", "remaining_time": "0:29:36"}
{"current_steps": 1270, "total_steps": 1421, "loss": 0.1659, "lr": 1.379960217893841e-06, "epoch": 5.345394736842105, "percentage": 89.37, "elapsed_time": "4:18:34", "remaining_time": "0:30:44"}
{"current_steps": 1275, "total_steps": 1421, "loss": 0.1641, "lr": 1.2916407835593093e-06, "epoch": 5.370065789473684, "percentage": 89.73, "elapsed_time": "4:37:03", "remaining_time": "0:31:43"}
{"current_steps": 1280, "total_steps": 1421, "loss": 0.1772, "lr": 1.2061475842818337e-06, "epoch": 5.394736842105263, "percentage": 90.08, "elapsed_time": "4:55:30", "remaining_time": "0:32:33"}
{"current_steps": 1285, "total_steps": 1421, "loss": 0.1638, "lr": 1.1234935353509946e-06, "epoch": 5.4194078947368425, "percentage": 90.43, "elapsed_time": "5:13:59", "remaining_time": "0:33:13"}
{"current_steps": 1290, "total_steps": 1421, "loss": 0.1631, "lr": 1.0436911231515202e-06, "epoch": 5.444078947368421, "percentage": 90.78, "elapsed_time": "5:32:25", "remaining_time": "0:33:45"}
{"current_steps": 1295, "total_steps": 1421, "loss": 0.1565, "lr": 9.667524032769715e-07, "epoch": 5.46875, "percentage": 91.13, "elapsed_time": "5:50:53", "remaining_time": "0:34:08"}
{"current_steps": 1300, "total_steps": 1421, "loss": 0.1619, "lr": 8.926889987085441e-07, "epoch": 5.493421052631579, "percentage": 91.48, "elapsed_time": "6:09:19", "remaining_time": "0:34:22"}
{"current_steps": 1305, "total_steps": 1421, "loss": 0.182, "lr": 8.215120980591984e-07, "epoch": 5.5180921052631575, "percentage": 91.84, "elapsed_time": "6:28:33", "remaining_time": "0:34:32"}
{"current_steps": 1310, "total_steps": 1421, "loss": 0.1985, "lr": 7.532324538834279e-07, "epoch": 5.542763157894737, "percentage": 92.19, "elapsed_time": "6:47:01", "remaining_time": "0:34:29"}
{"current_steps": 1315, "total_steps": 1421, "loss": 0.2009, "lr": 6.878603810528739e-07, "epoch": 5.567434210526316, "percentage": 92.54, "elapsed_time": "7:05:29", "remaining_time": "0:34:17"}
{"current_steps": 1320, "total_steps": 1421, "loss": 0.1945, "lr": 6.25405755198103e-07, "epoch": 5.592105263157895, "percentage": 92.89, "elapsed_time": "7:23:53", "remaining_time": "0:33:57"}
{"current_steps": 1325, "total_steps": 1421, "loss": 0.1882, "lr": 5.658780112166872e-07, "epoch": 5.6167763157894735, "percentage": 93.24, "elapsed_time": "7:42:19", "remaining_time": "0:33:29"}
{"current_steps": 1330, "total_steps": 1421, "loss": 0.2073, "lr": 5.092861418479156e-07, "epoch": 5.641447368421053, "percentage": 93.6, "elapsed_time": "8:00:48", "remaining_time": "0:32:53"}
{"current_steps": 1335, "total_steps": 1421, "loss": 0.438, "lr": 4.556386963142645e-07, "epoch": 5.666118421052632, "percentage": 93.95, "elapsed_time": "8:19:42", "remaining_time": "0:32:11"}
{"current_steps": 1340, "total_steps": 1421, "loss": 0.3895, "lr": 4.04943779029896e-07, "epoch": 5.690789473684211, "percentage": 94.3, "elapsed_time": "8:38:20", "remaining_time": "0:31:19"}
{"current_steps": 1345, "total_steps": 1421, "loss": 0.3666, "lr": 3.5720904837632355e-07, "epoch": 5.715460526315789, "percentage": 94.65, "elapsed_time": "8:56:52", "remaining_time": "0:30:20"}
{"current_steps": 1350, "total_steps": 1421, "loss": 0.3496, "lr": 3.124417155454884e-07, "epoch": 5.740131578947368, "percentage": 95.0, "elapsed_time": "9:15:25", "remaining_time": "0:29:12"}
{"current_steps": 1355, "total_steps": 1421, "loss": 0.3397, "lr": 2.7064854345037585e-07, "epoch": 5.764802631578947, "percentage": 95.36, "elapsed_time": "9:33:59", "remaining_time": "0:27:57"}
{"current_steps": 1360, "total_steps": 1421, "loss": 0.3512, "lr": 2.3183584570335205e-07, "epoch": 5.7894736842105265, "percentage": 95.71, "elapsed_time": "9:52:33", "remaining_time": "0:26:34"}
{"current_steps": 1365, "total_steps": 1421, "loss": 0.3457, "lr": 1.9600948566238287e-07, "epoch": 5.814144736842105, "percentage": 96.06, "elapsed_time": "10:11:08", "remaining_time": "0:25:04"}
{"current_steps": 1370, "total_steps": 1421, "loss": 0.3289, "lr": 1.631748755452667e-07, "epoch": 5.838815789473684, "percentage": 96.41, "elapsed_time": "10:29:43", "remaining_time": "0:23:26"}
{"current_steps": 1375, "total_steps": 1421, "loss": 0.3317, "lr": 1.3333697561201732e-07, "epoch": 5.863486842105263, "percentage": 96.76, "elapsed_time": "10:48:19", "remaining_time": "0:21:41"}
{"current_steps": 1380, "total_steps": 1421, "loss": 0.2982, "lr": 1.0650029341553902e-07, "epoch": 5.8881578947368425, "percentage": 97.11, "elapsed_time": "11:06:54", "remaining_time": "0:19:48"}
{"current_steps": 1385, "total_steps": 1421, "loss": 0.2737, "lr": 8.266888312066013e-08, "epoch": 5.912828947368421, "percentage": 97.47, "elapsed_time": "11:25:31", "remaining_time": "0:17:49"}
{"current_steps": 1390, "total_steps": 1421, "loss": 0.2607, "lr": 6.184634489169838e-08, "epoch": 5.9375, "percentage": 97.82, "elapsed_time": "11:44:07", "remaining_time": "0:15:42"}
{"current_steps": 1305, "total_steps": 1421, "loss": 0.1856, "lr": 8.215120980591984e-07, "epoch": 6.024671052631579, "percentage": 91.84, "elapsed_time": "0:19:07", "remaining_time": "0:01:42"}
{"current_steps": 1310, "total_steps": 1421, "loss": 0.1802, "lr": 7.532324538834279e-07, "epoch": 6.0493421052631575, "percentage": 92.19, "elapsed_time": "0:38:05", "remaining_time": "0:03:13"}
{"current_steps": 1315, "total_steps": 1421, "loss": 0.1765, "lr": 6.878603810528739e-07, "epoch": 6.074013157894737, "percentage": 92.54, "elapsed_time": "0:57:04", "remaining_time": "0:04:36"}
{"current_steps": 1320, "total_steps": 1421, "loss": 0.1861, "lr": 6.25405755198103e-07, "epoch": 6.098684210526316, "percentage": 92.89, "elapsed_time": "1:16:01", "remaining_time": "0:05:49"}
{"current_steps": 1325, "total_steps": 1421, "loss": 0.1735, "lr": 5.658780112166872e-07, "epoch": 6.123355263157895, "percentage": 93.24, "elapsed_time": "1:35:06", "remaining_time": "0:06:53"}
{"current_steps": 1330, "total_steps": 1421, "loss": 0.1781, "lr": 5.092861418479156e-07, "epoch": 6.1480263157894735, "percentage": 93.6, "elapsed_time": "1:54:04", "remaining_time": "0:07:48"}
{"current_steps": 1335, "total_steps": 1421, "loss": 0.1805, "lr": 4.556386963142645e-07, "epoch": 6.172697368421052, "percentage": 93.95, "elapsed_time": "2:13:01", "remaining_time": "0:08:34"}
{"current_steps": 1340, "total_steps": 1421, "loss": 0.1738, "lr": 4.04943779029896e-07, "epoch": 6.197368421052632, "percentage": 94.3, "elapsed_time": "2:32:02", "remaining_time": "0:09:11"}
{"current_steps": 1345, "total_steps": 1421, "loss": 0.1688, "lr": 3.5720904837632355e-07, "epoch": 6.222039473684211, "percentage": 94.65, "elapsed_time": "2:51:00", "remaining_time": "0:09:39"}
{"current_steps": 1350, "total_steps": 1421, "loss": 0.178, "lr": 3.124417155454884e-07, "epoch": 6.246710526315789, "percentage": 95.0, "elapsed_time": "3:10:12", "remaining_time": "0:10:00"}
{"current_steps": 1355, "total_steps": 1421, "loss": 0.167, "lr": 2.7064854345037585e-07, "epoch": 6.271381578947368, "percentage": 95.36, "elapsed_time": "3:29:08", "remaining_time": "0:10:11"}
{"current_steps": 1360, "total_steps": 1421, "loss": 0.1636, "lr": 2.3183584570335205e-07, "epoch": 6.296052631578947, "percentage": 95.71, "elapsed_time": "3:48:00", "remaining_time": "0:10:13"}
{"current_steps": 1365, "total_steps": 1421, "loss": 0.1596, "lr": 1.9600948566238287e-07, "epoch": 6.3207236842105265, "percentage": 96.06, "elapsed_time": "4:06:59", "remaining_time": "0:10:07"}
{"current_steps": 1370, "total_steps": 1421, "loss": 0.1629, "lr": 1.631748755452667e-07, "epoch": 6.345394736842105, "percentage": 96.41, "elapsed_time": "4:25:56", "remaining_time": "0:09:54"}
{"current_steps": 1375, "total_steps": 1421, "loss": 0.1603, "lr": 1.3333697561201732e-07, "epoch": 6.370065789473684, "percentage": 96.76, "elapsed_time": "4:44:59", "remaining_time": "0:09:32"}
{"current_steps": 1380, "total_steps": 1421, "loss": 0.1734, "lr": 1.0650029341553902e-07, "epoch": 6.394736842105263, "percentage": 97.11, "elapsed_time": "5:04:01", "remaining_time": "0:09:01"}
{"current_steps": 1385, "total_steps": 1421, "loss": 0.1609, "lr": 8.266888312066013e-08, "epoch": 6.4194078947368425, "percentage": 97.47, "elapsed_time": "5:23:04", "remaining_time": "0:08:23"}
{"current_steps": 1390, "total_steps": 1421, "loss": 0.1598, "lr": 6.184634489169838e-08, "epoch": 6.444078947368421, "percentage": 97.82, "elapsed_time": "5:42:07", "remaining_time": "0:07:37"}
{"current_steps": 1395, "total_steps": 1421, "loss": 0.1529, "lr": 4.403582434857834e-08, "epoch": 6.46875, "percentage": 98.17, "elapsed_time": "6:01:12", "remaining_time": "0:06:43"}
{"current_steps": 1400, "total_steps": 1421, "loss": 0.1583, "lr": 2.924001209163363e-08, "epoch": 6.493421052631579, "percentage": 98.52, "elapsed_time": "6:20:13", "remaining_time": "0:05:42"}
{"current_steps": 1405, "total_steps": 1421, "loss": 0.1821, "lr": 1.7461143295141036e-08, "epoch": 6.5180921052631575, "percentage": 98.87, "elapsed_time": "6:39:59", "remaining_time": "0:04:33"}
{"current_steps": 1410, "total_steps": 1421, "loss": 0.1983, "lr": 8.700997369659459e-09, "epoch": 6.542763157894737, "percentage": 99.23, "elapsed_time": "6:59:00", "remaining_time": "0:03:16"}
{"current_steps": 1415, "total_steps": 1421, "loss": 0.2006, "lr": 2.9608976932182788e-09, "epoch": 6.567434210526316, "percentage": 99.58, "elapsed_time": "7:18:04", "remaining_time": "0:01:51"}
{"current_steps": 1420, "total_steps": 1421, "loss": 0.1941, "lr": 2.4171141139284204e-10, "epoch": 6.592105263157895, "percentage": 99.93, "elapsed_time": "7:37:04", "remaining_time": "0:00:19"}
{"current_steps": 1421, "total_steps": 1421, "epoch": 6.597039473684211, "percentage": 100.0, "elapsed_time": "7:41:34", "remaining_time": "0:00:00"}
{"current_steps": 1421, "total_steps": 1421, "epoch": 6.597039473684211, "percentage": 100.0, "elapsed_time": "0:00:00", "remaining_time": "0:00:00"}
{"current_steps": 1421, "total_steps": 1421, "epoch": 6.597039473684211, "percentage": 100.0, "elapsed_time": "0:00:00", "remaining_time": "0:00:00"}
{"current_steps": 1421, "total_steps": 1421, "epoch": 6.597039473684211, "percentage": 100.0, "elapsed_time": "0:00:00", "remaining_time": "0:00:00"}
{"current_steps": 1421, "total_steps": 1421, "epoch": 6.597039473684211, "percentage": 100.0, "elapsed_time": "0:00:00", "remaining_time": "0:00:00"}

2599
trainer_state.json Normal file

File diff suppressed because it is too large Load Diff

3
training_args.bin Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:26a93322cc647552142c98e692b5b7e1724f423772fe8341112624c9ac83e64d
size 9105

BIN
training_loss.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 48 KiB

1
vocab.json Normal file

File diff suppressed because one or more lines are too long