初始化项目,由ModelHub XC社区提供模型
Model: laion/nemosci-tasrep-a1mfc-dev1-maxeps__Qwen3-8B Source: Original Platform
This commit is contained in:
36
.gitattributes
vendored
Normal file
36
.gitattributes
vendored
Normal file
@@ -0,0 +1,36 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||
61
README.md
Normal file
61
README.md
Normal file
@@ -0,0 +1,61 @@
|
||||
---
|
||||
library_name: transformers
|
||||
license: other
|
||||
base_model: Qwen/Qwen3-8B
|
||||
tags:
|
||||
- llama-factory
|
||||
- full
|
||||
- generated_from_trainer
|
||||
model-index:
|
||||
- name: nemosci-tasrep-a1mfc-dev1-maxeps__Qwen3-8B
|
||||
results: []
|
||||
---
|
||||
|
||||
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
||||
should probably proofread and complete it, then remove this comment. -->
|
||||
|
||||
# nemosci-tasrep-a1mfc-dev1-maxeps__Qwen3-8B
|
||||
|
||||
This model is a fine-tuned version of [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) on the /e/data1/datasets/playground/ot-baf/hf_hub/datasets--laion--nemotron-terminal-scientific_computing/snapshots/610c7db0b8510b87e3c99b3bd49660bc56821866_thinking_preprocessed, the /e/data1/datasets/playground/ot-baf/hf_hub/datasets--DCAgent--exp_tas_repetition_penalty_1.05_traces/snapshots/b4f5500e00651d5ffc7f8701f8a055d9b2b68a0a_thinking_preprocessed, the /e/data1/datasets/playground/ot-baf/hf_hub/datasets--DCAgent--a1_multifile_composition/snapshots/a19e5e467f3e83605b4de72bb5b7923e5e55efa9_thinking_preprocessed, the /e/data1/datasets/playground/ot-baf/hf_hub/datasets--DCAgent--exp_tas_max_episodes_512_traces/snapshots/236c1dc9aa6d24cf77ce281b5342d93bae685832_thinking_preprocessed and the /e/data1/datasets/playground/ot-baf/hf_hub/datasets--DCAgent--dev_set_part1_10k_glm_4.7_traces_jupiter/snapshots/f1871d1c1446b3b43cbfe2737d0df56cecf3f420_thinking_preprocessed datasets.
|
||||
|
||||
## Model description
|
||||
|
||||
More information needed
|
||||
|
||||
## Intended uses & limitations
|
||||
|
||||
More information needed
|
||||
|
||||
## Training and evaluation data
|
||||
|
||||
More information needed
|
||||
|
||||
## Training procedure
|
||||
|
||||
### Training hyperparameters
|
||||
|
||||
The following hyperparameters were used during training:
|
||||
- learning_rate: 4e-05
|
||||
- train_batch_size: 1
|
||||
- eval_batch_size: 8
|
||||
- seed: 42
|
||||
- distributed_type: multi-GPU
|
||||
- num_devices: 32
|
||||
- gradient_accumulation_steps: 3
|
||||
- total_train_batch_size: 96
|
||||
- total_eval_batch_size: 256
|
||||
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.98) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
||||
- lr_scheduler_type: cosine
|
||||
- lr_scheduler_warmup_ratio: 0.1
|
||||
- num_epochs: 5.0
|
||||
|
||||
### Training results
|
||||
|
||||
|
||||
|
||||
### Framework versions
|
||||
|
||||
- Transformers 4.57.6
|
||||
- Pytorch 2.9.1+cu130
|
||||
- Datasets 4.7.0
|
||||
- Tokenizers 0.22.2
|
||||
28
added_tokens.json
Normal file
28
added_tokens.json
Normal file
@@ -0,0 +1,28 @@
|
||||
{
|
||||
"</think>": 151668,
|
||||
"</tool_call>": 151658,
|
||||
"</tool_response>": 151666,
|
||||
"<think>": 151667,
|
||||
"<tool_call>": 151657,
|
||||
"<tool_response>": 151665,
|
||||
"<|box_end|>": 151649,
|
||||
"<|box_start|>": 151648,
|
||||
"<|endoftext|>": 151643,
|
||||
"<|file_sep|>": 151664,
|
||||
"<|fim_middle|>": 151660,
|
||||
"<|fim_pad|>": 151662,
|
||||
"<|fim_prefix|>": 151659,
|
||||
"<|fim_suffix|>": 151661,
|
||||
"<|im_end|>": 151645,
|
||||
"<|im_start|>": 151644,
|
||||
"<|image_pad|>": 151655,
|
||||
"<|object_ref_end|>": 151647,
|
||||
"<|object_ref_start|>": 151646,
|
||||
"<|quad_end|>": 151651,
|
||||
"<|quad_start|>": 151650,
|
||||
"<|repo_name|>": 151663,
|
||||
"<|video_pad|>": 151656,
|
||||
"<|vision_end|>": 151653,
|
||||
"<|vision_pad|>": 151654,
|
||||
"<|vision_start|>": 151652
|
||||
}
|
||||
16
all_results.json
Normal file
16
all_results.json
Normal file
@@ -0,0 +1,16 @@
|
||||
{
|
||||
"achieved_tflops_per_gpu": 71120.76996898574,
|
||||
"achieved_tflops_per_gpu_theoretical": 2528162.797724658,
|
||||
"epoch": 5.0,
|
||||
"loss_nan_ranks": 0,
|
||||
"loss_rank_avg": 0.36725589632987976,
|
||||
"mfu_percent": 5026.202824663303,
|
||||
"mfu_percent_theoretical": 178668.74895580622,
|
||||
"total_flos": 9.271417366388933e+18,
|
||||
"train_loss": 0.0,
|
||||
"train_runtime": 4.0738,
|
||||
"train_samples_per_second": 71564.652,
|
||||
"train_steps_per_second": 746.232,
|
||||
"valid_targets_mean": 6065.3,
|
||||
"valid_targets_min": 1413
|
||||
}
|
||||
89
chat_template.jinja
Normal file
89
chat_template.jinja
Normal file
@@ -0,0 +1,89 @@
|
||||
{%- if tools %}
|
||||
{{- '<|im_start|>system\n' }}
|
||||
{%- if messages[0].role == 'system' %}
|
||||
{{- messages[0].content + '\n\n' }}
|
||||
{%- endif %}
|
||||
{{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
|
||||
{%- for tool in tools %}
|
||||
{{- "\n" }}
|
||||
{{- tool | tojson }}
|
||||
{%- endfor %}
|
||||
{{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
|
||||
{%- else %}
|
||||
{%- if messages[0].role == 'system' %}
|
||||
{{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
|
||||
{%- for message in messages[::-1] %}
|
||||
{%- set index = (messages|length - 1) - loop.index0 %}
|
||||
{%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
|
||||
{%- set ns.multi_step_tool = false %}
|
||||
{%- set ns.last_query_index = index %}
|
||||
{%- endif %}
|
||||
{%- endfor %}
|
||||
{%- for message in messages %}
|
||||
{%- if message.content is string %}
|
||||
{%- set content = message.content %}
|
||||
{%- else %}
|
||||
{%- set content = '' %}
|
||||
{%- endif %}
|
||||
{%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
|
||||
{{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
|
||||
{%- elif message.role == "assistant" %}
|
||||
{%- set reasoning_content = '' %}
|
||||
{%- if message.reasoning_content is string %}
|
||||
{%- set reasoning_content = message.reasoning_content %}
|
||||
{%- else %}
|
||||
{%- if '</think>' in content %}
|
||||
{%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
|
||||
{%- set content = content.split('</think>')[-1].lstrip('\n') %}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
{%- if loop.index0 > ns.last_query_index %}
|
||||
{%- if loop.last or (not loop.last and reasoning_content) %}
|
||||
{{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
|
||||
{%- else %}
|
||||
{{- '<|im_start|>' + message.role + '\n' + content }}
|
||||
{%- endif %}
|
||||
{%- else %}
|
||||
{{- '<|im_start|>' + message.role + '\n' + content }}
|
||||
{%- endif %}
|
||||
{%- if message.tool_calls %}
|
||||
{%- for tool_call in message.tool_calls %}
|
||||
{%- if (loop.first and content) or (not loop.first) %}
|
||||
{{- '\n' }}
|
||||
{%- endif %}
|
||||
{%- if tool_call.function %}
|
||||
{%- set tool_call = tool_call.function %}
|
||||
{%- endif %}
|
||||
{{- '<tool_call>\n{"name": "' }}
|
||||
{{- tool_call.name }}
|
||||
{{- '", "arguments": ' }}
|
||||
{%- if tool_call.arguments is string %}
|
||||
{{- tool_call.arguments }}
|
||||
{%- else %}
|
||||
{{- tool_call.arguments | tojson }}
|
||||
{%- endif %}
|
||||
{{- '}\n</tool_call>' }}
|
||||
{%- endfor %}
|
||||
{%- endif %}
|
||||
{{- '<|im_end|>\n' }}
|
||||
{%- elif message.role == "tool" %}
|
||||
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
|
||||
{{- '<|im_start|>user' }}
|
||||
{%- endif %}
|
||||
{{- '\n<tool_response>\n' }}
|
||||
{{- content }}
|
||||
{{- '\n</tool_response>' }}
|
||||
{%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
|
||||
{{- '<|im_end|>\n' }}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
{%- endfor %}
|
||||
{%- if add_generation_prompt %}
|
||||
{{- '<|im_start|>assistant\n' }}
|
||||
{%- if enable_thinking is defined and enable_thinking is false %}
|
||||
{{- '<think>\n\n</think>\n\n' }}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
68
config.json
Normal file
68
config.json
Normal file
@@ -0,0 +1,68 @@
|
||||
{
|
||||
"architectures": [
|
||||
"Qwen3ForCausalLM"
|
||||
],
|
||||
"attention_bias": false,
|
||||
"attention_dropout": 0.0,
|
||||
"dtype": "bfloat16",
|
||||
"eos_token_id": 151645,
|
||||
"head_dim": 128,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 4096,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 12288,
|
||||
"layer_types": [
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention"
|
||||
],
|
||||
"max_position_embeddings": 40960,
|
||||
"max_window_layers": 36,
|
||||
"model_type": "qwen3",
|
||||
"num_attention_heads": 32,
|
||||
"num_hidden_layers": 36,
|
||||
"num_key_value_heads": 8,
|
||||
"pad_token_id": 151643,
|
||||
"rms_norm_eps": 1e-06,
|
||||
"rope_scaling": null,
|
||||
"rope_theta": 1000000,
|
||||
"sliding_window": null,
|
||||
"tie_word_embeddings": false,
|
||||
"transformers_version": "4.57.6",
|
||||
"use_cache": false,
|
||||
"use_sliding_window": false,
|
||||
"vocab_size": 151936
|
||||
}
|
||||
12
generation_config.json
Normal file
12
generation_config.json
Normal file
@@ -0,0 +1,12 @@
|
||||
{
|
||||
"do_sample": true,
|
||||
"eos_token_id": [
|
||||
151645,
|
||||
151643
|
||||
],
|
||||
"pad_token_id": 151643,
|
||||
"temperature": 0.6,
|
||||
"top_k": 20,
|
||||
"top_p": 0.95,
|
||||
"transformers_version": "4.57.6"
|
||||
}
|
||||
151388
merges.txt
Normal file
151388
merges.txt
Normal file
File diff suppressed because it is too large
Load Diff
3
model-00001-of-00004.safetensors
Normal file
3
model-00001-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:f5b0df9a1075a8355fa8506ae83639aa8528b805faf5ad554f10250066f7915c
|
||||
size 4902257696
|
||||
3
model-00002-of-00004.safetensors
Normal file
3
model-00002-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:878a33741d51cf29e423afd94be94e8f7e08e7117b0da04a8ac5aa1818f6535f
|
||||
size 4915960368
|
||||
3
model-00003-of-00004.safetensors
Normal file
3
model-00003-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:069262bf94002fcd9174139b38d9ce97f457513acc5869613c2ccc5844a4df65
|
||||
size 4983068496
|
||||
3
model-00004-of-00004.safetensors
Normal file
3
model-00004-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:6ef0f4660c6eb9a0dd632f139677d4e8e31dd42b6ec5584b13a1113c43a02268
|
||||
size 1580230264
|
||||
407
model.safetensors.index.json
Normal file
407
model.safetensors.index.json
Normal file
@@ -0,0 +1,407 @@
|
||||
{
|
||||
"metadata": {
|
||||
"total_parameters": 308224,
|
||||
"total_size": 16381470720
|
||||
},
|
||||
"weight_map": {
|
||||
"lm_head.weight": "model-00004-of-00004.safetensors",
|
||||
"model.embed_tokens.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.20.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.30.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.35.input_layernorm.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.self_attn.k_norm.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.35.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.self_attn.q_norm.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.35.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.norm.weight": "model-00004-of-00004.safetensors"
|
||||
}
|
||||
}
|
||||
12
run_summary.json
Normal file
12
run_summary.json
Normal file
@@ -0,0 +1,12 @@
|
||||
{
|
||||
"agent_name": "f1871d1c1446b3b43cbfe2737d0df56cecf3f420_thinking_preprocessed",
|
||||
"training_start": null,
|
||||
"training_end": null,
|
||||
"created_by": "DCAgent",
|
||||
"base_model_name": "Qwen/Qwen3-8B",
|
||||
"dataset_name": "/e/data1/datasets/playground/ot-baf/hf_hub/datasets--laion--nemotron-terminal-scientific_computing/snapshots/610c7db0b8510b87e3c99b3bd49660bc56821866_thinking_preprocessed,/e/data1/datasets/playground/ot-baf/hf_hub/datasets--DCAgent--exp_tas_repetition_penalty_1.05_traces/snapshots/b4f5500e00651d5ffc7f8701f8a055d9b2b68a0a_thinking_preprocessed,/e/data1/datasets/playground/ot-baf/hf_hub/datasets--DCAgent--a1_multifile_composition/snapshots/a19e5e467f3e83605b4de72bb5b7923e5e55efa9_thinking_preprocessed,/e/data1/datasets/playground/ot-baf/hf_hub/datasets--DCAgent--exp_tas_max_episodes_512_traces/snapshots/236c1dc9aa6d24cf77ce281b5342d93bae685832_thinking_preprocessed,/e/data1/datasets/playground/ot-baf/hf_hub/datasets--DCAgent--dev_set_part1_10k_glm_4.7_traces_jupiter/snapshots/f1871d1c1446b3b43cbfe2737d0df56cecf3f420_thinking_preprocessed",
|
||||
"training_type": "SFT",
|
||||
"training_parameters": "https://huggingface.co/mlfoundations-dev/nemosci-tasrep-a1mfc-dev1-maxeps__Qwen3-8B/blob/main/config.json",
|
||||
"wandb_link": null,
|
||||
"traces_location_s3": null
|
||||
}
|
||||
31
special_tokens_map.json
Normal file
31
special_tokens_map.json
Normal file
@@ -0,0 +1,31 @@
|
||||
{
|
||||
"additional_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>",
|
||||
"<|object_ref_start|>",
|
||||
"<|object_ref_end|>",
|
||||
"<|box_start|>",
|
||||
"<|box_end|>",
|
||||
"<|quad_start|>",
|
||||
"<|quad_end|>",
|
||||
"<|vision_start|>",
|
||||
"<|vision_end|>",
|
||||
"<|vision_pad|>",
|
||||
"<|image_pad|>",
|
||||
"<|video_pad|>"
|
||||
],
|
||||
"eos_token": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
3
tokenizer.json
Normal file
3
tokenizer.json
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
|
||||
size 11422654
|
||||
240
tokenizer_config.json
Normal file
240
tokenizer_config.json
Normal file
@@ -0,0 +1,240 @@
|
||||
{
|
||||
"add_bos_token": false,
|
||||
"add_prefix_space": false,
|
||||
"added_tokens_decoder": {
|
||||
"151643": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151644": {
|
||||
"content": "<|im_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151645": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151646": {
|
||||
"content": "<|object_ref_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151647": {
|
||||
"content": "<|object_ref_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151648": {
|
||||
"content": "<|box_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151649": {
|
||||
"content": "<|box_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151650": {
|
||||
"content": "<|quad_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151651": {
|
||||
"content": "<|quad_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151652": {
|
||||
"content": "<|vision_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151653": {
|
||||
"content": "<|vision_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151654": {
|
||||
"content": "<|vision_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151655": {
|
||||
"content": "<|image_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151656": {
|
||||
"content": "<|video_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151657": {
|
||||
"content": "<tool_call>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151658": {
|
||||
"content": "</tool_call>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151659": {
|
||||
"content": "<|fim_prefix|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151660": {
|
||||
"content": "<|fim_middle|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151661": {
|
||||
"content": "<|fim_suffix|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151662": {
|
||||
"content": "<|fim_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151663": {
|
||||
"content": "<|repo_name|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151664": {
|
||||
"content": "<|file_sep|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151665": {
|
||||
"content": "<tool_response>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151666": {
|
||||
"content": "</tool_response>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151667": {
|
||||
"content": "<think>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151668": {
|
||||
"content": "</think>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
}
|
||||
},
|
||||
"additional_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>",
|
||||
"<|object_ref_start|>",
|
||||
"<|object_ref_end|>",
|
||||
"<|box_start|>",
|
||||
"<|box_end|>",
|
||||
"<|quad_start|>",
|
||||
"<|quad_end|>",
|
||||
"<|vision_start|>",
|
||||
"<|vision_end|>",
|
||||
"<|vision_pad|>",
|
||||
"<|image_pad|>",
|
||||
"<|video_pad|>"
|
||||
],
|
||||
"bos_token": null,
|
||||
"clean_up_tokenization_spaces": false,
|
||||
"eos_token": "<|im_end|>",
|
||||
"errors": "replace",
|
||||
"extra_special_tokens": {},
|
||||
"model_max_length": 32768,
|
||||
"pad_token": "<|endoftext|>",
|
||||
"padding_side": "right",
|
||||
"split_special_tokens": false,
|
||||
"tokenizer_class": "Qwen2Tokenizer",
|
||||
"unk_token": null
|
||||
}
|
||||
12
train_results.json
Normal file
12
train_results.json
Normal file
@@ -0,0 +1,12 @@
|
||||
{
|
||||
"achieved_tflops_per_gpu": 71120.76996898574,
|
||||
"achieved_tflops_per_gpu_theoretical": 2528162.797724658,
|
||||
"epoch": 5.0,
|
||||
"mfu_percent": 5026.202824663303,
|
||||
"mfu_percent_theoretical": 178668.74895580622,
|
||||
"total_flos": 9.271417366388933e+18,
|
||||
"train_loss": 0.0,
|
||||
"train_runtime": 4.0738,
|
||||
"train_samples_per_second": 71564.652,
|
||||
"train_steps_per_second": 746.232
|
||||
}
|
||||
654
trainer_log.jsonl
Normal file
654
trainer_log.jsonl
Normal file
@@ -0,0 +1,654 @@
|
||||
{"current_steps": 5, "total_steps": 3040, "loss": 0.9672, "lr": 5.263157894736843e-07, "epoch": 0.008228195282501372, "percentage": 0.16, "elapsed_time": "0:01:42", "remaining_time": "17:19:30"}
|
||||
{"current_steps": 10, "total_steps": 3040, "loss": 0.9636, "lr": 1.1842105263157894e-06, "epoch": 0.016456390565002744, "percentage": 0.33, "elapsed_time": "0:03:17", "remaining_time": "16:36:40"}
|
||||
{"current_steps": 15, "total_steps": 3040, "loss": 0.9073, "lr": 1.8421052631578948e-06, "epoch": 0.024684585847504114, "percentage": 0.49, "elapsed_time": "0:04:49", "remaining_time": "16:11:59"}
|
||||
{"current_steps": 20, "total_steps": 3040, "loss": 0.8329, "lr": 2.5e-06, "epoch": 0.03291278113000549, "percentage": 0.66, "elapsed_time": "0:06:17", "remaining_time": "15:50:51"}
|
||||
{"current_steps": 25, "total_steps": 3040, "loss": 0.7725, "lr": 3.157894736842105e-06, "epoch": 0.04114097641250686, "percentage": 0.82, "elapsed_time": "0:07:48", "remaining_time": "15:42:39"}
|
||||
{"current_steps": 30, "total_steps": 3040, "loss": 0.7477, "lr": 3.815789473684211e-06, "epoch": 0.04936917169500823, "percentage": 0.99, "elapsed_time": "0:09:21", "remaining_time": "15:38:22"}
|
||||
{"current_steps": 35, "total_steps": 3040, "loss": 0.7078, "lr": 4.473684210526316e-06, "epoch": 0.0575973669775096, "percentage": 1.15, "elapsed_time": "0:10:48", "remaining_time": "15:28:01"}
|
||||
{"current_steps": 40, "total_steps": 3040, "loss": 0.6618, "lr": 5.131578947368422e-06, "epoch": 0.06582556226001098, "percentage": 1.32, "elapsed_time": "0:12:25", "remaining_time": "15:32:05"}
|
||||
{"current_steps": 45, "total_steps": 3040, "loss": 0.6337, "lr": 5.789473684210527e-06, "epoch": 0.07405375754251234, "percentage": 1.48, "elapsed_time": "0:13:57", "remaining_time": "15:29:07"}
|
||||
{"current_steps": 50, "total_steps": 3040, "loss": 0.5998, "lr": 6.447368421052632e-06, "epoch": 0.08228195282501372, "percentage": 1.64, "elapsed_time": "0:15:26", "remaining_time": "15:23:02"}
|
||||
{"current_steps": 55, "total_steps": 3040, "loss": 0.5558, "lr": 7.1052631578947375e-06, "epoch": 0.09051014810751508, "percentage": 1.81, "elapsed_time": "0:17:12", "remaining_time": "15:33:59"}
|
||||
{"current_steps": 60, "total_steps": 3040, "loss": 0.5219, "lr": 7.763157894736843e-06, "epoch": 0.09873834339001646, "percentage": 1.97, "elapsed_time": "0:18:53", "remaining_time": "15:38:16"}
|
||||
{"current_steps": 65, "total_steps": 3040, "loss": 0.4966, "lr": 8.421052631578948e-06, "epoch": 0.10696653867251783, "percentage": 2.14, "elapsed_time": "0:20:47", "remaining_time": "15:51:28"}
|
||||
{"current_steps": 70, "total_steps": 3040, "loss": 0.4791, "lr": 9.078947368421054e-06, "epoch": 0.1151947339550192, "percentage": 2.3, "elapsed_time": "0:22:25", "remaining_time": "15:51:37"}
|
||||
{"current_steps": 75, "total_steps": 3040, "loss": 0.4742, "lr": 9.736842105263159e-06, "epoch": 0.12342292923752057, "percentage": 2.47, "elapsed_time": "0:24:08", "remaining_time": "15:54:17"}
|
||||
{"current_steps": 80, "total_steps": 3040, "loss": 0.4552, "lr": 1.0394736842105264e-05, "epoch": 0.13165112452002195, "percentage": 2.63, "elapsed_time": "0:25:52", "remaining_time": "15:57:30"}
|
||||
{"current_steps": 85, "total_steps": 3040, "loss": 0.4578, "lr": 1.105263157894737e-05, "epoch": 0.1398793198025233, "percentage": 2.8, "elapsed_time": "0:27:38", "remaining_time": "16:00:44"}
|
||||
{"current_steps": 90, "total_steps": 3040, "loss": 0.443, "lr": 1.1710526315789475e-05, "epoch": 0.14810751508502468, "percentage": 2.96, "elapsed_time": "0:29:21", "remaining_time": "16:02:13"}
|
||||
{"current_steps": 95, "total_steps": 3040, "loss": 0.4362, "lr": 1.236842105263158e-05, "epoch": 0.15633571036752605, "percentage": 3.12, "elapsed_time": "0:31:07", "remaining_time": "16:05:04"}
|
||||
{"current_steps": 100, "total_steps": 3040, "loss": 0.4342, "lr": 1.3026315789473684e-05, "epoch": 0.16456390565002743, "percentage": 3.29, "elapsed_time": "0:32:55", "remaining_time": "16:08:00"}
|
||||
{"current_steps": 105, "total_steps": 3040, "loss": 0.4262, "lr": 1.3684210526315791e-05, "epoch": 0.1727921009325288, "percentage": 3.45, "elapsed_time": "0:34:42", "remaining_time": "16:10:22"}
|
||||
{"current_steps": 110, "total_steps": 3040, "loss": 0.4172, "lr": 1.4342105263157895e-05, "epoch": 0.18102029621503016, "percentage": 3.62, "elapsed_time": "0:37:01", "remaining_time": "16:26:03"}
|
||||
{"current_steps": 115, "total_steps": 3040, "loss": 0.4153, "lr": 1.5000000000000002e-05, "epoch": 0.18924849149753153, "percentage": 3.78, "elapsed_time": "0:39:20", "remaining_time": "16:40:27"}
|
||||
{"current_steps": 120, "total_steps": 3040, "loss": 0.4154, "lr": 1.5657894736842107e-05, "epoch": 0.1974766867800329, "percentage": 3.95, "elapsed_time": "0:41:30", "remaining_time": "16:50:09"}
|
||||
{"current_steps": 125, "total_steps": 3040, "loss": 0.4208, "lr": 1.6315789473684213e-05, "epoch": 0.2057048820625343, "percentage": 4.11, "elapsed_time": "0:43:42", "remaining_time": "16:59:06"}
|
||||
{"current_steps": 130, "total_steps": 3040, "loss": 0.4063, "lr": 1.6973684210526318e-05, "epoch": 0.21393307734503567, "percentage": 4.28, "elapsed_time": "0:45:57", "remaining_time": "17:08:39"}
|
||||
{"current_steps": 135, "total_steps": 3040, "loss": 0.3964, "lr": 1.763157894736842e-05, "epoch": 0.22216127262753702, "percentage": 4.44, "elapsed_time": "0:48:10", "remaining_time": "17:16:36"}
|
||||
{"current_steps": 140, "total_steps": 3040, "loss": 0.3988, "lr": 1.828947368421053e-05, "epoch": 0.2303894679100384, "percentage": 4.61, "elapsed_time": "0:50:28", "remaining_time": "17:25:35"}
|
||||
{"current_steps": 145, "total_steps": 3040, "loss": 0.4023, "lr": 1.894736842105263e-05, "epoch": 0.23861766319253977, "percentage": 4.77, "elapsed_time": "0:52:45", "remaining_time": "17:33:16"}
|
||||
{"current_steps": 150, "total_steps": 3040, "loss": 0.3961, "lr": 1.960526315789474e-05, "epoch": 0.24684585847504115, "percentage": 4.93, "elapsed_time": "0:55:01", "remaining_time": "17:40:16"}
|
||||
{"current_steps": 155, "total_steps": 3040, "loss": 0.3967, "lr": 2.0263157894736842e-05, "epoch": 0.2550740537575425, "percentage": 5.1, "elapsed_time": "0:57:21", "remaining_time": "17:47:34"}
|
||||
{"current_steps": 160, "total_steps": 3040, "loss": 0.3946, "lr": 2.0921052631578947e-05, "epoch": 0.2633022490400439, "percentage": 5.26, "elapsed_time": "0:59:33", "remaining_time": "17:52:03"}
|
||||
{"current_steps": 165, "total_steps": 3040, "loss": 0.3925, "lr": 2.1578947368421056e-05, "epoch": 0.27153044432254525, "percentage": 5.43, "elapsed_time": "1:01:48", "remaining_time": "17:57:04"}
|
||||
{"current_steps": 170, "total_steps": 3040, "loss": 0.3894, "lr": 2.223684210526316e-05, "epoch": 0.2797586396050466, "percentage": 5.59, "elapsed_time": "1:04:02", "remaining_time": "18:01:02"}
|
||||
{"current_steps": 175, "total_steps": 3040, "loss": 0.3853, "lr": 2.2894736842105263e-05, "epoch": 0.287986834887548, "percentage": 5.76, "elapsed_time": "1:06:12", "remaining_time": "18:03:58"}
|
||||
{"current_steps": 180, "total_steps": 3040, "loss": 0.3841, "lr": 2.355263157894737e-05, "epoch": 0.29621503017004935, "percentage": 5.92, "elapsed_time": "1:08:29", "remaining_time": "18:08:07"}
|
||||
{"current_steps": 185, "total_steps": 3040, "loss": 0.3787, "lr": 2.4210526315789474e-05, "epoch": 0.30444322545255076, "percentage": 6.09, "elapsed_time": "1:10:39", "remaining_time": "18:10:33"}
|
||||
{"current_steps": 190, "total_steps": 3040, "loss": 0.3812, "lr": 2.4868421052631583e-05, "epoch": 0.3126714207350521, "percentage": 6.25, "elapsed_time": "1:12:54", "remaining_time": "18:13:43"}
|
||||
{"current_steps": 195, "total_steps": 3040, "loss": 0.4022, "lr": 2.5526315789473688e-05, "epoch": 0.32089961601755346, "percentage": 6.41, "elapsed_time": "1:14:37", "remaining_time": "18:08:45"}
|
||||
{"current_steps": 200, "total_steps": 3040, "loss": 0.418, "lr": 2.618421052631579e-05, "epoch": 0.32912781130005486, "percentage": 6.58, "elapsed_time": "1:15:44", "remaining_time": "17:55:36"}
|
||||
{"current_steps": 205, "total_steps": 3040, "loss": 0.6654, "lr": 2.6842105263157896e-05, "epoch": 0.3373560065825562, "percentage": 6.74, "elapsed_time": "1:17:08", "remaining_time": "17:46:47"}
|
||||
{"current_steps": 210, "total_steps": 3040, "loss": 0.8938, "lr": 2.75e-05, "epoch": 0.3455842018650576, "percentage": 6.91, "elapsed_time": "1:19:02", "remaining_time": "17:45:12"}
|
||||
{"current_steps": 215, "total_steps": 3040, "loss": 0.7774, "lr": 2.815789473684211e-05, "epoch": 0.35381239714755897, "percentage": 7.07, "elapsed_time": "1:20:53", "remaining_time": "17:42:47"}
|
||||
{"current_steps": 220, "total_steps": 3040, "loss": 0.8285, "lr": 2.8815789473684215e-05, "epoch": 0.3620405924300603, "percentage": 7.24, "elapsed_time": "1:22:34", "remaining_time": "17:38:24"}
|
||||
{"current_steps": 225, "total_steps": 3040, "loss": 0.7522, "lr": 2.9473684210526317e-05, "epoch": 0.3702687877125617, "percentage": 7.4, "elapsed_time": "1:24:27", "remaining_time": "17:36:35"}
|
||||
{"current_steps": 230, "total_steps": 3040, "loss": 0.7109, "lr": 3.0131578947368423e-05, "epoch": 0.37849698299506307, "percentage": 7.57, "elapsed_time": "1:26:16", "remaining_time": "17:34:01"}
|
||||
{"current_steps": 235, "total_steps": 3040, "loss": 0.718, "lr": 3.078947368421053e-05, "epoch": 0.3867251782775645, "percentage": 7.73, "elapsed_time": "1:28:03", "remaining_time": "17:31:02"}
|
||||
{"current_steps": 240, "total_steps": 3040, "loss": 0.6864, "lr": 3.144736842105264e-05, "epoch": 0.3949533735600658, "percentage": 7.89, "elapsed_time": "1:29:47", "remaining_time": "17:27:31"}
|
||||
{"current_steps": 245, "total_steps": 3040, "loss": 0.704, "lr": 3.210526315789474e-05, "epoch": 0.4031815688425672, "percentage": 8.06, "elapsed_time": "1:31:27", "remaining_time": "17:23:18"}
|
||||
{"current_steps": 250, "total_steps": 3040, "loss": 0.6787, "lr": 3.276315789473684e-05, "epoch": 0.4114097641250686, "percentage": 8.22, "elapsed_time": "1:33:18", "remaining_time": "17:21:22"}
|
||||
{"current_steps": 255, "total_steps": 3040, "loss": 0.6624, "lr": 3.342105263157895e-05, "epoch": 0.4196379594075699, "percentage": 8.39, "elapsed_time": "1:34:58", "remaining_time": "17:17:18"}
|
||||
{"current_steps": 260, "total_steps": 3040, "loss": 0.6722, "lr": 3.407894736842106e-05, "epoch": 0.42786615469007133, "percentage": 8.55, "elapsed_time": "1:36:48", "remaining_time": "17:15:06"}
|
||||
{"current_steps": 265, "total_steps": 3040, "loss": 0.635, "lr": 3.473684210526316e-05, "epoch": 0.4360943499725727, "percentage": 8.72, "elapsed_time": "1:38:32", "remaining_time": "17:11:57"}
|
||||
{"current_steps": 270, "total_steps": 3040, "loss": 0.6713, "lr": 3.539473684210526e-05, "epoch": 0.44432254525507403, "percentage": 8.88, "elapsed_time": "1:40:22", "remaining_time": "17:09:45"}
|
||||
{"current_steps": 275, "total_steps": 3040, "loss": 0.6483, "lr": 3.605263157894737e-05, "epoch": 0.45255074053757544, "percentage": 9.05, "elapsed_time": "1:42:11", "remaining_time": "17:07:30"}
|
||||
{"current_steps": 280, "total_steps": 3040, "loss": 0.6846, "lr": 3.671052631578948e-05, "epoch": 0.4607789358200768, "percentage": 9.21, "elapsed_time": "1:43:53", "remaining_time": "17:04:01"}
|
||||
{"current_steps": 285, "total_steps": 3040, "loss": 0.6585, "lr": 3.736842105263158e-05, "epoch": 0.4690071311025782, "percentage": 9.38, "elapsed_time": "1:45:47", "remaining_time": "17:02:36"}
|
||||
{"current_steps": 290, "total_steps": 3040, "loss": 0.6333, "lr": 3.802631578947369e-05, "epoch": 0.47723532638507954, "percentage": 9.54, "elapsed_time": "1:47:30", "remaining_time": "16:59:32"}
|
||||
{"current_steps": 295, "total_steps": 3040, "loss": 0.6569, "lr": 3.868421052631579e-05, "epoch": 0.4854635216675809, "percentage": 9.7, "elapsed_time": "1:49:18", "remaining_time": "16:57:06"}
|
||||
{"current_steps": 300, "total_steps": 3040, "loss": 0.6432, "lr": 3.9342105263157895e-05, "epoch": 0.4936917169500823, "percentage": 9.87, "elapsed_time": "1:51:03", "remaining_time": "16:54:22"}
|
||||
{"current_steps": 305, "total_steps": 3040, "loss": 0.4498, "lr": 4e-05, "epoch": 0.5019199122325837, "percentage": 10.03, "elapsed_time": "1:52:37", "remaining_time": "16:49:53"}
|
||||
{"current_steps": 310, "total_steps": 3040, "loss": 0.2477, "lr": 3.999967038544942e-05, "epoch": 0.510148107515085, "percentage": 10.2, "elapsed_time": "1:53:47", "remaining_time": "16:42:06"}
|
||||
{"current_steps": 315, "total_steps": 3040, "loss": 0.2281, "lr": 3.9998681552662254e-05, "epoch": 0.5183763027975864, "percentage": 10.36, "elapsed_time": "1:54:57", "remaining_time": "16:34:24"}
|
||||
{"current_steps": 320, "total_steps": 3040, "loss": 0.2203, "lr": 3.999703353423185e-05, "epoch": 0.5266044980800878, "percentage": 10.53, "elapsed_time": "1:56:09", "remaining_time": "16:27:17"}
|
||||
{"current_steps": 325, "total_steps": 3040, "loss": 0.2143, "lr": 3.999472638447933e-05, "epoch": 0.5348326933625891, "percentage": 10.69, "elapsed_time": "1:57:21", "remaining_time": "16:20:26"}
|
||||
{"current_steps": 330, "total_steps": 3040, "loss": 0.2107, "lr": 3.999176017945168e-05, "epoch": 0.5430608886450905, "percentage": 10.86, "elapsed_time": "1:58:33", "remaining_time": "16:13:40"}
|
||||
{"current_steps": 335, "total_steps": 3040, "loss": 0.205, "lr": 3.998813501691934e-05, "epoch": 0.5512890839275919, "percentage": 11.02, "elapsed_time": "1:59:46", "remaining_time": "16:07:06"}
|
||||
{"current_steps": 340, "total_steps": 3040, "loss": 0.2049, "lr": 3.9983851016372945e-05, "epoch": 0.5595172792100932, "percentage": 11.18, "elapsed_time": "2:01:05", "remaining_time": "16:01:40"}
|
||||
{"current_steps": 345, "total_steps": 3040, "loss": 0.2062, "lr": 3.997890831901938e-05, "epoch": 0.5677454744925946, "percentage": 11.35, "elapsed_time": "2:02:27", "remaining_time": "15:56:37"}
|
||||
{"current_steps": 350, "total_steps": 3040, "loss": 0.1978, "lr": 3.997330708777714e-05, "epoch": 0.575973669775096, "percentage": 11.51, "elapsed_time": "2:03:39", "remaining_time": "15:50:24"}
|
||||
{"current_steps": 355, "total_steps": 3040, "loss": 0.197, "lr": 3.996704750727097e-05, "epoch": 0.5842018650575974, "percentage": 11.68, "elapsed_time": "2:05:04", "remaining_time": "15:45:56"}
|
||||
{"current_steps": 360, "total_steps": 3040, "loss": 0.1981, "lr": 3.9960129783825746e-05, "epoch": 0.5924300603400987, "percentage": 11.84, "elapsed_time": "2:06:25", "remaining_time": "15:41:11"}
|
||||
{"current_steps": 365, "total_steps": 3040, "loss": 0.1927, "lr": 3.995255414545969e-05, "epoch": 0.6006582556226001, "percentage": 12.01, "elapsed_time": "2:07:40", "remaining_time": "15:35:38"}
|
||||
{"current_steps": 370, "total_steps": 3040, "loss": 0.1932, "lr": 3.994432084187688e-05, "epoch": 0.6088864509051015, "percentage": 12.17, "elapsed_time": "2:09:00", "remaining_time": "15:30:59"}
|
||||
{"current_steps": 375, "total_steps": 3040, "loss": 0.1935, "lr": 3.993543014445897e-05, "epoch": 0.6171146461876028, "percentage": 12.34, "elapsed_time": "2:10:22", "remaining_time": "15:26:34"}
|
||||
{"current_steps": 380, "total_steps": 3040, "loss": 0.1943, "lr": 3.992588234625629e-05, "epoch": 0.6253428414701042, "percentage": 12.5, "elapsed_time": "2:11:33", "remaining_time": "15:20:54"}
|
||||
{"current_steps": 385, "total_steps": 3040, "loss": 0.1924, "lr": 3.991567776197815e-05, "epoch": 0.6335710367526056, "percentage": 12.66, "elapsed_time": "2:12:45", "remaining_time": "15:15:27"}
|
||||
{"current_steps": 390, "total_steps": 3040, "loss": 0.1941, "lr": 3.990481672798251e-05, "epoch": 0.6417992320351069, "percentage": 12.83, "elapsed_time": "2:14:00", "remaining_time": "15:10:33"}
|
||||
{"current_steps": 395, "total_steps": 3040, "loss": 0.1924, "lr": 3.989329960226486e-05, "epoch": 0.6500274273176083, "percentage": 12.99, "elapsed_time": "2:15:12", "remaining_time": "15:05:25"}
|
||||
{"current_steps": 400, "total_steps": 3040, "loss": 0.1874, "lr": 3.988112676444639e-05, "epoch": 0.6582556226001097, "percentage": 13.16, "elapsed_time": "2:16:20", "remaining_time": "14:59:53"}
|
||||
{"current_steps": 405, "total_steps": 3040, "loss": 0.1854, "lr": 3.9868298615761586e-05, "epoch": 0.6664838178826111, "percentage": 13.32, "elapsed_time": "2:17:30", "remaining_time": "14:54:37"}
|
||||
{"current_steps": 410, "total_steps": 3040, "loss": 0.5087, "lr": 3.9854815579044866e-05, "epoch": 0.6747120131651124, "percentage": 13.49, "elapsed_time": "2:19:12", "remaining_time": "14:53:01"}
|
||||
{"current_steps": 415, "total_steps": 3040, "loss": 0.5508, "lr": 3.984067809871675e-05, "epoch": 0.6829402084476138, "percentage": 13.65, "elapsed_time": "2:21:00", "remaining_time": "14:51:56"}
|
||||
{"current_steps": 420, "total_steps": 3040, "loss": 0.5498, "lr": 3.982588664076916e-05, "epoch": 0.6911684037301152, "percentage": 13.82, "elapsed_time": "2:22:41", "remaining_time": "14:50:09"}
|
||||
{"current_steps": 425, "total_steps": 3040, "loss": 0.5395, "lr": 3.981044169275006e-05, "epoch": 0.6993965990126165, "percentage": 13.98, "elapsed_time": "2:24:30", "remaining_time": "14:49:06"}
|
||||
{"current_steps": 430, "total_steps": 3040, "loss": 0.5325, "lr": 3.979434376374744e-05, "epoch": 0.7076247942951179, "percentage": 14.14, "elapsed_time": "2:26:22", "remaining_time": "14:48:25"}
|
||||
{"current_steps": 435, "total_steps": 3040, "loss": 0.5511, "lr": 3.9777593384372436e-05, "epoch": 0.7158529895776193, "percentage": 14.31, "elapsed_time": "2:28:01", "remaining_time": "14:46:28"}
|
||||
{"current_steps": 440, "total_steps": 3040, "loss": 0.525, "lr": 3.9760191106741935e-05, "epoch": 0.7240811848601206, "percentage": 14.47, "elapsed_time": "2:29:47", "remaining_time": "14:45:10"}
|
||||
{"current_steps": 445, "total_steps": 3040, "loss": 0.4966, "lr": 3.9742137504460326e-05, "epoch": 0.732309380142622, "percentage": 14.64, "elapsed_time": "2:31:42", "remaining_time": "14:44:38"}
|
||||
{"current_steps": 450, "total_steps": 3040, "loss": 0.556, "lr": 3.972343317260061e-05, "epoch": 0.7405375754251234, "percentage": 14.8, "elapsed_time": "2:33:25", "remaining_time": "14:43:02"}
|
||||
{"current_steps": 455, "total_steps": 3040, "loss": 0.5153, "lr": 3.970407872768478e-05, "epoch": 0.7487657707076248, "percentage": 14.97, "elapsed_time": "2:35:11", "remaining_time": "14:41:41"}
|
||||
{"current_steps": 460, "total_steps": 3040, "loss": 0.5377, "lr": 3.968407480766352e-05, "epoch": 0.7569939659901261, "percentage": 15.13, "elapsed_time": "2:36:56", "remaining_time": "14:40:14"}
|
||||
{"current_steps": 465, "total_steps": 3040, "loss": 0.5099, "lr": 3.9663422071895103e-05, "epoch": 0.7652221612726275, "percentage": 15.3, "elapsed_time": "2:38:48", "remaining_time": "14:39:24"}
|
||||
{"current_steps": 470, "total_steps": 3040, "loss": 0.5066, "lr": 3.964212120112379e-05, "epoch": 0.773450356555129, "percentage": 15.46, "elapsed_time": "2:40:39", "remaining_time": "14:38:30"}
|
||||
{"current_steps": 475, "total_steps": 3040, "loss": 0.5529, "lr": 3.962017289745724e-05, "epoch": 0.7816785518376302, "percentage": 15.62, "elapsed_time": "2:42:19", "remaining_time": "14:36:31"}
|
||||
{"current_steps": 480, "total_steps": 3040, "loss": 0.5281, "lr": 3.959757788434351e-05, "epoch": 0.7899067471201316, "percentage": 15.79, "elapsed_time": "2:44:10", "remaining_time": "14:35:38"}
|
||||
{"current_steps": 485, "total_steps": 3040, "loss": 0.5262, "lr": 3.957433690654709e-05, "epoch": 0.798134942402633, "percentage": 15.95, "elapsed_time": "2:45:55", "remaining_time": "14:34:04"}
|
||||
{"current_steps": 490, "total_steps": 3040, "loss": 0.5126, "lr": 3.955045073012443e-05, "epoch": 0.8063631376851343, "percentage": 16.12, "elapsed_time": "2:47:38", "remaining_time": "14:32:23"}
|
||||
{"current_steps": 495, "total_steps": 3040, "loss": 0.5201, "lr": 3.952592014239867e-05, "epoch": 0.8145913329676358, "percentage": 16.28, "elapsed_time": "2:49:19", "remaining_time": "14:30:35"}
|
||||
{"current_steps": 500, "total_steps": 3040, "loss": 0.5038, "lr": 3.950074595193366e-05, "epoch": 0.8228195282501372, "percentage": 16.45, "elapsed_time": "2:51:04", "remaining_time": "14:29:01"}
|
||||
{"current_steps": 505, "total_steps": 3040, "loss": 0.5297, "lr": 3.947492898850736e-05, "epoch": 0.8310477235326386, "percentage": 16.61, "elapsed_time": "2:52:53", "remaining_time": "14:27:51"}
|
||||
{"current_steps": 510, "total_steps": 3040, "loss": 0.5061, "lr": 3.9448470103084436e-05, "epoch": 0.8392759188151399, "percentage": 16.78, "elapsed_time": "2:54:38", "remaining_time": "14:26:21"}
|
||||
{"current_steps": 515, "total_steps": 3040, "loss": 0.4405, "lr": 3.942137016778826e-05, "epoch": 0.8475041140976413, "percentage": 16.94, "elapsed_time": "2:56:54", "remaining_time": "14:27:20"}
|
||||
{"current_steps": 520, "total_steps": 3040, "loss": 0.3552, "lr": 3.939363007587213e-05, "epoch": 0.8557323093801427, "percentage": 17.11, "elapsed_time": "2:59:22", "remaining_time": "14:29:16"}
|
||||
{"current_steps": 525, "total_steps": 3040, "loss": 0.3647, "lr": 3.9365250741689835e-05, "epoch": 0.863960504662644, "percentage": 17.27, "elapsed_time": "3:01:49", "remaining_time": "14:31:03"}
|
||||
{"current_steps": 530, "total_steps": 3040, "loss": 0.3511, "lr": 3.933623310066554e-05, "epoch": 0.8721886999451454, "percentage": 17.43, "elapsed_time": "3:04:19", "remaining_time": "14:32:55"}
|
||||
{"current_steps": 535, "total_steps": 3040, "loss": 0.347, "lr": 3.9306578109262894e-05, "epoch": 0.8804168952276468, "percentage": 17.6, "elapsed_time": "3:06:46", "remaining_time": "14:34:32"}
|
||||
{"current_steps": 540, "total_steps": 3040, "loss": 0.3431, "lr": 3.927628674495357e-05, "epoch": 0.8886450905101481, "percentage": 17.76, "elapsed_time": "3:09:07", "remaining_time": "14:35:34"}
|
||||
{"current_steps": 545, "total_steps": 3040, "loss": 0.3249, "lr": 3.924536000618501e-05, "epoch": 0.8968732857926495, "percentage": 17.93, "elapsed_time": "3:11:36", "remaining_time": "14:37:10"}
|
||||
{"current_steps": 550, "total_steps": 3040, "loss": 0.3254, "lr": 3.921379891234753e-05, "epoch": 0.9051014810751509, "percentage": 18.09, "elapsed_time": "3:13:59", "remaining_time": "14:38:16"}
|
||||
{"current_steps": 555, "total_steps": 3040, "loss": 0.3199, "lr": 3.9181604503740714e-05, "epoch": 0.9133296763576523, "percentage": 18.26, "elapsed_time": "3:16:27", "remaining_time": "14:39:36"}
|
||||
{"current_steps": 560, "total_steps": 3040, "loss": 0.3367, "lr": 3.914877784153909e-05, "epoch": 0.9215578716401536, "percentage": 18.42, "elapsed_time": "3:18:46", "remaining_time": "14:40:19"}
|
||||
{"current_steps": 565, "total_steps": 3040, "loss": 0.3168, "lr": 3.9115320007757225e-05, "epoch": 0.929786066922655, "percentage": 18.59, "elapsed_time": "3:21:15", "remaining_time": "14:41:37"}
|
||||
{"current_steps": 570, "total_steps": 3040, "loss": 0.3229, "lr": 3.9081232105214e-05, "epoch": 0.9380142622051564, "percentage": 18.75, "elapsed_time": "3:23:44", "remaining_time": "14:42:51"}
|
||||
{"current_steps": 575, "total_steps": 3040, "loss": 0.3387, "lr": 3.9046515257496295e-05, "epoch": 0.9462424574876577, "percentage": 18.91, "elapsed_time": "3:26:04", "remaining_time": "14:43:25"}
|
||||
{"current_steps": 580, "total_steps": 3040, "loss": 0.295, "lr": 3.9011170608921904e-05, "epoch": 0.9544706527701591, "percentage": 19.08, "elapsed_time": "3:28:32", "remaining_time": "14:44:28"}
|
||||
{"current_steps": 585, "total_steps": 3040, "loss": 0.3276, "lr": 3.897519932450189e-05, "epoch": 0.9626988480526605, "percentage": 19.24, "elapsed_time": "3:31:00", "remaining_time": "14:45:30"}
|
||||
{"current_steps": 590, "total_steps": 3040, "loss": 0.3026, "lr": 3.893860258990212e-05, "epoch": 0.9709270433351618, "percentage": 19.41, "elapsed_time": "3:33:28", "remaining_time": "14:46:27"}
|
||||
{"current_steps": 595, "total_steps": 3040, "loss": 0.2973, "lr": 3.890138161140421e-05, "epoch": 0.9791552386176632, "percentage": 19.57, "elapsed_time": "3:35:47", "remaining_time": "14:46:44"}
|
||||
{"current_steps": 600, "total_steps": 3040, "loss": 0.3127, "lr": 3.886353761586579e-05, "epoch": 0.9873834339001646, "percentage": 19.74, "elapsed_time": "3:38:10", "remaining_time": "14:47:14"}
|
||||
{"current_steps": 605, "total_steps": 3040, "loss": 0.3009, "lr": 3.8825071850679996e-05, "epoch": 0.9956116291826659, "percentage": 19.9, "elapsed_time": "3:40:45", "remaining_time": "14:48:30"}
|
||||
{"current_steps": 610, "total_steps": 3040, "loss": 0.5467, "lr": 3.878598558373443e-05, "epoch": 1.0032912781130006, "percentage": 20.07, "elapsed_time": "3:42:26", "remaining_time": "14:46:08"}
|
||||
{"current_steps": 615, "total_steps": 3040, "loss": 0.5739, "lr": 3.874628010336932e-05, "epoch": 1.011519473395502, "percentage": 20.23, "elapsed_time": "3:43:59", "remaining_time": "14:43:15"}
|
||||
{"current_steps": 620, "total_steps": 3040, "loss": 0.5011, "lr": 3.870595671833508e-05, "epoch": 1.0197476686780034, "percentage": 20.39, "elapsed_time": "3:45:29", "remaining_time": "14:40:08"}
|
||||
{"current_steps": 625, "total_steps": 3040, "loss": 0.4729, "lr": 3.866501675774914e-05, "epoch": 1.0279758639605046, "percentage": 20.56, "elapsed_time": "3:46:56", "remaining_time": "14:36:54"}
|
||||
{"current_steps": 630, "total_steps": 3040, "loss": 0.444, "lr": 3.862346157105219e-05, "epoch": 1.036204059243006, "percentage": 20.72, "elapsed_time": "3:48:21", "remaining_time": "14:33:35"}
|
||||
{"current_steps": 635, "total_steps": 3040, "loss": 0.4377, "lr": 3.858129252796363e-05, "epoch": 1.0444322545255074, "percentage": 20.89, "elapsed_time": "3:49:44", "remaining_time": "14:30:05"}
|
||||
{"current_steps": 640, "total_steps": 3040, "loss": 0.4291, "lr": 3.853851101843649e-05, "epoch": 1.0526604498080088, "percentage": 21.05, "elapsed_time": "3:51:12", "remaining_time": "14:27:00"}
|
||||
{"current_steps": 645, "total_steps": 3040, "loss": 0.4201, "lr": 3.8495118452611574e-05, "epoch": 1.0608886450905102, "percentage": 21.22, "elapsed_time": "3:52:37", "remaining_time": "14:23:46"}
|
||||
{"current_steps": 650, "total_steps": 3040, "loss": 0.4119, "lr": 3.845111626077097e-05, "epoch": 1.0691168403730116, "percentage": 21.38, "elapsed_time": "3:54:07", "remaining_time": "14:20:49"}
|
||||
{"current_steps": 655, "total_steps": 3040, "loss": 0.4111, "lr": 3.840650589329098e-05, "epoch": 1.077345035655513, "percentage": 21.55, "elapsed_time": "3:55:35", "remaining_time": "14:17:49"}
|
||||
{"current_steps": 660, "total_steps": 3040, "loss": 0.4077, "lr": 3.83612888205942e-05, "epoch": 1.0855732309380142, "percentage": 21.71, "elapsed_time": "3:57:04", "remaining_time": "14:14:54"}
|
||||
{"current_steps": 665, "total_steps": 3040, "loss": 0.3852, "lr": 3.8315466533101154e-05, "epoch": 1.0938014262205156, "percentage": 21.88, "elapsed_time": "3:58:46", "remaining_time": "14:12:45"}
|
||||
{"current_steps": 670, "total_steps": 3040, "loss": 0.3758, "lr": 3.82690405411811e-05, "epoch": 1.102029621503017, "percentage": 22.04, "elapsed_time": "4:00:26", "remaining_time": "14:10:31"}
|
||||
{"current_steps": 675, "total_steps": 3040, "loss": 0.3727, "lr": 3.82220123751023e-05, "epoch": 1.1102578167855184, "percentage": 22.2, "elapsed_time": "4:02:07", "remaining_time": "14:08:19"}
|
||||
{"current_steps": 680, "total_steps": 3040, "loss": 0.3683, "lr": 3.8174383584981525e-05, "epoch": 1.1184860120680198, "percentage": 22.37, "elapsed_time": "4:03:42", "remaining_time": "14:05:49"}
|
||||
{"current_steps": 685, "total_steps": 3040, "loss": 0.3675, "lr": 3.812615574073301e-05, "epoch": 1.1267142073505212, "percentage": 22.53, "elapsed_time": "4:05:20", "remaining_time": "14:03:28"}
|
||||
{"current_steps": 690, "total_steps": 3040, "loss": 0.3624, "lr": 3.807733043201666e-05, "epoch": 1.1349424026330226, "percentage": 22.7, "elapsed_time": "4:07:04", "remaining_time": "14:01:27"}
|
||||
{"current_steps": 695, "total_steps": 3040, "loss": 0.3701, "lr": 3.8027909268185695e-05, "epoch": 1.1431705979155238, "percentage": 22.86, "elapsed_time": "4:08:44", "remaining_time": "13:59:17"}
|
||||
{"current_steps": 700, "total_steps": 3040, "loss": 0.3592, "lr": 3.7977893878233604e-05, "epoch": 1.1513987931980252, "percentage": 23.03, "elapsed_time": "4:10:24", "remaining_time": "13:57:04"}
|
||||
{"current_steps": 705, "total_steps": 3040, "loss": 0.3619, "lr": 3.792728591074041e-05, "epoch": 1.1596269884805266, "percentage": 23.19, "elapsed_time": "4:12:09", "remaining_time": "13:55:11"}
|
||||
{"current_steps": 710, "total_steps": 3040, "loss": 0.3584, "lr": 3.7876087033818345e-05, "epoch": 1.167855183763028, "percentage": 23.36, "elapsed_time": "4:13:46", "remaining_time": "13:52:50"}
|
||||
{"current_steps": 715, "total_steps": 3040, "loss": 0.3579, "lr": 3.78242989350569e-05, "epoch": 1.1760833790455294, "percentage": 23.52, "elapsed_time": "4:15:48", "remaining_time": "13:51:50"}
|
||||
{"current_steps": 720, "total_steps": 3040, "loss": 0.3555, "lr": 3.7771923321467163e-05, "epoch": 1.1843115743280308, "percentage": 23.68, "elapsed_time": "4:18:04", "remaining_time": "13:51:35"}
|
||||
{"current_steps": 725, "total_steps": 3040, "loss": 0.3526, "lr": 3.771896191942556e-05, "epoch": 1.1925397696105322, "percentage": 23.85, "elapsed_time": "4:20:16", "remaining_time": "13:51:04"}
|
||||
{"current_steps": 730, "total_steps": 3040, "loss": 0.3569, "lr": 3.7665416474616986e-05, "epoch": 1.2007679648930334, "percentage": 24.01, "elapsed_time": "4:22:29", "remaining_time": "13:50:36"}
|
||||
{"current_steps": 735, "total_steps": 3040, "loss": 0.362, "lr": 3.761128875197719e-05, "epoch": 1.2089961601755348, "percentage": 24.18, "elapsed_time": "4:24:32", "remaining_time": "13:49:37"}
|
||||
{"current_steps": 740, "total_steps": 3040, "loss": 0.3485, "lr": 3.7556580535634685e-05, "epoch": 1.2172243554580362, "percentage": 24.34, "elapsed_time": "4:26:48", "remaining_time": "13:49:17"}
|
||||
{"current_steps": 745, "total_steps": 3040, "loss": 0.3469, "lr": 3.750129362885188e-05, "epoch": 1.2254525507405376, "percentage": 24.51, "elapsed_time": "4:29:00", "remaining_time": "13:48:40"}
|
||||
{"current_steps": 750, "total_steps": 3040, "loss": 0.3486, "lr": 3.744542985396566e-05, "epoch": 1.233680746023039, "percentage": 24.67, "elapsed_time": "4:31:08", "remaining_time": "13:47:53"}
|
||||
{"current_steps": 755, "total_steps": 3040, "loss": 0.3524, "lr": 3.738899105232734e-05, "epoch": 1.2419089413055404, "percentage": 24.84, "elapsed_time": "4:33:27", "remaining_time": "13:47:38"}
|
||||
{"current_steps": 760, "total_steps": 3040, "loss": 0.3437, "lr": 3.733197908424194e-05, "epoch": 1.2501371365880418, "percentage": 25.0, "elapsed_time": "4:35:35", "remaining_time": "13:46:47"}
|
||||
{"current_steps": 765, "total_steps": 3040, "loss": 0.3521, "lr": 3.727439582890689e-05, "epoch": 1.258365331870543, "percentage": 25.16, "elapsed_time": "4:37:48", "remaining_time": "13:46:10"}
|
||||
{"current_steps": 770, "total_steps": 3040, "loss": 0.3442, "lr": 3.721624318435006e-05, "epoch": 1.2665935271530444, "percentage": 25.33, "elapsed_time": "4:40:00", "remaining_time": "13:45:29"}
|
||||
{"current_steps": 775, "total_steps": 3040, "loss": 0.3491, "lr": 3.715752306736724e-05, "epoch": 1.2748217224355458, "percentage": 25.49, "elapsed_time": "4:42:10", "remaining_time": "13:44:39"}
|
||||
{"current_steps": 780, "total_steps": 3040, "loss": 0.3408, "lr": 3.709823741345894e-05, "epoch": 1.2830499177180472, "percentage": 25.66, "elapsed_time": "4:44:21", "remaining_time": "13:43:53"}
|
||||
{"current_steps": 785, "total_steps": 3040, "loss": 0.3399, "lr": 3.703838817676654e-05, "epoch": 1.2912781130005486, "percentage": 25.82, "elapsed_time": "4:46:33", "remaining_time": "13:43:11"}
|
||||
{"current_steps": 790, "total_steps": 3040, "loss": 0.3432, "lr": 3.6977977330008e-05, "epoch": 1.2995063082830498, "percentage": 25.99, "elapsed_time": "4:48:47", "remaining_time": "13:42:30"}
|
||||
{"current_steps": 795, "total_steps": 3040, "loss": 0.3365, "lr": 3.691700686441272e-05, "epoch": 1.3077345035655512, "percentage": 26.15, "elapsed_time": "4:50:53", "remaining_time": "13:41:26"}
|
||||
{"current_steps": 800, "total_steps": 3040, "loss": 0.3418, "lr": 3.685547878965595e-05, "epoch": 1.3159626988480526, "percentage": 26.32, "elapsed_time": "4:53:06", "remaining_time": "13:40:42"}
|
||||
{"current_steps": 805, "total_steps": 3040, "loss": 0.3654, "lr": 3.679339513379257e-05, "epoch": 1.324190894130554, "percentage": 26.48, "elapsed_time": "4:54:18", "remaining_time": "13:37:07"}
|
||||
{"current_steps": 810, "total_steps": 3040, "loss": 0.3703, "lr": 3.673075794319022e-05, "epoch": 1.3324190894130554, "percentage": 26.64, "elapsed_time": "4:55:24", "remaining_time": "13:33:17"}
|
||||
{"current_steps": 815, "total_steps": 3040, "loss": 0.7179, "lr": 3.6667569282461835e-05, "epoch": 1.3406472846955568, "percentage": 26.81, "elapsed_time": "4:57:06", "remaining_time": "13:31:07"}
|
||||
{"current_steps": 820, "total_steps": 3040, "loss": 0.6726, "lr": 3.660383123439761e-05, "epoch": 1.3488754799780582, "percentage": 26.97, "elapsed_time": "4:58:50", "remaining_time": "13:29:04"}
|
||||
{"current_steps": 825, "total_steps": 3040, "loss": 0.6596, "lr": 3.653954589989637e-05, "epoch": 1.3571036752605594, "percentage": 27.14, "elapsed_time": "5:00:35", "remaining_time": "13:27:01"}
|
||||
{"current_steps": 830, "total_steps": 3040, "loss": 0.6193, "lr": 3.647471539789626e-05, "epoch": 1.3653318705430608, "percentage": 27.3, "elapsed_time": "5:02:21", "remaining_time": "13:25:04"}
|
||||
{"current_steps": 835, "total_steps": 3040, "loss": 0.612, "lr": 3.640934186530496e-05, "epoch": 1.3735600658255622, "percentage": 27.47, "elapsed_time": "5:04:11", "remaining_time": "13:23:17"}
|
||||
{"current_steps": 840, "total_steps": 3040, "loss": 0.6096, "lr": 3.634342745692924e-05, "epoch": 1.3817882611080636, "percentage": 27.63, "elapsed_time": "5:05:53", "remaining_time": "13:21:09"}
|
||||
{"current_steps": 845, "total_steps": 3040, "loss": 0.5952, "lr": 3.62769743454039e-05, "epoch": 1.390016456390565, "percentage": 27.8, "elapsed_time": "5:07:40", "remaining_time": "13:19:12"}
|
||||
{"current_steps": 850, "total_steps": 3040, "loss": 0.6073, "lr": 3.6209984721120195e-05, "epoch": 1.3982446516730664, "percentage": 27.96, "elapsed_time": "5:09:14", "remaining_time": "13:16:44"}
|
||||
{"current_steps": 855, "total_steps": 3040, "loss": 0.5841, "lr": 3.614246079215361e-05, "epoch": 1.4064728469555678, "percentage": 28.12, "elapsed_time": "5:11:03", "remaining_time": "13:14:55"}
|
||||
{"current_steps": 860, "total_steps": 3040, "loss": 0.5723, "lr": 3.6074404784191084e-05, "epoch": 1.414701042238069, "percentage": 28.29, "elapsed_time": "5:12:47", "remaining_time": "13:12:52"}
|
||||
{"current_steps": 865, "total_steps": 3040, "loss": 0.5871, "lr": 3.600581894045768e-05, "epoch": 1.4229292375205704, "percentage": 28.45, "elapsed_time": "5:14:24", "remaining_time": "13:10:35"}
|
||||
{"current_steps": 870, "total_steps": 3040, "loss": 0.5814, "lr": 3.593670552164261e-05, "epoch": 1.4311574328030718, "percentage": 28.62, "elapsed_time": "5:16:15", "remaining_time": "13:08:50"}
|
||||
{"current_steps": 875, "total_steps": 3040, "loss": 0.5616, "lr": 3.586706680582471e-05, "epoch": 1.4393856280855732, "percentage": 28.78, "elapsed_time": "5:17:59", "remaining_time": "13:06:49"}
|
||||
{"current_steps": 880, "total_steps": 3040, "loss": 0.5933, "lr": 3.579690508839738e-05, "epoch": 1.4476138233680746, "percentage": 28.95, "elapsed_time": "5:19:49", "remaining_time": "13:05:01"}
|
||||
{"current_steps": 885, "total_steps": 3040, "loss": 0.5654, "lr": 3.572622268199292e-05, "epoch": 1.455842018650576, "percentage": 29.11, "elapsed_time": "5:21:33", "remaining_time": "13:02:59"}
|
||||
{"current_steps": 890, "total_steps": 3040, "loss": 0.6099, "lr": 3.5655021916406295e-05, "epoch": 1.4640702139330775, "percentage": 29.28, "elapsed_time": "5:23:18", "remaining_time": "13:01:00"}
|
||||
{"current_steps": 895, "total_steps": 3040, "loss": 0.5761, "lr": 3.558330513851833e-05, "epoch": 1.4722984092155786, "percentage": 29.44, "elapsed_time": "5:25:04", "remaining_time": "12:59:06"}
|
||||
{"current_steps": 900, "total_steps": 3040, "loss": 0.5784, "lr": 3.55110747122184e-05, "epoch": 1.48052660449808, "percentage": 29.61, "elapsed_time": "5:26:47", "remaining_time": "12:57:01"}
|
||||
{"current_steps": 905, "total_steps": 3040, "loss": 0.591, "lr": 3.543833301832642e-05, "epoch": 1.4887547997805815, "percentage": 29.77, "elapsed_time": "5:28:45", "remaining_time": "12:55:35"}
|
||||
{"current_steps": 910, "total_steps": 3040, "loss": 0.5635, "lr": 3.5365082454514493e-05, "epoch": 1.4969829950630829, "percentage": 29.93, "elapsed_time": "5:30:31", "remaining_time": "12:53:39"}
|
||||
{"current_steps": 915, "total_steps": 3040, "loss": 0.2335, "lr": 3.529132543522777e-05, "epoch": 1.5052111903455843, "percentage": 30.1, "elapsed_time": "5:31:39", "remaining_time": "12:50:15"}
|
||||
{"current_steps": 920, "total_steps": 3040, "loss": 0.1825, "lr": 3.521706439160494e-05, "epoch": 1.5134393856280854, "percentage": 30.26, "elapsed_time": "5:32:47", "remaining_time": "12:46:51"}
|
||||
{"current_steps": 925, "total_steps": 3040, "loss": 0.1755, "lr": 3.514230177139805e-05, "epoch": 1.521667580910587, "percentage": 30.43, "elapsed_time": "5:33:59", "remaining_time": "12:43:40"}
|
||||
{"current_steps": 930, "total_steps": 3040, "loss": 0.174, "lr": 3.5067040038891834e-05, "epoch": 1.5298957761930883, "percentage": 30.59, "elapsed_time": "5:35:12", "remaining_time": "12:40:30"}
|
||||
{"current_steps": 935, "total_steps": 3040, "loss": 0.1723, "lr": 3.499128167482253e-05, "epoch": 1.5381239714755897, "percentage": 30.76, "elapsed_time": "5:36:25", "remaining_time": "12:37:25"}
|
||||
{"current_steps": 940, "total_steps": 3040, "loss": 0.1712, "lr": 3.491502917629602e-05, "epoch": 1.546352166758091, "percentage": 30.92, "elapsed_time": "5:37:38", "remaining_time": "12:34:18"}
|
||||
{"current_steps": 945, "total_steps": 3040, "loss": 0.1687, "lr": 3.483828505670563e-05, "epoch": 1.5545803620405925, "percentage": 31.09, "elapsed_time": "5:38:53", "remaining_time": "12:31:17"}
|
||||
{"current_steps": 950, "total_steps": 3040, "loss": 0.1731, "lr": 3.476105184564921e-05, "epoch": 1.5628085573230939, "percentage": 31.25, "elapsed_time": "5:40:12", "remaining_time": "12:28:28"}
|
||||
{"current_steps": 955, "total_steps": 3040, "loss": 0.1673, "lr": 3.468333208884576e-05, "epoch": 1.571036752605595, "percentage": 31.41, "elapsed_time": "5:41:31", "remaining_time": "12:25:37"}
|
||||
{"current_steps": 960, "total_steps": 3040, "loss": 0.165, "lr": 3.4605128348051566e-05, "epoch": 1.5792649478880967, "percentage": 31.58, "elapsed_time": "5:42:49", "remaining_time": "12:22:48"}
|
||||
{"current_steps": 965, "total_steps": 3040, "loss": 0.1674, "lr": 3.4526443200975704e-05, "epoch": 1.5874931431705979, "percentage": 31.74, "elapsed_time": "5:44:13", "remaining_time": "12:20:10"}
|
||||
{"current_steps": 970, "total_steps": 3040, "loss": 0.1654, "lr": 3.444727924119511e-05, "epoch": 1.5957213384530993, "percentage": 31.91, "elapsed_time": "5:45:31", "remaining_time": "12:17:21"}
|
||||
{"current_steps": 975, "total_steps": 3040, "loss": 0.1664, "lr": 3.436763907806911e-05, "epoch": 1.6039495337356007, "percentage": 32.07, "elapsed_time": "5:46:50", "remaining_time": "12:14:34"}
|
||||
{"current_steps": 980, "total_steps": 3040, "loss": 0.1626, "lr": 3.4287525336653335e-05, "epoch": 1.612177729018102, "percentage": 32.24, "elapsed_time": "5:48:09", "remaining_time": "12:11:49"}
|
||||
{"current_steps": 985, "total_steps": 3040, "loss": 0.1632, "lr": 3.420694065761328e-05, "epoch": 1.6204059243006035, "percentage": 32.4, "elapsed_time": "5:49:26", "remaining_time": "12:09:02"}
|
||||
{"current_steps": 990, "total_steps": 3040, "loss": 0.1675, "lr": 3.412588769713723e-05, "epoch": 1.6286341195831047, "percentage": 32.57, "elapsed_time": "5:50:34", "remaining_time": "12:05:56"}
|
||||
{"current_steps": 995, "total_steps": 3040, "loss": 0.1641, "lr": 3.40443691268487e-05, "epoch": 1.6368623148656063, "percentage": 32.73, "elapsed_time": "5:51:48", "remaining_time": "12:03:04"}
|
||||
{"current_steps": 1000, "total_steps": 3040, "loss": 0.1656, "lr": 3.396238763371837e-05, "epoch": 1.6450905101481075, "percentage": 32.89, "elapsed_time": "5:53:01", "remaining_time": "12:00:09"}
|
||||
{"current_steps": 1005, "total_steps": 3040, "loss": 0.1658, "lr": 3.387994591997554e-05, "epoch": 1.6533187054306089, "percentage": 33.06, "elapsed_time": "5:54:13", "remaining_time": "11:57:15"}
|
||||
{"current_steps": 1010, "total_steps": 3040, "loss": 0.1614, "lr": 3.379704670301906e-05, "epoch": 1.6615469007131103, "percentage": 33.22, "elapsed_time": "5:55:22", "remaining_time": "11:54:16"}
|
||||
{"current_steps": 1015, "total_steps": 3040, "loss": 0.2498, "lr": 3.371369271532775e-05, "epoch": 1.6697750959956115, "percentage": 33.39, "elapsed_time": "5:56:42", "remaining_time": "11:51:39"}
|
||||
{"current_steps": 1020, "total_steps": 3040, "loss": 0.4966, "lr": 3.362988670437031e-05, "epoch": 1.678003291278113, "percentage": 33.55, "elapsed_time": "5:58:30", "remaining_time": "11:49:59"}
|
||||
{"current_steps": 1025, "total_steps": 3040, "loss": 0.4843, "lr": 3.354563143251483e-05, "epoch": 1.6862314865606143, "percentage": 33.72, "elapsed_time": "6:00:12", "remaining_time": "11:48:07"}
|
||||
{"current_steps": 1030, "total_steps": 3040, "loss": 0.4969, "lr": 3.346092967693764e-05, "epoch": 1.694459681843116, "percentage": 33.88, "elapsed_time": "6:01:54", "remaining_time": "11:46:14"}
|
||||
{"current_steps": 1035, "total_steps": 3040, "loss": 0.4868, "lr": 3.3375784229531864e-05, "epoch": 1.702687877125617, "percentage": 34.05, "elapsed_time": "6:03:43", "remaining_time": "11:44:37"}
|
||||
{"current_steps": 1040, "total_steps": 3040, "loss": 0.4708, "lr": 3.3290197896815344e-05, "epoch": 1.7109160724081185, "percentage": 34.21, "elapsed_time": "6:05:25", "remaining_time": "11:42:44"}
|
||||
{"current_steps": 1045, "total_steps": 3040, "loss": 0.4831, "lr": 3.320417349983813e-05, "epoch": 1.71914426769062, "percentage": 34.38, "elapsed_time": "6:07:12", "remaining_time": "11:41:01"}
|
||||
{"current_steps": 1050, "total_steps": 3040, "loss": 0.455, "lr": 3.3117713874089516e-05, "epoch": 1.727372462973121, "percentage": 34.54, "elapsed_time": "6:08:56", "remaining_time": "11:39:14"}
|
||||
{"current_steps": 1055, "total_steps": 3040, "loss": 0.4822, "lr": 3.303082186940458e-05, "epoch": 1.7356006582556227, "percentage": 34.7, "elapsed_time": "6:10:43", "remaining_time": "11:37:31"}
|
||||
{"current_steps": 1060, "total_steps": 3040, "loss": 0.4692, "lr": 3.294350034987022e-05, "epoch": 1.743828853538124, "percentage": 34.87, "elapsed_time": "6:12:26", "remaining_time": "11:35:41"}
|
||||
{"current_steps": 1065, "total_steps": 3040, "loss": 0.4608, "lr": 3.285575219373079e-05, "epoch": 1.7520570488206253, "percentage": 35.03, "elapsed_time": "6:14:18", "remaining_time": "11:34:08"}
|
||||
{"current_steps": 1070, "total_steps": 3040, "loss": 0.482, "lr": 3.276758029329318e-05, "epoch": 1.7602852441031267, "percentage": 35.2, "elapsed_time": "6:16:00", "remaining_time": "11:32:16"}
|
||||
{"current_steps": 1075, "total_steps": 3040, "loss": 0.4575, "lr": 3.267898755483153e-05, "epoch": 1.768513439385628, "percentage": 35.36, "elapsed_time": "6:17:51", "remaining_time": "11:30:40"}
|
||||
{"current_steps": 1080, "total_steps": 3040, "loss": 0.4827, "lr": 3.258997689849142e-05, "epoch": 1.7767416346681295, "percentage": 35.53, "elapsed_time": "6:19:32", "remaining_time": "11:28:47"}
|
||||
{"current_steps": 1085, "total_steps": 3040, "loss": 0.4813, "lr": 3.250055125819358e-05, "epoch": 1.7849698299506307, "percentage": 35.69, "elapsed_time": "6:21:20", "remaining_time": "11:27:07"}
|
||||
{"current_steps": 1090, "total_steps": 3040, "loss": 0.4659, "lr": 3.241071358153723e-05, "epoch": 1.7931980252331323, "percentage": 35.86, "elapsed_time": "6:23:12", "remaining_time": "11:25:33"}
|
||||
{"current_steps": 1095, "total_steps": 3040, "loss": 0.4793, "lr": 3.232046682970293e-05, "epoch": 1.8014262205156335, "percentage": 36.02, "elapsed_time": "6:24:52", "remaining_time": "11:23:37"}
|
||||
{"current_steps": 1100, "total_steps": 3040, "loss": 0.4687, "lr": 3.2229813977354926e-05, "epoch": 1.809654415798135, "percentage": 36.18, "elapsed_time": "6:26:36", "remaining_time": "11:21:50"}
|
||||
{"current_steps": 1105, "total_steps": 3040, "loss": 0.4568, "lr": 3.213875801254314e-05, "epoch": 1.8178826110806363, "percentage": 36.35, "elapsed_time": "6:28:17", "remaining_time": "11:19:57"}
|
||||
{"current_steps": 1110, "total_steps": 3040, "loss": 0.4743, "lr": 3.204730193660466e-05, "epoch": 1.8261108063631377, "percentage": 36.51, "elapsed_time": "6:29:59", "remaining_time": "11:18:05"}
|
||||
{"current_steps": 1115, "total_steps": 3040, "loss": 0.4771, "lr": 3.195544876406482e-05, "epoch": 1.8343390016456391, "percentage": 36.68, "elapsed_time": "6:31:47", "remaining_time": "11:16:23"}
|
||||
{"current_steps": 1120, "total_steps": 3040, "loss": 0.4355, "lr": 3.1863201522537843e-05, "epoch": 1.8425671969281403, "percentage": 36.84, "elapsed_time": "6:33:34", "remaining_time": "11:14:42"}
|
||||
{"current_steps": 1125, "total_steps": 3040, "loss": 0.3117, "lr": 3.177056325262704e-05, "epoch": 1.850795392210642, "percentage": 37.01, "elapsed_time": "6:36:02", "remaining_time": "11:14:08"}
|
||||
{"current_steps": 1130, "total_steps": 3040, "loss": 0.281, "lr": 3.167753700782457e-05, "epoch": 1.8590235874931431, "percentage": 37.17, "elapsed_time": "6:38:28", "remaining_time": "11:13:31"}
|
||||
{"current_steps": 1135, "total_steps": 3040, "loss": 0.2943, "lr": 3.1584125854410824e-05, "epoch": 1.8672517827756445, "percentage": 37.34, "elapsed_time": "6:40:55", "remaining_time": "11:12:54"}
|
||||
{"current_steps": 1140, "total_steps": 3040, "loss": 0.2902, "lr": 3.149033287135335e-05, "epoch": 1.875479978058146, "percentage": 37.5, "elapsed_time": "6:43:21", "remaining_time": "11:12:16"}
|
||||
{"current_steps": 1145, "total_steps": 3040, "loss": 0.3081, "lr": 3.1396161150205324e-05, "epoch": 1.8837081733406473, "percentage": 37.66, "elapsed_time": "6:45:40", "remaining_time": "11:11:24"}
|
||||
{"current_steps": 1150, "total_steps": 3040, "loss": 0.2738, "lr": 3.130161379500371e-05, "epoch": 1.8919363686231487, "percentage": 37.83, "elapsed_time": "6:48:07", "remaining_time": "11:10:45"}
|
||||
{"current_steps": 1155, "total_steps": 3040, "loss": 0.2918, "lr": 3.120669392216692e-05, "epoch": 1.90016456390565, "percentage": 37.99, "elapsed_time": "6:50:30", "remaining_time": "11:09:57"}
|
||||
{"current_steps": 1160, "total_steps": 3040, "loss": 0.2809, "lr": 3.111140466039205e-05, "epoch": 1.9083927591881515, "percentage": 38.16, "elapsed_time": "6:52:55", "remaining_time": "11:09:13"}
|
||||
{"current_steps": 1165, "total_steps": 3040, "loss": 0.274, "lr": 3.1015749150551835e-05, "epoch": 1.9166209544706527, "percentage": 38.32, "elapsed_time": "6:55:22", "remaining_time": "11:08:31"}
|
||||
{"current_steps": 1170, "total_steps": 3040, "loss": 0.2975, "lr": 3.091973054559106e-05, "epoch": 1.9248491497531541, "percentage": 38.49, "elapsed_time": "6:57:40", "remaining_time": "11:07:34"}
|
||||
{"current_steps": 1175, "total_steps": 3040, "loss": 0.291, "lr": 3.082335201042266e-05, "epoch": 1.9330773450356555, "percentage": 38.65, "elapsed_time": "7:00:08", "remaining_time": "11:06:51"}
|
||||
{"current_steps": 1180, "total_steps": 3040, "loss": 0.3024, "lr": 3.0726616721823394e-05, "epoch": 1.9413055403181567, "percentage": 38.82, "elapsed_time": "7:02:27", "remaining_time": "11:05:54"}
|
||||
{"current_steps": 1185, "total_steps": 3040, "loss": 0.2652, "lr": 3.062952786832912e-05, "epoch": 1.9495337356006583, "percentage": 38.98, "elapsed_time": "7:04:54", "remaining_time": "11:05:09"}
|
||||
{"current_steps": 1190, "total_steps": 3040, "loss": 0.2714, "lr": 3.053208865012973e-05, "epoch": 1.9577619308831595, "percentage": 39.14, "elapsed_time": "7:07:20", "remaining_time": "11:04:21"}
|
||||
{"current_steps": 1195, "total_steps": 3040, "loss": 0.2811, "lr": 3.0434302278963623e-05, "epoch": 1.9659901261656612, "percentage": 39.31, "elapsed_time": "7:09:48", "remaining_time": "11:03:35"}
|
||||
{"current_steps": 1200, "total_steps": 3040, "loss": 0.2645, "lr": 3.0336171978011885e-05, "epoch": 1.9742183214481623, "percentage": 39.47, "elapsed_time": "7:12:15", "remaining_time": "11:02:47"}
|
||||
{"current_steps": 1205, "total_steps": 3040, "loss": 0.2844, "lr": 3.0237700981792023e-05, "epoch": 1.9824465167306637, "percentage": 39.64, "elapsed_time": "7:14:37", "remaining_time": "11:01:51"}
|
||||
{"current_steps": 1210, "total_steps": 3040, "loss": 0.269, "lr": 3.013889253605135e-05, "epoch": 1.9906747120131651, "percentage": 39.8, "elapsed_time": "7:17:04", "remaining_time": "11:01:02"}
|
||||
{"current_steps": 1215, "total_steps": 3040, "loss": 0.2789, "lr": 3.0039749897660005e-05, "epoch": 1.9989029072956663, "percentage": 39.97, "elapsed_time": "7:19:30", "remaining_time": "11:00:09"}
|
||||
{"current_steps": 1220, "total_steps": 3040, "loss": 0.6179, "lr": 2.9940276334503617e-05, "epoch": 2.006582556226001, "percentage": 40.13, "elapsed_time": "7:20:51", "remaining_time": "10:57:40"}
|
||||
{"current_steps": 1225, "total_steps": 3040, "loss": 0.4752, "lr": 2.984047512537557e-05, "epoch": 2.0148107515085023, "percentage": 40.3, "elapsed_time": "7:22:21", "remaining_time": "10:55:24"}
|
||||
{"current_steps": 1230, "total_steps": 3040, "loss": 0.4364, "lr": 2.9740349559868918e-05, "epoch": 2.023038946791004, "percentage": 40.46, "elapsed_time": "7:23:52", "remaining_time": "10:53:10"}
|
||||
{"current_steps": 1235, "total_steps": 3040, "loss": 0.4043, "lr": 2.9639902938267994e-05, "epoch": 2.031267142073505, "percentage": 40.62, "elapsed_time": "7:25:14", "remaining_time": "10:50:44"}
|
||||
{"current_steps": 1240, "total_steps": 3040, "loss": 0.3873, "lr": 2.9539138571439614e-05, "epoch": 2.0394953373560067, "percentage": 40.79, "elapsed_time": "7:26:40", "remaining_time": "10:48:24"}
|
||||
{"current_steps": 1245, "total_steps": 3040, "loss": 0.3884, "lr": 2.943805978072391e-05, "epoch": 2.047723532638508, "percentage": 40.95, "elapsed_time": "7:28:06", "remaining_time": "10:46:03"}
|
||||
{"current_steps": 1250, "total_steps": 3040, "loss": 0.3801, "lr": 2.933666989782491e-05, "epoch": 2.055951727921009, "percentage": 41.12, "elapsed_time": "7:29:32", "remaining_time": "10:43:45"}
|
||||
{"current_steps": 1255, "total_steps": 3040, "loss": 0.3696, "lr": 2.9234972264700687e-05, "epoch": 2.0641799232035107, "percentage": 41.28, "elapsed_time": "7:31:04", "remaining_time": "10:41:33"}
|
||||
{"current_steps": 1260, "total_steps": 3040, "loss": 0.3725, "lr": 2.913297023345319e-05, "epoch": 2.072408118486012, "percentage": 41.45, "elapsed_time": "7:32:32", "remaining_time": "10:39:18"}
|
||||
{"current_steps": 1265, "total_steps": 3040, "loss": 0.3689, "lr": 2.903066716621779e-05, "epoch": 2.0806363137685135, "percentage": 41.61, "elapsed_time": "7:33:56", "remaining_time": "10:36:57"}
|
||||
{"current_steps": 1270, "total_steps": 3040, "loss": 0.3582, "lr": 2.892806643505245e-05, "epoch": 2.0888645090510147, "percentage": 41.78, "elapsed_time": "7:35:37", "remaining_time": "10:35:00"}
|
||||
{"current_steps": 1275, "total_steps": 3040, "loss": 0.3366, "lr": 2.8825171421826555e-05, "epoch": 2.0970927043335164, "percentage": 41.94, "elapsed_time": "7:37:16", "remaining_time": "10:33:00"}
|
||||
{"current_steps": 1280, "total_steps": 3040, "loss": 0.3355, "lr": 2.8721985518109457e-05, "epoch": 2.1053208996160175, "percentage": 42.11, "elapsed_time": "7:38:57", "remaining_time": "10:31:04"}
|
||||
{"current_steps": 1285, "total_steps": 3040, "loss": 0.325, "lr": 2.861851212505869e-05, "epoch": 2.1135490948985187, "percentage": 42.27, "elapsed_time": "7:40:35", "remaining_time": "10:29:03"}
|
||||
{"current_steps": 1290, "total_steps": 3040, "loss": 0.3343, "lr": 2.8514754653307836e-05, "epoch": 2.1217772901810203, "percentage": 42.43, "elapsed_time": "7:42:10", "remaining_time": "10:26:59"}
|
||||
{"current_steps": 1295, "total_steps": 3040, "loss": 0.3245, "lr": 2.8410716522854152e-05, "epoch": 2.1300054854635215, "percentage": 42.6, "elapsed_time": "7:43:51", "remaining_time": "10:25:02"}
|
||||
{"current_steps": 1300, "total_steps": 3040, "loss": 0.3325, "lr": 2.8306401162945795e-05, "epoch": 2.138233680746023, "percentage": 42.76, "elapsed_time": "7:45:35", "remaining_time": "10:23:11"}
|
||||
{"current_steps": 1305, "total_steps": 3040, "loss": 0.3282, "lr": 2.8201812011968807e-05, "epoch": 2.1464618760285243, "percentage": 42.93, "elapsed_time": "7:47:14", "remaining_time": "10:21:12"}
|
||||
{"current_steps": 1310, "total_steps": 3040, "loss": 0.3265, "lr": 2.809695251733379e-05, "epoch": 2.154690071311026, "percentage": 43.09, "elapsed_time": "7:49:00", "remaining_time": "10:19:22"}
|
||||
{"current_steps": 1315, "total_steps": 3040, "loss": 0.3264, "lr": 2.799182613536226e-05, "epoch": 2.162918266593527, "percentage": 43.26, "elapsed_time": "7:50:44", "remaining_time": "10:17:30"}
|
||||
{"current_steps": 1320, "total_steps": 3040, "loss": 0.3257, "lr": 2.7886436331172745e-05, "epoch": 2.1711464618760283, "percentage": 43.42, "elapsed_time": "7:52:20", "remaining_time": "10:15:28"}
|
||||
{"current_steps": 1325, "total_steps": 3040, "loss": 0.3213, "lr": 2.7780786578566524e-05, "epoch": 2.17937465715853, "percentage": 43.59, "elapsed_time": "7:54:33", "remaining_time": "10:14:14"}
|
||||
{"current_steps": 1330, "total_steps": 3040, "loss": 0.3235, "lr": 2.7674880359913183e-05, "epoch": 2.187602852441031, "percentage": 43.75, "elapsed_time": "7:56:50", "remaining_time": "10:13:04"}
|
||||
{"current_steps": 1335, "total_steps": 3040, "loss": 0.3254, "lr": 2.7568721166035778e-05, "epoch": 2.1958310477235328, "percentage": 43.91, "elapsed_time": "7:58:58", "remaining_time": "10:11:43"}
|
||||
{"current_steps": 1340, "total_steps": 3040, "loss": 0.3337, "lr": 2.7462312496095805e-05, "epoch": 2.204059243006034, "percentage": 44.08, "elapsed_time": "8:01:07", "remaining_time": "10:10:22"}
|
||||
{"current_steps": 1345, "total_steps": 3040, "loss": 0.3271, "lr": 2.735565785747787e-05, "epoch": 2.2122874382885356, "percentage": 44.24, "elapsed_time": "8:03:18", "remaining_time": "10:09:04"}
|
||||
{"current_steps": 1350, "total_steps": 3040, "loss": 0.3156, "lr": 2.7248760765674033e-05, "epoch": 2.2205156335710368, "percentage": 44.41, "elapsed_time": "8:05:31", "remaining_time": "10:07:48"}
|
||||
{"current_steps": 1355, "total_steps": 3040, "loss": 0.3177, "lr": 2.7141624744168e-05, "epoch": 2.228743828853538, "percentage": 44.57, "elapsed_time": "8:07:45", "remaining_time": "10:06:33"}
|
||||
{"current_steps": 1360, "total_steps": 3040, "loss": 0.3222, "lr": 2.703425332431891e-05, "epoch": 2.2369720241360396, "percentage": 44.74, "elapsed_time": "8:09:55", "remaining_time": "10:05:12"}
|
||||
{"current_steps": 1365, "total_steps": 3040, "loss": 0.3216, "lr": 2.6926650045245014e-05, "epoch": 2.2452002194185408, "percentage": 44.9, "elapsed_time": "8:12:10", "remaining_time": "10:03:57"}
|
||||
{"current_steps": 1370, "total_steps": 3040, "loss": 0.3191, "lr": 2.6818818453706944e-05, "epoch": 2.2534284147010424, "percentage": 45.07, "elapsed_time": "8:14:24", "remaining_time": "10:02:39"}
|
||||
{"current_steps": 1375, "total_steps": 3040, "loss": 0.323, "lr": 2.6710762103990856e-05, "epoch": 2.2616566099835436, "percentage": 45.23, "elapsed_time": "8:16:34", "remaining_time": "10:01:18"}
|
||||
{"current_steps": 1380, "total_steps": 3040, "loss": 0.318, "lr": 2.660248455779128e-05, "epoch": 2.269884805266045, "percentage": 45.39, "elapsed_time": "8:18:45", "remaining_time": "9:59:57"}
|
||||
{"current_steps": 1385, "total_steps": 3040, "loss": 0.321, "lr": 2.6493989384093674e-05, "epoch": 2.2781130005485464, "percentage": 45.56, "elapsed_time": "8:20:58", "remaining_time": "9:58:38"}
|
||||
{"current_steps": 1390, "total_steps": 3040, "loss": 0.3144, "lr": 2.6385280159056838e-05, "epoch": 2.2863411958310476, "percentage": 45.72, "elapsed_time": "8:23:03", "remaining_time": "9:57:09"}
|
||||
{"current_steps": 1395, "total_steps": 3040, "loss": 0.3172, "lr": 2.6276360465895004e-05, "epoch": 2.294569391113549, "percentage": 45.89, "elapsed_time": "8:25:20", "remaining_time": "9:55:54"}
|
||||
{"current_steps": 1400, "total_steps": 3040, "loss": 0.3109, "lr": 2.6167233894759743e-05, "epoch": 2.3027975863960504, "percentage": 46.05, "elapsed_time": "8:27:30", "remaining_time": "9:54:30"}
|
||||
{"current_steps": 1405, "total_steps": 3040, "loss": 0.3157, "lr": 2.6057904042621625e-05, "epoch": 2.311025781678552, "percentage": 46.22, "elapsed_time": "8:29:38", "remaining_time": "9:53:04"}
|
||||
{"current_steps": 1410, "total_steps": 3040, "loss": 0.3236, "lr": 2.5948374513151668e-05, "epoch": 2.319253976961053, "percentage": 46.38, "elapsed_time": "8:31:33", "remaining_time": "9:51:22"}
|
||||
{"current_steps": 1415, "total_steps": 3040, "loss": 0.3357, "lr": 2.583864891660252e-05, "epoch": 2.3274821722435544, "percentage": 46.55, "elapsed_time": "8:32:37", "remaining_time": "9:48:42"}
|
||||
{"current_steps": 1420, "total_steps": 3040, "loss": 0.4375, "lr": 2.5728730869689505e-05, "epoch": 2.335710367526056, "percentage": 46.71, "elapsed_time": "8:33:52", "remaining_time": "9:46:14"}
|
||||
{"current_steps": 1425, "total_steps": 3040, "loss": 0.6956, "lr": 2.5618623995471394e-05, "epoch": 2.343938562808557, "percentage": 46.88, "elapsed_time": "8:35:43", "remaining_time": "9:44:29"}
|
||||
{"current_steps": 1430, "total_steps": 3040, "loss": 0.6226, "lr": 2.5508331923230963e-05, "epoch": 2.352166758091059, "percentage": 47.04, "elapsed_time": "8:37:25", "remaining_time": "9:42:33"}
|
||||
{"current_steps": 1435, "total_steps": 3040, "loss": 0.6106, "lr": 2.5397858288355397e-05, "epoch": 2.36039495337356, "percentage": 47.2, "elapsed_time": "8:39:06", "remaining_time": "9:40:36"}
|
||||
{"current_steps": 1440, "total_steps": 3040, "loss": 0.5819, "lr": 2.5287206732216453e-05, "epoch": 2.3686231486560616, "percentage": 47.37, "elapsed_time": "8:40:57", "remaining_time": "9:38:51"}
|
||||
{"current_steps": 1445, "total_steps": 3040, "loss": 0.5457, "lr": 2.5176380902050418e-05, "epoch": 2.376851343938563, "percentage": 47.53, "elapsed_time": "8:42:45", "remaining_time": "9:37:01"}
|
||||
{"current_steps": 1450, "total_steps": 3040, "loss": 0.5422, "lr": 2.5065384450837916e-05, "epoch": 2.3850795392210644, "percentage": 47.7, "elapsed_time": "8:44:29", "remaining_time": "9:35:07"}
|
||||
{"current_steps": 1455, "total_steps": 3040, "loss": 0.5542, "lr": 2.495422103718349e-05, "epoch": 2.3933077345035656, "percentage": 47.86, "elapsed_time": "8:46:13", "remaining_time": "9:33:14"}
|
||||
{"current_steps": 1460, "total_steps": 3040, "loss": 0.5454, "lr": 2.4842894325194996e-05, "epoch": 2.401535929786067, "percentage": 48.03, "elapsed_time": "8:47:45", "remaining_time": "9:31:08"}
|
||||
{"current_steps": 1465, "total_steps": 3040, "loss": 0.5415, "lr": 2.473140798436285e-05, "epoch": 2.4097641250685684, "percentage": 48.19, "elapsed_time": "8:49:37", "remaining_time": "9:29:23"}
|
||||
{"current_steps": 1470, "total_steps": 3040, "loss": 0.5188, "lr": 2.4619765689439064e-05, "epoch": 2.4179923203510696, "percentage": 48.36, "elapsed_time": "8:51:14", "remaining_time": "9:27:23"}
|
||||
{"current_steps": 1475, "total_steps": 3040, "loss": 0.5321, "lr": 2.4507971120316128e-05, "epoch": 2.426220515633571, "percentage": 48.52, "elapsed_time": "8:52:59", "remaining_time": "9:25:30"}
|
||||
{"current_steps": 1480, "total_steps": 3040, "loss": 0.5298, "lr": 2.4396027961905704e-05, "epoch": 2.4344487109160724, "percentage": 48.68, "elapsed_time": "8:54:50", "remaining_time": "9:23:44"}
|
||||
{"current_steps": 1485, "total_steps": 3040, "loss": 0.5344, "lr": 2.4283939904017183e-05, "epoch": 2.4426769061985736, "percentage": 48.85, "elapsed_time": "8:56:32", "remaining_time": "9:21:49"}
|
||||
{"current_steps": 1490, "total_steps": 3040, "loss": 0.5288, "lr": 2.4171710641236045e-05, "epoch": 2.450905101481075, "percentage": 49.01, "elapsed_time": "8:58:21", "remaining_time": "9:20:02"}
|
||||
{"current_steps": 1495, "total_steps": 3040, "loss": 0.5464, "lr": 2.4059343872802084e-05, "epoch": 2.4591332967635764, "percentage": 49.18, "elapsed_time": "9:00:01", "remaining_time": "9:18:05"}
|
||||
{"current_steps": 1500, "total_steps": 3040, "loss": 0.5458, "lr": 2.3946843302487497e-05, "epoch": 2.467361492046078, "percentage": 49.34, "elapsed_time": "9:01:54", "remaining_time": "9:16:21"}
|
||||
{"current_steps": 1505, "total_steps": 3040, "loss": 0.5307, "lr": 2.3834212638474773e-05, "epoch": 2.475589687328579, "percentage": 49.51, "elapsed_time": "9:03:49", "remaining_time": "9:14:39"}
|
||||
{"current_steps": 1510, "total_steps": 3040, "loss": 0.5428, "lr": 2.372145559323448e-05, "epoch": 2.483817882611081, "percentage": 49.67, "elapsed_time": "9:05:29", "remaining_time": "9:12:43"}
|
||||
{"current_steps": 1515, "total_steps": 3040, "loss": 0.5438, "lr": 2.3608575883402903e-05, "epoch": 2.492046077893582, "percentage": 49.84, "elapsed_time": "9:07:16", "remaining_time": "9:10:53"}
|
||||
{"current_steps": 1520, "total_steps": 3040, "loss": 0.4163, "lr": 2.3495577229659515e-05, "epoch": 2.5002742731760836, "percentage": 50.0, "elapsed_time": "9:08:47", "remaining_time": "9:08:47"}
|
||||
{"current_steps": 1525, "total_steps": 3040, "loss": 0.1673, "lr": 2.3382463356604378e-05, "epoch": 2.508502468458585, "percentage": 50.16, "elapsed_time": "9:09:57", "remaining_time": "9:06:21"}
|
||||
{"current_steps": 1530, "total_steps": 3040, "loss": 0.1557, "lr": 2.3269237992635318e-05, "epoch": 2.516730663741086, "percentage": 50.33, "elapsed_time": "9:11:07", "remaining_time": "9:03:55"}
|
||||
{"current_steps": 1535, "total_steps": 3040, "loss": 0.1538, "lr": 2.31559048698251e-05, "epoch": 2.5249588590235876, "percentage": 50.49, "elapsed_time": "9:12:19", "remaining_time": "9:01:31"}
|
||||
{"current_steps": 1540, "total_steps": 3040, "loss": 0.1473, "lr": 2.3042467723798335e-05, "epoch": 2.533187054306089, "percentage": 50.66, "elapsed_time": "9:13:31", "remaining_time": "8:59:08"}
|
||||
{"current_steps": 1545, "total_steps": 3040, "loss": 0.1494, "lr": 2.2928930293608435e-05, "epoch": 2.5414152495885904, "percentage": 50.82, "elapsed_time": "9:14:42", "remaining_time": "8:56:44"}
|
||||
{"current_steps": 1550, "total_steps": 3040, "loss": 0.1451, "lr": 2.281529632161429e-05, "epoch": 2.5496434448710916, "percentage": 50.99, "elapsed_time": "9:15:53", "remaining_time": "8:54:22"}
|
||||
{"current_steps": 1555, "total_steps": 3040, "loss": 0.1497, "lr": 2.2701569553356963e-05, "epoch": 2.557871640153593, "percentage": 51.15, "elapsed_time": "9:17:14", "remaining_time": "8:52:08"}
|
||||
{"current_steps": 1560, "total_steps": 3040, "loss": 0.1477, "lr": 2.2587753737436217e-05, "epoch": 2.5660998354360944, "percentage": 51.32, "elapsed_time": "9:18:34", "remaining_time": "8:49:55"}
|
||||
{"current_steps": 1565, "total_steps": 3040, "loss": 0.1437, "lr": 2.247385262538696e-05, "epoch": 2.5743280307185956, "percentage": 51.48, "elapsed_time": "9:19:47", "remaining_time": "8:47:35"}
|
||||
{"current_steps": 1570, "total_steps": 3040, "loss": 0.1423, "lr": 2.235986997155556e-05, "epoch": 2.5825562260010972, "percentage": 51.64, "elapsed_time": "9:21:09", "remaining_time": "8:45:24"}
|
||||
{"current_steps": 1575, "total_steps": 3040, "loss": 0.1468, "lr": 2.2245809532976157e-05, "epoch": 2.5907844212835984, "percentage": 51.81, "elapsed_time": "9:22:33", "remaining_time": "8:43:15"}
|
||||
{"current_steps": 1580, "total_steps": 3040, "loss": 0.1418, "lr": 2.2131675069246758e-05, "epoch": 2.5990126165660996, "percentage": 51.97, "elapsed_time": "9:23:47", "remaining_time": "8:40:58"}
|
||||
{"current_steps": 1585, "total_steps": 3040, "loss": 0.1434, "lr": 2.201747034240537e-05, "epoch": 2.6072408118486012, "percentage": 52.14, "elapsed_time": "9:25:06", "remaining_time": "8:38:45"}
|
||||
{"current_steps": 1590, "total_steps": 3040, "loss": 0.1442, "lr": 2.1903199116805953e-05, "epoch": 2.6154690071311024, "percentage": 52.3, "elapsed_time": "9:26:28", "remaining_time": "8:36:35"}
|
||||
{"current_steps": 1595, "total_steps": 3040, "loss": 0.1424, "lr": 2.1788865158994384e-05, "epoch": 2.623697202413604, "percentage": 52.47, "elapsed_time": "9:27:40", "remaining_time": "8:34:17"}
|
||||
{"current_steps": 1600, "total_steps": 3040, "loss": 0.1414, "lr": 2.1674472237584272e-05, "epoch": 2.6319253976961052, "percentage": 52.63, "elapsed_time": "9:28:50", "remaining_time": "8:31:57"}
|
||||
{"current_steps": 1605, "total_steps": 3040, "loss": 0.144, "lr": 2.1560024123132755e-05, "epoch": 2.640153592978607, "percentage": 52.8, "elapsed_time": "9:30:07", "remaining_time": "8:29:43"}
|
||||
{"current_steps": 1610, "total_steps": 3040, "loss": 0.1454, "lr": 2.1445524588016214e-05, "epoch": 2.648381788261108, "percentage": 52.96, "elapsed_time": "9:31:16", "remaining_time": "8:27:24"}
|
||||
{"current_steps": 1615, "total_steps": 3040, "loss": 0.1419, "lr": 2.1330977406305933e-05, "epoch": 2.6566099835436097, "percentage": 53.12, "elapsed_time": "9:32:28", "remaining_time": "8:25:07"}
|
||||
{"current_steps": 1620, "total_steps": 3040, "loss": 0.1418, "lr": 2.1216386353643686e-05, "epoch": 2.664838178826111, "percentage": 53.29, "elapsed_time": "9:33:38", "remaining_time": "8:22:49"}
|
||||
{"current_steps": 1625, "total_steps": 3040, "loss": 0.3568, "lr": 2.110175520711731e-05, "epoch": 2.673066374108612, "percentage": 53.45, "elapsed_time": "9:35:10", "remaining_time": "8:20:50"}
|
||||
{"current_steps": 1630, "total_steps": 3040, "loss": 0.4743, "lr": 2.098708774513619e-05, "epoch": 2.6812945693911137, "percentage": 53.62, "elapsed_time": "9:36:58", "remaining_time": "8:19:05"}
|
||||
{"current_steps": 1635, "total_steps": 3040, "loss": 0.4392, "lr": 2.0872387747306725e-05, "epoch": 2.689522764673615, "percentage": 53.78, "elapsed_time": "9:38:41", "remaining_time": "8:17:17"}
|
||||
{"current_steps": 1640, "total_steps": 3040, "loss": 0.4485, "lr": 2.075765899430773e-05, "epoch": 2.6977509599561165, "percentage": 53.95, "elapsed_time": "9:40:27", "remaining_time": "8:15:30"}
|
||||
{"current_steps": 1645, "total_steps": 3040, "loss": 0.4374, "lr": 2.0642905267765846e-05, "epoch": 2.7059791552386177, "percentage": 54.11, "elapsed_time": "9:42:14", "remaining_time": "8:13:45"}
|
||||
{"current_steps": 1650, "total_steps": 3040, "loss": 0.4346, "lr": 2.0528130350130867e-05, "epoch": 2.714207350521119, "percentage": 54.28, "elapsed_time": "9:43:57", "remaining_time": "8:11:56"}
|
||||
{"current_steps": 1655, "total_steps": 3040, "loss": 0.4311, "lr": 2.041333802455109e-05, "epoch": 2.7224355458036205, "percentage": 54.44, "elapsed_time": "9:45:41", "remaining_time": "8:10:08"}
|
||||
{"current_steps": 1660, "total_steps": 3040, "loss": 0.4087, "lr": 2.0298532074748594e-05, "epoch": 2.7306637410861216, "percentage": 54.61, "elapsed_time": "9:47:31", "remaining_time": "8:08:25"}
|
||||
{"current_steps": 1665, "total_steps": 3040, "loss": 0.451, "lr": 2.0183716284894533e-05, "epoch": 2.7388919363686233, "percentage": 54.77, "elapsed_time": "9:49:13", "remaining_time": "8:06:36"}
|
||||
{"current_steps": 1670, "total_steps": 3040, "loss": 0.4236, "lr": 2.00688944394844e-05, "epoch": 2.7471201316511245, "percentage": 54.93, "elapsed_time": "9:50:59", "remaining_time": "8:04:49"}
|
||||
{"current_steps": 1675, "total_steps": 3040, "loss": 0.4337, "lr": 1.9954070323213296e-05, "epoch": 2.7553483269336256, "percentage": 55.1, "elapsed_time": "9:52:45", "remaining_time": "8:03:03"}
|
||||
{"current_steps": 1680, "total_steps": 3040, "loss": 0.4245, "lr": 1.9839247720851178e-05, "epoch": 2.7635765222161273, "percentage": 55.26, "elapsed_time": "9:54:38", "remaining_time": "8:01:22"}
|
||||
{"current_steps": 1685, "total_steps": 3040, "loss": 0.4213, "lr": 1.9724430417118074e-05, "epoch": 2.771804717498629, "percentage": 55.43, "elapsed_time": "9:56:23", "remaining_time": "7:59:35"}
|
||||
{"current_steps": 1690, "total_steps": 3040, "loss": 0.4447, "lr": 1.9609622196559402e-05, "epoch": 2.78003291278113, "percentage": 55.59, "elapsed_time": "9:58:04", "remaining_time": "7:57:45"}
|
||||
{"current_steps": 1695, "total_steps": 3040, "loss": 0.4473, "lr": 1.9494826843421147e-05, "epoch": 2.7882611080636313, "percentage": 55.76, "elapsed_time": "9:59:55", "remaining_time": "7:56:02"}
|
||||
{"current_steps": 1700, "total_steps": 3040, "loss": 0.4193, "lr": 1.9380048141525194e-05, "epoch": 2.796489303346133, "percentage": 55.92, "elapsed_time": "10:01:47", "remaining_time": "7:54:20"}
|
||||
{"current_steps": 1705, "total_steps": 3040, "loss": 0.446, "lr": 1.9265289874144554e-05, "epoch": 2.804717498628634, "percentage": 56.09, "elapsed_time": "10:03:20", "remaining_time": "7:52:24"}
|
||||
{"current_steps": 1710, "total_steps": 3040, "loss": 0.4301, "lr": 1.9150555823878708e-05, "epoch": 2.8129456939111357, "percentage": 56.25, "elapsed_time": "10:05:01", "remaining_time": "7:50:34"}
|
||||
{"current_steps": 1715, "total_steps": 3040, "loss": 0.4168, "lr": 1.9035849772528907e-05, "epoch": 2.821173889193637, "percentage": 56.41, "elapsed_time": "10:06:45", "remaining_time": "7:48:46"}
|
||||
{"current_steps": 1720, "total_steps": 3040, "loss": 0.4388, "lr": 1.8921175500973496e-05, "epoch": 2.829402084476138, "percentage": 56.58, "elapsed_time": "10:08:34", "remaining_time": "7:47:02"}
|
||||
{"current_steps": 1725, "total_steps": 3040, "loss": 0.4373, "lr": 1.8806536789043322e-05, "epoch": 2.8376302797586397, "percentage": 56.74, "elapsed_time": "10:10:19", "remaining_time": "7:45:15"}
|
||||
{"current_steps": 1730, "total_steps": 3040, "loss": 0.3654, "lr": 1.869193741539714e-05, "epoch": 2.845858475041141, "percentage": 56.91, "elapsed_time": "10:12:23", "remaining_time": "7:43:43"}
|
||||
{"current_steps": 1735, "total_steps": 3040, "loss": 0.2606, "lr": 1.8577381157397056e-05, "epoch": 2.8540866703236425, "percentage": 57.07, "elapsed_time": "10:14:50", "remaining_time": "7:42:27"}
|
||||
{"current_steps": 1740, "total_steps": 3040, "loss": 0.2631, "lr": 1.8462871790984015e-05, "epoch": 2.8623148656061437, "percentage": 57.24, "elapsed_time": "10:17:17", "remaining_time": "7:41:11"}
|
||||
{"current_steps": 1745, "total_steps": 3040, "loss": 0.2638, "lr": 1.8348413090553356e-05, "epoch": 2.870543060888645, "percentage": 57.4, "elapsed_time": "10:19:43", "remaining_time": "7:39:54"}
|
||||
{"current_steps": 1750, "total_steps": 3040, "loss": 0.2677, "lr": 1.8234008828830386e-05, "epoch": 2.8787712561711465, "percentage": 57.57, "elapsed_time": "10:22:10", "remaining_time": "7:38:37"}
|
||||
{"current_steps": 1755, "total_steps": 3040, "loss": 0.2703, "lr": 1.8119662776746043e-05, "epoch": 2.8869994514536477, "percentage": 57.73, "elapsed_time": "10:24:29", "remaining_time": "7:37:14"}
|
||||
{"current_steps": 1760, "total_steps": 3040, "loss": 0.2531, "lr": 1.800537870331257e-05, "epoch": 2.8952276467361493, "percentage": 57.89, "elapsed_time": "10:26:56", "remaining_time": "7:35:57"}
|
||||
{"current_steps": 1765, "total_steps": 3040, "loss": 0.2646, "lr": 1.789116037549933e-05, "epoch": 2.9034558420186505, "percentage": 58.06, "elapsed_time": "10:29:19", "remaining_time": "7:34:36"}
|
||||
{"current_steps": 1770, "total_steps": 3040, "loss": 0.2545, "lr": 1.77770115581086e-05, "epoch": 2.911684037301152, "percentage": 58.22, "elapsed_time": "10:31:45", "remaining_time": "7:33:17"}
|
||||
{"current_steps": 1775, "total_steps": 3040, "loss": 0.2663, "lr": 1.7662936013651493e-05, "epoch": 2.9199122325836533, "percentage": 58.39, "elapsed_time": "10:34:03", "remaining_time": "7:31:53"}
|
||||
{"current_steps": 1780, "total_steps": 3040, "loss": 0.2602, "lr": 1.7548937502223932e-05, "epoch": 2.928140427866155, "percentage": 58.55, "elapsed_time": "10:36:30", "remaining_time": "7:30:33"}
|
||||
{"current_steps": 1785, "total_steps": 3040, "loss": 0.2659, "lr": 1.7435019781382737e-05, "epoch": 2.936368623148656, "percentage": 58.72, "elapsed_time": "10:38:57", "remaining_time": "7:29:14"}
|
||||
{"current_steps": 1790, "total_steps": 3040, "loss": 0.2784, "lr": 1.732118660602175e-05, "epoch": 2.9445968184311573, "percentage": 58.88, "elapsed_time": "10:41:17", "remaining_time": "7:27:49"}
|
||||
{"current_steps": 1795, "total_steps": 3040, "loss": 0.2427, "lr": 1.7207441728248055e-05, "epoch": 2.952825013713659, "percentage": 59.05, "elapsed_time": "10:43:43", "remaining_time": "7:26:29"}
|
||||
{"current_steps": 1800, "total_steps": 3040, "loss": 0.261, "lr": 1.7093788897258338e-05, "epoch": 2.96105320899616, "percentage": 59.21, "elapsed_time": "10:46:10", "remaining_time": "7:25:08"}
|
||||
{"current_steps": 1805, "total_steps": 3040, "loss": 0.2532, "lr": 1.698023185921526e-05, "epoch": 2.9692814042786617, "percentage": 59.38, "elapsed_time": "10:48:45", "remaining_time": "7:23:53"}
|
||||
{"current_steps": 1810, "total_steps": 3040, "loss": 0.2356, "lr": 1.6866774357124054e-05, "epoch": 2.977509599561163, "percentage": 59.54, "elapsed_time": "10:51:13", "remaining_time": "7:22:32"}
|
||||
{"current_steps": 1815, "total_steps": 3040, "loss": 0.2582, "lr": 1.675342013070905e-05, "epoch": 2.985737794843664, "percentage": 59.7, "elapsed_time": "10:53:27", "remaining_time": "7:21:02"}
|
||||
{"current_steps": 1820, "total_steps": 3040, "loss": 0.2508, "lr": 1.6640172916290515e-05, "epoch": 2.9939659901261657, "percentage": 59.87, "elapsed_time": "10:55:54", "remaining_time": "7:19:40"}
|
||||
{"current_steps": 1825, "total_steps": 3040, "loss": 0.4101, "lr": 1.6527036446661396e-05, "epoch": 3.0016456390565, "percentage": 60.03, "elapsed_time": "10:57:47", "remaining_time": "7:17:55"}
|
||||
{"current_steps": 1830, "total_steps": 3040, "loss": 0.5452, "lr": 1.641401445096436e-05, "epoch": 3.0098738343390017, "percentage": 60.2, "elapsed_time": "10:59:19", "remaining_time": "7:15:56"}
|
||||
{"current_steps": 1835, "total_steps": 3040, "loss": 0.4481, "lr": 1.6301110654568833e-05, "epoch": 3.018102029621503, "percentage": 60.36, "elapsed_time": "11:00:51", "remaining_time": "7:13:57"}
|
||||
{"current_steps": 1840, "total_steps": 3040, "loss": 0.4067, "lr": 1.6188328778948238e-05, "epoch": 3.0263302249040045, "percentage": 60.53, "elapsed_time": "11:02:16", "remaining_time": "7:11:55"}
|
||||
{"current_steps": 1845, "total_steps": 3040, "loss": 0.3744, "lr": 1.6075672541557287e-05, "epoch": 3.0345584201865057, "percentage": 60.69, "elapsed_time": "11:03:39", "remaining_time": "7:09:50"}
|
||||
{"current_steps": 1850, "total_steps": 3040, "loss": 0.366, "lr": 1.5963145655709495e-05, "epoch": 3.0427866154690073, "percentage": 60.86, "elapsed_time": "11:05:06", "remaining_time": "7:07:49"}
|
||||
{"current_steps": 1855, "total_steps": 3040, "loss": 0.3588, "lr": 1.5850751830454747e-05, "epoch": 3.0510148107515085, "percentage": 61.02, "elapsed_time": "11:06:35", "remaining_time": "7:05:49"}
|
||||
{"current_steps": 1860, "total_steps": 3040, "loss": 0.3524, "lr": 1.573849477045706e-05, "epoch": 3.0592430060340097, "percentage": 61.18, "elapsed_time": "11:07:57", "remaining_time": "7:03:45"}
|
||||
{"current_steps": 1865, "total_steps": 3040, "loss": 0.3435, "lr": 1.5626378175872486e-05, "epoch": 3.0674712013165113, "percentage": 61.35, "elapsed_time": "11:09:32", "remaining_time": "7:01:49"}
|
||||
{"current_steps": 1870, "total_steps": 3040, "loss": 0.3478, "lr": 1.5514405742227103e-05, "epoch": 3.0756993965990125, "percentage": 61.51, "elapsed_time": "11:10:57", "remaining_time": "6:59:47"}
|
||||
{"current_steps": 1875, "total_steps": 3040, "loss": 0.3424, "lr": 1.5402581160295265e-05, "epoch": 3.083927591881514, "percentage": 61.68, "elapsed_time": "11:12:25", "remaining_time": "6:57:47"}
|
||||
{"current_steps": 1880, "total_steps": 3040, "loss": 0.3233, "lr": 1.5290908115977884e-05, "epoch": 3.0921557871640153, "percentage": 61.84, "elapsed_time": "11:14:08", "remaining_time": "6:55:57"}
|
||||
{"current_steps": 1885, "total_steps": 3040, "loss": 0.3113, "lr": 1.5179390290181013e-05, "epoch": 3.100383982446517, "percentage": 62.01, "elapsed_time": "11:15:42", "remaining_time": "6:54:01"}
|
||||
{"current_steps": 1890, "total_steps": 3040, "loss": 0.3056, "lr": 1.5068031358694437e-05, "epoch": 3.108612177729018, "percentage": 62.17, "elapsed_time": "11:17:29", "remaining_time": "6:52:13"}
|
||||
{"current_steps": 1895, "total_steps": 3040, "loss": 0.3058, "lr": 1.4956834992070589e-05, "epoch": 3.1168403730115193, "percentage": 62.34, "elapsed_time": "11:19:02", "remaining_time": "6:50:17"}
|
||||
{"current_steps": 1900, "total_steps": 3040, "loss": 0.3039, "lr": 1.4845804855503494e-05, "epoch": 3.125068568294021, "percentage": 62.5, "elapsed_time": "11:20:41", "remaining_time": "6:48:24"}
|
||||
{"current_steps": 1905, "total_steps": 3040, "loss": 0.3029, "lr": 1.4734944608708022e-05, "epoch": 3.133296763576522, "percentage": 62.66, "elapsed_time": "11:22:24", "remaining_time": "6:46:34"}
|
||||
{"current_steps": 1910, "total_steps": 3040, "loss": 0.3124, "lr": 1.46242579057992e-05, "epoch": 3.1415249588590237, "percentage": 62.83, "elapsed_time": "11:24:05", "remaining_time": "6:44:43"}
|
||||
{"current_steps": 1915, "total_steps": 3040, "loss": 0.3034, "lr": 1.451374839517183e-05, "epoch": 3.149753154141525, "percentage": 62.99, "elapsed_time": "11:25:45", "remaining_time": "6:42:51"}
|
||||
{"current_steps": 1920, "total_steps": 3040, "loss": 0.306, "lr": 1.4403419719380161e-05, "epoch": 3.1579813494240265, "percentage": 63.16, "elapsed_time": "11:27:26", "remaining_time": "6:41:00"}
|
||||
{"current_steps": 1925, "total_steps": 3040, "loss": 0.3011, "lr": 1.42932755150179e-05, "epoch": 3.1662095447065277, "percentage": 63.32, "elapsed_time": "11:29:09", "remaining_time": "6:39:10"}
|
||||
{"current_steps": 1930, "total_steps": 3040, "loss": 0.3048, "lr": 1.4183319412598274e-05, "epoch": 3.174437739989029, "percentage": 63.49, "elapsed_time": "11:31:01", "remaining_time": "6:37:26"}
|
||||
{"current_steps": 1935, "total_steps": 3040, "loss": 0.3028, "lr": 1.4073555036434423e-05, "epoch": 3.1826659352715305, "percentage": 63.65, "elapsed_time": "11:33:18", "remaining_time": "6:35:54"}
|
||||
{"current_steps": 1940, "total_steps": 3040, "loss": 0.3016, "lr": 1.3963986004519885e-05, "epoch": 3.1908941305540317, "percentage": 63.82, "elapsed_time": "11:35:33", "remaining_time": "6:34:23"}
|
||||
{"current_steps": 1945, "total_steps": 3040, "loss": 0.3036, "lr": 1.385461592840939e-05, "epoch": 3.1991223258365333, "percentage": 63.98, "elapsed_time": "11:37:41", "remaining_time": "6:32:47"}
|
||||
{"current_steps": 1950, "total_steps": 3040, "loss": 0.3113, "lr": 1.3745448413099795e-05, "epoch": 3.2073505211190345, "percentage": 64.14, "elapsed_time": "11:39:47", "remaining_time": "6:31:09"}
|
||||
{"current_steps": 1955, "total_steps": 3040, "loss": 0.3014, "lr": 1.3636487056911236e-05, "epoch": 3.215578716401536, "percentage": 64.31, "elapsed_time": "11:41:59", "remaining_time": "6:29:35"}
|
||||
{"current_steps": 1960, "total_steps": 3040, "loss": 0.2971, "lr": 1.3527735451368567e-05, "epoch": 3.2238069116840373, "percentage": 64.47, "elapsed_time": "11:44:10", "remaining_time": "6:28:01"}
|
||||
{"current_steps": 1965, "total_steps": 3040, "loss": 0.3001, "lr": 1.3419197181082937e-05, "epoch": 3.2320351069665385, "percentage": 64.64, "elapsed_time": "11:46:24", "remaining_time": "6:26:27"}
|
||||
{"current_steps": 1970, "total_steps": 3040, "loss": 0.3029, "lr": 1.3310875823633675e-05, "epoch": 3.24026330224904, "percentage": 64.8, "elapsed_time": "11:48:40", "remaining_time": "6:24:54"}
|
||||
{"current_steps": 1975, "total_steps": 3040, "loss": 0.3019, "lr": 1.3202774949450326e-05, "epoch": 3.2484914975315413, "percentage": 64.97, "elapsed_time": "11:50:50", "remaining_time": "6:23:19"}
|
||||
{"current_steps": 1980, "total_steps": 3040, "loss": 0.3009, "lr": 1.3094898121695008e-05, "epoch": 3.256719692814043, "percentage": 65.13, "elapsed_time": "11:53:04", "remaining_time": "6:21:44"}
|
||||
{"current_steps": 1805, "total_steps": 3040, "loss": 0.2532, "lr": 1.698023185921526e-05, "epoch": 2.9692814042786617, "percentage": 59.38, "elapsed_time": "0:02:36", "remaining_time": "0:01:47"}
|
||||
{"current_steps": 1810, "total_steps": 3040, "loss": 0.2356, "lr": 1.6866774357124054e-05, "epoch": 2.977509599561163, "percentage": 59.54, "elapsed_time": "0:05:06", "remaining_time": "0:03:28"}
|
||||
{"current_steps": 1815, "total_steps": 3040, "loss": 0.2582, "lr": 1.675342013070905e-05, "epoch": 2.985737794843664, "percentage": 59.7, "elapsed_time": "0:07:24", "remaining_time": "0:04:59"}
|
||||
{"current_steps": 1820, "total_steps": 3040, "loss": 0.2508, "lr": 1.6640172916290515e-05, "epoch": 2.9939659901261657, "percentage": 59.87, "elapsed_time": "0:09:54", "remaining_time": "0:06:38"}
|
||||
{"current_steps": 1825, "total_steps": 3040, "loss": 0.4102, "lr": 1.6527036446661396e-05, "epoch": 3.0016456390565, "percentage": 60.03, "elapsed_time": "0:11:50", "remaining_time": "0:07:53"}
|
||||
{"current_steps": 1830, "total_steps": 3040, "loss": 0.5448, "lr": 1.641401445096436e-05, "epoch": 3.0098738343390017, "percentage": 60.2, "elapsed_time": "0:13:26", "remaining_time": "0:08:53"}
|
||||
{"current_steps": 1805, "total_steps": 3040, "loss": 0.2532, "lr": 1.698023185921526e-05, "epoch": 2.9692814042786617, "percentage": 59.38, "elapsed_time": "0:02:38", "remaining_time": "0:01:48"}
|
||||
{"current_steps": 1810, "total_steps": 3040, "loss": 0.2356, "lr": 1.6866774357124054e-05, "epoch": 2.977509599561163, "percentage": 59.54, "elapsed_time": "0:05:07", "remaining_time": "0:03:28"}
|
||||
{"current_steps": 1815, "total_steps": 3040, "loss": 0.2582, "lr": 1.675342013070905e-05, "epoch": 2.985737794843664, "percentage": 59.7, "elapsed_time": "0:07:24", "remaining_time": "0:04:59"}
|
||||
{"current_steps": 1820, "total_steps": 3040, "loss": 0.2508, "lr": 1.6640172916290515e-05, "epoch": 2.9939659901261657, "percentage": 59.87, "elapsed_time": "0:09:53", "remaining_time": "0:06:38"}
|
||||
{"current_steps": 1825, "total_steps": 3040, "loss": 0.4103, "lr": 1.6527036446661396e-05, "epoch": 3.0016456390565, "percentage": 60.03, "elapsed_time": "0:11:51", "remaining_time": "0:07:53"}
|
||||
{"current_steps": 1830, "total_steps": 3040, "loss": 0.5455, "lr": 1.641401445096436e-05, "epoch": 3.0098738343390017, "percentage": 60.2, "elapsed_time": "0:13:28", "remaining_time": "0:08:54"}
|
||||
{"current_steps": 1835, "total_steps": 3040, "loss": 0.4481, "lr": 1.6301110654568833e-05, "epoch": 3.018102029621503, "percentage": 60.36, "elapsed_time": "0:15:04", "remaining_time": "0:09:53"}
|
||||
{"current_steps": 1840, "total_steps": 3040, "loss": 0.4066, "lr": 1.6188328778948238e-05, "epoch": 3.0263302249040045, "percentage": 60.53, "elapsed_time": "0:16:33", "remaining_time": "0:10:48"}
|
||||
{"current_steps": 1845, "total_steps": 3040, "loss": 0.3745, "lr": 1.6075672541557287e-05, "epoch": 3.0345584201865057, "percentage": 60.69, "elapsed_time": "0:18:01", "remaining_time": "0:11:40"}
|
||||
{"current_steps": 1850, "total_steps": 3040, "loss": 0.366, "lr": 1.5963145655709495e-05, "epoch": 3.0427866154690073, "percentage": 60.86, "elapsed_time": "0:19:32", "remaining_time": "0:12:34"}
|
||||
{"current_steps": 1855, "total_steps": 3040, "loss": 0.3588, "lr": 1.5850751830454747e-05, "epoch": 3.0510148107515085, "percentage": 61.02, "elapsed_time": "0:21:05", "remaining_time": "0:13:28"}
|
||||
{"current_steps": 1860, "total_steps": 3040, "loss": 0.3523, "lr": 1.573849477045706e-05, "epoch": 3.0592430060340097, "percentage": 61.18, "elapsed_time": "0:22:32", "remaining_time": "0:14:18"}
|
||||
{"current_steps": 1865, "total_steps": 3040, "loss": 0.3435, "lr": 1.5626378175872486e-05, "epoch": 3.0674712013165113, "percentage": 61.35, "elapsed_time": "0:24:14", "remaining_time": "0:15:16"}
|
||||
{"current_steps": 1870, "total_steps": 3040, "loss": 0.3478, "lr": 1.5514405742227103e-05, "epoch": 3.0756993965990125, "percentage": 61.51, "elapsed_time": "0:25:46", "remaining_time": "0:16:07"}
|
||||
{"current_steps": 1875, "total_steps": 3040, "loss": 0.3424, "lr": 1.5402581160295265e-05, "epoch": 3.083927591881514, "percentage": 61.68, "elapsed_time": "0:27:19", "remaining_time": "0:16:58"}
|
||||
{"current_steps": 1880, "total_steps": 3040, "loss": 0.3233, "lr": 1.5290908115977884e-05, "epoch": 3.0921557871640153, "percentage": 61.84, "elapsed_time": "0:29:06", "remaining_time": "0:17:57"}
|
||||
{"current_steps": 1885, "total_steps": 3040, "loss": 0.3113, "lr": 1.5179390290181013e-05, "epoch": 3.100383982446517, "percentage": 62.01, "elapsed_time": "0:30:45", "remaining_time": "0:18:50"}
|
||||
{"current_steps": 1890, "total_steps": 3040, "loss": 0.3056, "lr": 1.5068031358694437e-05, "epoch": 3.108612177729018, "percentage": 62.17, "elapsed_time": "0:32:35", "remaining_time": "0:19:49"}
|
||||
{"current_steps": 1895, "total_steps": 3040, "loss": 0.3058, "lr": 1.4956834992070589e-05, "epoch": 3.1168403730115193, "percentage": 62.34, "elapsed_time": "0:34:13", "remaining_time": "0:20:40"}
|
||||
{"current_steps": 1900, "total_steps": 3040, "loss": 0.3039, "lr": 1.4845804855503494e-05, "epoch": 3.125068568294021, "percentage": 62.5, "elapsed_time": "0:35:55", "remaining_time": "0:21:33"}
|
||||
{"current_steps": 1905, "total_steps": 3040, "loss": 0.3029, "lr": 1.4734944608708022e-05, "epoch": 3.133296763576522, "percentage": 62.66, "elapsed_time": "0:37:42", "remaining_time": "0:22:27"}
|
||||
{"current_steps": 1910, "total_steps": 3040, "loss": 0.3124, "lr": 1.46242579057992e-05, "epoch": 3.1415249588590237, "percentage": 62.83, "elapsed_time": "0:39:27", "remaining_time": "0:23:20"}
|
||||
{"current_steps": 1915, "total_steps": 3040, "loss": 0.3035, "lr": 1.451374839517183e-05, "epoch": 3.149753154141525, "percentage": 62.99, "elapsed_time": "0:41:11", "remaining_time": "0:24:12"}
|
||||
{"current_steps": 1920, "total_steps": 3040, "loss": 0.306, "lr": 1.4403419719380161e-05, "epoch": 3.1579813494240265, "percentage": 63.16, "elapsed_time": "0:42:56", "remaining_time": "0:25:03"}
|
||||
{"current_steps": 1925, "total_steps": 3040, "loss": 0.3011, "lr": 1.42932755150179e-05, "epoch": 3.1662095447065277, "percentage": 63.32, "elapsed_time": "0:44:41", "remaining_time": "0:25:53"}
|
||||
{"current_steps": 1930, "total_steps": 3040, "loss": 0.3048, "lr": 1.4183319412598274e-05, "epoch": 3.174437739989029, "percentage": 63.49, "elapsed_time": "0:46:38", "remaining_time": "0:26:49"}
|
||||
{"current_steps": 1935, "total_steps": 3040, "loss": 0.3028, "lr": 1.4073555036434423e-05, "epoch": 3.1826659352715305, "percentage": 63.65, "elapsed_time": "0:48:57", "remaining_time": "0:27:57"}
|
||||
{"current_steps": 1940, "total_steps": 3040, "loss": 0.3015, "lr": 1.3963986004519885e-05, "epoch": 3.1908941305540317, "percentage": 63.82, "elapsed_time": "0:51:16", "remaining_time": "0:29:04"}
|
||||
{"current_steps": 1945, "total_steps": 3040, "loss": 0.3036, "lr": 1.385461592840939e-05, "epoch": 3.1991223258365333, "percentage": 63.98, "elapsed_time": "0:53:26", "remaining_time": "0:30:05"}
|
||||
{"current_steps": 1950, "total_steps": 3040, "loss": 0.3113, "lr": 1.3745448413099795e-05, "epoch": 3.2073505211190345, "percentage": 64.14, "elapsed_time": "0:55:36", "remaining_time": "0:31:05"}
|
||||
{"current_steps": 1955, "total_steps": 3040, "loss": 0.3013, "lr": 1.3636487056911236e-05, "epoch": 3.215578716401536, "percentage": 64.31, "elapsed_time": "0:57:53", "remaining_time": "0:32:07"}
|
||||
{"current_steps": 1960, "total_steps": 3040, "loss": 0.2971, "lr": 1.3527735451368567e-05, "epoch": 3.2238069116840373, "percentage": 64.47, "elapsed_time": "1:00:08", "remaining_time": "0:33:08"}
|
||||
{"current_steps": 1965, "total_steps": 3040, "loss": 0.3001, "lr": 1.3419197181082937e-05, "epoch": 3.2320351069665385, "percentage": 64.64, "elapsed_time": "1:02:25", "remaining_time": "0:34:09"}
|
||||
{"current_steps": 1970, "total_steps": 3040, "loss": 0.3029, "lr": 1.3310875823633675e-05, "epoch": 3.24026330224904, "percentage": 64.8, "elapsed_time": "1:04:43", "remaining_time": "0:35:09"}
|
||||
{"current_steps": 1975, "total_steps": 3040, "loss": 0.3018, "lr": 1.3202774949450326e-05, "epoch": 3.2484914975315413, "percentage": 64.97, "elapsed_time": "1:06:57", "remaining_time": "0:36:06"}
|
||||
{"current_steps": 1980, "total_steps": 3040, "loss": 0.3008, "lr": 1.3094898121695008e-05, "epoch": 3.256719692814043, "percentage": 65.13, "elapsed_time": "1:09:14", "remaining_time": "0:37:03"}
|
||||
{"current_steps": 1985, "total_steps": 3040, "loss": 0.299, "lr": 1.2987248896144915e-05, "epoch": 3.264947888096544, "percentage": 65.3, "elapsed_time": "1:11:28", "remaining_time": "0:37:59"}
|
||||
{"current_steps": 1990, "total_steps": 3040, "loss": 0.3009, "lr": 1.2879830821075174e-05, "epoch": 3.2731760833790453, "percentage": 65.46, "elapsed_time": "1:13:42", "remaining_time": "0:38:53"}
|
||||
{"current_steps": 1995, "total_steps": 3040, "loss": 0.2964, "lr": 1.277264743714182e-05, "epoch": 3.281404278661547, "percentage": 65.62, "elapsed_time": "1:15:56", "remaining_time": "0:39:46"}
|
||||
{"current_steps": 2000, "total_steps": 3040, "loss": 0.2996, "lr": 1.2665702277265168e-05, "epoch": 3.289632473944048, "percentage": 65.79, "elapsed_time": "1:18:09", "remaining_time": "0:40:38"}
|
||||
{"current_steps": 2005, "total_steps": 3040, "loss": 0.2958, "lr": 1.2558998866513283e-05, "epoch": 3.2978606692265497, "percentage": 65.95, "elapsed_time": "1:20:24", "remaining_time": "0:41:30"}
|
||||
{"current_steps": 2010, "total_steps": 3040, "loss": 0.2922, "lr": 1.245254072198585e-05, "epoch": 3.306088864509051, "percentage": 66.12, "elapsed_time": "1:22:37", "remaining_time": "0:42:20"}
|
||||
{"current_steps": 2015, "total_steps": 3040, "loss": 0.3002, "lr": 1.2346331352698206e-05, "epoch": 3.3143170597915526, "percentage": 66.28, "elapsed_time": "1:24:50", "remaining_time": "0:43:09"}
|
||||
{"current_steps": 2020, "total_steps": 3040, "loss": 0.3058, "lr": 1.224037425946571e-05, "epoch": 3.3225452550740537, "percentage": 66.45, "elapsed_time": "1:26:21", "remaining_time": "0:43:36"}
|
||||
{"current_steps": 2025, "total_steps": 3040, "loss": 0.3072, "lr": 1.2134672934788338e-05, "epoch": 3.3307734503565554, "percentage": 66.61, "elapsed_time": "1:27:31", "remaining_time": "0:43:52"}
|
||||
{"current_steps": 2030, "total_steps": 3040, "loss": 0.5715, "lr": 1.202923086273554e-05, "epoch": 3.3390016456390565, "percentage": 66.78, "elapsed_time": "1:29:05", "remaining_time": "0:44:19"}
|
||||
{"current_steps": 2035, "total_steps": 3040, "loss": 0.6695, "lr": 1.1924051518831444e-05, "epoch": 3.3472298409215577, "percentage": 66.94, "elapsed_time": "1:30:54", "remaining_time": "0:44:53"}
|
||||
{"current_steps": 2040, "total_steps": 3040, "loss": 0.5868, "lr": 1.1819138369940251e-05, "epoch": 3.3554580362040594, "percentage": 67.11, "elapsed_time": "1:32:43", "remaining_time": "0:45:27"}
|
||||
{"current_steps": 2045, "total_steps": 3040, "loss": 0.5628, "lr": 1.1714494874152025e-05, "epoch": 3.3636862314865605, "percentage": 67.27, "elapsed_time": "1:34:29", "remaining_time": "0:45:58"}
|
||||
{"current_steps": 2050, "total_steps": 3040, "loss": 0.5438, "lr": 1.1610124480668636e-05, "epoch": 3.371914426769062, "percentage": 67.43, "elapsed_time": "1:36:19", "remaining_time": "0:46:30"}
|
||||
{"current_steps": 2055, "total_steps": 3040, "loss": 0.513, "lr": 1.1506030629690124e-05, "epoch": 3.3801426220515634, "percentage": 67.6, "elapsed_time": "1:38:07", "remaining_time": "0:47:02"}
|
||||
{"current_steps": 2060, "total_steps": 3040, "loss": 0.5226, "lr": 1.140221675230127e-05, "epoch": 3.3883708173340645, "percentage": 67.76, "elapsed_time": "1:39:53", "remaining_time": "0:47:31"}
|
||||
{"current_steps": 2065, "total_steps": 3040, "loss": 0.5053, "lr": 1.1298686270358542e-05, "epoch": 3.396599012616566, "percentage": 67.93, "elapsed_time": "1:41:32", "remaining_time": "0:47:56"}
|
||||
{"current_steps": 2070, "total_steps": 3040, "loss": 0.4982, "lr": 1.1195442596377253e-05, "epoch": 3.4048272078990673, "percentage": 68.09, "elapsed_time": "1:43:19", "remaining_time": "0:48:25"}
|
||||
{"current_steps": 2075, "total_steps": 3040, "loss": 0.4989, "lr": 1.1092489133419137e-05, "epoch": 3.413055403181569, "percentage": 68.26, "elapsed_time": "1:45:07", "remaining_time": "0:48:53"}
|
||||
{"current_steps": 2080, "total_steps": 3040, "loss": 0.4889, "lr": 1.0989829274980126e-05, "epoch": 3.42128359846407, "percentage": 68.42, "elapsed_time": "1:46:47", "remaining_time": "0:49:17"}
|
||||
{"current_steps": 2085, "total_steps": 3040, "loss": 0.505, "lr": 1.088746640487854e-05, "epoch": 3.429511793746572, "percentage": 68.59, "elapsed_time": "1:48:39", "remaining_time": "0:49:45"}
|
||||
{"current_steps": 2090, "total_steps": 3040, "loss": 0.4832, "lr": 1.078540389714351e-05, "epoch": 3.437739989029073, "percentage": 68.75, "elapsed_time": "1:50:25", "remaining_time": "0:50:11"}
|
||||
{"current_steps": 2095, "total_steps": 3040, "loss": 0.4899, "lr": 1.0683645115903811e-05, "epoch": 3.445968184311574, "percentage": 68.91, "elapsed_time": "1:52:14", "remaining_time": "0:50:37"}
|
||||
{"current_steps": 2100, "total_steps": 3040, "loss": 0.5025, "lr": 1.0582193415276931e-05, "epoch": 3.4541963795940758, "percentage": 69.08, "elapsed_time": "1:54:02", "remaining_time": "0:51:02"}
|
||||
{"current_steps": 2105, "total_steps": 3040, "loss": 0.5072, "lr": 1.048105213925853e-05, "epoch": 3.462424574876577, "percentage": 69.24, "elapsed_time": "1:55:55", "remaining_time": "0:51:29"}
|
||||
{"current_steps": 2110, "total_steps": 3040, "loss": 0.5125, "lr": 1.0380224621612252e-05, "epoch": 3.4706527701590786, "percentage": 69.41, "elapsed_time": "1:57:44", "remaining_time": "0:51:53"}
|
||||
{"current_steps": 2115, "total_steps": 3040, "loss": 0.4921, "lr": 1.0279714185759771e-05, "epoch": 3.4788809654415798, "percentage": 69.57, "elapsed_time": "1:59:31", "remaining_time": "0:52:16"}
|
||||
{"current_steps": 2120, "total_steps": 3040, "loss": 0.4981, "lr": 1.0179524144671315e-05, "epoch": 3.4871091607240814, "percentage": 69.74, "elapsed_time": "2:01:20", "remaining_time": "0:52:39"}
|
||||
{"current_steps": 2125, "total_steps": 3040, "loss": 0.5039, "lr": 1.0079657800756409e-05, "epoch": 3.4953373560065826, "percentage": 69.9, "elapsed_time": "2:03:06", "remaining_time": "0:53:00"}
|
||||
{"current_steps": 2130, "total_steps": 3040, "loss": 0.2588, "lr": 9.980118445755072e-06, "epoch": 3.5035655512890838, "percentage": 70.07, "elapsed_time": "2:04:23", "remaining_time": "0:53:08"}
|
||||
{"current_steps": 2135, "total_steps": 3040, "loss": 0.1504, "lr": 9.880909360629265e-06, "epoch": 3.5117937465715854, "percentage": 70.23, "elapsed_time": "2:05:31", "remaining_time": "0:53:12"}
|
||||
{"current_steps": 2140, "total_steps": 3040, "loss": 0.1382, "lr": 9.782033815454806e-06, "epoch": 3.5200219418540866, "percentage": 70.39, "elapsed_time": "2:06:42", "remaining_time": "0:53:17"}
|
||||
{"current_steps": 2145, "total_steps": 3040, "loss": 0.1371, "lr": 9.683495069313527e-06, "epoch": 3.528250137136588, "percentage": 70.56, "elapsed_time": "2:07:54", "remaining_time": "0:53:22"}
|
||||
{"current_steps": 2150, "total_steps": 3040, "loss": 0.1309, "lr": 9.585296370185875e-06, "epoch": 3.5364783324190894, "percentage": 70.72, "elapsed_time": "2:09:09", "remaining_time": "0:53:27"}
|
||||
{"current_steps": 2155, "total_steps": 3040, "loss": 0.1299, "lr": 9.487440954843856e-06, "epoch": 3.5447065277015906, "percentage": 70.89, "elapsed_time": "2:10:20", "remaining_time": "0:53:31"}
|
||||
{"current_steps": 2160, "total_steps": 3040, "loss": 0.1282, "lr": 9.38993204874436e-06, "epoch": 3.552934722984092, "percentage": 71.05, "elapsed_time": "2:11:32", "remaining_time": "0:53:35"}
|
||||
{"current_steps": 2165, "total_steps": 3040, "loss": 0.1314, "lr": 9.292772865922792e-06, "epoch": 3.5611629182665934, "percentage": 71.22, "elapsed_time": "2:12:52", "remaining_time": "0:53:42"}
|
||||
{"current_steps": 2170, "total_steps": 3040, "loss": 0.1306, "lr": 9.195966608887212e-06, "epoch": 3.569391113549095, "percentage": 71.38, "elapsed_time": "2:14:13", "remaining_time": "0:53:48"}
|
||||
{"current_steps": 2175, "total_steps": 3040, "loss": 0.1241, "lr": 9.099516468512692e-06, "epoch": 3.577619308831596, "percentage": 71.55, "elapsed_time": "2:15:29", "remaining_time": "0:53:53"}
|
||||
{"current_steps": 2180, "total_steps": 3040, "loss": 0.1262, "lr": 9.003425623936208e-06, "epoch": 3.585847504114098, "percentage": 71.71, "elapsed_time": "2:18:02", "remaining_time": "0:54:27"}
|
||||
{"current_steps": 2185, "total_steps": 3040, "loss": 0.1277, "lr": 8.907697242451825e-06, "epoch": 3.594075699396599, "percentage": 71.88, "elapsed_time": "2:19:23", "remaining_time": "0:54:32"}
|
||||
{"current_steps": 2190, "total_steps": 3040, "loss": 0.128, "lr": 8.812334479406266e-06, "epoch": 3.6023038946791006, "percentage": 72.04, "elapsed_time": "2:20:38", "remaining_time": "0:54:35"}
|
||||
{"current_steps": 2195, "total_steps": 3040, "loss": 0.1242, "lr": 8.71734047809498e-06, "epoch": 3.610532089961602, "percentage": 72.2, "elapsed_time": "2:21:59", "remaining_time": "0:54:39"}
|
||||
{"current_steps": 2200, "total_steps": 3040, "loss": 0.1258, "lr": 8.62271836965846e-06, "epoch": 3.618760285244103, "percentage": 72.37, "elapsed_time": "2:23:19", "remaining_time": "0:54:43"}
|
||||
{"current_steps": 2205, "total_steps": 3040, "loss": 0.128, "lr": 8.528471272979083e-06, "epoch": 3.6269884805266046, "percentage": 72.53, "elapsed_time": "2:24:31", "remaining_time": "0:54:43"}
|
||||
{"current_steps": 2210, "total_steps": 3040, "loss": 0.123, "lr": 8.434602294578285e-06, "epoch": 3.635216675809106, "percentage": 72.7, "elapsed_time": "2:25:43", "remaining_time": "0:54:43"}
|
||||
{"current_steps": 2215, "total_steps": 3040, "loss": 0.1262, "lr": 8.341114528514192e-06, "epoch": 3.6434448710916074, "percentage": 72.86, "elapsed_time": "2:26:56", "remaining_time": "0:54:43"}
|
||||
{"current_steps": 2220, "total_steps": 3040, "loss": 0.1269, "lr": 8.248011056279588e-06, "epoch": 3.6516730663741086, "percentage": 73.03, "elapsed_time": "2:28:10", "remaining_time": "0:54:43"}
|
||||
{"current_steps": 2225, "total_steps": 3040, "loss": 0.123, "lr": 8.155294946700402e-06, "epoch": 3.65990126165661, "percentage": 73.19, "elapsed_time": "2:29:18", "remaining_time": "0:54:41"}
|
||||
{"current_steps": 2230, "total_steps": 3040, "loss": 0.1339, "lr": 8.062969255834505e-06, "epoch": 3.6681294569391114, "percentage": 73.36, "elapsed_time": "2:30:30", "remaining_time": "0:54:39"}
|
||||
{"current_steps": 2235, "total_steps": 3040, "loss": 0.4519, "lr": 7.971037026871016e-06, "epoch": 3.6763576522216126, "percentage": 73.52, "elapsed_time": "2:32:19", "remaining_time": "0:54:51"}
|
||||
{"current_steps": 2240, "total_steps": 3040, "loss": 0.4387, "lr": 7.879501290029954e-06, "epoch": 3.684585847504114, "percentage": 73.68, "elapsed_time": "2:34:03", "remaining_time": "0:55:01"}
|
||||
{"current_steps": 2245, "total_steps": 3040, "loss": 0.4199, "lr": 7.788365062462411e-06, "epoch": 3.6928140427866154, "percentage": 73.85, "elapsed_time": "2:35:46", "remaining_time": "0:55:09"}
|
||||
{"current_steps": 2250, "total_steps": 3040, "loss": 0.419, "lr": 7.697631348151048e-06, "epoch": 3.7010422380691166, "percentage": 74.01, "elapsed_time": "2:37:37", "remaining_time": "0:55:20"}
|
||||
{"current_steps": 2255, "total_steps": 3040, "loss": 0.3877, "lr": 7.607303137811108e-06, "epoch": 3.709270433351618, "percentage": 74.18, "elapsed_time": "2:39:30", "remaining_time": "0:55:31"}
|
||||
{"current_steps": 2260, "total_steps": 3040, "loss": 0.4175, "lr": 7.517383408791847e-06, "epoch": 3.71749862863412, "percentage": 74.34, "elapsed_time": "2:41:07", "remaining_time": "0:55:36"}
|
||||
{"current_steps": 2265, "total_steps": 3040, "loss": 0.3918, "lr": 7.427875124978359e-06, "epoch": 3.725726823916621, "percentage": 74.51, "elapsed_time": "2:42:54", "remaining_time": "0:55:44"}
|
||||
{"current_steps": 2270, "total_steps": 3040, "loss": 0.3865, "lr": 7.33878123669393e-06, "epoch": 3.733955019199122, "percentage": 74.67, "elapsed_time": "2:44:42", "remaining_time": "0:55:52"}
|
||||
{"current_steps": 2275, "total_steps": 3040, "loss": 0.41, "lr": 7.2501046806027456e-06, "epoch": 3.742183214481624, "percentage": 74.84, "elapsed_time": "2:46:29", "remaining_time": "0:55:59"}
|
||||
{"current_steps": 2280, "total_steps": 3040, "loss": 0.394, "lr": 7.161848379613134e-06, "epoch": 3.750411409764125, "percentage": 75.0, "elapsed_time": "2:48:16", "remaining_time": "0:56:05"}
|
||||
{"current_steps": 2285, "total_steps": 3040, "loss": 0.4103, "lr": 7.074015242781181e-06, "epoch": 3.7586396050466266, "percentage": 75.16, "elapsed_time": "2:49:59", "remaining_time": "0:56:10"}
|
||||
{"current_steps": 2290, "total_steps": 3040, "loss": 0.3874, "lr": 6.986608165214892e-06, "epoch": 3.766867800329128, "percentage": 75.33, "elapsed_time": "2:51:53", "remaining_time": "0:56:17"}
|
||||
{"current_steps": 2295, "total_steps": 3040, "loss": 0.3862, "lr": 6.899630027978717e-06, "epoch": 3.775095995611629, "percentage": 75.49, "elapsed_time": "2:53:46", "remaining_time": "0:56:24"}
|
||||
{"current_steps": 2300, "total_steps": 3040, "loss": 0.4288, "lr": 6.8130836979986236e-06, "epoch": 3.7833241908941306, "percentage": 75.66, "elapsed_time": "2:55:24", "remaining_time": "0:56:26"}
|
||||
{"current_steps": 2305, "total_steps": 3040, "loss": 0.3908, "lr": 6.7269720279675755e-06, "epoch": 3.791552386176632, "percentage": 75.82, "elapsed_time": "2:57:18", "remaining_time": "0:56:32"}
|
||||
{"current_steps": 2310, "total_steps": 3040, "loss": 0.4191, "lr": 6.641297856251514e-06, "epoch": 3.7997805814591334, "percentage": 75.99, "elapsed_time": "2:59:01", "remaining_time": "0:56:34"}
|
||||
{"current_steps": 2315, "total_steps": 3040, "loss": 0.3918, "lr": 6.556064006795795e-06, "epoch": 3.8080087767416346, "percentage": 76.15, "elapsed_time": "3:00:48", "remaining_time": "0:56:37"}
|
||||
{"current_steps": 2320, "total_steps": 3040, "loss": 0.4029, "lr": 6.471273289032125e-06, "epoch": 3.816236972024136, "percentage": 76.32, "elapsed_time": "3:02:31", "remaining_time": "0:56:38"}
|
||||
{"current_steps": 2325, "total_steps": 3040, "loss": 0.3892, "lr": 6.386928497785929e-06, "epoch": 3.8244651673066374, "percentage": 76.48, "elapsed_time": "3:04:17", "remaining_time": "0:56:40"}
|
||||
{"current_steps": 2330, "total_steps": 3040, "loss": 0.4142, "lr": 6.303032413184256e-06, "epoch": 3.8326933625891386, "percentage": 76.64, "elapsed_time": "3:06:05", "remaining_time": "0:56:42"}
|
||||
{"current_steps": 2335, "total_steps": 3040, "loss": 0.3908, "lr": 6.219587800564135e-06, "epoch": 3.8409215578716402, "percentage": 76.81, "elapsed_time": "3:07:52", "remaining_time": "0:56:43"}
|
||||
{"current_steps": 2340, "total_steps": 3040, "loss": 0.2928, "lr": 6.136597410381404e-06, "epoch": 3.8491497531541414, "percentage": 76.97, "elapsed_time": "3:10:17", "remaining_time": "0:56:55"}
|
||||
{"current_steps": 2345, "total_steps": 3040, "loss": 0.2429, "lr": 6.054063978120093e-06, "epoch": 3.857377948436643, "percentage": 77.14, "elapsed_time": "3:12:47", "remaining_time": "0:57:08"}
|
||||
{"current_steps": 2350, "total_steps": 3040, "loss": 0.2501, "lr": 5.971990224202209e-06, "epoch": 3.8656061437191442, "percentage": 77.3, "elapsed_time": "3:15:14", "remaining_time": "0:57:19"}
|
||||
{"current_steps": 2355, "total_steps": 3040, "loss": 0.2559, "lr": 5.890378853898106e-06, "epoch": 3.873834339001646, "percentage": 77.47, "elapsed_time": "3:17:45", "remaining_time": "0:57:31"}
|
||||
{"current_steps": 2360, "total_steps": 3040, "loss": 0.2583, "lr": 5.809232557237292e-06, "epoch": 3.882062534284147, "percentage": 77.63, "elapsed_time": "3:20:05", "remaining_time": "0:57:39"}
|
||||
{"current_steps": 2365, "total_steps": 3040, "loss": 0.2444, "lr": 5.728554008919794e-06, "epoch": 3.8902907295666482, "percentage": 77.8, "elapsed_time": "3:22:34", "remaining_time": "0:57:49"}
|
||||
{"current_steps": 2370, "total_steps": 3040, "loss": 0.2398, "lr": 5.6483458682279354e-06, "epoch": 3.89851892484915, "percentage": 77.96, "elapsed_time": "3:25:04", "remaining_time": "0:57:58"}
|
||||
{"current_steps": 2375, "total_steps": 3040, "loss": 0.2386, "lr": 5.568610778938761e-06, "epoch": 3.906747120131651, "percentage": 78.12, "elapsed_time": "3:27:28", "remaining_time": "0:58:05"}
|
||||
{"current_steps": 2380, "total_steps": 3040, "loss": 0.241, "lr": 5.489351369236817e-06, "epoch": 3.9149753154141527, "percentage": 78.29, "elapsed_time": "3:29:58", "remaining_time": "0:58:13"}
|
||||
{"current_steps": 2385, "total_steps": 3040, "loss": 0.255, "lr": 5.410570251627587e-06, "epoch": 3.923203510696654, "percentage": 78.45, "elapsed_time": "3:32:16", "remaining_time": "0:58:17"}
|
||||
{"current_steps": 2390, "total_steps": 3040, "loss": 0.2462, "lr": 5.332270022851327e-06, "epoch": 3.931431705979155, "percentage": 78.62, "elapsed_time": "3:34:45", "remaining_time": "0:58:24"}
|
||||
{"current_steps": 2395, "total_steps": 3040, "loss": 0.2531, "lr": 5.254453263797521e-06, "epoch": 3.9396599012616567, "percentage": 78.78, "elapsed_time": "3:37:06", "remaining_time": "0:58:28"}
|
||||
{"current_steps": 2400, "total_steps": 3040, "loss": 0.2445, "lr": 5.177122539419763e-06, "epoch": 3.947888096544158, "percentage": 78.95, "elapsed_time": "3:39:33", "remaining_time": "0:58:33"}
|
||||
{"current_steps": 2405, "total_steps": 3040, "loss": 0.2243, "lr": 5.10028039865126e-06, "epoch": 3.9561162918266595, "percentage": 79.11, "elapsed_time": "3:42:09", "remaining_time": "0:58:39"}
|
||||
{"current_steps": 2410, "total_steps": 3040, "loss": 0.2485, "lr": 5.023929374320779e-06, "epoch": 3.9643444871091607, "percentage": 79.28, "elapsed_time": "3:44:37", "remaining_time": "0:58:43"}
|
||||
{"current_steps": 2415, "total_steps": 3040, "loss": 0.2359, "lr": 4.948071983069167e-06, "epoch": 3.972572682391662, "percentage": 79.44, "elapsed_time": "3:47:03", "remaining_time": "0:58:45"}
|
||||
{"current_steps": 2420, "total_steps": 3040, "loss": 0.2271, "lr": 4.8727107252664315e-06, "epoch": 3.9808008776741635, "percentage": 79.61, "elapsed_time": "3:49:16", "remaining_time": "0:58:44"}
|
||||
{"current_steps": 2425, "total_steps": 3040, "loss": 0.2384, "lr": 4.797848084929271e-06, "epoch": 3.989029072956665, "percentage": 79.77, "elapsed_time": "3:51:43", "remaining_time": "0:58:45"}
|
||||
{"current_steps": 2430, "total_steps": 3040, "loss": 0.2387, "lr": 4.723486529639252e-06, "epoch": 3.9972572682391663, "percentage": 79.93, "elapsed_time": "3:54:10", "remaining_time": "0:58:47"}
|
||||
{"current_steps": 2435, "total_steps": 3040, "loss": 0.514, "lr": 4.649628510461428e-06, "epoch": 4.004936917169501, "percentage": 80.1, "elapsed_time": "3:55:42", "remaining_time": "0:58:33"}
|
||||
{"current_steps": 2440, "total_steps": 3040, "loss": 0.5277, "lr": 4.576276461863589e-06, "epoch": 4.013165112452002, "percentage": 80.26, "elapsed_time": "3:57:14", "remaining_time": "0:58:20"}
|
||||
{"current_steps": 2445, "total_steps": 3040, "loss": 0.46, "lr": 4.503432801635976e-06, "epoch": 4.021393307734503, "percentage": 80.43, "elapsed_time": "3:58:42", "remaining_time": "0:58:05"}
|
||||
{"current_steps": 2450, "total_steps": 3040, "loss": 0.4076, "lr": 4.431099930811633e-06, "epoch": 4.029621503017005, "percentage": 80.59, "elapsed_time": "4:00:09", "remaining_time": "0:57:49"}
|
||||
{"current_steps": 2455, "total_steps": 3040, "loss": 0.3746, "lr": 4.359280233587229e-06, "epoch": 4.037849698299507, "percentage": 80.76, "elapsed_time": "4:01:34", "remaining_time": "0:57:33"}
|
||||
{"current_steps": 2460, "total_steps": 3040, "loss": 0.3607, "lr": 4.28797607724448e-06, "epoch": 4.046077893582008, "percentage": 80.92, "elapsed_time": "4:02:57", "remaining_time": "0:57:17"}
|
||||
{"current_steps": 2465, "total_steps": 3040, "loss": 0.3497, "lr": 4.217189812072131e-06, "epoch": 4.054306088864509, "percentage": 81.09, "elapsed_time": "4:04:23", "remaining_time": "0:57:00"}
|
||||
{"current_steps": 2470, "total_steps": 3040, "loss": 0.336, "lr": 4.146923771288489e-06, "epoch": 4.06253428414701, "percentage": 81.25, "elapsed_time": "4:05:50", "remaining_time": "0:56:44"}
|
||||
{"current_steps": 2475, "total_steps": 3040, "loss": 0.3354, "lr": 4.077180270964487e-06, "epoch": 4.070762479429511, "percentage": 81.41, "elapsed_time": "4:07:22", "remaining_time": "0:56:28"}
|
||||
{"current_steps": 2480, "total_steps": 3040, "loss": 0.3356, "lr": 4.007961609947391e-06, "epoch": 4.0789906747120135, "percentage": 81.58, "elapsed_time": "4:08:49", "remaining_time": "0:56:11"}
|
||||
{"current_steps": 2485, "total_steps": 3040, "loss": 0.3246, "lr": 3.93927006978497e-06, "epoch": 4.087218869994515, "percentage": 81.74, "elapsed_time": "4:10:25", "remaining_time": "0:55:55"}
|
||||
{"current_steps": 2490, "total_steps": 3040, "loss": 0.3023, "lr": 3.8711079146503474e-06, "epoch": 4.095447065277016, "percentage": 81.91, "elapsed_time": "4:12:02", "remaining_time": "0:55:40"}
|
||||
{"current_steps": 2495, "total_steps": 3040, "loss": 0.2955, "lr": 3.8034773912673383e-06, "epoch": 4.103675260559517, "percentage": 82.07, "elapsed_time": "4:13:43", "remaining_time": "0:55:25"}
|
||||
{"current_steps": 2500, "total_steps": 3040, "loss": 0.2899, "lr": 3.736380728836393e-06, "epoch": 4.111903455842018, "percentage": 82.24, "elapsed_time": "4:15:22", "remaining_time": "0:55:09"}
|
||||
{"current_steps": 2505, "total_steps": 3040, "loss": 0.2945, "lr": 3.6698201389611423e-06, "epoch": 4.12013165112452, "percentage": 82.4, "elapsed_time": "4:17:01", "remaining_time": "0:54:53"}
|
||||
{"current_steps": 2510, "total_steps": 3040, "loss": 0.2894, "lr": 3.6037978155754737e-06, "epoch": 4.1283598464070215, "percentage": 82.57, "elapsed_time": "4:18:38", "remaining_time": "0:54:36"}
|
||||
{"current_steps": 2515, "total_steps": 3040, "loss": 0.2923, "lr": 3.53831593487123e-06, "epoch": 4.136588041689523, "percentage": 82.73, "elapsed_time": "4:20:24", "remaining_time": "0:54:21"}
|
||||
{"current_steps": 2520, "total_steps": 3040, "loss": 0.2952, "lr": 3.473376655226479e-06, "epoch": 4.144816236972024, "percentage": 82.89, "elapsed_time": "4:22:02", "remaining_time": "0:54:04"}
|
||||
{"current_steps": 2525, "total_steps": 3040, "loss": 0.2892, "lr": 3.408982117134374e-06, "epoch": 4.153044432254526, "percentage": 83.06, "elapsed_time": "4:23:45", "remaining_time": "0:53:47"}
|
||||
{"current_steps": 2530, "total_steps": 3040, "loss": 0.292, "lr": 3.3451344431325806e-06, "epoch": 4.161272627537027, "percentage": 83.22, "elapsed_time": "4:25:30", "remaining_time": "0:53:31"}
|
||||
{"current_steps": 2535, "total_steps": 3040, "loss": 0.291, "lr": 3.2818357377333455e-06, "epoch": 4.169500822819528, "percentage": 83.39, "elapsed_time": "4:27:07", "remaining_time": "0:53:12"}
|
||||
{"current_steps": 2540, "total_steps": 3040, "loss": 0.2892, "lr": 3.219088087354092e-06, "epoch": 4.1777290181020295, "percentage": 83.55, "elapsed_time": "4:29:13", "remaining_time": "0:52:59"}
|
||||
{"current_steps": 2545, "total_steps": 3040, "loss": 0.2885, "lr": 3.156893560248688e-06, "epoch": 4.185957213384531, "percentage": 83.72, "elapsed_time": "4:31:29", "remaining_time": "0:52:48"}
|
||||
{"current_steps": 2550, "total_steps": 3040, "loss": 0.2904, "lr": 3.095254206439233e-06, "epoch": 4.194185408667033, "percentage": 83.88, "elapsed_time": "4:33:40", "remaining_time": "0:52:35"}
|
||||
{"current_steps": 2555, "total_steps": 3040, "loss": 0.2959, "lr": 3.0341720576485277e-06, "epoch": 4.202413603949534, "percentage": 84.05, "elapsed_time": "4:35:50", "remaining_time": "0:52:21"}
|
||||
{"current_steps": 2560, "total_steps": 3040, "loss": 0.2955, "lr": 2.9736491272330694e-06, "epoch": 4.210641799232035, "percentage": 84.21, "elapsed_time": "4:37:56", "remaining_time": "0:52:06"}
|
||||
{"current_steps": 2565, "total_steps": 3040, "loss": 0.2842, "lr": 2.9136874101167034e-06, "epoch": 4.218869994514536, "percentage": 84.38, "elapsed_time": "4:40:12", "remaining_time": "0:51:53"}
|
||||
{"current_steps": 2570, "total_steps": 3040, "loss": 0.287, "lr": 2.854288882724885e-06, "epoch": 4.227098189797037, "percentage": 84.54, "elapsed_time": "4:42:25", "remaining_time": "0:51:38"}
|
||||
{"current_steps": 2575, "total_steps": 3040, "loss": 0.2877, "lr": 2.795455502919493e-06, "epoch": 4.2353263850795395, "percentage": 84.7, "elapsed_time": "4:44:33", "remaining_time": "0:51:23"}
|
||||
{"current_steps": 2580, "total_steps": 3040, "loss": 0.2942, "lr": 2.7371892099343455e-06, "epoch": 4.243554580362041, "percentage": 84.87, "elapsed_time": "4:46:48", "remaining_time": "0:51:08"}
|
||||
{"current_steps": 2585, "total_steps": 3040, "loss": 0.2837, "lr": 2.679491924311226e-06, "epoch": 4.251782775644542, "percentage": 85.03, "elapsed_time": "4:49:02", "remaining_time": "0:50:52"}
|
||||
{"current_steps": 2590, "total_steps": 3040, "loss": 0.291, "lr": 2.622365547836636e-06, "epoch": 4.260010970927043, "percentage": 85.2, "elapsed_time": "4:51:13", "remaining_time": "0:50:35"}
|
||||
{"current_steps": 2595, "total_steps": 3040, "loss": 0.285, "lr": 2.5658119634790526e-06, "epoch": 4.268239166209545, "percentage": 85.36, "elapsed_time": "4:53:24", "remaining_time": "0:50:18"}
|
||||
{"current_steps": 2600, "total_steps": 3040, "loss": 0.2898, "lr": 2.5098330353269164e-06, "epoch": 4.276467361492046, "percentage": 85.53, "elapsed_time": "4:55:38", "remaining_time": "0:50:01"}
|
||||
{"current_steps": 2605, "total_steps": 3040, "loss": 0.2843, "lr": 2.4544306085271406e-06, "epoch": 4.2846955567745475, "percentage": 85.69, "elapsed_time": "4:57:43", "remaining_time": "0:49:43"}
|
||||
{"current_steps": 2610, "total_steps": 3040, "loss": 0.2875, "lr": 2.399606509224337e-06, "epoch": 4.292923752057049, "percentage": 85.86, "elapsed_time": "5:00:00", "remaining_time": "0:49:25"}
|
||||
{"current_steps": 2615, "total_steps": 3040, "loss": 0.2802, "lr": 2.345362544500589e-06, "epoch": 4.30115194733955, "percentage": 86.02, "elapsed_time": "5:02:11", "remaining_time": "0:49:06"}
|
||||
{"current_steps": 2620, "total_steps": 3040, "loss": 0.285, "lr": 2.2917005023158966e-06, "epoch": 4.309380142622052, "percentage": 86.18, "elapsed_time": "5:04:15", "remaining_time": "0:48:46"}
|
||||
{"current_steps": 2625, "total_steps": 3040, "loss": 0.2862, "lr": 2.2386221514492502e-06, "epoch": 4.317608337904553, "percentage": 86.35, "elapsed_time": "5:06:26", "remaining_time": "0:48:26"}
|
||||
{"current_steps": 2630, "total_steps": 3040, "loss": 0.2915, "lr": 2.186129241440336e-06, "epoch": 4.325836533187054, "percentage": 86.51, "elapsed_time": "5:07:29", "remaining_time": "0:47:56"}
|
||||
{"current_steps": 2635, "total_steps": 3040, "loss": 0.2902, "lr": 2.134223502531838e-06, "epoch": 4.3340647284695555, "percentage": 86.68, "elapsed_time": "5:08:38", "remaining_time": "0:47:26"}
|
||||
{"current_steps": 2640, "total_steps": 3040, "loss": 0.6808, "lr": 2.0829066456124415e-06, "epoch": 4.342292923752057, "percentage": 86.84, "elapsed_time": "5:10:25", "remaining_time": "0:47:01"}
|
||||
{"current_steps": 2645, "total_steps": 3040, "loss": 0.6324, "lr": 2.032180362160423e-06, "epoch": 4.350521119034559, "percentage": 87.01, "elapsed_time": "5:12:10", "remaining_time": "0:46:37"}
|
||||
{"current_steps": 2650, "total_steps": 3040, "loss": 0.62, "lr": 1.9820463241878873e-06, "epoch": 4.35874931431706, "percentage": 87.17, "elapsed_time": "5:13:53", "remaining_time": "0:46:11"}
|
||||
{"current_steps": 2655, "total_steps": 3040, "loss": 0.5603, "lr": 1.9325061841856808e-06, "epoch": 4.366977509599561, "percentage": 87.34, "elapsed_time": "5:15:40", "remaining_time": "0:45:46"}
|
||||
{"current_steps": 2660, "total_steps": 3040, "loss": 0.5239, "lr": 1.8835615750688997e-06, "epoch": 4.375205704882062, "percentage": 87.5, "elapsed_time": "5:17:30", "remaining_time": "0:45:21"}
|
||||
{"current_steps": 2665, "total_steps": 3040, "loss": 0.5108, "lr": 1.8352141101230758e-06, "epoch": 4.3834339001645635, "percentage": 87.66, "elapsed_time": "5:19:13", "remaining_time": "0:44:55"}
|
||||
{"current_steps": 2670, "total_steps": 3040, "loss": 0.4972, "lr": 1.787465382950999e-06, "epoch": 4.3916620954470655, "percentage": 87.83, "elapsed_time": "5:20:59", "remaining_time": "0:44:28"}
|
||||
{"current_steps": 2675, "total_steps": 3040, "loss": 0.5039, "lr": 1.7403169674202036e-06, "epoch": 4.399890290729567, "percentage": 87.99, "elapsed_time": "5:22:33", "remaining_time": "0:44:00"}
|
||||
{"current_steps": 2680, "total_steps": 3040, "loss": 0.4889, "lr": 1.6937704176110582e-06, "epoch": 4.408118486012068, "percentage": 88.16, "elapsed_time": "5:24:24", "remaining_time": "0:43:34"}
|
||||
{"current_steps": 2685, "total_steps": 3040, "loss": 0.4648, "lr": 1.6478272677655804e-06, "epoch": 4.416346681294569, "percentage": 88.32, "elapsed_time": "5:26:02", "remaining_time": "0:43:06"}
|
||||
{"current_steps": 2690, "total_steps": 3040, "loss": 0.4778, "lr": 1.6024890322368358e-06, "epoch": 4.424574876577071, "percentage": 88.49, "elapsed_time": "5:27:46", "remaining_time": "0:42:38"}
|
||||
{"current_steps": 2695, "total_steps": 3040, "loss": 0.4799, "lr": 1.5577572054390388e-06, "epoch": 4.432803071859572, "percentage": 88.65, "elapsed_time": "5:29:35", "remaining_time": "0:42:11"}
|
||||
{"current_steps": 2700, "total_steps": 3040, "loss": 0.4633, "lr": 1.5136332617982863e-06, "epoch": 4.4410312671420735, "percentage": 88.82, "elapsed_time": "5:31:20", "remaining_time": "0:41:43"}
|
||||
{"current_steps": 2705, "total_steps": 3040, "loss": 0.4941, "lr": 1.4701186557039648e-06, "epoch": 4.449259462424575, "percentage": 88.98, "elapsed_time": "5:33:18", "remaining_time": "0:41:16"}
|
||||
{"current_steps": 2710, "total_steps": 3040, "loss": 0.4766, "lr": 1.4272148214608073e-06, "epoch": 4.457487657707076, "percentage": 89.14, "elapsed_time": "5:34:58", "remaining_time": "0:40:47"}
|
||||
{"current_steps": 2715, "total_steps": 3040, "loss": 0.4828, "lr": 1.384923173241619e-06, "epoch": 4.465715852989578, "percentage": 89.31, "elapsed_time": "5:36:51", "remaining_time": "0:40:19"}
|
||||
{"current_steps": 2720, "total_steps": 3040, "loss": 0.4872, "lr": 1.3432451050406603e-06, "epoch": 4.473944048272079, "percentage": 89.47, "elapsed_time": "5:38:36", "remaining_time": "0:39:50"}
|
||||
{"current_steps": 2725, "total_steps": 3040, "loss": 0.4717, "lr": 1.3021819906277021e-06, "epoch": 4.48217224355458, "percentage": 89.64, "elapsed_time": "5:40:18", "remaining_time": "0:39:20"}
|
||||
{"current_steps": 2730, "total_steps": 3040, "loss": 0.5015, "lr": 1.2617351835027481e-06, "epoch": 4.4904004388370815, "percentage": 89.8, "elapsed_time": "5:42:04", "remaining_time": "0:38:50"}
|
||||
{"current_steps": 2735, "total_steps": 3040, "loss": 0.4304, "lr": 1.2219060168514086e-06, "epoch": 4.498628634119583, "percentage": 89.97, "elapsed_time": "5:43:44", "remaining_time": "0:38:19"}
|
||||
{"current_steps": 2740, "total_steps": 3040, "loss": 0.1567, "lr": 1.1826958035009773e-06, "epoch": 4.506856829402085, "percentage": 90.13, "elapsed_time": "5:44:57", "remaining_time": "0:37:46"}
|
||||
{"current_steps": 2745, "total_steps": 3040, "loss": 0.1415, "lr": 1.1441058358771317e-06, "epoch": 4.515085024684586, "percentage": 90.3, "elapsed_time": "5:46:05", "remaining_time": "0:37:11"}
|
||||
{"current_steps": 2750, "total_steps": 3040, "loss": 0.1359, "lr": 1.1061373859613634e-06, "epoch": 4.523313219967087, "percentage": 90.46, "elapsed_time": "5:47:19", "remaining_time": "0:36:37"}
|
||||
{"current_steps": 2755, "total_steps": 3040, "loss": 0.1283, "lr": 1.0687917052490193e-06, "epoch": 4.531541415249588, "percentage": 90.62, "elapsed_time": "5:48:32", "remaining_time": "0:36:03"}
|
||||
{"current_steps": 2760, "total_steps": 3040, "loss": 0.1269, "lr": 1.032070024708085e-06, "epoch": 4.53976961053209, "percentage": 90.79, "elapsed_time": "5:49:43", "remaining_time": "0:35:28"}
|
||||
{"current_steps": 2765, "total_steps": 3040, "loss": 0.1223, "lr": 9.959735547385762e-07, "epoch": 4.547997805814592, "percentage": 90.95, "elapsed_time": "5:50:56", "remaining_time": "0:34:54"}
|
||||
{"current_steps": 2770, "total_steps": 3040, "loss": 0.124, "lr": 9.605034851326644e-07, "epoch": 4.556226001097093, "percentage": 91.12, "elapsed_time": "5:52:13", "remaining_time": "0:34:19"}
|
||||
{"current_steps": 2775, "total_steps": 3040, "loss": 0.1264, "lr": 9.256609850354636e-07, "epoch": 4.564454196379594, "percentage": 91.28, "elapsed_time": "5:53:32", "remaining_time": "0:33:45"}
|
||||
{"current_steps": 2780, "total_steps": 3040, "loss": 0.1185, "lr": 8.91447202906468e-07, "epoch": 4.572682391662095, "percentage": 91.45, "elapsed_time": "5:54:47", "remaining_time": "0:33:10"}
|
||||
{"current_steps": 2785, "total_steps": 3040, "loss": 0.1181, "lr": 8.578632664817177e-07, "epoch": 4.580910586944597, "percentage": 91.61, "elapsed_time": "5:56:09", "remaining_time": "0:32:36"}
|
||||
{"current_steps": 2790, "total_steps": 3040, "loss": 0.1215, "lr": 8.249102827366306e-07, "epoch": 4.589138782227098, "percentage": 91.78, "elapsed_time": "5:57:32", "remaining_time": "0:32:02"}
|
||||
{"current_steps": 2795, "total_steps": 3040, "loss": 0.1181, "lr": 7.925893378494942e-07, "epoch": 4.5973669775095996, "percentage": 91.94, "elapsed_time": "5:58:46", "remaining_time": "0:31:26"}
|
||||
{"current_steps": 2800, "total_steps": 3040, "loss": 0.1187, "lr": 7.609014971656803e-07, "epoch": 4.605595172792101, "percentage": 92.11, "elapsed_time": "6:00:06", "remaining_time": "0:30:51"}
|
||||
{"current_steps": 2805, "total_steps": 3040, "loss": 0.1187, "lr": 7.298478051625335e-07, "epoch": 4.613823368074602, "percentage": 92.27, "elapsed_time": "6:01:26", "remaining_time": "0:30:16"}
|
||||
{"current_steps": 2810, "total_steps": 3040, "loss": 0.117, "lr": 6.994292854149165e-07, "epoch": 4.622051563357104, "percentage": 92.43, "elapsed_time": "6:02:42", "remaining_time": "0:29:41"}
|
||||
{"current_steps": 2815, "total_steps": 3040, "loss": 0.1159, "lr": 6.696469405615102e-07, "epoch": 4.630279758639605, "percentage": 92.6, "elapsed_time": "6:03:50", "remaining_time": "0:29:04"}
|
||||
{"current_steps": 2820, "total_steps": 3040, "loss": 0.1168, "lr": 6.405017522717316e-07, "epoch": 4.638507953922106, "percentage": 92.76, "elapsed_time": "6:05:07", "remaining_time": "0:28:29"}
|
||||
{"current_steps": 2825, "total_steps": 3040, "loss": 0.1176, "lr": 6.119946812133926e-07, "epoch": 4.6467361492046075, "percentage": 92.93, "elapsed_time": "6:06:17", "remaining_time": "0:27:52"}
|
||||
{"current_steps": 2830, "total_steps": 3040, "loss": 0.1163, "lr": 5.841266670210366e-07, "epoch": 4.654964344487109, "percentage": 93.09, "elapsed_time": "6:07:31", "remaining_time": "0:27:16"}
|
||||
{"current_steps": 2835, "total_steps": 3040, "loss": 0.1144, "lr": 5.568986282649636e-07, "epoch": 4.663192539769611, "percentage": 93.26, "elapsed_time": "6:08:41", "remaining_time": "0:26:39"}
|
||||
{"current_steps": 2840, "total_steps": 3040, "loss": 0.261, "lr": 5.303114624209449e-07, "epoch": 4.671420735052112, "percentage": 93.42, "elapsed_time": "6:10:04", "remaining_time": "0:26:03"}
|
||||
{"current_steps": 2845, "total_steps": 3040, "loss": 0.4345, "lr": 5.043660458406563e-07, "epoch": 4.679648930334613, "percentage": 93.59, "elapsed_time": "6:11:54", "remaining_time": "0:25:29"}
|
||||
{"current_steps": 2850, "total_steps": 3040, "loss": 0.4006, "lr": 4.790632337227785e-07, "epoch": 4.687877125617114, "percentage": 93.75, "elapsed_time": "6:13:40", "remaining_time": "0:24:54"}
|
||||
{"current_steps": 2855, "total_steps": 3040, "loss": 0.4252, "lr": 4.544038600848155e-07, "epoch": 4.6961053208996155, "percentage": 93.91, "elapsed_time": "6:15:21", "remaining_time": "0:24:19"}
|
||||
{"current_steps": 2860, "total_steps": 3040, "loss": 0.3976, "lr": 4.303887377356053e-07, "epoch": 4.704333516182118, "percentage": 94.08, "elapsed_time": "6:17:09", "remaining_time": "0:23:44"}
|
||||
{"current_steps": 2865, "total_steps": 3040, "loss": 0.396, "lr": 4.070186582485214e-07, "epoch": 4.712561711464619, "percentage": 94.24, "elapsed_time": "6:18:50", "remaining_time": "0:23:08"}
|
||||
{"current_steps": 2870, "total_steps": 3040, "loss": 0.3833, "lr": 3.842943919353914e-07, "epoch": 4.72078990674712, "percentage": 94.41, "elapsed_time": "6:20:36", "remaining_time": "0:22:32"}
|
||||
{"current_steps": 2875, "total_steps": 3040, "loss": 0.375, "lr": 3.6221668782109534e-07, "epoch": 4.729018102029621, "percentage": 94.57, "elapsed_time": "6:22:24", "remaining_time": "0:21:56"}
|
||||
{"current_steps": 2880, "total_steps": 3040, "loss": 0.3942, "lr": 3.4078627361888717e-07, "epoch": 4.737246297312123, "percentage": 94.74, "elapsed_time": "6:24:08", "remaining_time": "0:21:20"}
|
||||
{"current_steps": 2885, "total_steps": 3040, "loss": 0.3869, "lr": 3.2000385570640114e-07, "epoch": 4.745474492594624, "percentage": 94.9, "elapsed_time": "6:25:51", "remaining_time": "0:20:43"}
|
||||
{"current_steps": 2890, "total_steps": 3040, "loss": 0.3684, "lr": 2.998701191023701e-07, "epoch": 4.753702687877126, "percentage": 95.07, "elapsed_time": "6:27:43", "remaining_time": "0:20:07"}
|
||||
{"current_steps": 2895, "total_steps": 3040, "loss": 0.3995, "lr": 2.80385727444048e-07, "epoch": 4.761930883159627, "percentage": 95.23, "elapsed_time": "6:29:28", "remaining_time": "0:19:30"}
|
||||
{"current_steps": 2900, "total_steps": 3040, "loss": 0.3732, "lr": 2.615513229653366e-07, "epoch": 4.770159078442129, "percentage": 95.39, "elapsed_time": "6:31:16", "remaining_time": "0:18:53"}
|
||||
{"current_steps": 2905, "total_steps": 3040, "loss": 0.397, "lr": 2.4336752647561304e-07, "epoch": 4.77838727372463, "percentage": 95.56, "elapsed_time": "6:32:56", "remaining_time": "0:18:15"}
|
||||
{"current_steps": 2910, "total_steps": 3040, "loss": 0.3876, "lr": 2.2583493733926655e-07, "epoch": 4.786615469007131, "percentage": 95.72, "elapsed_time": "6:34:46", "remaining_time": "0:17:38"}
|
||||
{"current_steps": 2915, "total_steps": 3040, "loss": 0.3908, "lr": 2.0895413345594527e-07, "epoch": 4.794843664289632, "percentage": 95.89, "elapsed_time": "6:36:36", "remaining_time": "0:17:00"}
|
||||
{"current_steps": 2920, "total_steps": 3040, "loss": 0.3916, "lr": 1.9272567124150932e-07, "epoch": 4.803071859572134, "percentage": 96.05, "elapsed_time": "6:38:14", "remaining_time": "0:16:21"}
|
||||
{"current_steps": 2925, "total_steps": 3040, "loss": 0.3932, "lr": 1.771500856096875e-07, "epoch": 4.811300054854636, "percentage": 96.22, "elapsed_time": "6:39:56", "remaining_time": "0:15:43"}
|
||||
{"current_steps": 2930, "total_steps": 3040, "loss": 0.369, "lr": 1.6222788995444272e-07, "epoch": 4.819528250137137, "percentage": 96.38, "elapsed_time": "6:41:39", "remaining_time": "0:15:04"}
|
||||
{"current_steps": 2935, "total_steps": 3040, "loss": 0.3937, "lr": 1.4795957613305877e-07, "epoch": 4.827756445419638, "percentage": 96.55, "elapsed_time": "6:43:27", "remaining_time": "0:14:26"}
|
||||
{"current_steps": 2940, "total_steps": 3040, "loss": 0.3936, "lr": 1.3434561444992e-07, "epoch": 4.835984640702139, "percentage": 96.71, "elapsed_time": "6:45:13", "remaining_time": "0:13:46"}
|
||||
{"current_steps": 2945, "total_steps": 3040, "loss": 0.3477, "lr": 1.2138645364101032e-07, "epoch": 4.84421283598464, "percentage": 96.88, "elapsed_time": "6:47:10", "remaining_time": "0:13:08"}
|
||||
{"current_steps": 2950, "total_steps": 3040, "loss": 0.2616, "lr": 1.0908252085912952e-07, "epoch": 4.852441031267142, "percentage": 97.04, "elapsed_time": "6:49:37", "remaining_time": "0:12:29"}
|
||||
{"current_steps": 2955, "total_steps": 3040, "loss": 0.2401, "lr": 9.743422165980454e-08, "epoch": 4.860669226549644, "percentage": 97.2, "elapsed_time": "6:52:05", "remaining_time": "0:11:51"}
|
||||
{"current_steps": 2960, "total_steps": 3040, "loss": 0.2575, "lr": 8.64419399879246e-08, "epoch": 4.868897421832145, "percentage": 97.37, "elapsed_time": "6:54:32", "remaining_time": "0:11:12"}
|
||||
{"current_steps": 2965, "total_steps": 3040, "loss": 0.2476, "lr": 7.61060381650891e-08, "epoch": 4.877125617114646, "percentage": 97.53, "elapsed_time": "6:57:00", "remaining_time": "0:10:32"}
|
||||
{"current_steps": 2970, "total_steps": 3040, "loss": 0.264, "lr": 6.642685687766159e-08, "epoch": 4.885353812397147, "percentage": 97.7, "elapsed_time": "6:59:19", "remaining_time": "0:09:52"}
|
||||
{"current_steps": 2975, "total_steps": 3040, "loss": 0.2368, "lr": 5.740471516553881e-08, "epoch": 4.893582007679649, "percentage": 97.86, "elapsed_time": "7:01:48", "remaining_time": "0:09:12"}
|
||||
{"current_steps": 2980, "total_steps": 3040, "loss": 0.2492, "lr": 4.9039910411643466e-08, "epoch": 4.90181020296215, "percentage": 98.03, "elapsed_time": "7:04:11", "remaining_time": "0:08:32"}
|
||||
{"current_steps": 2985, "total_steps": 3040, "loss": 0.234, "lr": 4.133271833210772e-08, "epoch": 4.910038398244652, "percentage": 98.19, "elapsed_time": "7:06:39", "remaining_time": "0:07:51"}
|
||||
{"current_steps": 2990, "total_steps": 3040, "loss": 0.2502, "lr": 3.428339296719596e-08, "epoch": 4.918266593527153, "percentage": 98.36, "elapsed_time": "7:09:05", "remaining_time": "0:07:10"}
|
||||
{"current_steps": 2995, "total_steps": 3040, "loss": 0.2527, "lr": 2.789216667293593e-08, "epoch": 4.926494788809654, "percentage": 98.52, "elapsed_time": "7:11:23", "remaining_time": "0:06:28"}
|
||||
{"current_steps": 3000, "total_steps": 3040, "loss": 0.2546, "lr": 2.2159250113438223e-08, "epoch": 4.934722984092156, "percentage": 98.68, "elapsed_time": "7:13:51", "remaining_time": "0:05:47"}
|
||||
{"current_steps": 3005, "total_steps": 3040, "loss": 0.2639, "lr": 1.708483225397961e-08, "epoch": 4.942951179374657, "percentage": 98.85, "elapsed_time": "7:16:19", "remaining_time": "0:05:04"}
|
||||
{"current_steps": 3010, "total_steps": 3040, "loss": 0.2289, "lr": 1.266908035475467e-08, "epoch": 4.951179374657158, "percentage": 99.01, "elapsed_time": "7:18:45", "remaining_time": "0:04:22"}
|
||||
{"current_steps": 3015, "total_steps": 3040, "loss": 0.2386, "lr": 8.912139965369105e-09, "epoch": 4.95940756993966, "percentage": 99.18, "elapsed_time": "7:21:12", "remaining_time": "0:03:39"}
|
||||
{"current_steps": 3020, "total_steps": 3040, "loss": 0.2425, "lr": 5.814134920048009e-09, "epoch": 4.967635765222162, "percentage": 99.34, "elapsed_time": "7:23:39", "remaining_time": "0:02:56"}
|
||||
{"current_steps": 3025, "total_steps": 3040, "loss": 0.2281, "lr": 3.3751673335458147e-09, "epoch": 4.975863960504663, "percentage": 99.51, "elapsed_time": "7:26:05", "remaining_time": "0:02:12"}
|
||||
{"current_steps": 3030, "total_steps": 3040, "loss": 0.2468, "lr": 1.5953175977778679e-09, "epoch": 4.984092155787164, "percentage": 99.67, "elapsed_time": "7:28:19", "remaining_time": "0:01:28"}
|
||||
{"current_steps": 3035, "total_steps": 3040, "loss": 0.2333, "lr": 4.746443791869837e-10, "epoch": 4.992320351069665, "percentage": 99.84, "elapsed_time": "7:30:44", "remaining_time": "0:00:44"}
|
||||
{"current_steps": 3040, "total_steps": 3040, "loss": 0.2833, "lr": 1.3184616789452264e-11, "epoch": 5.0, "percentage": 100.0, "elapsed_time": "7:32:50", "remaining_time": "0:00:00"}
|
||||
{"current_steps": 3040, "total_steps": 3040, "epoch": 5.0, "percentage": 100.0, "elapsed_time": "7:32:59", "remaining_time": "0:00:00"}
|
||||
{"current_steps": 3040, "total_steps": 3040, "epoch": 5.0, "percentage": 100.0, "elapsed_time": "0:00:00", "remaining_time": "0:00:00"}
|
||||
{"current_steps": 3040, "total_steps": 3040, "epoch": 5.0, "percentage": 100.0, "elapsed_time": "0:00:00", "remaining_time": "0:00:00"}
|
||||
{"current_steps": 3040, "total_steps": 3040, "epoch": 5.0, "percentage": 100.0, "elapsed_time": "0:00:00", "remaining_time": "0:00:00"}
|
||||
6731
trainer_state.json
Normal file
6731
trainer_state.json
Normal file
File diff suppressed because it is too large
Load Diff
3
training_args.bin
Normal file
3
training_args.bin
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:c8765fec5b49e9a94eae85e43da6e3aa7cc892f9851b3f18d759e957a620f806
|
||||
size 8785
|
||||
BIN
training_loss.png
Normal file
BIN
training_loss.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 60 KiB |
1
vocab.json
Normal file
1
vocab.json
Normal file
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user