初始化项目,由ModelHub XC社区提供模型
Model: DCAgent/a1-codeelo Source: Original Platform
This commit is contained in:
36
.gitattributes
vendored
Normal file
36
.gitattributes
vendored
Normal file
@@ -0,0 +1,36 @@
|
|||||||
|
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.model filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||||
|
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||||
60
README.md
Normal file
60
README.md
Normal file
@@ -0,0 +1,60 @@
|
|||||||
|
---
|
||||||
|
library_name: transformers
|
||||||
|
license: other
|
||||||
|
base_model: Qwen/Qwen3-8B
|
||||||
|
tags:
|
||||||
|
- llama-factory
|
||||||
|
- full
|
||||||
|
- generated_from_trainer
|
||||||
|
model-index:
|
||||||
|
- name: sft_a1_codeelo__Qwen3-8B
|
||||||
|
results: []
|
||||||
|
---
|
||||||
|
|
||||||
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
||||||
|
should probably proofread and complete it, then remove this comment. -->
|
||||||
|
|
||||||
|
# sft_a1_codeelo__Qwen3-8B
|
||||||
|
|
||||||
|
This model is a fine-tuned version of [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) on the /e/scratch/jureap59/raoof1/sft_data/hf_hub/datasets--DCAgent--exp_rpt_codeelo-v2_10k_glm_4.7_traces_jupiter/snapshots/82252f3ec14c532dcb0a1154c26432b8bcd8b10e_thinking_preprocessed dataset.
|
||||||
|
|
||||||
|
## Model description
|
||||||
|
|
||||||
|
More information needed
|
||||||
|
|
||||||
|
## Intended uses & limitations
|
||||||
|
|
||||||
|
More information needed
|
||||||
|
|
||||||
|
## Training and evaluation data
|
||||||
|
|
||||||
|
More information needed
|
||||||
|
|
||||||
|
## Training procedure
|
||||||
|
|
||||||
|
### Training hyperparameters
|
||||||
|
|
||||||
|
The following hyperparameters were used during training:
|
||||||
|
- learning_rate: 4e-05
|
||||||
|
- train_batch_size: 1
|
||||||
|
- eval_batch_size: 8
|
||||||
|
- seed: 42
|
||||||
|
- distributed_type: multi-GPU
|
||||||
|
- num_devices: 16
|
||||||
|
- total_train_batch_size: 16
|
||||||
|
- total_eval_batch_size: 128
|
||||||
|
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.98) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
||||||
|
- lr_scheduler_type: cosine
|
||||||
|
- lr_scheduler_warmup_ratio: 0.1
|
||||||
|
- num_epochs: 7.0
|
||||||
|
|
||||||
|
### Training results
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
### Framework versions
|
||||||
|
|
||||||
|
- Transformers 4.57.6
|
||||||
|
- Pytorch 2.9.1+cu130
|
||||||
|
- Datasets 4.7.0
|
||||||
|
- Tokenizers 0.22.2
|
||||||
28
added_tokens.json
Normal file
28
added_tokens.json
Normal file
@@ -0,0 +1,28 @@
|
|||||||
|
{
|
||||||
|
"</think>": 151668,
|
||||||
|
"</tool_call>": 151658,
|
||||||
|
"</tool_response>": 151666,
|
||||||
|
"<think>": 151667,
|
||||||
|
"<tool_call>": 151657,
|
||||||
|
"<tool_response>": 151665,
|
||||||
|
"<|box_end|>": 151649,
|
||||||
|
"<|box_start|>": 151648,
|
||||||
|
"<|endoftext|>": 151643,
|
||||||
|
"<|file_sep|>": 151664,
|
||||||
|
"<|fim_middle|>": 151660,
|
||||||
|
"<|fim_pad|>": 151662,
|
||||||
|
"<|fim_prefix|>": 151659,
|
||||||
|
"<|fim_suffix|>": 151661,
|
||||||
|
"<|im_end|>": 151645,
|
||||||
|
"<|im_start|>": 151644,
|
||||||
|
"<|image_pad|>": 151655,
|
||||||
|
"<|object_ref_end|>": 151647,
|
||||||
|
"<|object_ref_start|>": 151646,
|
||||||
|
"<|quad_end|>": 151651,
|
||||||
|
"<|quad_start|>": 151650,
|
||||||
|
"<|repo_name|>": 151663,
|
||||||
|
"<|video_pad|>": 151656,
|
||||||
|
"<|vision_end|>": 151653,
|
||||||
|
"<|vision_pad|>": 151654,
|
||||||
|
"<|vision_start|>": 151652
|
||||||
|
}
|
||||||
16
all_results.json
Normal file
16
all_results.json
Normal file
@@ -0,0 +1,16 @@
|
|||||||
|
{
|
||||||
|
"achieved_tflops_per_gpu": 0.0022911149705146998,
|
||||||
|
"achieved_tflops_per_gpu_theoretical": 423.488070352896,
|
||||||
|
"epoch": 7.0,
|
||||||
|
"loss_nan_ranks": 0,
|
||||||
|
"loss_rank_avg": 0.4637048840522766,
|
||||||
|
"mfu_percent": 0.0001619162523331943,
|
||||||
|
"mfu_percent_theoretical": 29.928485537307136,
|
||||||
|
"total_flos": 1105775565668352.0,
|
||||||
|
"train_loss": 0.48887504853286395,
|
||||||
|
"train_runtime": 30164.7773,
|
||||||
|
"train_samples_per_second": 1.994,
|
||||||
|
"train_steps_per_second": 0.125,
|
||||||
|
"valid_targets_mean": 7010.8,
|
||||||
|
"valid_targets_min": 805
|
||||||
|
}
|
||||||
89
chat_template.jinja
Normal file
89
chat_template.jinja
Normal file
@@ -0,0 +1,89 @@
|
|||||||
|
{%- if tools %}
|
||||||
|
{{- '<|im_start|>system\n' }}
|
||||||
|
{%- if messages[0].role == 'system' %}
|
||||||
|
{{- messages[0].content + '\n\n' }}
|
||||||
|
{%- endif %}
|
||||||
|
{{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
|
||||||
|
{%- for tool in tools %}
|
||||||
|
{{- "\n" }}
|
||||||
|
{{- tool | tojson }}
|
||||||
|
{%- endfor %}
|
||||||
|
{{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
|
||||||
|
{%- else %}
|
||||||
|
{%- if messages[0].role == 'system' %}
|
||||||
|
{{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
|
||||||
|
{%- endif %}
|
||||||
|
{%- endif %}
|
||||||
|
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
|
||||||
|
{%- for message in messages[::-1] %}
|
||||||
|
{%- set index = (messages|length - 1) - loop.index0 %}
|
||||||
|
{%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
|
||||||
|
{%- set ns.multi_step_tool = false %}
|
||||||
|
{%- set ns.last_query_index = index %}
|
||||||
|
{%- endif %}
|
||||||
|
{%- endfor %}
|
||||||
|
{%- for message in messages %}
|
||||||
|
{%- if message.content is string %}
|
||||||
|
{%- set content = message.content %}
|
||||||
|
{%- else %}
|
||||||
|
{%- set content = '' %}
|
||||||
|
{%- endif %}
|
||||||
|
{%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
|
||||||
|
{{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
|
||||||
|
{%- elif message.role == "assistant" %}
|
||||||
|
{%- set reasoning_content = '' %}
|
||||||
|
{%- if message.reasoning_content is string %}
|
||||||
|
{%- set reasoning_content = message.reasoning_content %}
|
||||||
|
{%- else %}
|
||||||
|
{%- if '</think>' in content %}
|
||||||
|
{%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
|
||||||
|
{%- set content = content.split('</think>')[-1].lstrip('\n') %}
|
||||||
|
{%- endif %}
|
||||||
|
{%- endif %}
|
||||||
|
{%- if loop.index0 > ns.last_query_index %}
|
||||||
|
{%- if loop.last or (not loop.last and reasoning_content) %}
|
||||||
|
{{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
|
||||||
|
{%- else %}
|
||||||
|
{{- '<|im_start|>' + message.role + '\n' + content }}
|
||||||
|
{%- endif %}
|
||||||
|
{%- else %}
|
||||||
|
{{- '<|im_start|>' + message.role + '\n' + content }}
|
||||||
|
{%- endif %}
|
||||||
|
{%- if message.tool_calls %}
|
||||||
|
{%- for tool_call in message.tool_calls %}
|
||||||
|
{%- if (loop.first and content) or (not loop.first) %}
|
||||||
|
{{- '\n' }}
|
||||||
|
{%- endif %}
|
||||||
|
{%- if tool_call.function %}
|
||||||
|
{%- set tool_call = tool_call.function %}
|
||||||
|
{%- endif %}
|
||||||
|
{{- '<tool_call>\n{"name": "' }}
|
||||||
|
{{- tool_call.name }}
|
||||||
|
{{- '", "arguments": ' }}
|
||||||
|
{%- if tool_call.arguments is string %}
|
||||||
|
{{- tool_call.arguments }}
|
||||||
|
{%- else %}
|
||||||
|
{{- tool_call.arguments | tojson }}
|
||||||
|
{%- endif %}
|
||||||
|
{{- '}\n</tool_call>' }}
|
||||||
|
{%- endfor %}
|
||||||
|
{%- endif %}
|
||||||
|
{{- '<|im_end|>\n' }}
|
||||||
|
{%- elif message.role == "tool" %}
|
||||||
|
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
|
||||||
|
{{- '<|im_start|>user' }}
|
||||||
|
{%- endif %}
|
||||||
|
{{- '\n<tool_response>\n' }}
|
||||||
|
{{- content }}
|
||||||
|
{{- '\n</tool_response>' }}
|
||||||
|
{%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
|
||||||
|
{{- '<|im_end|>\n' }}
|
||||||
|
{%- endif %}
|
||||||
|
{%- endif %}
|
||||||
|
{%- endfor %}
|
||||||
|
{%- if add_generation_prompt %}
|
||||||
|
{{- '<|im_start|>assistant\n' }}
|
||||||
|
{%- if enable_thinking is defined and enable_thinking is false %}
|
||||||
|
{{- '<think>\n\n</think>\n\n' }}
|
||||||
|
{%- endif %}
|
||||||
|
{%- endif %}
|
||||||
68
config.json
Normal file
68
config.json
Normal file
@@ -0,0 +1,68 @@
|
|||||||
|
{
|
||||||
|
"architectures": [
|
||||||
|
"Qwen3ForCausalLM"
|
||||||
|
],
|
||||||
|
"attention_bias": false,
|
||||||
|
"attention_dropout": 0.0,
|
||||||
|
"dtype": "bfloat16",
|
||||||
|
"eos_token_id": 151645,
|
||||||
|
"head_dim": 128,
|
||||||
|
"hidden_act": "silu",
|
||||||
|
"hidden_size": 4096,
|
||||||
|
"initializer_range": 0.02,
|
||||||
|
"intermediate_size": 12288,
|
||||||
|
"layer_types": [
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention"
|
||||||
|
],
|
||||||
|
"max_position_embeddings": 40960,
|
||||||
|
"max_window_layers": 36,
|
||||||
|
"model_type": "qwen3",
|
||||||
|
"num_attention_heads": 32,
|
||||||
|
"num_hidden_layers": 36,
|
||||||
|
"num_key_value_heads": 8,
|
||||||
|
"pad_token_id": 151643,
|
||||||
|
"rms_norm_eps": 1e-06,
|
||||||
|
"rope_scaling": null,
|
||||||
|
"rope_theta": 1000000,
|
||||||
|
"sliding_window": null,
|
||||||
|
"tie_word_embeddings": false,
|
||||||
|
"transformers_version": "4.57.6",
|
||||||
|
"use_cache": false,
|
||||||
|
"use_sliding_window": false,
|
||||||
|
"vocab_size": 151936
|
||||||
|
}
|
||||||
12
generation_config.json
Normal file
12
generation_config.json
Normal file
@@ -0,0 +1,12 @@
|
|||||||
|
{
|
||||||
|
"do_sample": true,
|
||||||
|
"eos_token_id": [
|
||||||
|
151645,
|
||||||
|
151643
|
||||||
|
],
|
||||||
|
"pad_token_id": 151643,
|
||||||
|
"temperature": 0.6,
|
||||||
|
"top_k": 20,
|
||||||
|
"top_p": 0.95,
|
||||||
|
"transformers_version": "4.57.6"
|
||||||
|
}
|
||||||
151388
merges.txt
Normal file
151388
merges.txt
Normal file
File diff suppressed because it is too large
Load Diff
3
model-00001-of-00004.safetensors
Normal file
3
model-00001-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:7aa7ce43de8f80cf45d174906fd9665c4cc3d4bd9f5710104a9b6a2b5d41de3f
|
||||||
|
size 4902257696
|
||||||
3
model-00002-of-00004.safetensors
Normal file
3
model-00002-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:5b794036ca219abc636ea3301feea6a4b17921859b3e4a2d3a4a7f9997d8f28f
|
||||||
|
size 4915960368
|
||||||
3
model-00003-of-00004.safetensors
Normal file
3
model-00003-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:f8bcb76294a475a9ce0009b57282da54a052eaeba6643247f28bb83e130c7595
|
||||||
|
size 4983068496
|
||||||
3
model-00004-of-00004.safetensors
Normal file
3
model-00004-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:dddd527fc9b9c187160b3974b35654b79c0785379ff59b3e8d14be862e217afd
|
||||||
|
size 1580230264
|
||||||
407
model.safetensors.index.json
Normal file
407
model.safetensors.index.json
Normal file
@@ -0,0 +1,407 @@
|
|||||||
|
{
|
||||||
|
"metadata": {
|
||||||
|
"total_parameters": 308224,
|
||||||
|
"total_size": 16381470720
|
||||||
|
},
|
||||||
|
"weight_map": {
|
||||||
|
"lm_head.weight": "model-00004-of-00004.safetensors",
|
||||||
|
"model.embed_tokens.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.0.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.0.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.1.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.1.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.10.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.10.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.11.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.11.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.12.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.12.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.13.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.13.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.14.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.14.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.15.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.15.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.16.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.16.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.17.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.17.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.18.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.18.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.18.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.18.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.19.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.19.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.19.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.19.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.19.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.19.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.19.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.19.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.19.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.19.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.19.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.2.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.2.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.20.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.20.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.20.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.20.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.20.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.20.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.20.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.20.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.20.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.20.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.20.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.21.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.21.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.21.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.21.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.21.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.21.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.21.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.21.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.21.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.21.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.21.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.22.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.22.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.22.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.22.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.22.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.22.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.23.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.23.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.24.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.24.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.25.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.25.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.26.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.26.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.27.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.27.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.28.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.28.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.28.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.28.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.28.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.28.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.28.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.28.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.28.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.28.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.28.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.29.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.29.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.29.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.29.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.29.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.29.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.29.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.29.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.29.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.29.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.3.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.3.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.30.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.30.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.30.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.30.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.30.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.30.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.30.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.30.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.30.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.30.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.30.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.31.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.31.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.31.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.31.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.31.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.31.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.31.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.31.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.31.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.31.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.31.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.32.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.32.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.32.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.32.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.32.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.32.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.32.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.32.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.32.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.32.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.32.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.33.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.33.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.33.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.33.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.33.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.33.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.33.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.33.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.33.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.33.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.33.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.34.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.34.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.34.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.34.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.34.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.34.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.34.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.34.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.34.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.34.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.34.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.35.input_layernorm.weight": "model-00004-of-00004.safetensors",
|
||||||
|
"model.layers.35.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
|
||||||
|
"model.layers.35.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
|
||||||
|
"model.layers.35.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
|
||||||
|
"model.layers.35.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
|
||||||
|
"model.layers.35.self_attn.k_norm.weight": "model-00004-of-00004.safetensors",
|
||||||
|
"model.layers.35.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.35.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
|
||||||
|
"model.layers.35.self_attn.q_norm.weight": "model-00004-of-00004.safetensors",
|
||||||
|
"model.layers.35.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.35.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.4.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.4.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.5.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.5.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.6.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.6.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.7.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.7.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.8.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.8.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.8.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.8.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.8.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.9.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.9.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.9.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.9.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.9.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.9.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.9.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.norm.weight": "model-00004-of-00004.safetensors"
|
||||||
|
}
|
||||||
|
}
|
||||||
12
run_summary.json
Normal file
12
run_summary.json
Normal file
@@ -0,0 +1,12 @@
|
|||||||
|
{
|
||||||
|
"agent_name": "82252f3ec14c532dcb0a1154c26432b8bcd8b10e_thinking_preprocessed",
|
||||||
|
"training_start": null,
|
||||||
|
"training_end": null,
|
||||||
|
"created_by": "raoof1",
|
||||||
|
"base_model_name": "Qwen/Qwen3-8B",
|
||||||
|
"dataset_name": "/e/scratch/jureap59/raoof1/sft_data/hf_hub/datasets--DCAgent--exp_rpt_codeelo-v2_10k_glm_4.7_traces_jupiter/snapshots/82252f3ec14c532dcb0a1154c26432b8bcd8b10e_thinking_preprocessed",
|
||||||
|
"training_type": "SFT",
|
||||||
|
"training_parameters": "https://huggingface.co/DCAgent/a1-codeelo/blob/main/config.json",
|
||||||
|
"wandb_link": null,
|
||||||
|
"traces_location_s3": null
|
||||||
|
}
|
||||||
31
special_tokens_map.json
Normal file
31
special_tokens_map.json
Normal file
@@ -0,0 +1,31 @@
|
|||||||
|
{
|
||||||
|
"additional_special_tokens": [
|
||||||
|
"<|im_start|>",
|
||||||
|
"<|im_end|>",
|
||||||
|
"<|object_ref_start|>",
|
||||||
|
"<|object_ref_end|>",
|
||||||
|
"<|box_start|>",
|
||||||
|
"<|box_end|>",
|
||||||
|
"<|quad_start|>",
|
||||||
|
"<|quad_end|>",
|
||||||
|
"<|vision_start|>",
|
||||||
|
"<|vision_end|>",
|
||||||
|
"<|vision_pad|>",
|
||||||
|
"<|image_pad|>",
|
||||||
|
"<|video_pad|>"
|
||||||
|
],
|
||||||
|
"eos_token": {
|
||||||
|
"content": "<|im_end|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"pad_token": {
|
||||||
|
"content": "<|endoftext|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
}
|
||||||
|
}
|
||||||
3
tokenizer.json
Normal file
3
tokenizer.json
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
|
||||||
|
size 11422654
|
||||||
240
tokenizer_config.json
Normal file
240
tokenizer_config.json
Normal file
@@ -0,0 +1,240 @@
|
|||||||
|
{
|
||||||
|
"add_bos_token": false,
|
||||||
|
"add_prefix_space": false,
|
||||||
|
"added_tokens_decoder": {
|
||||||
|
"151643": {
|
||||||
|
"content": "<|endoftext|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151644": {
|
||||||
|
"content": "<|im_start|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151645": {
|
||||||
|
"content": "<|im_end|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151646": {
|
||||||
|
"content": "<|object_ref_start|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151647": {
|
||||||
|
"content": "<|object_ref_end|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151648": {
|
||||||
|
"content": "<|box_start|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151649": {
|
||||||
|
"content": "<|box_end|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151650": {
|
||||||
|
"content": "<|quad_start|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151651": {
|
||||||
|
"content": "<|quad_end|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151652": {
|
||||||
|
"content": "<|vision_start|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151653": {
|
||||||
|
"content": "<|vision_end|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151654": {
|
||||||
|
"content": "<|vision_pad|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151655": {
|
||||||
|
"content": "<|image_pad|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151656": {
|
||||||
|
"content": "<|video_pad|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151657": {
|
||||||
|
"content": "<tool_call>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
},
|
||||||
|
"151658": {
|
||||||
|
"content": "</tool_call>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
},
|
||||||
|
"151659": {
|
||||||
|
"content": "<|fim_prefix|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
},
|
||||||
|
"151660": {
|
||||||
|
"content": "<|fim_middle|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
},
|
||||||
|
"151661": {
|
||||||
|
"content": "<|fim_suffix|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
},
|
||||||
|
"151662": {
|
||||||
|
"content": "<|fim_pad|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
},
|
||||||
|
"151663": {
|
||||||
|
"content": "<|repo_name|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
},
|
||||||
|
"151664": {
|
||||||
|
"content": "<|file_sep|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
},
|
||||||
|
"151665": {
|
||||||
|
"content": "<tool_response>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
},
|
||||||
|
"151666": {
|
||||||
|
"content": "</tool_response>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
},
|
||||||
|
"151667": {
|
||||||
|
"content": "<think>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
},
|
||||||
|
"151668": {
|
||||||
|
"content": "</think>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"additional_special_tokens": [
|
||||||
|
"<|im_start|>",
|
||||||
|
"<|im_end|>",
|
||||||
|
"<|object_ref_start|>",
|
||||||
|
"<|object_ref_end|>",
|
||||||
|
"<|box_start|>",
|
||||||
|
"<|box_end|>",
|
||||||
|
"<|quad_start|>",
|
||||||
|
"<|quad_end|>",
|
||||||
|
"<|vision_start|>",
|
||||||
|
"<|vision_end|>",
|
||||||
|
"<|vision_pad|>",
|
||||||
|
"<|image_pad|>",
|
||||||
|
"<|video_pad|>"
|
||||||
|
],
|
||||||
|
"bos_token": null,
|
||||||
|
"clean_up_tokenization_spaces": false,
|
||||||
|
"eos_token": "<|im_end|>",
|
||||||
|
"errors": "replace",
|
||||||
|
"extra_special_tokens": {},
|
||||||
|
"model_max_length": 32768,
|
||||||
|
"pad_token": "<|endoftext|>",
|
||||||
|
"padding_side": "right",
|
||||||
|
"split_special_tokens": false,
|
||||||
|
"tokenizer_class": "Qwen2Tokenizer",
|
||||||
|
"unk_token": null
|
||||||
|
}
|
||||||
16
train_results.json
Normal file
16
train_results.json
Normal file
@@ -0,0 +1,16 @@
|
|||||||
|
{
|
||||||
|
"achieved_tflops_per_gpu": 0.0022911149705146998,
|
||||||
|
"achieved_tflops_per_gpu_theoretical": 423.488070352896,
|
||||||
|
"epoch": 7.0,
|
||||||
|
"loss_nan_ranks": 0,
|
||||||
|
"loss_rank_avg": 0.4637048840522766,
|
||||||
|
"mfu_percent": 0.0001619162523331943,
|
||||||
|
"mfu_percent_theoretical": 29.928485537307136,
|
||||||
|
"total_flos": 1105775565668352.0,
|
||||||
|
"train_loss": 0.48887504853286395,
|
||||||
|
"train_runtime": 30164.7773,
|
||||||
|
"train_samples_per_second": 1.994,
|
||||||
|
"train_steps_per_second": 0.125,
|
||||||
|
"valid_targets_mean": 7010.8,
|
||||||
|
"valid_targets_min": 805
|
||||||
|
}
|
||||||
760
trainer_log.jsonl
Normal file
760
trainer_log.jsonl
Normal file
@@ -0,0 +1,760 @@
|
|||||||
|
{"current_steps": 5, "total_steps": 3759, "loss": 0.9813, "lr": 4.2553191489361704e-07, "epoch": 0.00931098696461825, "percentage": 0.13, "elapsed_time": "0:00:42", "remaining_time": "8:50:09"}
|
||||||
|
{"current_steps": 10, "total_steps": 3759, "loss": 1.0315, "lr": 9.574468085106384e-07, "epoch": 0.0186219739292365, "percentage": 0.27, "elapsed_time": "0:01:23", "remaining_time": "8:38:59"}
|
||||||
|
{"current_steps": 15, "total_steps": 3759, "loss": 0.9588, "lr": 1.4893617021276596e-06, "epoch": 0.027932960893854747, "percentage": 0.4, "elapsed_time": "0:02:10", "remaining_time": "9:02:22"}
|
||||||
|
{"current_steps": 20, "total_steps": 3759, "loss": 0.9496, "lr": 2.021276595744681e-06, "epoch": 0.037243947858473, "percentage": 0.53, "elapsed_time": "0:02:55", "remaining_time": "9:07:45"}
|
||||||
|
{"current_steps": 25, "total_steps": 3759, "loss": 0.9418, "lr": 2.553191489361702e-06, "epoch": 0.04655493482309125, "percentage": 0.67, "elapsed_time": "0:03:36", "remaining_time": "8:59:47"}
|
||||||
|
{"current_steps": 30, "total_steps": 3759, "loss": 0.8619, "lr": 3.0851063829787237e-06, "epoch": 0.055865921787709494, "percentage": 0.8, "elapsed_time": "0:04:25", "remaining_time": "9:09:43"}
|
||||||
|
{"current_steps": 35, "total_steps": 3759, "loss": 0.8767, "lr": 3.6170212765957453e-06, "epoch": 0.06517690875232775, "percentage": 0.93, "elapsed_time": "0:05:04", "remaining_time": "9:00:09"}
|
||||||
|
{"current_steps": 40, "total_steps": 3759, "loss": 0.8136, "lr": 4.148936170212766e-06, "epoch": 0.074487895716946, "percentage": 1.06, "elapsed_time": "0:05:41", "remaining_time": "8:49:19"}
|
||||||
|
{"current_steps": 5, "total_steps": 3759, "loss": 0.9813, "lr": 4.2553191489361704e-07, "epoch": 0.00931098696461825, "percentage": 0.13, "elapsed_time": "0:00:42", "remaining_time": "8:54:03"}
|
||||||
|
{"current_steps": 10, "total_steps": 3759, "loss": 1.0316, "lr": 9.574468085106384e-07, "epoch": 0.0186219739292365, "percentage": 0.27, "elapsed_time": "0:01:22", "remaining_time": "8:38:20"}
|
||||||
|
{"current_steps": 15, "total_steps": 3759, "loss": 0.9587, "lr": 1.4893617021276596e-06, "epoch": 0.027932960893854747, "percentage": 0.4, "elapsed_time": "0:02:10", "remaining_time": "9:03:17"}
|
||||||
|
{"current_steps": 20, "total_steps": 3759, "loss": 0.9496, "lr": 2.021276595744681e-06, "epoch": 0.037243947858473, "percentage": 0.53, "elapsed_time": "0:02:54", "remaining_time": "9:04:43"}
|
||||||
|
{"current_steps": 25, "total_steps": 3759, "loss": 0.942, "lr": 2.553191489361702e-06, "epoch": 0.04655493482309125, "percentage": 0.67, "elapsed_time": "0:03:35", "remaining_time": "8:56:49"}
|
||||||
|
{"current_steps": 30, "total_steps": 3759, "loss": 0.8621, "lr": 3.0851063829787237e-06, "epoch": 0.055865921787709494, "percentage": 0.8, "elapsed_time": "0:04:23", "remaining_time": "9:05:14"}
|
||||||
|
{"current_steps": 35, "total_steps": 3759, "loss": 0.8771, "lr": 3.6170212765957453e-06, "epoch": 0.06517690875232775, "percentage": 0.93, "elapsed_time": "0:05:02", "remaining_time": "8:55:51"}
|
||||||
|
{"current_steps": 40, "total_steps": 3759, "loss": 0.8139, "lr": 4.148936170212766e-06, "epoch": 0.074487895716946, "percentage": 1.06, "elapsed_time": "0:05:38", "remaining_time": "8:45:15"}
|
||||||
|
{"current_steps": 45, "total_steps": 3759, "loss": 0.8371, "lr": 4.680851063829788e-06, "epoch": 0.08379888268156424, "percentage": 1.2, "elapsed_time": "0:06:28", "remaining_time": "8:53:46"}
|
||||||
|
{"current_steps": 50, "total_steps": 3759, "loss": 0.8077, "lr": 5.212765957446809e-06, "epoch": 0.0931098696461825, "percentage": 1.33, "elapsed_time": "0:07:05", "remaining_time": "8:46:21"}
|
||||||
|
{"current_steps": 55, "total_steps": 3759, "loss": 0.8016, "lr": 5.744680851063831e-06, "epoch": 0.10242085661080075, "percentage": 1.46, "elapsed_time": "0:07:50", "remaining_time": "8:48:15"}
|
||||||
|
{"current_steps": 60, "total_steps": 3759, "loss": 0.7604, "lr": 6.276595744680851e-06, "epoch": 0.11173184357541899, "percentage": 1.6, "elapsed_time": "0:08:26", "remaining_time": "8:39:58"}
|
||||||
|
{"current_steps": 65, "total_steps": 3759, "loss": 0.7112, "lr": 6.808510638297873e-06, "epoch": 0.12104283054003724, "percentage": 1.73, "elapsed_time": "0:09:03", "remaining_time": "8:34:44"}
|
||||||
|
{"current_steps": 70, "total_steps": 3759, "loss": 0.7253, "lr": 7.340425531914894e-06, "epoch": 0.1303538175046555, "percentage": 1.86, "elapsed_time": "0:09:43", "remaining_time": "8:32:53"}
|
||||||
|
{"current_steps": 75, "total_steps": 3759, "loss": 0.7503, "lr": 7.872340425531916e-06, "epoch": 0.13966480446927373, "percentage": 2.0, "elapsed_time": "0:10:21", "remaining_time": "8:28:58"}
|
||||||
|
{"current_steps": 80, "total_steps": 3759, "loss": 0.7576, "lr": 8.404255319148937e-06, "epoch": 0.148975791433892, "percentage": 2.13, "elapsed_time": "0:11:07", "remaining_time": "8:31:20"}
|
||||||
|
{"current_steps": 85, "total_steps": 3759, "loss": 0.7299, "lr": 8.936170212765958e-06, "epoch": 0.15828677839851024, "percentage": 2.26, "elapsed_time": "0:11:48", "remaining_time": "8:30:13"}
|
||||||
|
{"current_steps": 90, "total_steps": 3759, "loss": 0.7243, "lr": 9.46808510638298e-06, "epoch": 0.16759776536312848, "percentage": 2.39, "elapsed_time": "0:12:20", "remaining_time": "8:23:26"}
|
||||||
|
{"current_steps": 95, "total_steps": 3759, "loss": 0.7048, "lr": 1e-05, "epoch": 0.17690875232774675, "percentage": 2.53, "elapsed_time": "0:12:58", "remaining_time": "8:20:07"}
|
||||||
|
{"current_steps": 100, "total_steps": 3759, "loss": 0.7074, "lr": 1.0531914893617022e-05, "epoch": 0.186219739292365, "percentage": 2.66, "elapsed_time": "0:13:37", "remaining_time": "8:18:31"}
|
||||||
|
{"current_steps": 105, "total_steps": 3759, "loss": 0.6751, "lr": 1.1063829787234044e-05, "epoch": 0.19553072625698323, "percentage": 2.79, "elapsed_time": "0:14:19", "remaining_time": "8:18:32"}
|
||||||
|
{"current_steps": 110, "total_steps": 3759, "loss": 0.6519, "lr": 1.1595744680851065e-05, "epoch": 0.2048417132216015, "percentage": 2.93, "elapsed_time": "0:15:07", "remaining_time": "8:21:47"}
|
||||||
|
{"current_steps": 115, "total_steps": 3759, "loss": 0.6588, "lr": 1.2127659574468087e-05, "epoch": 0.21415270018621974, "percentage": 3.06, "elapsed_time": "0:15:53", "remaining_time": "8:23:20"}
|
||||||
|
{"current_steps": 120, "total_steps": 3759, "loss": 0.6667, "lr": 1.2659574468085108e-05, "epoch": 0.22346368715083798, "percentage": 3.19, "elapsed_time": "0:16:26", "remaining_time": "8:18:24"}
|
||||||
|
{"current_steps": 125, "total_steps": 3759, "loss": 0.693, "lr": 1.3191489361702127e-05, "epoch": 0.23277467411545624, "percentage": 3.33, "elapsed_time": "0:17:06", "remaining_time": "8:17:15"}
|
||||||
|
{"current_steps": 130, "total_steps": 3759, "loss": 0.6676, "lr": 1.372340425531915e-05, "epoch": 0.24208566108007448, "percentage": 3.46, "elapsed_time": "0:17:52", "remaining_time": "8:18:45"}
|
||||||
|
{"current_steps": 135, "total_steps": 3759, "loss": 0.6199, "lr": 1.425531914893617e-05, "epoch": 0.25139664804469275, "percentage": 3.59, "elapsed_time": "0:18:32", "remaining_time": "8:17:37"}
|
||||||
|
{"current_steps": 140, "total_steps": 3759, "loss": 0.6262, "lr": 1.4787234042553193e-05, "epoch": 0.260707635009311, "percentage": 3.72, "elapsed_time": "0:19:06", "remaining_time": "8:13:52"}
|
||||||
|
{"current_steps": 145, "total_steps": 3759, "loss": 0.6294, "lr": 1.5319148936170214e-05, "epoch": 0.27001862197392923, "percentage": 3.86, "elapsed_time": "0:19:44", "remaining_time": "8:12:10"}
|
||||||
|
{"current_steps": 150, "total_steps": 3759, "loss": 0.6175, "lr": 1.5851063829787235e-05, "epoch": 0.27932960893854747, "percentage": 3.99, "elapsed_time": "0:20:30", "remaining_time": "8:13:27"}
|
||||||
|
{"current_steps": 155, "total_steps": 3759, "loss": 0.6431, "lr": 1.6382978723404255e-05, "epoch": 0.2886405959031657, "percentage": 4.12, "elapsed_time": "0:21:13", "remaining_time": "8:13:34"}
|
||||||
|
{"current_steps": 160, "total_steps": 3759, "loss": 0.5826, "lr": 1.6914893617021276e-05, "epoch": 0.297951582867784, "percentage": 4.26, "elapsed_time": "0:21:54", "remaining_time": "8:12:42"}
|
||||||
|
{"current_steps": 165, "total_steps": 3759, "loss": 0.5899, "lr": 1.74468085106383e-05, "epoch": 0.30726256983240224, "percentage": 4.39, "elapsed_time": "0:22:30", "remaining_time": "8:10:19"}
|
||||||
|
{"current_steps": 170, "total_steps": 3759, "loss": 0.6397, "lr": 1.797872340425532e-05, "epoch": 0.3165735567970205, "percentage": 4.52, "elapsed_time": "0:23:11", "remaining_time": "8:09:34"}
|
||||||
|
{"current_steps": 175, "total_steps": 3759, "loss": 0.6526, "lr": 1.8510638297872342e-05, "epoch": 0.3258845437616387, "percentage": 4.66, "elapsed_time": "0:23:50", "remaining_time": "8:08:16"}
|
||||||
|
{"current_steps": 180, "total_steps": 3759, "loss": 0.618, "lr": 1.9042553191489363e-05, "epoch": 0.33519553072625696, "percentage": 4.79, "elapsed_time": "0:24:30", "remaining_time": "8:07:23"}
|
||||||
|
{"current_steps": 185, "total_steps": 3759, "loss": 0.6048, "lr": 1.9574468085106384e-05, "epoch": 0.34450651769087526, "percentage": 4.92, "elapsed_time": "0:25:06", "remaining_time": "8:05:01"}
|
||||||
|
{"current_steps": 190, "total_steps": 3759, "loss": 0.6212, "lr": 2.0106382978723404e-05, "epoch": 0.3538175046554935, "percentage": 5.05, "elapsed_time": "0:25:48", "remaining_time": "8:04:46"}
|
||||||
|
{"current_steps": 195, "total_steps": 3759, "loss": 0.6082, "lr": 2.063829787234043e-05, "epoch": 0.36312849162011174, "percentage": 5.19, "elapsed_time": "0:26:31", "remaining_time": "8:04:55"}
|
||||||
|
{"current_steps": 200, "total_steps": 3759, "loss": 0.5872, "lr": 2.117021276595745e-05, "epoch": 0.37243947858473, "percentage": 5.32, "elapsed_time": "0:27:13", "remaining_time": "8:04:21"}
|
||||||
|
{"current_steps": 205, "total_steps": 3759, "loss": 0.627, "lr": 2.1702127659574467e-05, "epoch": 0.3817504655493482, "percentage": 5.45, "elapsed_time": "0:28:04", "remaining_time": "8:06:38"}
|
||||||
|
{"current_steps": 210, "total_steps": 3759, "loss": 0.6074, "lr": 2.223404255319149e-05, "epoch": 0.39106145251396646, "percentage": 5.59, "elapsed_time": "0:28:46", "remaining_time": "8:06:20"}
|
||||||
|
{"current_steps": 215, "total_steps": 3759, "loss": 0.594, "lr": 2.2765957446808512e-05, "epoch": 0.40037243947858475, "percentage": 5.72, "elapsed_time": "0:29:32", "remaining_time": "8:06:55"}
|
||||||
|
{"current_steps": 220, "total_steps": 3759, "loss": 0.6555, "lr": 2.3297872340425536e-05, "epoch": 0.409683426443203, "percentage": 5.85, "elapsed_time": "0:30:22", "remaining_time": "8:08:44"}
|
||||||
|
{"current_steps": 225, "total_steps": 3759, "loss": 0.6005, "lr": 2.3829787234042553e-05, "epoch": 0.41899441340782123, "percentage": 5.99, "elapsed_time": "0:31:09", "remaining_time": "8:09:29"}
|
||||||
|
{"current_steps": 230, "total_steps": 3759, "loss": 0.6157, "lr": 2.4361702127659578e-05, "epoch": 0.42830540037243947, "percentage": 6.12, "elapsed_time": "0:31:53", "remaining_time": "8:09:24"}
|
||||||
|
{"current_steps": 235, "total_steps": 3759, "loss": 0.5807, "lr": 2.48936170212766e-05, "epoch": 0.4376163873370577, "percentage": 6.25, "elapsed_time": "0:32:37", "remaining_time": "8:09:11"}
|
||||||
|
{"current_steps": 240, "total_steps": 3759, "loss": 0.5824, "lr": 2.5425531914893616e-05, "epoch": 0.44692737430167595, "percentage": 6.38, "elapsed_time": "0:33:18", "remaining_time": "8:08:16"}
|
||||||
|
{"current_steps": 245, "total_steps": 3759, "loss": 0.5676, "lr": 2.595744680851064e-05, "epoch": 0.45623836126629425, "percentage": 6.52, "elapsed_time": "0:33:55", "remaining_time": "8:06:41"}
|
||||||
|
{"current_steps": 250, "total_steps": 3759, "loss": 0.6113, "lr": 2.6489361702127664e-05, "epoch": 0.4655493482309125, "percentage": 6.65, "elapsed_time": "0:34:43", "remaining_time": "8:07:18"}
|
||||||
|
{"current_steps": 255, "total_steps": 3759, "loss": 0.5458, "lr": 2.702127659574468e-05, "epoch": 0.4748603351955307, "percentage": 6.78, "elapsed_time": "0:35:19", "remaining_time": "8:05:23"}
|
||||||
|
{"current_steps": 260, "total_steps": 3759, "loss": 0.591, "lr": 2.7553191489361702e-05, "epoch": 0.48417132216014896, "percentage": 6.92, "elapsed_time": "0:35:57", "remaining_time": "8:03:49"}
|
||||||
|
{"current_steps": 265, "total_steps": 3759, "loss": 0.5708, "lr": 2.8085106382978727e-05, "epoch": 0.4934823091247672, "percentage": 7.05, "elapsed_time": "0:36:39", "remaining_time": "8:03:14"}
|
||||||
|
{"current_steps": 270, "total_steps": 3759, "loss": 0.6334, "lr": 2.8617021276595747e-05, "epoch": 0.5027932960893855, "percentage": 7.18, "elapsed_time": "0:37:21", "remaining_time": "8:02:48"}
|
||||||
|
{"current_steps": 275, "total_steps": 3759, "loss": 0.6072, "lr": 2.9148936170212768e-05, "epoch": 0.5121042830540037, "percentage": 7.32, "elapsed_time": "0:38:08", "remaining_time": "8:03:15"}
|
||||||
|
{"current_steps": 280, "total_steps": 3759, "loss": 0.5288, "lr": 2.968085106382979e-05, "epoch": 0.521415270018622, "percentage": 7.45, "elapsed_time": "0:38:42", "remaining_time": "8:01:02"}
|
||||||
|
{"current_steps": 285, "total_steps": 3759, "loss": 0.5789, "lr": 3.0212765957446813e-05, "epoch": 0.5307262569832403, "percentage": 7.58, "elapsed_time": "0:39:22", "remaining_time": "7:59:54"}
|
||||||
|
{"current_steps": 290, "total_steps": 3759, "loss": 0.5872, "lr": 3.074468085106383e-05, "epoch": 0.5400372439478585, "percentage": 7.71, "elapsed_time": "0:40:05", "remaining_time": "7:59:33"}
|
||||||
|
{"current_steps": 295, "total_steps": 3759, "loss": 0.611, "lr": 3.127659574468085e-05, "epoch": 0.5493482309124768, "percentage": 7.85, "elapsed_time": "0:40:50", "remaining_time": "7:59:32"}
|
||||||
|
{"current_steps": 300, "total_steps": 3759, "loss": 0.58, "lr": 3.180851063829788e-05, "epoch": 0.5586592178770949, "percentage": 7.98, "elapsed_time": "0:41:34", "remaining_time": "7:59:23"}
|
||||||
|
{"current_steps": 305, "total_steps": 3759, "loss": 0.5632, "lr": 3.234042553191489e-05, "epoch": 0.5679702048417132, "percentage": 8.11, "elapsed_time": "0:42:17", "remaining_time": "7:58:52"}
|
||||||
|
{"current_steps": 310, "total_steps": 3759, "loss": 0.5557, "lr": 3.2872340425531914e-05, "epoch": 0.5772811918063314, "percentage": 8.25, "elapsed_time": "0:42:57", "remaining_time": "7:57:59"}
|
||||||
|
{"current_steps": 315, "total_steps": 3759, "loss": 0.5483, "lr": 3.340425531914894e-05, "epoch": 0.5865921787709497, "percentage": 8.38, "elapsed_time": "0:43:41", "remaining_time": "7:57:40"}
|
||||||
|
{"current_steps": 320, "total_steps": 3759, "loss": 0.5459, "lr": 3.393617021276596e-05, "epoch": 0.595903165735568, "percentage": 8.51, "elapsed_time": "0:44:21", "remaining_time": "7:56:44"}
|
||||||
|
{"current_steps": 325, "total_steps": 3759, "loss": 0.5841, "lr": 3.446808510638298e-05, "epoch": 0.6052141527001862, "percentage": 8.65, "elapsed_time": "0:45:03", "remaining_time": "7:56:07"}
|
||||||
|
{"current_steps": 330, "total_steps": 3759, "loss": 0.5441, "lr": 3.5000000000000004e-05, "epoch": 0.6145251396648045, "percentage": 8.78, "elapsed_time": "0:45:39", "remaining_time": "7:54:30"}
|
||||||
|
{"current_steps": 335, "total_steps": 3759, "loss": 0.5694, "lr": 3.5531914893617025e-05, "epoch": 0.6238361266294227, "percentage": 8.91, "elapsed_time": "0:46:11", "remaining_time": "7:52:09"}
|
||||||
|
{"current_steps": 340, "total_steps": 3759, "loss": 0.5563, "lr": 3.6063829787234045e-05, "epoch": 0.633147113594041, "percentage": 9.04, "elapsed_time": "0:46:50", "remaining_time": "7:50:58"}
|
||||||
|
{"current_steps": 345, "total_steps": 3759, "loss": 0.5381, "lr": 3.6595744680851066e-05, "epoch": 0.6424581005586593, "percentage": 9.18, "elapsed_time": "0:47:19", "remaining_time": "7:48:20"}
|
||||||
|
{"current_steps": 350, "total_steps": 3759, "loss": 0.5863, "lr": 3.712765957446809e-05, "epoch": 0.6517690875232774, "percentage": 9.31, "elapsed_time": "0:47:53", "remaining_time": "7:46:28"}
|
||||||
|
{"current_steps": 355, "total_steps": 3759, "loss": 0.5583, "lr": 3.7659574468085114e-05, "epoch": 0.6610800744878957, "percentage": 9.44, "elapsed_time": "0:48:34", "remaining_time": "7:45:42"}
|
||||||
|
{"current_steps": 360, "total_steps": 3759, "loss": 0.5656, "lr": 3.819148936170213e-05, "epoch": 0.6703910614525139, "percentage": 9.58, "elapsed_time": "0:49:17", "remaining_time": "7:45:28"}
|
||||||
|
{"current_steps": 365, "total_steps": 3759, "loss": 0.585, "lr": 3.872340425531915e-05, "epoch": 0.6797020484171322, "percentage": 9.71, "elapsed_time": "0:50:00", "remaining_time": "7:45:03"}
|
||||||
|
{"current_steps": 370, "total_steps": 3759, "loss": 0.5435, "lr": 3.925531914893618e-05, "epoch": 0.6890130353817505, "percentage": 9.84, "elapsed_time": "0:50:43", "remaining_time": "7:44:36"}
|
||||||
|
{"current_steps": 375, "total_steps": 3759, "loss": 0.5977, "lr": 3.978723404255319e-05, "epoch": 0.6983240223463687, "percentage": 9.98, "elapsed_time": "0:51:20", "remaining_time": "7:43:16"}
|
||||||
|
{"current_steps": 380, "total_steps": 3759, "loss": 0.5721, "lr": 3.999992238637315e-05, "epoch": 0.707635009310987, "percentage": 10.11, "elapsed_time": "0:51:59", "remaining_time": "7:42:18"}
|
||||||
|
{"current_steps": 385, "total_steps": 3759, "loss": 0.5348, "lr": 3.9999448083057144e-05, "epoch": 0.7169459962756052, "percentage": 10.24, "elapsed_time": "0:52:38", "remaining_time": "7:41:16"}
|
||||||
|
{"current_steps": 390, "total_steps": 3759, "loss": 0.5614, "lr": 3.999854260531999e-05, "epoch": 0.7262569832402235, "percentage": 10.38, "elapsed_time": "0:53:21", "remaining_time": "7:40:57"}
|
||||||
|
{"current_steps": 395, "total_steps": 3759, "loss": 0.5623, "lr": 3.9997205972683174e-05, "epoch": 0.7355679702048417, "percentage": 10.51, "elapsed_time": "0:53:59", "remaining_time": "7:39:52"}
|
||||||
|
{"current_steps": 400, "total_steps": 3759, "loss": 0.5659, "lr": 3.999543821396357e-05, "epoch": 0.74487895716946, "percentage": 10.64, "elapsed_time": "0:54:38", "remaining_time": "7:38:50"}
|
||||||
|
{"current_steps": 405, "total_steps": 3759, "loss": 0.5499, "lr": 3.999323936727285e-05, "epoch": 0.7541899441340782, "percentage": 10.77, "elapsed_time": "0:55:18", "remaining_time": "7:38:02"}
|
||||||
|
{"current_steps": 410, "total_steps": 3759, "loss": 0.5381, "lr": 3.999060948001665e-05, "epoch": 0.7635009310986964, "percentage": 10.91, "elapsed_time": "0:56:00", "remaining_time": "7:37:27"}
|
||||||
|
{"current_steps": 415, "total_steps": 3759, "loss": 0.4909, "lr": 3.998754860889353e-05, "epoch": 0.7728119180633147, "percentage": 11.04, "elapsed_time": "0:56:30", "remaining_time": "7:35:19"}
|
||||||
|
{"current_steps": 420, "total_steps": 3759, "loss": 0.5538, "lr": 3.998405681989378e-05, "epoch": 0.7821229050279329, "percentage": 11.17, "elapsed_time": "0:57:11", "remaining_time": "7:34:43"}
|
||||||
|
{"current_steps": 425, "total_steps": 3759, "loss": 0.5511, "lr": 3.998013418829799e-05, "epoch": 0.7914338919925512, "percentage": 11.31, "elapsed_time": "0:57:56", "remaining_time": "7:34:33"}
|
||||||
|
{"current_steps": 430, "total_steps": 3759, "loss": 0.5548, "lr": 3.997578079867542e-05, "epoch": 0.8007448789571695, "percentage": 11.44, "elapsed_time": "0:58:27", "remaining_time": "7:32:35"}
|
||||||
|
{"current_steps": 435, "total_steps": 3759, "loss": 0.588, "lr": 3.997099674488215e-05, "epoch": 0.8100558659217877, "percentage": 11.57, "elapsed_time": "0:59:14", "remaining_time": "7:32:40"}
|
||||||
|
{"current_steps": 440, "total_steps": 3759, "loss": 0.5565, "lr": 3.9965782130059124e-05, "epoch": 0.819366852886406, "percentage": 11.71, "elapsed_time": "0:59:57", "remaining_time": "7:32:13"}
|
||||||
|
{"current_steps": 445, "total_steps": 3759, "loss": 0.5665, "lr": 3.996013706662987e-05, "epoch": 0.8286778398510242, "percentage": 11.84, "elapsed_time": "1:00:40", "remaining_time": "7:31:52"}
|
||||||
|
{"current_steps": 450, "total_steps": 3759, "loss": 0.5254, "lr": 3.995406167629809e-05, "epoch": 0.8379888268156425, "percentage": 11.97, "elapsed_time": "1:01:13", "remaining_time": "7:30:12"}
|
||||||
|
{"current_steps": 455, "total_steps": 3759, "loss": 0.5436, "lr": 3.994755609004505e-05, "epoch": 0.8472998137802608, "percentage": 12.1, "elapsed_time": "1:01:48", "remaining_time": "7:28:49"}
|
||||||
|
{"current_steps": 460, "total_steps": 3759, "loss": 0.5714, "lr": 3.994062044812673e-05, "epoch": 0.8566108007448789, "percentage": 12.24, "elapsed_time": "1:02:34", "remaining_time": "7:28:47"}
|
||||||
|
{"current_steps": 465, "total_steps": 3759, "loss": 0.5372, "lr": 3.993325490007083e-05, "epoch": 0.8659217877094972, "percentage": 12.37, "elapsed_time": "1:03:18", "remaining_time": "7:28:28"}
|
||||||
|
{"current_steps": 470, "total_steps": 3759, "loss": 0.5827, "lr": 3.992545960467353e-05, "epoch": 0.8752327746741154, "percentage": 12.5, "elapsed_time": "1:04:04", "remaining_time": "7:28:20"}
|
||||||
|
{"current_steps": 475, "total_steps": 3759, "loss": 0.5474, "lr": 3.9917234729996065e-05, "epoch": 0.8845437616387337, "percentage": 12.64, "elapsed_time": "1:04:49", "remaining_time": "7:28:13"}
|
||||||
|
{"current_steps": 480, "total_steps": 3759, "loss": 0.5535, "lr": 3.990858045336111e-05, "epoch": 0.8938547486033519, "percentage": 12.77, "elapsed_time": "1:05:34", "remaining_time": "7:27:57"}
|
||||||
|
{"current_steps": 485, "total_steps": 3759, "loss": 0.5618, "lr": 3.989949696134894e-05, "epoch": 0.9031657355679702, "percentage": 12.9, "elapsed_time": "1:06:21", "remaining_time": "7:27:57"}
|
||||||
|
{"current_steps": 490, "total_steps": 3759, "loss": 0.5599, "lr": 3.988998444979345e-05, "epoch": 0.9124767225325885, "percentage": 13.04, "elapsed_time": "1:07:02", "remaining_time": "7:27:15"}
|
||||||
|
{"current_steps": 495, "total_steps": 3759, "loss": 0.5145, "lr": 3.988004312377786e-05, "epoch": 0.9217877094972067, "percentage": 13.17, "elapsed_time": "1:07:38", "remaining_time": "7:26:01"}
|
||||||
|
{"current_steps": 500, "total_steps": 3759, "loss": 0.54, "lr": 3.986967319763038e-05, "epoch": 0.931098696461825, "percentage": 13.3, "elapsed_time": "1:08:20", "remaining_time": "7:25:28"}
|
||||||
|
{"current_steps": 505, "total_steps": 3759, "loss": 0.5633, "lr": 3.9858874894919516e-05, "epoch": 0.9404096834264432, "percentage": 13.43, "elapsed_time": "1:08:59", "remaining_time": "7:24:33"}
|
||||||
|
{"current_steps": 510, "total_steps": 3759, "loss": 0.5637, "lr": 3.9847648448449274e-05, "epoch": 0.9497206703910615, "percentage": 13.57, "elapsed_time": "1:09:40", "remaining_time": "7:23:53"}
|
||||||
|
{"current_steps": 515, "total_steps": 3759, "loss": 0.4954, "lr": 3.983599410025418e-05, "epoch": 0.9590316573556797, "percentage": 13.7, "elapsed_time": "1:10:19", "remaining_time": "7:22:58"}
|
||||||
|
{"current_steps": 520, "total_steps": 3759, "loss": 0.5519, "lr": 3.9823912101594e-05, "epoch": 0.9683426443202979, "percentage": 13.83, "elapsed_time": "1:11:00", "remaining_time": "7:22:15"}
|
||||||
|
{"current_steps": 525, "total_steps": 3759, "loss": 0.5757, "lr": 3.981140271294837e-05, "epoch": 0.9776536312849162, "percentage": 13.97, "elapsed_time": "1:11:40", "remaining_time": "7:21:33"}
|
||||||
|
{"current_steps": 530, "total_steps": 3759, "loss": 0.5513, "lr": 3.979846620401115e-05, "epoch": 0.9869646182495344, "percentage": 14.1, "elapsed_time": "1:12:17", "remaining_time": "7:20:24"}
|
||||||
|
{"current_steps": 535, "total_steps": 3759, "loss": 0.4965, "lr": 3.9785102853684614e-05, "epoch": 0.9962756052141527, "percentage": 14.23, "elapsed_time": "1:12:54", "remaining_time": "7:19:19"}
|
||||||
|
{"current_steps": 540, "total_steps": 3759, "loss": 0.5319, "lr": 3.9771312950073464e-05, "epoch": 1.005586592178771, "percentage": 14.37, "elapsed_time": "1:13:37", "remaining_time": "7:18:52"}
|
||||||
|
{"current_steps": 545, "total_steps": 3759, "loss": 0.4817, "lr": 3.9757096790478585e-05, "epoch": 1.0148975791433892, "percentage": 14.5, "elapsed_time": "1:14:13", "remaining_time": "7:17:45"}
|
||||||
|
{"current_steps": 550, "total_steps": 3759, "loss": 0.4911, "lr": 3.974245468139066e-05, "epoch": 1.0242085661080074, "percentage": 14.63, "elapsed_time": "1:14:51", "remaining_time": "7:16:47"}
|
||||||
|
{"current_steps": 555, "total_steps": 3759, "loss": 0.5372, "lr": 3.972738693848354e-05, "epoch": 1.0335195530726258, "percentage": 14.76, "elapsed_time": "1:15:32", "remaining_time": "7:16:08"}
|
||||||
|
{"current_steps": 560, "total_steps": 3759, "loss": 0.5455, "lr": 3.971189388660747e-05, "epoch": 1.042830540037244, "percentage": 14.9, "elapsed_time": "1:16:12", "remaining_time": "7:15:19"}
|
||||||
|
{"current_steps": 565, "total_steps": 3759, "loss": 0.5441, "lr": 3.9695975859782025e-05, "epoch": 1.0521415270018621, "percentage": 15.03, "elapsed_time": "1:16:51", "remaining_time": "7:14:26"}
|
||||||
|
{"current_steps": 570, "total_steps": 3759, "loss": 0.5447, "lr": 3.9679633201188996e-05, "epoch": 1.0614525139664805, "percentage": 15.16, "elapsed_time": "1:17:36", "remaining_time": "7:14:14"}
|
||||||
|
{"current_steps": 575, "total_steps": 3759, "loss": 0.514, "lr": 3.966286626316491e-05, "epoch": 1.0707635009310987, "percentage": 15.3, "elapsed_time": "1:18:19", "remaining_time": "7:13:45"}
|
||||||
|
{"current_steps": 580, "total_steps": 3759, "loss": 0.5242, "lr": 3.96456754071935e-05, "epoch": 1.080074487895717, "percentage": 15.43, "elapsed_time": "1:18:58", "remaining_time": "7:12:53"}
|
||||||
|
{"current_steps": 585, "total_steps": 3759, "loss": 0.5655, "lr": 3.962806100389785e-05, "epoch": 1.089385474860335, "percentage": 15.56, "elapsed_time": "1:19:43", "remaining_time": "7:12:31"}
|
||||||
|
{"current_steps": 590, "total_steps": 3759, "loss": 0.5168, "lr": 3.961002343303245e-05, "epoch": 1.0986964618249535, "percentage": 15.7, "elapsed_time": "1:20:24", "remaining_time": "7:11:55"}
|
||||||
|
{"current_steps": 595, "total_steps": 3759, "loss": 0.5202, "lr": 3.9591563083474996e-05, "epoch": 1.1080074487895717, "percentage": 15.83, "elapsed_time": "1:21:07", "remaining_time": "7:11:21"}
|
||||||
|
{"current_steps": 600, "total_steps": 3759, "loss": 0.5142, "lr": 3.9572680353217984e-05, "epoch": 1.1173184357541899, "percentage": 15.96, "elapsed_time": "1:21:43", "remaining_time": "7:10:17"}
|
||||||
|
{"current_steps": 605, "total_steps": 3759, "loss": 0.5351, "lr": 3.9553375649360175e-05, "epoch": 1.1266294227188083, "percentage": 16.09, "elapsed_time": "1:22:22", "remaining_time": "7:09:24"}
|
||||||
|
{"current_steps": 610, "total_steps": 3759, "loss": 0.5238, "lr": 3.953364938809777e-05, "epoch": 1.1359404096834265, "percentage": 16.23, "elapsed_time": "1:22:51", "remaining_time": "7:07:43"}
|
||||||
|
{"current_steps": 615, "total_steps": 3759, "loss": 0.5183, "lr": 3.9513501994715476e-05, "epoch": 1.1452513966480447, "percentage": 16.36, "elapsed_time": "1:23:38", "remaining_time": "7:07:34"}
|
||||||
|
{"current_steps": 620, "total_steps": 3759, "loss": 0.486, "lr": 3.9492933903577316e-05, "epoch": 1.1545623836126628, "percentage": 16.49, "elapsed_time": "1:24:24", "remaining_time": "7:07:22"}
|
||||||
|
{"current_steps": 625, "total_steps": 3759, "loss": 0.5252, "lr": 3.947194555811725e-05, "epoch": 1.1638733705772812, "percentage": 16.63, "elapsed_time": "1:25:08", "remaining_time": "7:06:53"}
|
||||||
|
{"current_steps": 630, "total_steps": 3759, "loss": 0.4988, "lr": 3.9450537410829674e-05, "epoch": 1.1731843575418994, "percentage": 16.76, "elapsed_time": "1:25:48", "remaining_time": "7:06:11"}
|
||||||
|
{"current_steps": 635, "total_steps": 3759, "loss": 0.5237, "lr": 3.9428709923259586e-05, "epoch": 1.1824953445065176, "percentage": 16.89, "elapsed_time": "1:26:29", "remaining_time": "7:05:29"}
|
||||||
|
{"current_steps": 640, "total_steps": 3759, "loss": 0.5525, "lr": 3.940646356599269e-05, "epoch": 1.191806331471136, "percentage": 17.03, "elapsed_time": "1:27:16", "remaining_time": "7:05:18"}
|
||||||
|
{"current_steps": 645, "total_steps": 3759, "loss": 0.5097, "lr": 3.9383798818645236e-05, "epoch": 1.2011173184357542, "percentage": 17.16, "elapsed_time": "1:28:00", "remaining_time": "7:04:53"}
|
||||||
|
{"current_steps": 650, "total_steps": 3759, "loss": 0.521, "lr": 3.936071616985366e-05, "epoch": 1.2104283054003724, "percentage": 17.29, "elapsed_time": "1:28:40", "remaining_time": "7:04:10"}
|
||||||
|
{"current_steps": 655, "total_steps": 3759, "loss": 0.5503, "lr": 3.9337216117264106e-05, "epoch": 1.2197392923649906, "percentage": 17.42, "elapsed_time": "1:29:21", "remaining_time": "7:03:25"}
|
||||||
|
{"current_steps": 660, "total_steps": 3759, "loss": 0.519, "lr": 3.931329916752162e-05, "epoch": 1.229050279329609, "percentage": 17.56, "elapsed_time": "1:30:01", "remaining_time": "7:02:40"}
|
||||||
|
{"current_steps": 665, "total_steps": 3759, "loss": 0.5713, "lr": 3.928896583625927e-05, "epoch": 1.2383612662942272, "percentage": 17.69, "elapsed_time": "1:30:44", "remaining_time": "7:02:10"}
|
||||||
|
{"current_steps": 670, "total_steps": 3759, "loss": 0.5404, "lr": 3.926421664808706e-05, "epoch": 1.2476722532588453, "percentage": 17.82, "elapsed_time": "1:31:25", "remaining_time": "7:01:32"}
|
||||||
|
{"current_steps": 675, "total_steps": 3759, "loss": 0.4888, "lr": 3.9239052136580536e-05, "epoch": 1.2569832402234637, "percentage": 17.96, "elapsed_time": "1:32:03", "remaining_time": "7:00:38"}
|
||||||
|
{"current_steps": 680, "total_steps": 3759, "loss": 0.5207, "lr": 3.921347284426935e-05, "epoch": 1.266294227188082, "percentage": 18.09, "elapsed_time": "1:32:40", "remaining_time": "6:59:39"}
|
||||||
|
{"current_steps": 685, "total_steps": 3759, "loss": 0.542, "lr": 3.918747932262558e-05, "epoch": 1.2756052141527001, "percentage": 18.22, "elapsed_time": "1:33:23", "remaining_time": "6:59:04"}
|
||||||
|
{"current_steps": 690, "total_steps": 3759, "loss": 0.4924, "lr": 3.916107213205174e-05, "epoch": 1.2849162011173183, "percentage": 18.36, "elapsed_time": "1:34:01", "remaining_time": "6:58:13"}
|
||||||
|
{"current_steps": 695, "total_steps": 3759, "loss": 0.5318, "lr": 3.9134251841868806e-05, "epoch": 1.2942271880819367, "percentage": 18.49, "elapsed_time": "1:34:43", "remaining_time": "6:57:34"}
|
||||||
|
{"current_steps": 700, "total_steps": 3759, "loss": 0.4928, "lr": 3.91070190303039e-05, "epoch": 1.303538175046555, "percentage": 18.62, "elapsed_time": "1:35:16", "remaining_time": "6:56:20"}
|
||||||
|
{"current_steps": 705, "total_steps": 3759, "loss": 0.5562, "lr": 3.9079374284477804e-05, "epoch": 1.3128491620111733, "percentage": 18.75, "elapsed_time": "1:35:57", "remaining_time": "6:55:41"}
|
||||||
|
{"current_steps": 710, "total_steps": 3759, "loss": 0.481, "lr": 3.905131820039232e-05, "epoch": 1.3221601489757915, "percentage": 18.89, "elapsed_time": "1:36:34", "remaining_time": "6:54:42"}
|
||||||
|
{"current_steps": 715, "total_steps": 3759, "loss": 0.4993, "lr": 3.902285138291745e-05, "epoch": 1.3314711359404097, "percentage": 19.02, "elapsed_time": "1:37:15", "remaining_time": "6:54:04"}
|
||||||
|
{"current_steps": 720, "total_steps": 3759, "loss": 0.5598, "lr": 3.899397444577829e-05, "epoch": 1.3407821229050279, "percentage": 19.15, "elapsed_time": "1:37:55", "remaining_time": "6:53:17"}
|
||||||
|
{"current_steps": 725, "total_steps": 3759, "loss": 0.533, "lr": 3.8964688011541864e-05, "epoch": 1.3500931098696463, "percentage": 19.29, "elapsed_time": "1:38:40", "remaining_time": "6:52:57"}
|
||||||
|
{"current_steps": 730, "total_steps": 3759, "loss": 0.5203, "lr": 3.893499271160366e-05, "epoch": 1.3594040968342644, "percentage": 19.42, "elapsed_time": "1:39:17", "remaining_time": "6:51:59"}
|
||||||
|
{"current_steps": 735, "total_steps": 3759, "loss": 0.5667, "lr": 3.890488918617403e-05, "epoch": 1.3687150837988826, "percentage": 19.55, "elapsed_time": "1:39:54", "remaining_time": "6:51:03"}
|
||||||
|
{"current_steps": 740, "total_steps": 3759, "loss": 0.5591, "lr": 3.887437808426439e-05, "epoch": 1.378026070763501, "percentage": 19.69, "elapsed_time": "1:40:37", "remaining_time": "6:50:31"}
|
||||||
|
{"current_steps": 745, "total_steps": 3759, "loss": 0.5226, "lr": 3.884346006367324e-05, "epoch": 1.3873370577281192, "percentage": 19.82, "elapsed_time": "1:41:19", "remaining_time": "6:49:56"}
|
||||||
|
{"current_steps": 750, "total_steps": 3759, "loss": 0.516, "lr": 3.8812135790971946e-05, "epoch": 1.3966480446927374, "percentage": 19.95, "elapsed_time": "1:41:57", "remaining_time": "6:49:02"}
|
||||||
|
{"current_steps": 755, "total_steps": 3759, "loss": 0.5275, "lr": 3.87804059414904e-05, "epoch": 1.4059590316573556, "percentage": 20.09, "elapsed_time": "1:42:45", "remaining_time": "6:48:49"}
|
||||||
|
{"current_steps": 760, "total_steps": 3759, "loss": 0.5281, "lr": 3.8748271199302465e-05, "epoch": 1.415270018621974, "percentage": 20.22, "elapsed_time": "1:43:24", "remaining_time": "6:48:04"}
|
||||||
|
{"current_steps": 765, "total_steps": 3759, "loss": 0.5169, "lr": 3.871573225721119e-05, "epoch": 1.4245810055865922, "percentage": 20.35, "elapsed_time": "1:44:08", "remaining_time": "6:47:36"}
|
||||||
|
{"current_steps": 770, "total_steps": 3759, "loss": 0.5277, "lr": 3.868278981673391e-05, "epoch": 1.4338919925512104, "percentage": 20.48, "elapsed_time": "1:44:47", "remaining_time": "6:46:47"}
|
||||||
|
{"current_steps": 775, "total_steps": 3759, "loss": 0.5001, "lr": 3.864944458808712e-05, "epoch": 1.4432029795158288, "percentage": 20.62, "elapsed_time": "1:45:29", "remaining_time": "6:46:09"}
|
||||||
|
{"current_steps": 780, "total_steps": 3759, "loss": 0.524, "lr": 3.8615697290171135e-05, "epoch": 1.452513966480447, "percentage": 20.75, "elapsed_time": "1:46:11", "remaining_time": "6:45:32"}
|
||||||
|
{"current_steps": 785, "total_steps": 3759, "loss": 0.4999, "lr": 3.858154865055461e-05, "epoch": 1.4618249534450651, "percentage": 20.88, "elapsed_time": "1:46:47", "remaining_time": "6:44:34"}
|
||||||
|
{"current_steps": 790, "total_steps": 3759, "loss": 0.5271, "lr": 3.8546999405458876e-05, "epoch": 1.4711359404096833, "percentage": 21.02, "elapsed_time": "1:47:29", "remaining_time": "6:43:57"}
|
||||||
|
{"current_steps": 795, "total_steps": 3759, "loss": 0.5305, "lr": 3.8512050299742005e-05, "epoch": 1.4804469273743017, "percentage": 21.15, "elapsed_time": "1:48:16", "remaining_time": "6:43:39"}
|
||||||
|
{"current_steps": 800, "total_steps": 3759, "loss": 0.4865, "lr": 3.847670208688282e-05, "epoch": 1.48975791433892, "percentage": 21.28, "elapsed_time": "1:48:49", "remaining_time": "6:42:32"}
|
||||||
|
{"current_steps": 805, "total_steps": 3759, "loss": 0.4944, "lr": 3.844095552896461e-05, "epoch": 1.499068901303538, "percentage": 21.42, "elapsed_time": "1:49:23", "remaining_time": "6:41:23"}
|
||||||
|
{"current_steps": 810, "total_steps": 3759, "loss": 0.4986, "lr": 3.840481139665872e-05, "epoch": 1.5083798882681565, "percentage": 21.55, "elapsed_time": "1:50:10", "remaining_time": "6:41:05"}
|
||||||
|
{"current_steps": 815, "total_steps": 3759, "loss": 0.5275, "lr": 3.836827046920791e-05, "epoch": 1.5176908752327747, "percentage": 21.68, "elapsed_time": "1:50:50", "remaining_time": "6:40:22"}
|
||||||
|
{"current_steps": 820, "total_steps": 3759, "loss": 0.5508, "lr": 3.8331333534409594e-05, "epoch": 1.5270018621973929, "percentage": 21.81, "elapsed_time": "1:51:29", "remaining_time": "6:39:37"}
|
||||||
|
{"current_steps": 825, "total_steps": 3759, "loss": 0.4913, "lr": 3.8294001388598807e-05, "epoch": 1.536312849162011, "percentage": 21.95, "elapsed_time": "1:52:08", "remaining_time": "6:38:50"}
|
||||||
|
{"current_steps": 830, "total_steps": 3759, "loss": 0.5056, "lr": 3.825627483663109e-05, "epoch": 1.5456238361266295, "percentage": 22.08, "elapsed_time": "1:52:51", "remaining_time": "6:38:15"}
|
||||||
|
{"current_steps": 835, "total_steps": 3759, "loss": 0.518, "lr": 3.82181546918651e-05, "epoch": 1.5549348230912476, "percentage": 22.21, "elapsed_time": "1:53:27", "remaining_time": "6:37:19"}
|
||||||
|
{"current_steps": 840, "total_steps": 3759, "loss": 0.4808, "lr": 3.8179641776145067e-05, "epoch": 1.564245810055866, "percentage": 22.35, "elapsed_time": "1:54:01", "remaining_time": "6:36:13"}
|
||||||
|
{"current_steps": 845, "total_steps": 3759, "loss": 0.534, "lr": 3.814073691978313e-05, "epoch": 1.5735567970204842, "percentage": 22.48, "elapsed_time": "1:54:48", "remaining_time": "6:35:56"}
|
||||||
|
{"current_steps": 850, "total_steps": 3759, "loss": 0.5452, "lr": 3.810144096154137e-05, "epoch": 1.5828677839851024, "percentage": 22.61, "elapsed_time": "1:55:36", "remaining_time": "6:35:40"}
|
||||||
|
{"current_steps": 855, "total_steps": 3759, "loss": 0.5484, "lr": 3.8061754748613766e-05, "epoch": 1.5921787709497206, "percentage": 22.75, "elapsed_time": "1:56:21", "remaining_time": "6:35:12"}
|
||||||
|
{"current_steps": 860, "total_steps": 3759, "loss": 0.4993, "lr": 3.802167913660794e-05, "epoch": 1.6014897579143388, "percentage": 22.88, "elapsed_time": "1:57:03", "remaining_time": "6:34:36"}
|
||||||
|
{"current_steps": 865, "total_steps": 3759, "loss": 0.5195, "lr": 3.7981214989526664e-05, "epoch": 1.6108007448789572, "percentage": 23.01, "elapsed_time": "1:57:37", "remaining_time": "6:33:32"}
|
||||||
|
{"current_steps": 870, "total_steps": 3759, "loss": 0.5521, "lr": 3.7940363179749274e-05, "epoch": 1.6201117318435754, "percentage": 23.14, "elapsed_time": "1:58:22", "remaining_time": "6:33:05"}
|
||||||
|
{"current_steps": 875, "total_steps": 3759, "loss": 0.528, "lr": 3.7899124588012844e-05, "epoch": 1.6294227188081938, "percentage": 23.28, "elapsed_time": "1:59:06", "remaining_time": "6:32:33"}
|
||||||
|
{"current_steps": 880, "total_steps": 3759, "loss": 0.5077, "lr": 3.785750010339321e-05, "epoch": 1.638733705772812, "percentage": 23.41, "elapsed_time": "1:59:48", "remaining_time": "6:31:57"}
|
||||||
|
{"current_steps": 885, "total_steps": 3759, "loss": 0.4703, "lr": 3.7815490623285776e-05, "epoch": 1.6480446927374302, "percentage": 23.54, "elapsed_time": "2:00:34", "remaining_time": "6:31:32"}
|
||||||
|
{"current_steps": 890, "total_steps": 3759, "loss": 0.531, "lr": 3.7773097053386214e-05, "epoch": 1.6573556797020483, "percentage": 23.68, "elapsed_time": "2:01:21", "remaining_time": "6:31:11"}
|
||||||
|
{"current_steps": 895, "total_steps": 3759, "loss": 0.5352, "lr": 3.7730320307670896e-05, "epoch": 1.6666666666666665, "percentage": 23.81, "elapsed_time": "2:01:57", "remaining_time": "6:30:15"}
|
||||||
|
{"current_steps": 900, "total_steps": 3759, "loss": 0.5203, "lr": 3.7687161308377185e-05, "epoch": 1.675977653631285, "percentage": 23.94, "elapsed_time": "2:02:35", "remaining_time": "6:29:25"}
|
||||||
|
{"current_steps": 905, "total_steps": 3759, "loss": 0.5255, "lr": 3.7643620985983594e-05, "epoch": 1.6852886405959033, "percentage": 24.08, "elapsed_time": "2:03:17", "remaining_time": "6:28:47"}
|
||||||
|
{"current_steps": 910, "total_steps": 3759, "loss": 0.52, "lr": 3.759970027918969e-05, "epoch": 1.6945996275605215, "percentage": 24.21, "elapsed_time": "2:03:54", "remaining_time": "6:27:57"}
|
||||||
|
{"current_steps": 915, "total_steps": 3759, "loss": 0.5322, "lr": 3.7555400134895875e-05, "epoch": 1.7039106145251397, "percentage": 24.34, "elapsed_time": "2:04:37", "remaining_time": "6:27:22"}
|
||||||
|
{"current_steps": 920, "total_steps": 3759, "loss": 0.5002, "lr": 3.751072150818296e-05, "epoch": 1.7132216014897579, "percentage": 24.47, "elapsed_time": "2:05:21", "remaining_time": "6:26:49"}
|
||||||
|
{"current_steps": 925, "total_steps": 3759, "loss": 0.5335, "lr": 3.7465665362291564e-05, "epoch": 1.722532588454376, "percentage": 24.61, "elapsed_time": "2:06:01", "remaining_time": "6:26:05"}
|
||||||
|
{"current_steps": 930, "total_steps": 3759, "loss": 0.5255, "lr": 3.742023266860139e-05, "epoch": 1.7318435754189943, "percentage": 24.74, "elapsed_time": "2:06:46", "remaining_time": "6:25:37"}
|
||||||
|
{"current_steps": 935, "total_steps": 3759, "loss": 0.4669, "lr": 3.737442440661023e-05, "epoch": 1.7411545623836127, "percentage": 24.87, "elapsed_time": "2:07:21", "remaining_time": "6:24:40"}
|
||||||
|
{"current_steps": 940, "total_steps": 3759, "loss": 0.5038, "lr": 3.7328241563912874e-05, "epoch": 1.750465549348231, "percentage": 25.01, "elapsed_time": "2:08:05", "remaining_time": "6:24:07"}
|
||||||
|
{"current_steps": 945, "total_steps": 3759, "loss": 0.5277, "lr": 3.728168513617984e-05, "epoch": 1.7597765363128492, "percentage": 25.14, "elapsed_time": "2:08:48", "remaining_time": "6:23:35"}
|
||||||
|
{"current_steps": 950, "total_steps": 3759, "loss": 0.5447, "lr": 3.723475612713585e-05, "epoch": 1.7690875232774674, "percentage": 25.27, "elapsed_time": "2:09:32", "remaining_time": "6:23:00"}
|
||||||
|
{"current_steps": 955, "total_steps": 3759, "loss": 0.5024, "lr": 3.7187455548538245e-05, "epoch": 1.7783985102420856, "percentage": 25.41, "elapsed_time": "2:10:14", "remaining_time": "6:22:24"}
|
||||||
|
{"current_steps": 960, "total_steps": 3759, "loss": 0.5321, "lr": 3.7139784420155145e-05, "epoch": 1.7877094972067038, "percentage": 25.54, "elapsed_time": "2:10:55", "remaining_time": "6:21:43"}
|
||||||
|
{"current_steps": 965, "total_steps": 3759, "loss": 0.5063, "lr": 3.709174376974347e-05, "epoch": 1.7970204841713222, "percentage": 25.67, "elapsed_time": "2:11:34", "remaining_time": "6:20:55"}
|
||||||
|
{"current_steps": 970, "total_steps": 3759, "loss": 0.4722, "lr": 3.7043334633026785e-05, "epoch": 1.8063314711359404, "percentage": 25.8, "elapsed_time": "2:12:00", "remaining_time": "6:19:34"}
|
||||||
|
{"current_steps": 975, "total_steps": 3759, "loss": 0.4956, "lr": 3.699455805367296e-05, "epoch": 1.8156424581005588, "percentage": 25.94, "elapsed_time": "2:12:41", "remaining_time": "6:18:54"}
|
||||||
|
{"current_steps": 980, "total_steps": 3759, "loss": 0.5106, "lr": 3.69454150832717e-05, "epoch": 1.824953445065177, "percentage": 26.07, "elapsed_time": "2:13:21", "remaining_time": "6:18:10"}
|
||||||
|
{"current_steps": 985, "total_steps": 3759, "loss": 0.5233, "lr": 3.689590678131181e-05, "epoch": 1.8342644320297952, "percentage": 26.2, "elapsed_time": "2:14:04", "remaining_time": "6:17:35"}
|
||||||
|
{"current_steps": 990, "total_steps": 3759, "loss": 0.491, "lr": 3.684603421515844e-05, "epoch": 1.8435754189944134, "percentage": 26.34, "elapsed_time": "2:14:35", "remaining_time": "6:16:26"}
|
||||||
|
{"current_steps": 995, "total_steps": 3759, "loss": 0.5426, "lr": 3.6795798460029996e-05, "epoch": 1.8528864059590315, "percentage": 26.47, "elapsed_time": "2:15:12", "remaining_time": "6:15:34"}
|
||||||
|
{"current_steps": 1000, "total_steps": 3759, "loss": 0.5644, "lr": 3.6745200598974987e-05, "epoch": 1.86219739292365, "percentage": 26.6, "elapsed_time": "2:15:45", "remaining_time": "6:14:33"}
|
||||||
|
{"current_steps": 1005, "total_steps": 3759, "loss": 0.4416, "lr": 3.66942417228487e-05, "epoch": 1.8715083798882681, "percentage": 26.74, "elapsed_time": "2:16:15", "remaining_time": "6:13:22"}
|
||||||
|
{"current_steps": 1010, "total_steps": 3759, "loss": 0.5295, "lr": 3.664292293028964e-05, "epoch": 1.8808193668528865, "percentage": 26.87, "elapsed_time": "2:16:50", "remaining_time": "6:12:27"}
|
||||||
|
{"current_steps": 1015, "total_steps": 3759, "loss": 0.5123, "lr": 3.659124532769587e-05, "epoch": 1.8901303538175047, "percentage": 27.0, "elapsed_time": "2:17:32", "remaining_time": "6:11:50"}
|
||||||
|
{"current_steps": 1020, "total_steps": 3759, "loss": 0.528, "lr": 3.6539210029201174e-05, "epoch": 1.899441340782123, "percentage": 27.13, "elapsed_time": "2:18:10", "remaining_time": "6:11:03"}
|
||||||
|
{"current_steps": 1025, "total_steps": 3759, "loss": 0.5157, "lr": 3.6486818156650985e-05, "epoch": 1.908752327746741, "percentage": 27.27, "elapsed_time": "2:18:51", "remaining_time": "6:10:21"}
|
||||||
|
{"current_steps": 1030, "total_steps": 3759, "loss": 0.4551, "lr": 3.643407083957823e-05, "epoch": 1.9180633147113593, "percentage": 27.4, "elapsed_time": "2:19:21", "remaining_time": "6:09:13"}
|
||||||
|
{"current_steps": 1035, "total_steps": 3759, "loss": 0.5057, "lr": 3.6380969215178994e-05, "epoch": 1.9273743016759777, "percentage": 27.53, "elapsed_time": "2:19:55", "remaining_time": "6:08:16"}
|
||||||
|
{"current_steps": 1040, "total_steps": 3759, "loss": 0.4826, "lr": 3.6327514428287985e-05, "epoch": 1.9366852886405959, "percentage": 27.67, "elapsed_time": "2:20:38", "remaining_time": "6:07:41"}
|
||||||
|
{"current_steps": 1045, "total_steps": 3759, "loss": 0.4985, "lr": 3.627370763135384e-05, "epoch": 1.9459962756052143, "percentage": 27.8, "elapsed_time": "2:21:22", "remaining_time": "6:07:09"}
|
||||||
|
{"current_steps": 1050, "total_steps": 3759, "loss": 0.505, "lr": 3.6219549984414293e-05, "epoch": 1.9553072625698324, "percentage": 27.93, "elapsed_time": "2:22:03", "remaining_time": "6:06:29"}
|
||||||
|
{"current_steps": 1055, "total_steps": 3759, "loss": 0.5158, "lr": 3.616504265507116e-05, "epoch": 1.9646182495344506, "percentage": 28.07, "elapsed_time": "2:22:38", "remaining_time": "6:05:35"}
|
||||||
|
{"current_steps": 1060, "total_steps": 3759, "loss": 0.5118, "lr": 3.61101868184652e-05, "epoch": 1.9739292364990688, "percentage": 28.2, "elapsed_time": "2:23:22", "remaining_time": "6:05:04"}
|
||||||
|
{"current_steps": 1065, "total_steps": 3759, "loss": 0.4935, "lr": 3.605498365725073e-05, "epoch": 1.983240223463687, "percentage": 28.33, "elapsed_time": "2:24:06", "remaining_time": "6:04:32"}
|
||||||
|
{"current_steps": 1070, "total_steps": 3759, "loss": 0.4908, "lr": 3.599943436157012e-05, "epoch": 1.9925512104283054, "percentage": 28.47, "elapsed_time": "2:24:46", "remaining_time": "6:03:48"}
|
||||||
|
{"current_steps": 1075, "total_steps": 3759, "loss": 0.5052, "lr": 3.594354012902821e-05, "epoch": 2.001862197392924, "percentage": 28.6, "elapsed_time": "2:25:33", "remaining_time": "6:03:26"}
|
||||||
|
{"current_steps": 1080, "total_steps": 3759, "loss": 0.5312, "lr": 3.588730216466642e-05, "epoch": 2.011173184357542, "percentage": 28.73, "elapsed_time": "2:26:05", "remaining_time": "6:02:24"}
|
||||||
|
{"current_steps": 1085, "total_steps": 3759, "loss": 0.4923, "lr": 3.58307216809368e-05, "epoch": 2.02048417132216, "percentage": 28.86, "elapsed_time": "2:26:47", "remaining_time": "6:01:47"}
|
||||||
|
{"current_steps": 1090, "total_steps": 3759, "loss": 0.5007, "lr": 3.5773799897675865e-05, "epoch": 2.0297951582867784, "percentage": 29.0, "elapsed_time": "2:27:28", "remaining_time": "6:01:06"}
|
||||||
|
{"current_steps": 1095, "total_steps": 3759, "loss": 0.4998, "lr": 3.5716538042078333e-05, "epoch": 2.0391061452513966, "percentage": 29.13, "elapsed_time": "2:28:03", "remaining_time": "6:00:11"}
|
||||||
|
{"current_steps": 1100, "total_steps": 3759, "loss": 0.5368, "lr": 3.565893734867065e-05, "epoch": 2.0484171322160147, "percentage": 29.26, "elapsed_time": "2:28:46", "remaining_time": "5:59:37"}
|
||||||
|
{"current_steps": 1105, "total_steps": 3759, "loss": 0.4793, "lr": 3.560099905928437e-05, "epoch": 2.0577281191806334, "percentage": 29.4, "elapsed_time": "2:29:33", "remaining_time": "5:59:12"}
|
||||||
|
{"current_steps": 1110, "total_steps": 3759, "loss": 0.4627, "lr": 3.554272442302936e-05, "epoch": 2.0670391061452515, "percentage": 29.53, "elapsed_time": "2:30:13", "remaining_time": "5:58:30"}
|
||||||
|
{"current_steps": 1115, "total_steps": 3759, "loss": 0.4981, "lr": 3.548411469626694e-05, "epoch": 2.0763500931098697, "percentage": 29.66, "elapsed_time": "2:30:58", "remaining_time": "5:58:01"}
|
||||||
|
{"current_steps": 1120, "total_steps": 3759, "loss": 0.4456, "lr": 3.5425171142582725e-05, "epoch": 2.085661080074488, "percentage": 29.8, "elapsed_time": "2:31:34", "remaining_time": "5:57:09"}
|
||||||
|
{"current_steps": 1125, "total_steps": 3759, "loss": 0.5077, "lr": 3.536589503275941e-05, "epoch": 2.094972067039106, "percentage": 29.93, "elapsed_time": "2:32:20", "remaining_time": "5:56:41"}
|
||||||
|
{"current_steps": 1130, "total_steps": 3759, "loss": 0.4871, "lr": 3.53062876447494e-05, "epoch": 2.1042830540037243, "percentage": 30.06, "elapsed_time": "2:32:55", "remaining_time": "5:55:46"}
|
||||||
|
{"current_steps": 1135, "total_steps": 3759, "loss": 0.515, "lr": 3.5246350263647175e-05, "epoch": 2.1135940409683425, "percentage": 30.19, "elapsed_time": "2:33:39", "remaining_time": "5:55:13"}
|
||||||
|
{"current_steps": 1140, "total_steps": 3759, "loss": 0.4862, "lr": 3.518608418166169e-05, "epoch": 2.122905027932961, "percentage": 30.33, "elapsed_time": "2:34:14", "remaining_time": "5:54:22"}
|
||||||
|
{"current_steps": 1145, "total_steps": 3759, "loss": 0.5148, "lr": 3.512549069808846e-05, "epoch": 2.1322160148975793, "percentage": 30.46, "elapsed_time": "2:34:58", "remaining_time": "5:53:48"}
|
||||||
|
{"current_steps": 1150, "total_steps": 3759, "loss": 0.4794, "lr": 3.5064571119281535e-05, "epoch": 2.1415270018621975, "percentage": 30.59, "elapsed_time": "2:35:43", "remaining_time": "5:53:17"}
|
||||||
|
{"current_steps": 1155, "total_steps": 3759, "loss": 0.4826, "lr": 3.500332675862537e-05, "epoch": 2.1508379888268156, "percentage": 30.73, "elapsed_time": "2:36:22", "remaining_time": "5:52:32"}
|
||||||
|
{"current_steps": 1160, "total_steps": 3759, "loss": 0.4772, "lr": 3.4941758936506484e-05, "epoch": 2.160148975791434, "percentage": 30.86, "elapsed_time": "2:37:01", "remaining_time": "5:51:48"}
|
||||||
|
{"current_steps": 1165, "total_steps": 3759, "loss": 0.481, "lr": 3.4879868980285017e-05, "epoch": 2.169459962756052, "percentage": 30.99, "elapsed_time": "2:37:46", "remaining_time": "5:51:18"}
|
||||||
|
{"current_steps": 1170, "total_steps": 3759, "loss": 0.5064, "lr": 3.481765822426609e-05, "epoch": 2.17877094972067, "percentage": 31.13, "elapsed_time": "2:38:23", "remaining_time": "5:50:28"}
|
||||||
|
{"current_steps": 1175, "total_steps": 3759, "loss": 0.4863, "lr": 3.4755128009671055e-05, "epoch": 2.188081936685289, "percentage": 31.26, "elapsed_time": "2:38:55", "remaining_time": "5:49:29"}
|
||||||
|
{"current_steps": 1180, "total_steps": 3759, "loss": 0.5043, "lr": 3.4692279684608554e-05, "epoch": 2.197392923649907, "percentage": 31.39, "elapsed_time": "2:39:35", "remaining_time": "5:48:48"}
|
||||||
|
{"current_steps": 1185, "total_steps": 3759, "loss": 0.4912, "lr": 3.46291146040455e-05, "epoch": 2.206703910614525, "percentage": 31.52, "elapsed_time": "2:40:13", "remaining_time": "5:48:01"}
|
||||||
|
{"current_steps": 1190, "total_steps": 3759, "loss": 0.4625, "lr": 3.456563412977783e-05, "epoch": 2.2160148975791434, "percentage": 31.66, "elapsed_time": "2:40:55", "remaining_time": "5:47:24"}
|
||||||
|
{"current_steps": 1195, "total_steps": 3759, "loss": 0.4865, "lr": 3.4501839630401136e-05, "epoch": 2.2253258845437616, "percentage": 31.79, "elapsed_time": "2:41:40", "remaining_time": "5:46:52"}
|
||||||
|
{"current_steps": 1200, "total_steps": 3759, "loss": 0.4876, "lr": 3.443773248128119e-05, "epoch": 2.2346368715083798, "percentage": 31.92, "elapsed_time": "2:42:20", "remaining_time": "5:46:10"}
|
||||||
|
{"current_steps": 1205, "total_steps": 3759, "loss": 0.4958, "lr": 3.437331406452429e-05, "epoch": 2.243947858472998, "percentage": 32.06, "elapsed_time": "2:42:53", "remaining_time": "5:45:15"}
|
||||||
|
{"current_steps": 1210, "total_steps": 3759, "loss": 0.5328, "lr": 3.4308585768947424e-05, "epoch": 2.2532588454376166, "percentage": 32.19, "elapsed_time": "2:43:41", "remaining_time": "5:44:50"}
|
||||||
|
{"current_steps": 1215, "total_steps": 3759, "loss": 0.4657, "lr": 3.424354899004839e-05, "epoch": 2.2625698324022347, "percentage": 32.32, "elapsed_time": "2:44:22", "remaining_time": "5:44:09"}
|
||||||
|
{"current_steps": 1220, "total_steps": 3759, "loss": 0.4668, "lr": 3.417820512997564e-05, "epoch": 2.271880819366853, "percentage": 32.46, "elapsed_time": "2:45:02", "remaining_time": "5:43:28"}
|
||||||
|
{"current_steps": 1225, "total_steps": 3759, "loss": 0.4829, "lr": 3.411255559749811e-05, "epoch": 2.281191806331471, "percentage": 32.59, "elapsed_time": "2:45:39", "remaining_time": "5:42:40"}
|
||||||
|
{"current_steps": 1230, "total_steps": 3759, "loss": 0.5087, "lr": 3.404660180797481e-05, "epoch": 2.2905027932960893, "percentage": 32.72, "elapsed_time": "2:46:19", "remaining_time": "5:41:59"}
|
||||||
|
{"current_steps": 1235, "total_steps": 3759, "loss": 0.4552, "lr": 3.3980345183324344e-05, "epoch": 2.2998137802607075, "percentage": 32.85, "elapsed_time": "2:47:06", "remaining_time": "5:41:31"}
|
||||||
|
{"current_steps": 1240, "total_steps": 3759, "loss": 0.4925, "lr": 3.391378715199419e-05, "epoch": 2.3091247672253257, "percentage": 32.99, "elapsed_time": "2:47:47", "remaining_time": "5:40:50"}
|
||||||
|
{"current_steps": 1245, "total_steps": 3759, "loss": 0.4802, "lr": 3.384692914893002e-05, "epoch": 2.3184357541899443, "percentage": 33.12, "elapsed_time": "2:48:29", "remaining_time": "5:40:13"}
|
||||||
|
{"current_steps": 1250, "total_steps": 3759, "loss": 0.4934, "lr": 3.377977261554462e-05, "epoch": 2.3277467411545625, "percentage": 33.25, "elapsed_time": "2:49:09", "remaining_time": "5:39:31"}
|
||||||
|
{"current_steps": 1255, "total_steps": 3759, "loss": 0.512, "lr": 3.3712318999686914e-05, "epoch": 2.3370577281191807, "percentage": 33.39, "elapsed_time": "2:49:38", "remaining_time": "5:38:28"}
|
||||||
|
{"current_steps": 1260, "total_steps": 3759, "loss": 0.4856, "lr": 3.3644569755610745e-05, "epoch": 2.346368715083799, "percentage": 33.52, "elapsed_time": "2:50:19", "remaining_time": "5:37:49"}
|
||||||
|
{"current_steps": 1265, "total_steps": 3759, "loss": 0.4495, "lr": 3.357652634394345e-05, "epoch": 2.355679702048417, "percentage": 33.65, "elapsed_time": "2:50:58", "remaining_time": "5:37:04"}
|
||||||
|
{"current_steps": 1270, "total_steps": 3759, "loss": 0.4782, "lr": 3.350819023165446e-05, "epoch": 2.364990689013035, "percentage": 33.79, "elapsed_time": "2:51:41", "remaining_time": "5:36:29"}
|
||||||
|
{"current_steps": 1275, "total_steps": 3759, "loss": 0.477, "lr": 3.343956289202361e-05, "epoch": 2.3743016759776534, "percentage": 33.92, "elapsed_time": "2:52:20", "remaining_time": "5:35:46"}
|
||||||
|
{"current_steps": 1280, "total_steps": 3759, "loss": 0.5172, "lr": 3.33706458046094e-05, "epoch": 2.383612662942272, "percentage": 34.05, "elapsed_time": "2:53:03", "remaining_time": "5:35:10"}
|
||||||
|
{"current_steps": 1285, "total_steps": 3759, "loss": 0.5001, "lr": 3.33014404552171e-05, "epoch": 2.39292364990689, "percentage": 34.18, "elapsed_time": "2:53:43", "remaining_time": "5:34:28"}
|
||||||
|
{"current_steps": 1290, "total_steps": 3759, "loss": 0.4881, "lr": 3.323194833586669e-05, "epoch": 2.4022346368715084, "percentage": 34.32, "elapsed_time": "2:54:28", "remaining_time": "5:33:56"}
|
||||||
|
{"current_steps": 1295, "total_steps": 3759, "loss": 0.5176, "lr": 3.316217094476076e-05, "epoch": 2.4115456238361266, "percentage": 34.45, "elapsed_time": "2:55:11", "remaining_time": "5:33:20"}
|
||||||
|
{"current_steps": 1300, "total_steps": 3759, "loss": 0.4838, "lr": 3.3092109786252105e-05, "epoch": 2.4208566108007448, "percentage": 34.58, "elapsed_time": "2:55:52", "remaining_time": "5:32:39"}
|
||||||
|
{"current_steps": 1305, "total_steps": 3759, "loss": 0.4685, "lr": 3.3021766370811403e-05, "epoch": 2.430167597765363, "percentage": 34.72, "elapsed_time": "2:56:33", "remaining_time": "5:32:01"}
|
||||||
|
{"current_steps": 1310, "total_steps": 3759, "loss": 0.4647, "lr": 3.2951142214994565e-05, "epoch": 2.439478584729981, "percentage": 34.85, "elapsed_time": "2:57:12", "remaining_time": "5:31:16"}
|
||||||
|
{"current_steps": 1315, "total_steps": 3759, "loss": 0.4522, "lr": 3.2880238841410086e-05, "epoch": 2.4487895716945998, "percentage": 34.98, "elapsed_time": "2:57:49", "remaining_time": "5:30:29"}
|
||||||
|
{"current_steps": 1320, "total_steps": 3759, "loss": 0.4704, "lr": 3.280905777868621e-05, "epoch": 2.458100558659218, "percentage": 35.12, "elapsed_time": "2:58:33", "remaining_time": "5:29:55"}
|
||||||
|
{"current_steps": 1325, "total_steps": 3759, "loss": 0.4769, "lr": 3.273760056143795e-05, "epoch": 2.467411545623836, "percentage": 35.25, "elapsed_time": "2:59:07", "remaining_time": "5:29:02"}
|
||||||
|
{"current_steps": 1330, "total_steps": 3759, "loss": 0.4892, "lr": 3.266586873023404e-05, "epoch": 2.4767225325884543, "percentage": 35.38, "elapsed_time": "2:59:47", "remaining_time": "5:28:21"}
|
||||||
|
{"current_steps": 1335, "total_steps": 3759, "loss": 0.5005, "lr": 3.259386383156369e-05, "epoch": 2.4860335195530725, "percentage": 35.51, "elapsed_time": "3:00:22", "remaining_time": "5:27:31"}
|
||||||
|
{"current_steps": 1340, "total_steps": 3759, "loss": 0.4988, "lr": 3.252158741780328e-05, "epoch": 2.4953445065176907, "percentage": 35.65, "elapsed_time": "3:01:04", "remaining_time": "5:26:52"}
|
||||||
|
{"current_steps": 1345, "total_steps": 3759, "loss": 0.4923, "lr": 3.244904104718284e-05, "epoch": 2.504655493482309, "percentage": 35.78, "elapsed_time": "3:01:40", "remaining_time": "5:26:03"}
|
||||||
|
{"current_steps": 1350, "total_steps": 3759, "loss": 0.4832, "lr": 3.23762262837525e-05, "epoch": 2.5139664804469275, "percentage": 35.91, "elapsed_time": "3:02:20", "remaining_time": "5:25:23"}
|
||||||
|
{"current_steps": 1355, "total_steps": 3759, "loss": 0.5029, "lr": 3.230314469734877e-05, "epoch": 2.5232774674115457, "percentage": 36.05, "elapsed_time": "3:03:01", "remaining_time": "5:24:43"}
|
||||||
|
{"current_steps": 1360, "total_steps": 3759, "loss": 0.5192, "lr": 3.222979786356064e-05, "epoch": 2.532588454376164, "percentage": 36.18, "elapsed_time": "3:03:43", "remaining_time": "5:24:05"}
|
||||||
|
{"current_steps": 1365, "total_steps": 3759, "loss": 0.5034, "lr": 3.21561873636957e-05, "epoch": 2.541899441340782, "percentage": 36.31, "elapsed_time": "3:04:30", "remaining_time": "5:23:36"}
|
||||||
|
{"current_steps": 1370, "total_steps": 3759, "loss": 0.4608, "lr": 3.208231478474596e-05, "epoch": 2.5512104283054002, "percentage": 36.45, "elapsed_time": "3:05:03", "remaining_time": "5:22:41"}
|
||||||
|
{"current_steps": 1375, "total_steps": 3759, "loss": 0.5044, "lr": 3.200818171935371e-05, "epoch": 2.560521415270019, "percentage": 36.58, "elapsed_time": "3:05:50", "remaining_time": "5:22:12"}
|
||||||
|
{"current_steps": 1380, "total_steps": 3759, "loss": 0.4873, "lr": 3.193378976577712e-05, "epoch": 2.5698324022346366, "percentage": 36.71, "elapsed_time": "3:06:28", "remaining_time": "5:21:28"}
|
||||||
|
{"current_steps": 1385, "total_steps": 3759, "loss": 0.4706, "lr": 3.185914052785584e-05, "epoch": 2.5791433891992552, "percentage": 36.84, "elapsed_time": "3:07:14", "remaining_time": "5:20:57"}
|
||||||
|
{"current_steps": 1390, "total_steps": 3759, "loss": 0.4735, "lr": 3.178423561497636e-05, "epoch": 2.5884543761638734, "percentage": 36.98, "elapsed_time": "3:07:52", "remaining_time": "5:20:12"}
|
||||||
|
{"current_steps": 1395, "total_steps": 3759, "loss": 0.5062, "lr": 3.170907664203739e-05, "epoch": 2.5977653631284916, "percentage": 37.11, "elapsed_time": "3:08:34", "remaining_time": "5:19:33"}
|
||||||
|
{"current_steps": 1400, "total_steps": 3759, "loss": 0.463, "lr": 3.1633665229414956e-05, "epoch": 2.60707635009311, "percentage": 37.24, "elapsed_time": "3:09:14", "remaining_time": "5:18:51"}
|
||||||
|
{"current_steps": 1405, "total_steps": 3759, "loss": 0.5016, "lr": 3.155800300292755e-05, "epoch": 2.616387337057728, "percentage": 37.38, "elapsed_time": "3:09:45", "remaining_time": "5:17:56"}
|
||||||
|
{"current_steps": 1410, "total_steps": 3759, "loss": 0.4868, "lr": 3.148209159380101e-05, "epoch": 2.6256983240223466, "percentage": 37.51, "elapsed_time": "3:10:23", "remaining_time": "5:17:11"}
|
||||||
|
{"current_steps": 1415, "total_steps": 3759, "loss": 0.5238, "lr": 3.1405932638633404e-05, "epoch": 2.635009310986965, "percentage": 37.64, "elapsed_time": "3:11:02", "remaining_time": "5:16:28"}
|
||||||
|
{"current_steps": 1420, "total_steps": 3759, "loss": 0.4947, "lr": 3.13295277793597e-05, "epoch": 2.644320297951583, "percentage": 37.78, "elapsed_time": "3:11:47", "remaining_time": "5:15:54"}
|
||||||
|
{"current_steps": 1425, "total_steps": 3759, "loss": 0.4817, "lr": 3.125287866321643e-05, "epoch": 2.653631284916201, "percentage": 37.91, "elapsed_time": "3:12:29", "remaining_time": "5:15:16"}
|
||||||
|
{"current_steps": 1430, "total_steps": 3759, "loss": 0.5125, "lr": 3.117598694270609e-05, "epoch": 2.6629422718808193, "percentage": 38.04, "elapsed_time": "3:13:01", "remaining_time": "5:14:22"}
|
||||||
|
{"current_steps": 1435, "total_steps": 3759, "loss": 0.4455, "lr": 3.1098854275561565e-05, "epoch": 2.6722532588454375, "percentage": 38.18, "elapsed_time": "3:13:34", "remaining_time": "5:13:29"}
|
||||||
|
{"current_steps": 1440, "total_steps": 3759, "loss": 0.5103, "lr": 3.102148232471043e-05, "epoch": 2.6815642458100557, "percentage": 38.31, "elapsed_time": "3:14:15", "remaining_time": "5:12:49"}
|
||||||
|
{"current_steps": 1445, "total_steps": 3759, "loss": 0.5068, "lr": 3.0943872758239016e-05, "epoch": 2.6908752327746743, "percentage": 38.44, "elapsed_time": "3:14:52", "remaining_time": "5:12:03"}
|
||||||
|
{"current_steps": 1450, "total_steps": 3759, "loss": 0.4788, "lr": 3.0866027249356484e-05, "epoch": 2.7001862197392925, "percentage": 38.57, "elapsed_time": "3:15:37", "remaining_time": "5:11:30"}
|
||||||
|
{"current_steps": 1455, "total_steps": 3759, "loss": 0.4362, "lr": 3.0787947476358765e-05, "epoch": 2.7094972067039107, "percentage": 38.71, "elapsed_time": "3:16:19", "remaining_time": "5:10:52"}
|
||||||
|
{"current_steps": 1460, "total_steps": 3759, "loss": 0.5015, "lr": 3.070963512259235e-05, "epoch": 2.718808193668529, "percentage": 38.84, "elapsed_time": "3:17:05", "remaining_time": "5:10:21"}
|
||||||
|
{"current_steps": 1465, "total_steps": 3759, "loss": 0.5231, "lr": 3.063109187641803e-05, "epoch": 2.728119180633147, "percentage": 38.97, "elapsed_time": "3:17:47", "remaining_time": "5:09:42"}
|
||||||
|
{"current_steps": 1470, "total_steps": 3759, "loss": 0.4649, "lr": 3.055231943117447e-05, "epoch": 2.7374301675977653, "percentage": 39.11, "elapsed_time": "3:18:28", "remaining_time": "5:09:03"}
|
||||||
|
{"current_steps": 1475, "total_steps": 3759, "loss": 0.5019, "lr": 3.0473319485141702e-05, "epoch": 2.7467411545623834, "percentage": 39.24, "elapsed_time": "3:19:07", "remaining_time": "5:08:21"}
|
||||||
|
{"current_steps": 1480, "total_steps": 3759, "loss": 0.4559, "lr": 3.039409374150454e-05, "epoch": 2.756052141527002, "percentage": 39.37, "elapsed_time": "3:19:48", "remaining_time": "5:07:40"}
|
||||||
|
{"current_steps": 1485, "total_steps": 3759, "loss": 0.4542, "lr": 3.0314643908315812e-05, "epoch": 2.7653631284916202, "percentage": 39.51, "elapsed_time": "3:20:28", "remaining_time": "5:06:59"}
|
||||||
|
{"current_steps": 1490, "total_steps": 3759, "loss": 0.4644, "lr": 3.0234971698459582e-05, "epoch": 2.7746741154562384, "percentage": 39.64, "elapsed_time": "3:21:09", "remaining_time": "5:06:20"}
|
||||||
|
{"current_steps": 1495, "total_steps": 3759, "loss": 0.5043, "lr": 3.015507882961421e-05, "epoch": 2.7839851024208566, "percentage": 39.77, "elapsed_time": "3:21:44", "remaining_time": "5:05:30"}
|
||||||
|
{"current_steps": 1500, "total_steps": 3759, "loss": 0.4666, "lr": 3.007496702421529e-05, "epoch": 2.793296089385475, "percentage": 39.9, "elapsed_time": "3:22:26", "remaining_time": "5:04:52"}
|
||||||
|
{"current_steps": 1505, "total_steps": 3759, "loss": 0.5005, "lr": 2.9994638009418552e-05, "epoch": 2.802607076350093, "percentage": 40.04, "elapsed_time": "3:23:21", "remaining_time": "5:04:34"}
|
||||||
|
{"current_steps": 1510, "total_steps": 3759, "loss": 0.4952, "lr": 2.9914093517062612e-05, "epoch": 2.811918063314711, "percentage": 40.17, "elapsed_time": "3:23:55", "remaining_time": "5:03:43"}
|
||||||
|
{"current_steps": 1515, "total_steps": 3759, "loss": 0.4448, "lr": 2.9833335283631642e-05, "epoch": 2.82122905027933, "percentage": 40.3, "elapsed_time": "3:24:31", "remaining_time": "5:02:56"}
|
||||||
|
{"current_steps": 1520, "total_steps": 3759, "loss": 0.4907, "lr": 2.9752365050217898e-05, "epoch": 2.830540037243948, "percentage": 40.44, "elapsed_time": "3:25:13", "remaining_time": "5:02:18"}
|
||||||
|
{"current_steps": 1525, "total_steps": 3759, "loss": 0.5087, "lr": 2.9671184562484244e-05, "epoch": 2.839851024208566, "percentage": 40.57, "elapsed_time": "3:25:58", "remaining_time": "5:01:43"}
|
||||||
|
{"current_steps": 1530, "total_steps": 3759, "loss": 0.4443, "lr": 2.958979557062646e-05, "epoch": 2.8491620111731844, "percentage": 40.7, "elapsed_time": "3:26:35", "remaining_time": "5:00:58"}
|
||||||
|
{"current_steps": 1535, "total_steps": 3759, "loss": 0.4743, "lr": 2.9508199829335543e-05, "epoch": 2.8584729981378025, "percentage": 40.84, "elapsed_time": "3:27:04", "remaining_time": "5:00:01"}
|
||||||
|
{"current_steps": 1540, "total_steps": 3759, "loss": 0.4955, "lr": 2.942639909775987e-05, "epoch": 2.8677839851024207, "percentage": 40.97, "elapsed_time": "3:27:42", "remaining_time": "4:59:17"}
|
||||||
|
{"current_steps": 1545, "total_steps": 3759, "loss": 0.4703, "lr": 2.934439513946726e-05, "epoch": 2.877094972067039, "percentage": 41.1, "elapsed_time": "3:28:28", "remaining_time": "4:58:45"}
|
||||||
|
{"current_steps": 1550, "total_steps": 3759, "loss": 0.476, "lr": 2.9262189722406956e-05, "epoch": 2.8864059590316575, "percentage": 41.23, "elapsed_time": "3:29:11", "remaining_time": "4:58:08"}
|
||||||
|
{"current_steps": 1555, "total_steps": 3759, "loss": 0.4598, "lr": 2.917978461887154e-05, "epoch": 2.8957169459962757, "percentage": 41.37, "elapsed_time": "3:29:46", "remaining_time": "4:57:19"}
|
||||||
|
{"current_steps": 1560, "total_steps": 3759, "loss": 0.4478, "lr": 2.909718160545865e-05, "epoch": 2.905027932960894, "percentage": 41.5, "elapsed_time": "3:30:30", "remaining_time": "4:56:43"}
|
||||||
|
{"current_steps": 1565, "total_steps": 3759, "loss": 0.5108, "lr": 2.9014382463032782e-05, "epoch": 2.914338919925512, "percentage": 41.63, "elapsed_time": "3:31:12", "remaining_time": "4:56:05"}
|
||||||
|
{"current_steps": 1570, "total_steps": 3759, "loss": 0.4568, "lr": 2.89313889766868e-05, "epoch": 2.9236499068901303, "percentage": 41.77, "elapsed_time": "3:31:47", "remaining_time": "4:55:18"}
|
||||||
|
{"current_steps": 1575, "total_steps": 3759, "loss": 0.4832, "lr": 2.8848202935703505e-05, "epoch": 2.9329608938547485, "percentage": 41.9, "elapsed_time": "3:32:26", "remaining_time": "4:54:34"}
|
||||||
|
{"current_steps": 1580, "total_steps": 3759, "loss": 0.5364, "lr": 2.8764826133517045e-05, "epoch": 2.9422718808193666, "percentage": 42.03, "elapsed_time": "3:33:10", "remaining_time": "4:53:59"}
|
||||||
|
{"current_steps": 1585, "total_steps": 3759, "loss": 0.4403, "lr": 2.8681260367674237e-05, "epoch": 2.9515828677839853, "percentage": 42.17, "elapsed_time": "3:33:50", "remaining_time": "4:53:18"}
|
||||||
|
{"current_steps": 1590, "total_steps": 3759, "loss": 0.5168, "lr": 2.8597507439795845e-05, "epoch": 2.9608938547486034, "percentage": 42.3, "elapsed_time": "3:34:29", "remaining_time": "4:52:36"}
|
||||||
|
{"current_steps": 1595, "total_steps": 3759, "loss": 0.4985, "lr": 2.8513569155537698e-05, "epoch": 2.9702048417132216, "percentage": 42.43, "elapsed_time": "3:35:04", "remaining_time": "4:51:47"}
|
||||||
|
{"current_steps": 1600, "total_steps": 3759, "loss": 0.4757, "lr": 2.8429447324551805e-05, "epoch": 2.97951582867784, "percentage": 42.56, "elapsed_time": "3:35:41", "remaining_time": "4:51:03"}
|
||||||
|
{"current_steps": 1605, "total_steps": 3759, "loss": 0.5058, "lr": 2.834514376044728e-05, "epoch": 2.988826815642458, "percentage": 42.7, "elapsed_time": "3:36:26", "remaining_time": "4:50:28"}
|
||||||
|
{"current_steps": 1610, "total_steps": 3759, "loss": 0.4803, "lr": 2.8260660280751326e-05, "epoch": 2.998137802607076, "percentage": 42.83, "elapsed_time": "3:37:00", "remaining_time": "4:49:39"}
|
||||||
|
{"current_steps": 1615, "total_steps": 3759, "loss": 0.4622, "lr": 2.8175998706869964e-05, "epoch": 3.007448789571695, "percentage": 42.96, "elapsed_time": "3:37:31", "remaining_time": "4:48:46"}
|
||||||
|
{"current_steps": 1620, "total_steps": 3759, "loss": 0.4434, "lr": 2.809116086404882e-05, "epoch": 3.016759776536313, "percentage": 43.1, "elapsed_time": "3:38:10", "remaining_time": "4:48:04"}
|
||||||
|
{"current_steps": 1625, "total_steps": 3759, "loss": 0.4474, "lr": 2.8006148581333766e-05, "epoch": 3.026070763500931, "percentage": 43.23, "elapsed_time": "3:38:48", "remaining_time": "4:47:20"}
|
||||||
|
{"current_steps": 1630, "total_steps": 3759, "loss": 0.4594, "lr": 2.792096369153146e-05, "epoch": 3.0353817504655494, "percentage": 43.36, "elapsed_time": "3:39:29", "remaining_time": "4:46:41"}
|
||||||
|
{"current_steps": 1635, "total_steps": 3759, "loss": 0.4851, "lr": 2.7835608031169864e-05, "epoch": 3.0446927374301676, "percentage": 43.5, "elapsed_time": "3:40:09", "remaining_time": "4:46:00"}
|
||||||
|
{"current_steps": 1640, "total_steps": 3759, "loss": 0.4615, "lr": 2.7750083440458634e-05, "epoch": 3.0540037243947857, "percentage": 43.63, "elapsed_time": "3:40:50", "remaining_time": "4:45:20"}
|
||||||
|
{"current_steps": 1645, "total_steps": 3759, "loss": 0.5329, "lr": 2.7664391763249443e-05, "epoch": 3.063314711359404, "percentage": 43.76, "elapsed_time": "3:41:33", "remaining_time": "4:44:43"}
|
||||||
|
{"current_steps": 1650, "total_steps": 3759, "loss": 0.4805, "lr": 2.757853484699624e-05, "epoch": 3.0726256983240225, "percentage": 43.89, "elapsed_time": "3:42:19", "remaining_time": "4:44:09"}
|
||||||
|
{"current_steps": 1655, "total_steps": 3759, "loss": 0.494, "lr": 2.7492514542715422e-05, "epoch": 3.0819366852886407, "percentage": 44.03, "elapsed_time": "3:43:01", "remaining_time": "4:43:31"}
|
||||||
|
{"current_steps": 1660, "total_steps": 3759, "loss": 0.4592, "lr": 2.7406332704945894e-05, "epoch": 3.091247672253259, "percentage": 44.16, "elapsed_time": "3:43:46", "remaining_time": "4:42:56"}
|
||||||
|
{"current_steps": 1665, "total_steps": 3759, "loss": 0.4686, "lr": 2.7319991191709148e-05, "epoch": 3.100558659217877, "percentage": 44.29, "elapsed_time": "3:44:27", "remaining_time": "4:42:17"}
|
||||||
|
{"current_steps": 1670, "total_steps": 3759, "loss": 0.4544, "lr": 2.7233491864469142e-05, "epoch": 3.1098696461824953, "percentage": 44.43, "elapsed_time": "3:45:01", "remaining_time": "4:41:28"}
|
||||||
|
{"current_steps": 1675, "total_steps": 3759, "loss": 0.4998, "lr": 2.7146836588092215e-05, "epoch": 3.1191806331471135, "percentage": 44.56, "elapsed_time": "3:45:46", "remaining_time": "4:40:54"}
|
||||||
|
{"current_steps": 1680, "total_steps": 3759, "loss": 0.4701, "lr": 2.706002723080684e-05, "epoch": 3.1284916201117317, "percentage": 44.69, "elapsed_time": "3:46:28", "remaining_time": "4:40:16"}
|
||||||
|
{"current_steps": 1685, "total_steps": 3759, "loss": 0.4981, "lr": 2.6973065664163405e-05, "epoch": 3.1378026070763503, "percentage": 44.83, "elapsed_time": "3:47:11", "remaining_time": "4:39:38"}
|
||||||
|
{"current_steps": 1690, "total_steps": 3759, "loss": 0.4779, "lr": 2.688595376299378e-05, "epoch": 3.1471135940409685, "percentage": 44.96, "elapsed_time": "3:47:58", "remaining_time": "4:39:05"}
|
||||||
|
{"current_steps": 1695, "total_steps": 3759, "loss": 0.4625, "lr": 2.6798693405370987e-05, "epoch": 3.1564245810055866, "percentage": 45.09, "elapsed_time": "3:48:41", "remaining_time": "4:38:29"}
|
||||||
|
{"current_steps": 1700, "total_steps": 3759, "loss": 0.4529, "lr": 2.6711286472568644e-05, "epoch": 3.165735567970205, "percentage": 45.22, "elapsed_time": "3:49:26", "remaining_time": "4:37:53"}
|
||||||
|
{"current_steps": 1705, "total_steps": 3759, "loss": 0.4528, "lr": 2.6623734849020436e-05, "epoch": 3.175046554934823, "percentage": 45.36, "elapsed_time": "3:50:05", "remaining_time": "4:37:11"}
|
||||||
|
{"current_steps": 1710, "total_steps": 3759, "loss": 0.4653, "lr": 2.653604042227949e-05, "epoch": 3.184357541899441, "percentage": 45.49, "elapsed_time": "3:50:43", "remaining_time": "4:36:27"}
|
||||||
|
{"current_steps": 1715, "total_steps": 3759, "loss": 0.4839, "lr": 2.644820508297765e-05, "epoch": 3.1936685288640594, "percentage": 45.62, "elapsed_time": "3:51:18", "remaining_time": "4:35:40"}
|
||||||
|
{"current_steps": 1720, "total_steps": 3759, "loss": 0.4183, "lr": 2.6360230724784766e-05, "epoch": 3.202979515828678, "percentage": 45.76, "elapsed_time": "3:51:51", "remaining_time": "4:34:51"}
|
||||||
|
{"current_steps": 1725, "total_steps": 3759, "loss": 0.4508, "lr": 2.627211924436782e-05, "epoch": 3.212290502793296, "percentage": 45.89, "elapsed_time": "3:52:29", "remaining_time": "4:34:08"}
|
||||||
|
{"current_steps": 1730, "total_steps": 3759, "loss": 0.456, "lr": 2.6183872541350068e-05, "epoch": 3.2216014897579144, "percentage": 46.02, "elapsed_time": "3:53:15", "remaining_time": "4:33:34"}
|
||||||
|
{"current_steps": 1735, "total_steps": 3759, "loss": 0.5061, "lr": 2.609549251827005e-05, "epoch": 3.2309124767225326, "percentage": 46.16, "elapsed_time": "3:53:57", "remaining_time": "4:32:56"}
|
||||||
|
{"current_steps": 1740, "total_steps": 3759, "loss": 0.446, "lr": 2.6006981080540638e-05, "epoch": 3.2402234636871508, "percentage": 46.29, "elapsed_time": "3:54:33", "remaining_time": "4:32:09"}
|
||||||
|
{"current_steps": 1745, "total_steps": 3759, "loss": 0.4624, "lr": 2.5918340136407865e-05, "epoch": 3.249534450651769, "percentage": 46.42, "elapsed_time": "3:55:06", "remaining_time": "4:31:20"}
|
||||||
|
{"current_steps": 1750, "total_steps": 3759, "loss": 0.4711, "lr": 2.5829571596909862e-05, "epoch": 3.2588454376163876, "percentage": 46.55, "elapsed_time": "3:55:38", "remaining_time": "4:30:31"}
|
||||||
|
{"current_steps": 1755, "total_steps": 3759, "loss": 0.4871, "lr": 2.5740677375835637e-05, "epoch": 3.2681564245810057, "percentage": 46.69, "elapsed_time": "3:56:16", "remaining_time": "4:29:47"}
|
||||||
|
{"current_steps": 1760, "total_steps": 3759, "loss": 0.4665, "lr": 2.5651659389683777e-05, "epoch": 3.277467411545624, "percentage": 46.82, "elapsed_time": "3:56:58", "remaining_time": "4:29:09"}
|
||||||
|
{"current_steps": 1765, "total_steps": 3759, "loss": 0.4876, "lr": 2.5562519557621183e-05, "epoch": 3.286778398510242, "percentage": 46.95, "elapsed_time": "3:57:43", "remaining_time": "4:28:34"}
|
||||||
|
{"current_steps": 1770, "total_steps": 3759, "loss": 0.4765, "lr": 2.5473259801441663e-05, "epoch": 3.2960893854748603, "percentage": 47.09, "elapsed_time": "3:58:29", "remaining_time": "4:28:00"}
|
||||||
|
{"current_steps": 1775, "total_steps": 3759, "loss": 0.4684, "lr": 2.53838820455245e-05, "epoch": 3.3054003724394785, "percentage": 47.22, "elapsed_time": "3:59:04", "remaining_time": "4:27:13"}
|
||||||
|
{"current_steps": 1780, "total_steps": 3759, "loss": 0.4745, "lr": 2.5294388216792987e-05, "epoch": 3.3147113594040967, "percentage": 47.35, "elapsed_time": "3:59:42", "remaining_time": "4:26:30"}
|
||||||
|
{"current_steps": 1785, "total_steps": 3759, "loss": 0.4895, "lr": 2.5204780244672858e-05, "epoch": 3.3240223463687153, "percentage": 47.49, "elapsed_time": "4:00:25", "remaining_time": "4:25:53"}
|
||||||
|
{"current_steps": 1790, "total_steps": 3759, "loss": 0.452, "lr": 2.5115060061050693e-05, "epoch": 3.3333333333333335, "percentage": 47.62, "elapsed_time": "4:01:11", "remaining_time": "4:25:18"}
|
||||||
|
{"current_steps": 1795, "total_steps": 3759, "loss": 0.4861, "lr": 2.5025229600232285e-05, "epoch": 3.3426443202979517, "percentage": 47.75, "elapsed_time": "4:01:57", "remaining_time": "4:24:44"}
|
||||||
|
{"current_steps": 1800, "total_steps": 3759, "loss": 0.4775, "lr": 2.493529079890093e-05, "epoch": 3.35195530726257, "percentage": 47.89, "elapsed_time": "4:02:36", "remaining_time": "4:24:02"}
|
||||||
|
{"current_steps": 1805, "total_steps": 3759, "loss": 0.4673, "lr": 2.4845245596075666e-05, "epoch": 3.361266294227188, "percentage": 48.02, "elapsed_time": "4:03:18", "remaining_time": "4:23:23"}
|
||||||
|
{"current_steps": 1810, "total_steps": 3759, "loss": 0.47, "lr": 2.475509593306947e-05, "epoch": 3.370577281191806, "percentage": 48.15, "elapsed_time": "4:03:57", "remaining_time": "4:22:41"}
|
||||||
|
{"current_steps": 1815, "total_steps": 3759, "loss": 0.4616, "lr": 2.4664843753447423e-05, "epoch": 3.3798882681564244, "percentage": 48.28, "elapsed_time": "4:04:38", "remaining_time": "4:22:01"}
|
||||||
|
{"current_steps": 1820, "total_steps": 3759, "loss": 0.4921, "lr": 2.4574491002984777e-05, "epoch": 3.389199255121043, "percentage": 48.42, "elapsed_time": "4:05:15", "remaining_time": "4:21:17"}
|
||||||
|
{"current_steps": 1825, "total_steps": 3759, "loss": 0.4575, "lr": 2.448403962962504e-05, "epoch": 3.398510242085661, "percentage": 48.55, "elapsed_time": "4:05:56", "remaining_time": "4:20:37"}
|
||||||
|
{"current_steps": 1830, "total_steps": 3759, "loss": 0.4696, "lr": 2.4393491583437946e-05, "epoch": 3.4078212290502794, "percentage": 48.68, "elapsed_time": "4:06:38", "remaining_time": "4:19:58"}
|
||||||
|
{"current_steps": 1835, "total_steps": 3759, "loss": 0.4747, "lr": 2.430284881657744e-05, "epoch": 3.4171322160148976, "percentage": 48.82, "elapsed_time": "4:07:23", "remaining_time": "4:19:23"}
|
||||||
|
{"current_steps": 1840, "total_steps": 3759, "loss": 0.4547, "lr": 2.421211328323957e-05, "epoch": 3.4264432029795158, "percentage": 48.95, "elapsed_time": "4:08:02", "remaining_time": "4:18:41"}
|
||||||
|
{"current_steps": 1845, "total_steps": 3759, "loss": 0.4675, "lr": 2.4121286939620385e-05, "epoch": 3.435754189944134, "percentage": 49.08, "elapsed_time": "4:08:49", "remaining_time": "4:18:07"}
|
||||||
|
{"current_steps": 1850, "total_steps": 3759, "loss": 0.479, "lr": 2.4030371743873713e-05, "epoch": 3.445065176908752, "percentage": 49.22, "elapsed_time": "4:09:30", "remaining_time": "4:17:28"}
|
||||||
|
{"current_steps": 1855, "total_steps": 3759, "loss": 0.4915, "lr": 2.3939369656069005e-05, "epoch": 3.4543761638733708, "percentage": 49.35, "elapsed_time": "4:10:11", "remaining_time": "4:16:47"}
|
||||||
|
{"current_steps": 1860, "total_steps": 3759, "loss": 0.4329, "lr": 2.3848282638149012e-05, "epoch": 3.463687150837989, "percentage": 49.48, "elapsed_time": "4:10:45", "remaining_time": "4:16:01"}
|
||||||
|
{"current_steps": 1865, "total_steps": 3759, "loss": 0.4659, "lr": 2.3757112653887554e-05, "epoch": 3.472998137802607, "percentage": 49.61, "elapsed_time": "4:11:23", "remaining_time": "4:15:18"}
|
||||||
|
{"current_steps": 1870, "total_steps": 3759, "loss": 0.4889, "lr": 2.366586166884712e-05, "epoch": 3.4823091247672253, "percentage": 49.75, "elapsed_time": "4:11:54", "remaining_time": "4:14:28"}
|
||||||
|
{"current_steps": 1875, "total_steps": 3759, "loss": 0.4922, "lr": 2.3574531650336536e-05, "epoch": 3.4916201117318435, "percentage": 49.88, "elapsed_time": "4:12:37", "remaining_time": "4:13:49"}
|
||||||
|
{"current_steps": 1880, "total_steps": 3759, "loss": 0.4553, "lr": 2.3483124567368534e-05, "epoch": 3.5009310986964617, "percentage": 50.01, "elapsed_time": "4:13:17", "remaining_time": "4:13:09"}
|
||||||
|
{"current_steps": 1885, "total_steps": 3759, "loss": 0.4315, "lr": 2.33916423906173e-05, "epoch": 3.51024208566108, "percentage": 50.15, "elapsed_time": "4:13:39", "remaining_time": "4:12:10"}
|
||||||
|
{"current_steps": 1890, "total_steps": 3759, "loss": 0.443, "lr": 2.330008709237599e-05, "epoch": 3.5195530726256985, "percentage": 50.28, "elapsed_time": "4:14:19", "remaining_time": "4:11:29"}
|
||||||
|
{"current_steps": 1895, "total_steps": 3759, "loss": 0.4731, "lr": 2.3208460646514208e-05, "epoch": 3.5288640595903167, "percentage": 50.41, "elapsed_time": "4:15:03", "remaining_time": "4:10:53"}
|
||||||
|
{"current_steps": 1900, "total_steps": 3759, "loss": 0.4395, "lr": 2.3116765028435458e-05, "epoch": 3.538175046554935, "percentage": 50.55, "elapsed_time": "4:15:41", "remaining_time": "4:10:10"}
|
||||||
|
{"current_steps": 1905, "total_steps": 3759, "loss": 0.4633, "lr": 2.302500221503455e-05, "epoch": 3.547486033519553, "percentage": 50.68, "elapsed_time": "4:16:25", "remaining_time": "4:09:33"}
|
||||||
|
{"current_steps": 1910, "total_steps": 3759, "loss": 0.4584, "lr": 2.2933174184654973e-05, "epoch": 3.5567970204841712, "percentage": 50.81, "elapsed_time": "4:17:02", "remaining_time": "4:08:49"}
|
||||||
|
{"current_steps": 1915, "total_steps": 3759, "loss": 0.4977, "lr": 2.2841282917046246e-05, "epoch": 3.5661080074487894, "percentage": 50.94, "elapsed_time": "4:17:49", "remaining_time": "4:08:16"}
|
||||||
|
{"current_steps": 1920, "total_steps": 3759, "loss": 0.4548, "lr": 2.274933039332125e-05, "epoch": 3.5754189944134076, "percentage": 51.08, "elapsed_time": "4:18:36", "remaining_time": "4:07:41"}
|
||||||
|
{"current_steps": 1925, "total_steps": 3759, "loss": 0.4723, "lr": 2.2657318595913506e-05, "epoch": 3.5847299813780262, "percentage": 51.21, "elapsed_time": "4:19:23", "remaining_time": "4:07:07"}
|
||||||
|
{"current_steps": 1930, "total_steps": 3759, "loss": 0.4749, "lr": 2.256524950853443e-05, "epoch": 3.5940409683426444, "percentage": 51.34, "elapsed_time": "4:20:04", "remaining_time": "4:06:27"}
|
||||||
|
{"current_steps": 1935, "total_steps": 3759, "loss": 0.4358, "lr": 2.2473125116130557e-05, "epoch": 3.6033519553072626, "percentage": 51.48, "elapsed_time": "4:20:41", "remaining_time": "4:05:44"}
|
||||||
|
{"current_steps": 1940, "total_steps": 3759, "loss": 0.4749, "lr": 2.2380947404840793e-05, "epoch": 3.612662942271881, "percentage": 51.61, "elapsed_time": "4:21:22", "remaining_time": "4:05:04"}
|
||||||
|
{"current_steps": 1945, "total_steps": 3759, "loss": 0.4931, "lr": 2.2288718361953534e-05, "epoch": 3.621973929236499, "percentage": 51.74, "elapsed_time": "4:22:06", "remaining_time": "4:04:27"}
|
||||||
|
{"current_steps": 1950, "total_steps": 3759, "loss": 0.4716, "lr": 2.2196439975863867e-05, "epoch": 3.631284916201117, "percentage": 51.88, "elapsed_time": "4:22:47", "remaining_time": "4:03:47"}
|
||||||
|
{"current_steps": 1955, "total_steps": 3759, "loss": 0.4824, "lr": 2.2104114236030666e-05, "epoch": 3.6405959031657353, "percentage": 52.01, "elapsed_time": "4:23:29", "remaining_time": "4:03:08"}
|
||||||
|
{"current_steps": 1960, "total_steps": 3759, "loss": 0.454, "lr": 2.2011743132933746e-05, "epoch": 3.649906890130354, "percentage": 52.14, "elapsed_time": "4:24:14", "remaining_time": "4:02:31"}
|
||||||
|
{"current_steps": 1965, "total_steps": 3759, "loss": 0.4726, "lr": 2.19193286580309e-05, "epoch": 3.659217877094972, "percentage": 52.27, "elapsed_time": "4:24:55", "remaining_time": "4:01:52"}
|
||||||
|
{"current_steps": 1970, "total_steps": 3759, "loss": 0.4421, "lr": 2.1826872803714997e-05, "epoch": 3.6685288640595903, "percentage": 52.41, "elapsed_time": "4:25:41", "remaining_time": "4:01:16"}
|
||||||
|
{"current_steps": 1975, "total_steps": 3759, "loss": 0.4418, "lr": 2.173437756327102e-05, "epoch": 3.6778398510242085, "percentage": 52.54, "elapsed_time": "4:26:20", "remaining_time": "4:00:34"}
|
||||||
|
{"current_steps": 1980, "total_steps": 3759, "loss": 0.4688, "lr": 2.1641844930833078e-05, "epoch": 3.6871508379888267, "percentage": 52.67, "elapsed_time": "4:26:57", "remaining_time": "3:59:51"}
|
||||||
|
{"current_steps": 1985, "total_steps": 3759, "loss": 0.4394, "lr": 2.154927690134145e-05, "epoch": 3.6964618249534453, "percentage": 52.81, "elapsed_time": "4:27:28", "remaining_time": "3:59:02"}
|
||||||
|
{"current_steps": 1990, "total_steps": 3759, "loss": 0.4779, "lr": 2.1456675470499523e-05, "epoch": 3.705772811918063, "percentage": 52.94, "elapsed_time": "4:28:12", "remaining_time": "3:58:25"}
|
||||||
|
{"current_steps": 1995, "total_steps": 3759, "loss": 0.4698, "lr": 2.1364042634730806e-05, "epoch": 3.7150837988826817, "percentage": 53.07, "elapsed_time": "4:28:47", "remaining_time": "3:57:40"}
|
||||||
|
{"current_steps": 2000, "total_steps": 3759, "loss": 0.4755, "lr": 2.127138039113589e-05, "epoch": 3.7243947858473, "percentage": 53.21, "elapsed_time": "4:29:27", "remaining_time": "3:56:59"}
|
||||||
|
{"current_steps": 2005, "total_steps": 3759, "loss": 0.4465, "lr": 2.1178690737449354e-05, "epoch": 3.733705772811918, "percentage": 53.34, "elapsed_time": "4:30:09", "remaining_time": "3:56:20"}
|
||||||
|
{"current_steps": 2010, "total_steps": 3759, "loss": 0.4529, "lr": 2.1085975671996736e-05, "epoch": 3.7430167597765363, "percentage": 53.47, "elapsed_time": "4:30:40", "remaining_time": "3:55:31"}
|
||||||
|
{"current_steps": 2015, "total_steps": 3759, "loss": 0.4785, "lr": 2.0993237193651436e-05, "epoch": 3.7523277467411544, "percentage": 53.6, "elapsed_time": "4:31:22", "remaining_time": "3:54:52"}
|
||||||
|
{"current_steps": 2020, "total_steps": 3759, "loss": 0.4662, "lr": 2.0900477301791606e-05, "epoch": 3.761638733705773, "percentage": 53.74, "elapsed_time": "4:32:01", "remaining_time": "3:54:10"}
|
||||||
|
{"current_steps": 2025, "total_steps": 3759, "loss": 0.4879, "lr": 2.0807697996257072e-05, "epoch": 3.770949720670391, "percentage": 53.87, "elapsed_time": "4:32:36", "remaining_time": "3:53:26"}
|
||||||
|
{"current_steps": 2030, "total_steps": 3759, "loss": 0.434, "lr": 2.0714901277306203e-05, "epoch": 3.7802607076350094, "percentage": 54.0, "elapsed_time": "4:33:14", "remaining_time": "3:52:43"}
|
||||||
|
{"current_steps": 2035, "total_steps": 3759, "loss": 0.4445, "lr": 2.062208914557278e-05, "epoch": 3.7895716945996276, "percentage": 54.14, "elapsed_time": "4:33:51", "remaining_time": "3:52:00"}
|
||||||
|
{"current_steps": 2040, "total_steps": 3759, "loss": 0.4657, "lr": 2.052926360202289e-05, "epoch": 3.798882681564246, "percentage": 54.27, "elapsed_time": "4:34:32", "remaining_time": "3:51:20"}
|
||||||
|
{"current_steps": 2045, "total_steps": 3759, "loss": 0.4328, "lr": 2.0436426647911748e-05, "epoch": 3.808193668528864, "percentage": 54.4, "elapsed_time": "4:35:10", "remaining_time": "3:50:38"}
|
||||||
|
{"current_steps": 2050, "total_steps": 3759, "loss": 0.4728, "lr": 2.034358028474059e-05, "epoch": 3.817504655493482, "percentage": 54.54, "elapsed_time": "4:35:42", "remaining_time": "3:49:50"}
|
||||||
|
{"current_steps": 2055, "total_steps": 3759, "loss": 0.4735, "lr": 2.0250726514213506e-05, "epoch": 3.826815642458101, "percentage": 54.67, "elapsed_time": "4:36:25", "remaining_time": "3:49:13"}
|
||||||
|
{"current_steps": 2060, "total_steps": 3759, "loss": 0.451, "lr": 2.015786733819427e-05, "epoch": 3.8361266294227185, "percentage": 54.8, "elapsed_time": "4:37:02", "remaining_time": "3:48:29"}
|
||||||
|
{"current_steps": 2065, "total_steps": 3759, "loss": 0.5136, "lr": 2.0065004758663202e-05, "epoch": 3.845437616387337, "percentage": 54.93, "elapsed_time": "4:37:49", "remaining_time": "3:47:54"}
|
||||||
|
{"current_steps": 2070, "total_steps": 3759, "loss": 0.4534, "lr": 1.9972140777673997e-05, "epoch": 3.8547486033519553, "percentage": 55.07, "elapsed_time": "4:38:29", "remaining_time": "3:47:14"}
|
||||||
|
{"current_steps": 2075, "total_steps": 3759, "loss": 0.4694, "lr": 1.9879277397310574e-05, "epoch": 3.8640595903165735, "percentage": 55.2, "elapsed_time": "4:39:05", "remaining_time": "3:46:29"}
|
||||||
|
{"current_steps": 2080, "total_steps": 3759, "loss": 0.4558, "lr": 1.9786416619643888e-05, "epoch": 3.8733705772811917, "percentage": 55.33, "elapsed_time": "4:39:49", "remaining_time": "3:45:52"}
|
||||||
|
{"current_steps": 2085, "total_steps": 3759, "loss": 0.4724, "lr": 1.9693560446688796e-05, "epoch": 3.88268156424581, "percentage": 55.47, "elapsed_time": "4:40:19", "remaining_time": "3:45:04"}
|
||||||
|
{"current_steps": 2090, "total_steps": 3759, "loss": 0.4369, "lr": 1.960071088036087e-05, "epoch": 3.8919925512104285, "percentage": 55.6, "elapsed_time": "4:40:56", "remaining_time": "3:44:21"}
|
||||||
|
{"current_steps": 2095, "total_steps": 3759, "loss": 0.4754, "lr": 1.9507869922433244e-05, "epoch": 3.9013035381750467, "percentage": 55.73, "elapsed_time": "4:41:41", "remaining_time": "3:43:44"}
|
||||||
|
{"current_steps": 2100, "total_steps": 3759, "loss": 0.4462, "lr": 1.9415039574493482e-05, "epoch": 3.910614525139665, "percentage": 55.87, "elapsed_time": "4:42:17", "remaining_time": "3:43:00"}
|
||||||
|
{"current_steps": 2105, "total_steps": 3759, "loss": 0.4824, "lr": 1.9322221837900387e-05, "epoch": 3.919925512104283, "percentage": 56.0, "elapsed_time": "4:43:03", "remaining_time": "3:42:24"}
|
||||||
|
{"current_steps": 2110, "total_steps": 3759, "loss": 0.4449, "lr": 1.922941871374087e-05, "epoch": 3.9292364990689013, "percentage": 56.13, "elapsed_time": "4:43:37", "remaining_time": "3:41:39"}
|
||||||
|
{"current_steps": 2115, "total_steps": 3759, "loss": 0.4682, "lr": 1.9136632202786806e-05, "epoch": 3.9385474860335195, "percentage": 56.26, "elapsed_time": "4:44:19", "remaining_time": "3:41:00"}
|
||||||
|
{"current_steps": 2120, "total_steps": 3759, "loss": 0.4541, "lr": 1.9043864305451913e-05, "epoch": 3.9478584729981376, "percentage": 56.4, "elapsed_time": "4:44:51", "remaining_time": "3:40:13"}
|
||||||
|
{"current_steps": 2125, "total_steps": 3759, "loss": 0.4747, "lr": 1.8951117021748596e-05, "epoch": 3.9571694599627563, "percentage": 56.53, "elapsed_time": "4:45:30", "remaining_time": "3:39:32"}
|
||||||
|
{"current_steps": 2130, "total_steps": 3759, "loss": 0.468, "lr": 1.8858392351244875e-05, "epoch": 3.9664804469273744, "percentage": 56.66, "elapsed_time": "4:46:12", "remaining_time": "3:38:52"}
|
||||||
|
{"current_steps": 2135, "total_steps": 3759, "loss": 0.4719, "lr": 1.8765692293021196e-05, "epoch": 3.9757914338919926, "percentage": 56.8, "elapsed_time": "4:46:56", "remaining_time": "3:38:15"}
|
||||||
|
{"current_steps": 2140, "total_steps": 3759, "loss": 0.4516, "lr": 1.8673018845627416e-05, "epoch": 3.985102420856611, "percentage": 56.93, "elapsed_time": "4:47:40", "remaining_time": "3:37:38"}
|
||||||
|
{"current_steps": 2145, "total_steps": 3759, "loss": 0.4577, "lr": 1.8580374007039682e-05, "epoch": 3.994413407821229, "percentage": 57.06, "elapsed_time": "4:48:15", "remaining_time": "3:36:54"}
|
||||||
|
{"current_steps": 2150, "total_steps": 3759, "loss": 0.4594, "lr": 1.8487759774617343e-05, "epoch": 4.003724394785848, "percentage": 57.2, "elapsed_time": "4:48:57", "remaining_time": "3:36:14"}
|
||||||
|
{"current_steps": 2155, "total_steps": 3759, "loss": 0.4287, "lr": 1.8395178145059894e-05, "epoch": 4.013035381750465, "percentage": 57.33, "elapsed_time": "4:49:36", "remaining_time": "3:35:33"}
|
||||||
|
{"current_steps": 2160, "total_steps": 3759, "loss": 0.4596, "lr": 1.8302631114363947e-05, "epoch": 4.022346368715084, "percentage": 57.46, "elapsed_time": "4:50:19", "remaining_time": "3:34:55"}
|
||||||
|
{"current_steps": 2165, "total_steps": 3759, "loss": 0.4469, "lr": 1.8210120677780184e-05, "epoch": 4.031657355679702, "percentage": 57.6, "elapsed_time": "4:50:58", "remaining_time": "3:34:13"}
|
||||||
|
{"current_steps": 2170, "total_steps": 3759, "loss": 0.4901, "lr": 1.8117648829770343e-05, "epoch": 4.04096834264432, "percentage": 57.73, "elapsed_time": "4:51:35", "remaining_time": "3:33:31"}
|
||||||
|
{"current_steps": 2175, "total_steps": 3759, "loss": 0.4478, "lr": 1.8025217563964207e-05, "epoch": 4.050279329608939, "percentage": 57.86, "elapsed_time": "4:52:15", "remaining_time": "3:32:50"}
|
||||||
|
{"current_steps": 2180, "total_steps": 3759, "loss": 0.4608, "lr": 1.793282887311665e-05, "epoch": 4.059590316573557, "percentage": 57.99, "elapsed_time": "4:52:54", "remaining_time": "3:32:09"}
|
||||||
|
{"current_steps": 2185, "total_steps": 3759, "loss": 0.463, "lr": 1.784048474906466e-05, "epoch": 4.068901303538175, "percentage": 58.13, "elapsed_time": "4:53:35", "remaining_time": "3:31:29"}
|
||||||
|
{"current_steps": 2190, "total_steps": 3759, "loss": 0.4293, "lr": 1.7748187182684375e-05, "epoch": 4.078212290502793, "percentage": 58.26, "elapsed_time": "4:54:08", "remaining_time": "3:30:43"}
|
||||||
|
{"current_steps": 2195, "total_steps": 3759, "loss": 0.4371, "lr": 1.7655938163848216e-05, "epoch": 4.087523277467412, "percentage": 58.39, "elapsed_time": "4:54:51", "remaining_time": "3:30:05"}
|
||||||
|
{"current_steps": 2200, "total_steps": 3759, "loss": 0.4254, "lr": 1.7563739681381908e-05, "epoch": 4.0968342644320295, "percentage": 58.53, "elapsed_time": "4:55:33", "remaining_time": "3:29:26"}
|
||||||
|
{"current_steps": 2205, "total_steps": 3759, "loss": 0.4357, "lr": 1.7471593723021684e-05, "epoch": 4.106145251396648, "percentage": 58.66, "elapsed_time": "4:56:12", "remaining_time": "3:28:45"}
|
||||||
|
{"current_steps": 2210, "total_steps": 3759, "loss": 0.4643, "lr": 1.737950227537137e-05, "epoch": 4.115456238361267, "percentage": 58.79, "elapsed_time": "4:56:50", "remaining_time": "3:28:03"}
|
||||||
|
{"current_steps": 2215, "total_steps": 3759, "loss": 0.4512, "lr": 1.7287467323859598e-05, "epoch": 4.1247672253258845, "percentage": 58.93, "elapsed_time": "4:57:34", "remaining_time": "3:27:25"}
|
||||||
|
{"current_steps": 2220, "total_steps": 3759, "loss": 0.4659, "lr": 1.7195490852696963e-05, "epoch": 4.134078212290503, "percentage": 59.06, "elapsed_time": "4:58:15", "remaining_time": "3:26:45"}
|
||||||
|
{"current_steps": 2225, "total_steps": 3759, "loss": 0.4198, "lr": 1.7103574844833266e-05, "epoch": 4.143389199255121, "percentage": 59.19, "elapsed_time": "4:58:51", "remaining_time": "3:26:02"}
|
||||||
|
{"current_steps": 2230, "total_steps": 3759, "loss": 0.4417, "lr": 1.701172128191478e-05, "epoch": 4.1527001862197395, "percentage": 59.32, "elapsed_time": "4:59:30", "remaining_time": "3:25:21"}
|
||||||
|
{"current_steps": 2235, "total_steps": 3759, "loss": 0.4471, "lr": 1.6919932144241493e-05, "epoch": 4.162011173184357, "percentage": 59.46, "elapsed_time": "5:00:07", "remaining_time": "3:24:38"}
|
||||||
|
{"current_steps": 2240, "total_steps": 3759, "loss": 0.4167, "lr": 1.6828209410724423e-05, "epoch": 4.171322160148976, "percentage": 59.59, "elapsed_time": "5:00:40", "remaining_time": "3:23:53"}
|
||||||
|
{"current_steps": 2245, "total_steps": 3759, "loss": 0.4863, "lr": 1.6736555058842967e-05, "epoch": 4.1806331471135945, "percentage": 59.72, "elapsed_time": "5:01:24", "remaining_time": "3:23:15"}
|
||||||
|
{"current_steps": 2250, "total_steps": 3759, "loss": 0.4473, "lr": 1.6644971064602266e-05, "epoch": 4.189944134078212, "percentage": 59.86, "elapsed_time": "5:02:02", "remaining_time": "3:22:34"}
|
||||||
|
{"current_steps": 2255, "total_steps": 3759, "loss": 0.4312, "lr": 1.655345940249059e-05, "epoch": 4.199255121042831, "percentage": 59.99, "elapsed_time": "5:02:43", "remaining_time": "3:21:54"}
|
||||||
|
{"current_steps": 2260, "total_steps": 3759, "loss": 0.4376, "lr": 1.6462022045436778e-05, "epoch": 4.208566108007449, "percentage": 60.12, "elapsed_time": "5:03:22", "remaining_time": "3:21:12"}
|
||||||
|
{"current_steps": 2265, "total_steps": 3759, "loss": 0.4799, "lr": 1.6370660964767704e-05, "epoch": 4.217877094972067, "percentage": 60.26, "elapsed_time": "5:04:04", "remaining_time": "3:20:34"}
|
||||||
|
{"current_steps": 2270, "total_steps": 3759, "loss": 0.4722, "lr": 1.627937813016578e-05, "epoch": 4.227188081936685, "percentage": 60.39, "elapsed_time": "5:04:51", "remaining_time": "3:19:58"}
|
||||||
|
{"current_steps": 2275, "total_steps": 3759, "loss": 0.471, "lr": 1.6188175509626484e-05, "epoch": 4.236499068901304, "percentage": 60.52, "elapsed_time": "5:05:33", "remaining_time": "3:19:19"}
|
||||||
|
{"current_steps": 2280, "total_steps": 3759, "loss": 0.4333, "lr": 1.6097055069415938e-05, "epoch": 4.245810055865922, "percentage": 60.65, "elapsed_time": "5:06:11", "remaining_time": "3:18:37"}
|
||||||
|
{"current_steps": 2285, "total_steps": 3759, "loss": 0.4569, "lr": 1.6006018774028497e-05, "epoch": 4.25512104283054, "percentage": 60.79, "elapsed_time": "5:06:48", "remaining_time": "3:17:54"}
|
||||||
|
{"current_steps": 2290, "total_steps": 3759, "loss": 0.447, "lr": 1.5915068586144426e-05, "epoch": 4.264432029795159, "percentage": 60.92, "elapsed_time": "5:07:35", "remaining_time": "3:17:18"}
|
||||||
|
{"current_steps": 2295, "total_steps": 3759, "loss": 0.4821, "lr": 1.5824206466587563e-05, "epoch": 4.273743016759776, "percentage": 61.05, "elapsed_time": "5:08:17", "remaining_time": "3:16:40"}
|
||||||
|
{"current_steps": 2300, "total_steps": 3759, "loss": 0.4399, "lr": 1.5733434374283067e-05, "epoch": 4.283054003724395, "percentage": 61.19, "elapsed_time": "5:08:58", "remaining_time": "3:15:59"}
|
||||||
|
{"current_steps": 2305, "total_steps": 3759, "loss": 0.4579, "lr": 1.5642754266215147e-05, "epoch": 4.292364990689013, "percentage": 61.32, "elapsed_time": "5:09:45", "remaining_time": "3:15:23"}
|
||||||
|
{"current_steps": 2310, "total_steps": 3759, "loss": 0.4413, "lr": 1.555216809738491e-05, "epoch": 4.301675977653631, "percentage": 61.45, "elapsed_time": "5:10:27", "remaining_time": "3:14:44"}
|
||||||
|
{"current_steps": 2315, "total_steps": 3759, "loss": 0.4646, "lr": 1.5461677820768195e-05, "epoch": 4.31098696461825, "percentage": 61.59, "elapsed_time": "5:11:15", "remaining_time": "3:14:08"}
|
||||||
|
{"current_steps": 2320, "total_steps": 3759, "loss": 0.4791, "lr": 1.537128538727349e-05, "epoch": 4.320297951582868, "percentage": 61.72, "elapsed_time": "5:11:53", "remaining_time": "3:13:27"}
|
||||||
|
{"current_steps": 2325, "total_steps": 3759, "loss": 0.4547, "lr": 1.5280992745699807e-05, "epoch": 4.329608938547486, "percentage": 61.85, "elapsed_time": "5:12:38", "remaining_time": "3:12:49"}
|
||||||
|
{"current_steps": 2330, "total_steps": 3759, "loss": 0.4478, "lr": 1.519080184269475e-05, "epoch": 4.338919925512104, "percentage": 61.98, "elapsed_time": "5:13:15", "remaining_time": "3:12:07"}
|
||||||
|
{"current_steps": 2335, "total_steps": 3759, "loss": 0.442, "lr": 1.5100714622712503e-05, "epoch": 4.348230912476723, "percentage": 62.12, "elapsed_time": "5:13:53", "remaining_time": "3:11:25"}
|
||||||
|
{"current_steps": 2340, "total_steps": 3759, "loss": 0.4664, "lr": 1.5010733027971905e-05, "epoch": 4.35754189944134, "percentage": 62.25, "elapsed_time": "5:14:27", "remaining_time": "3:10:41"}
|
||||||
|
{"current_steps": 2345, "total_steps": 3759, "loss": 0.4564, "lr": 1.4920858998414584e-05, "epoch": 4.366852886405959, "percentage": 62.38, "elapsed_time": "5:15:04", "remaining_time": "3:09:59"}
|
||||||
|
{"current_steps": 2350, "total_steps": 3759, "loss": 0.437, "lr": 1.4831094471663154e-05, "epoch": 4.376163873370578, "percentage": 62.52, "elapsed_time": "5:15:44", "remaining_time": "3:09:18"}
|
||||||
|
{"current_steps": 2355, "total_steps": 3759, "loss": 0.4804, "lr": 1.4741441382979406e-05, "epoch": 4.385474860335195, "percentage": 62.65, "elapsed_time": "5:16:29", "remaining_time": "3:08:41"}
|
||||||
|
{"current_steps": 2360, "total_steps": 3759, "loss": 0.4456, "lr": 1.4651901665222616e-05, "epoch": 4.394785847299814, "percentage": 62.78, "elapsed_time": "5:17:05", "remaining_time": "3:07:58"}
|
||||||
|
{"current_steps": 2365, "total_steps": 3759, "loss": 0.4545, "lr": 1.456247724880786e-05, "epoch": 4.404096834264432, "percentage": 62.92, "elapsed_time": "5:17:53", "remaining_time": "3:07:22"}
|
||||||
|
{"current_steps": 2370, "total_steps": 3759, "loss": 0.4446, "lr": 1.4473170061664371e-05, "epoch": 4.41340782122905, "percentage": 63.05, "elapsed_time": "5:18:30", "remaining_time": "3:06:40"}
|
||||||
|
{"current_steps": 2375, "total_steps": 3759, "loss": 0.4382, "lr": 1.4383982029194035e-05, "epoch": 4.422718808193668, "percentage": 63.18, "elapsed_time": "5:19:15", "remaining_time": "3:06:02"}
|
||||||
|
{"current_steps": 2380, "total_steps": 3759, "loss": 0.4175, "lr": 1.4294915074229817e-05, "epoch": 4.432029795158287, "percentage": 63.31, "elapsed_time": "5:19:55", "remaining_time": "3:05:22"}
|
||||||
|
{"current_steps": 2385, "total_steps": 3759, "loss": 0.4339, "lr": 1.4205971116994357e-05, "epoch": 4.441340782122905, "percentage": 63.45, "elapsed_time": "5:20:40", "remaining_time": "3:04:44"}
|
||||||
|
{"current_steps": 2390, "total_steps": 3759, "loss": 0.4473, "lr": 1.4117152075058522e-05, "epoch": 4.450651769087523, "percentage": 63.58, "elapsed_time": "5:21:26", "remaining_time": "3:04:07"}
|
||||||
|
{"current_steps": 2395, "total_steps": 3759, "loss": 0.4688, "lr": 1.4028459863300112e-05, "epoch": 4.459962756052142, "percentage": 63.71, "elapsed_time": "5:22:12", "remaining_time": "3:03:30"}
|
||||||
|
{"current_steps": 2400, "total_steps": 3759, "loss": 0.4329, "lr": 1.3939896393862544e-05, "epoch": 4.4692737430167595, "percentage": 63.85, "elapsed_time": "5:23:00", "remaining_time": "3:02:53"}
|
||||||
|
{"current_steps": 2405, "total_steps": 3759, "loss": 0.4577, "lr": 1.3851463576113655e-05, "epoch": 4.478584729981378, "percentage": 63.98, "elapsed_time": "5:23:37", "remaining_time": "3:02:11"}
|
||||||
|
{"current_steps": 2410, "total_steps": 3759, "loss": 0.4683, "lr": 1.3763163316604486e-05, "epoch": 4.487895716945996, "percentage": 64.11, "elapsed_time": "5:24:18", "remaining_time": "3:01:32"}
|
||||||
|
{"current_steps": 2415, "total_steps": 3759, "loss": 0.4332, "lr": 1.367499751902825e-05, "epoch": 4.4972067039106145, "percentage": 64.25, "elapsed_time": "5:24:59", "remaining_time": "3:00:51"}
|
||||||
|
{"current_steps": 2420, "total_steps": 3759, "loss": 0.4322, "lr": 1.3586968084179235e-05, "epoch": 4.506517690875233, "percentage": 64.38, "elapsed_time": "5:25:40", "remaining_time": "3:00:11"}
|
||||||
|
{"current_steps": 2425, "total_steps": 3759, "loss": 0.5108, "lr": 1.3499076909911842e-05, "epoch": 4.515828677839851, "percentage": 64.51, "elapsed_time": "5:26:19", "remaining_time": "2:59:30"}
|
||||||
|
{"current_steps": 2430, "total_steps": 3759, "loss": 0.4543, "lr": 1.3411325891099679e-05, "epoch": 4.5251396648044695, "percentage": 64.64, "elapsed_time": "5:27:00", "remaining_time": "2:58:50"}
|
||||||
|
{"current_steps": 2435, "total_steps": 3759, "loss": 0.4727, "lr": 1.3323716919594678e-05, "epoch": 4.534450651769087, "percentage": 64.78, "elapsed_time": "5:27:39", "remaining_time": "2:58:09"}
|
||||||
|
{"current_steps": 2440, "total_steps": 3759, "loss": 0.4497, "lr": 1.3236251884186348e-05, "epoch": 4.543761638733706, "percentage": 64.91, "elapsed_time": "5:28:18", "remaining_time": "2:57:28"}
|
||||||
|
{"current_steps": 2445, "total_steps": 3759, "loss": 0.4545, "lr": 1.3148932670561024e-05, "epoch": 4.553072625698324, "percentage": 65.04, "elapsed_time": "5:29:03", "remaining_time": "2:56:50"}
|
||||||
|
{"current_steps": 2450, "total_steps": 3759, "loss": 0.4659, "lr": 1.3061761161261237e-05, "epoch": 4.562383612662942, "percentage": 65.18, "elapsed_time": "5:29:44", "remaining_time": "2:56:10"}
|
||||||
|
{"current_steps": 2455, "total_steps": 3759, "loss": 0.4573, "lr": 1.2974739235645082e-05, "epoch": 4.571694599627561, "percentage": 65.31, "elapsed_time": "5:30:29", "remaining_time": "2:55:32"}
|
||||||
|
{"current_steps": 2460, "total_steps": 3759, "loss": 0.3961, "lr": 1.2887868769845769e-05, "epoch": 4.581005586592179, "percentage": 65.44, "elapsed_time": "5:31:11", "remaining_time": "2:54:53"}
|
||||||
|
{"current_steps": 2465, "total_steps": 3759, "loss": 0.4492, "lr": 1.2801151636731115e-05, "epoch": 4.590316573556797, "percentage": 65.58, "elapsed_time": "5:31:51", "remaining_time": "2:54:12"}
|
||||||
|
{"current_steps": 2470, "total_steps": 3759, "loss": 0.4065, "lr": 1.2714589705863202e-05, "epoch": 4.599627560521415, "percentage": 65.71, "elapsed_time": "5:32:15", "remaining_time": "2:53:23"}
|
||||||
|
{"current_steps": 2475, "total_steps": 3759, "loss": 0.4397, "lr": 1.2628184843458043e-05, "epoch": 4.608938547486034, "percentage": 65.84, "elapsed_time": "5:32:57", "remaining_time": "2:52:44"}
|
||||||
|
{"current_steps": 2480, "total_steps": 3759, "loss": 0.4533, "lr": 1.2541938912345378e-05, "epoch": 4.618249534450651, "percentage": 65.97, "elapsed_time": "5:33:45", "remaining_time": "2:52:07"}
|
||||||
|
{"current_steps": 2485, "total_steps": 3759, "loss": 0.4091, "lr": 1.2455853771928479e-05, "epoch": 4.62756052141527, "percentage": 66.11, "elapsed_time": "5:34:17", "remaining_time": "2:51:22"}
|
||||||
|
{"current_steps": 2490, "total_steps": 3759, "loss": 0.4329, "lr": 1.2369931278144112e-05, "epoch": 4.636871508379889, "percentage": 66.24, "elapsed_time": "5:34:48", "remaining_time": "2:50:37"}
|
||||||
|
{"current_steps": 2495, "total_steps": 3759, "loss": 0.4523, "lr": 1.2284173283422453e-05, "epoch": 4.646182495344506, "percentage": 66.37, "elapsed_time": "5:35:14", "remaining_time": "2:49:50"}
|
||||||
|
{"current_steps": 2500, "total_steps": 3759, "loss": 0.4708, "lr": 1.2198581636647214e-05, "epoch": 4.655493482309125, "percentage": 66.51, "elapsed_time": "5:35:50", "remaining_time": "2:49:08"}
|
||||||
|
{"current_steps": 2505, "total_steps": 3759, "loss": 0.4718, "lr": 1.2113158183115763e-05, "epoch": 4.664804469273743, "percentage": 66.64, "elapsed_time": "5:36:33", "remaining_time": "2:48:28"}
|
||||||
|
{"current_steps": 2510, "total_steps": 3759, "loss": 0.4283, "lr": 1.2027904764499324e-05, "epoch": 4.674115456238361, "percentage": 66.77, "elapsed_time": "5:37:08", "remaining_time": "2:47:46"}
|
||||||
|
{"current_steps": 2515, "total_steps": 3759, "loss": 0.4049, "lr": 1.1942823218803293e-05, "epoch": 4.683426443202979, "percentage": 66.91, "elapsed_time": "5:37:47", "remaining_time": "2:47:05"}
|
||||||
|
{"current_steps": 2520, "total_steps": 3759, "loss": 0.4717, "lr": 1.1857915380327593e-05, "epoch": 4.692737430167598, "percentage": 67.04, "elapsed_time": "5:38:26", "remaining_time": "2:46:23"}
|
||||||
|
{"current_steps": 2525, "total_steps": 3759, "loss": 0.4404, "lr": 1.1773183079627148e-05, "epoch": 4.702048417132216, "percentage": 67.17, "elapsed_time": "5:39:13", "remaining_time": "2:45:46"}
|
||||||
|
{"current_steps": 2530, "total_steps": 3759, "loss": 0.4475, "lr": 1.1688628143472402e-05, "epoch": 4.711359404096834, "percentage": 67.31, "elapsed_time": "5:39:48", "remaining_time": "2:45:04"}
|
||||||
|
{"current_steps": 2535, "total_steps": 3759, "loss": 0.4528, "lr": 1.1604252394809957e-05, "epoch": 4.720670391061453, "percentage": 67.44, "elapsed_time": "5:40:21", "remaining_time": "2:44:20"}
|
||||||
|
{"current_steps": 2540, "total_steps": 3759, "loss": 0.4501, "lr": 1.1520057652723224e-05, "epoch": 4.72998137802607, "percentage": 67.57, "elapsed_time": "5:41:06", "remaining_time": "2:43:42"}
|
||||||
|
{"current_steps": 2545, "total_steps": 3759, "loss": 0.4784, "lr": 1.1436045732393264e-05, "epoch": 4.739292364990689, "percentage": 67.7, "elapsed_time": "5:41:49", "remaining_time": "2:43:03"}
|
||||||
|
{"current_steps": 2550, "total_steps": 3759, "loss": 0.4664, "lr": 1.135221844505961e-05, "epoch": 4.748603351955307, "percentage": 67.84, "elapsed_time": "5:42:34", "remaining_time": "2:42:25"}
|
||||||
|
{"current_steps": 2555, "total_steps": 3759, "loss": 0.4535, "lr": 1.1268577597981252e-05, "epoch": 4.757914338919925, "percentage": 67.97, "elapsed_time": "5:43:14", "remaining_time": "2:41:44"}
|
||||||
|
{"current_steps": 2560, "total_steps": 3759, "loss": 0.4565, "lr": 1.1185124994397625e-05, "epoch": 4.767225325884544, "percentage": 68.1, "elapsed_time": "5:43:51", "remaining_time": "2:41:03"}
|
||||||
|
{"current_steps": 2565, "total_steps": 3759, "loss": 0.44, "lr": 1.110186243348978e-05, "epoch": 4.776536312849162, "percentage": 68.24, "elapsed_time": "5:44:23", "remaining_time": "2:40:18"}
|
||||||
|
{"current_steps": 2570, "total_steps": 3759, "loss": 0.4508, "lr": 1.1018791710341597e-05, "epoch": 4.78584729981378, "percentage": 68.37, "elapsed_time": "5:45:04", "remaining_time": "2:39:38"}
|
||||||
|
{"current_steps": 2575, "total_steps": 3759, "loss": 0.4623, "lr": 1.0935914615901044e-05, "epoch": 4.795158286778398, "percentage": 68.5, "elapsed_time": "5:45:40", "remaining_time": "2:38:56"}
|
||||||
|
{"current_steps": 2580, "total_steps": 3759, "loss": 0.4398, "lr": 1.0853232936941579e-05, "epoch": 4.804469273743017, "percentage": 68.64, "elapsed_time": "5:46:22", "remaining_time": "2:38:16"}
|
||||||
|
{"current_steps": 2585, "total_steps": 3759, "loss": 0.453, "lr": 1.0770748456023651e-05, "epoch": 4.8137802607076345, "percentage": 68.77, "elapsed_time": "5:47:08", "remaining_time": "2:37:39"}
|
||||||
|
{"current_steps": 2590, "total_steps": 3759, "loss": 0.4497, "lr": 1.0688462951456249e-05, "epoch": 4.823091247672253, "percentage": 68.9, "elapsed_time": "5:47:44", "remaining_time": "2:36:57"}
|
||||||
|
{"current_steps": 2595, "total_steps": 3759, "loss": 0.4684, "lr": 1.0606378197258563e-05, "epoch": 4.832402234636872, "percentage": 69.03, "elapsed_time": "5:48:28", "remaining_time": "2:36:18"}
|
||||||
|
{"current_steps": 2600, "total_steps": 3759, "loss": 0.4602, "lr": 1.0524495963121767e-05, "epoch": 4.8417132216014895, "percentage": 69.17, "elapsed_time": "5:49:04", "remaining_time": "2:35:36"}
|
||||||
|
{"current_steps": 2605, "total_steps": 3759, "loss": 0.4442, "lr": 1.0442818014370805e-05, "epoch": 4.851024208566108, "percentage": 69.3, "elapsed_time": "5:49:44", "remaining_time": "2:34:55"}
|
||||||
|
{"current_steps": 2610, "total_steps": 3759, "loss": 0.4395, "lr": 1.0361346111926391e-05, "epoch": 4.860335195530726, "percentage": 69.43, "elapsed_time": "5:50:21", "remaining_time": "2:34:14"}
|
||||||
|
{"current_steps": 2615, "total_steps": 3759, "loss": 0.4322, "lr": 1.0280082012267015e-05, "epoch": 4.8696461824953445, "percentage": 69.57, "elapsed_time": "5:51:00", "remaining_time": "2:33:33"}
|
||||||
|
{"current_steps": 2620, "total_steps": 3759, "loss": 0.4362, "lr": 1.0199027467391077e-05, "epoch": 4.878957169459962, "percentage": 69.7, "elapsed_time": "5:51:47", "remaining_time": "2:32:55"}
|
||||||
|
{"current_steps": 2625, "total_steps": 3759, "loss": 0.4335, "lr": 1.0118184224779126e-05, "epoch": 4.888268156424581, "percentage": 69.83, "elapsed_time": "5:52:28", "remaining_time": "2:32:16"}
|
||||||
|
{"current_steps": 2630, "total_steps": 3759, "loss": 0.4314, "lr": 1.0037554027356177e-05, "epoch": 4.8975791433891995, "percentage": 69.97, "elapsed_time": "5:53:09", "remaining_time": "2:31:36"}
|
||||||
|
{"current_steps": 2635, "total_steps": 3759, "loss": 0.4419, "lr": 9.957138613454138e-06, "epoch": 4.906890130353817, "percentage": 70.1, "elapsed_time": "5:53:42", "remaining_time": "2:30:52"}
|
||||||
|
{"current_steps": 2640, "total_steps": 3759, "loss": 0.4615, "lr": 9.876939716774332e-06, "epoch": 4.916201117318436, "percentage": 70.23, "elapsed_time": "5:54:26", "remaining_time": "2:30:14"}
|
||||||
|
{"current_steps": 2645, "total_steps": 3759, "loss": 0.4402, "lr": 9.796959066350104e-06, "epoch": 4.925512104283054, "percentage": 70.36, "elapsed_time": "5:55:08", "remaining_time": "2:29:34"}
|
||||||
|
{"current_steps": 2650, "total_steps": 3759, "loss": 0.4548, "lr": 9.717198386509572e-06, "epoch": 4.934823091247672, "percentage": 70.5, "elapsed_time": "5:55:49", "remaining_time": "2:28:54"}
|
||||||
|
{"current_steps": 2655, "total_steps": 3759, "loss": 0.4804, "lr": 9.63765939683845e-06, "epoch": 4.94413407821229, "percentage": 70.63, "elapsed_time": "5:56:28", "remaining_time": "2:28:13"}
|
||||||
|
{"current_steps": 2660, "total_steps": 3759, "loss": 0.4273, "lr": 9.558343812142942e-06, "epoch": 4.953445065176909, "percentage": 70.76, "elapsed_time": "5:57:09", "remaining_time": "2:27:33"}
|
||||||
|
{"current_steps": 2665, "total_steps": 3759, "loss": 0.4609, "lr": 9.479253342412815e-06, "epoch": 4.962756052141527, "percentage": 70.9, "elapsed_time": "5:57:54", "remaining_time": "2:26:55"}
|
||||||
|
{"current_steps": 2670, "total_steps": 3759, "loss": 0.4826, "lr": 9.400389692784479e-06, "epoch": 4.972067039106145, "percentage": 71.03, "elapsed_time": "5:58:41", "remaining_time": "2:26:17"}
|
||||||
|
{"current_steps": 2675, "total_steps": 3759, "loss": 0.4594, "lr": 9.321754563504288e-06, "epoch": 4.981378026070764, "percentage": 71.16, "elapsed_time": "5:59:19", "remaining_time": "2:25:36"}
|
||||||
|
{"current_steps": 2680, "total_steps": 3759, "loss": 0.4609, "lr": 9.243349649891842e-06, "epoch": 4.990689013035381, "percentage": 71.3, "elapsed_time": "6:00:02", "remaining_time": "2:24:57"}
|
||||||
|
{"current_steps": 2685, "total_steps": 3759, "loss": 0.4609, "lr": 9.16517664230345e-06, "epoch": 5.0, "percentage": 71.43, "elapsed_time": "6:00:33", "remaining_time": "2:24:13"}
|
||||||
|
{"current_steps": 2690, "total_steps": 3759, "loss": 0.4234, "lr": 9.087237226095687e-06, "epoch": 5.009310986964619, "percentage": 71.56, "elapsed_time": "6:01:03", "remaining_time": "2:23:28"}
|
||||||
|
{"current_steps": 2695, "total_steps": 3759, "loss": 0.3896, "lr": 9.009533081589055e-06, "epoch": 5.018621973929236, "percentage": 71.69, "elapsed_time": "6:01:36", "remaining_time": "2:22:45"}
|
||||||
|
{"current_steps": 2700, "total_steps": 3759, "loss": 0.4558, "lr": 8.93206588403176e-06, "epoch": 5.027932960893855, "percentage": 71.83, "elapsed_time": "6:02:23", "remaining_time": "2:22:08"}
|
||||||
|
{"current_steps": 2705, "total_steps": 3759, "loss": 0.4241, "lr": 8.854837303563594e-06, "epoch": 5.037243947858473, "percentage": 71.96, "elapsed_time": "6:03:07", "remaining_time": "2:21:29"}
|
||||||
|
{"current_steps": 2710, "total_steps": 3759, "loss": 0.4556, "lr": 8.77784900517993e-06, "epoch": 5.046554934823091, "percentage": 72.09, "elapsed_time": "6:03:51", "remaining_time": "2:20:50"}
|
||||||
|
{"current_steps": 2715, "total_steps": 3759, "loss": 0.4369, "lr": 8.701102648695821e-06, "epoch": 5.055865921787709, "percentage": 72.23, "elapsed_time": "6:04:34", "remaining_time": "2:20:11"}
|
||||||
|
{"current_steps": 2720, "total_steps": 3759, "loss": 0.4192, "lr": 8.624599888710217e-06, "epoch": 5.065176908752328, "percentage": 72.36, "elapsed_time": "6:05:11", "remaining_time": "2:19:29"}
|
||||||
|
{"current_steps": 2725, "total_steps": 3759, "loss": 0.4262, "lr": 8.548342374570304e-06, "epoch": 5.074487895716946, "percentage": 72.49, "elapsed_time": "6:05:57", "remaining_time": "2:18:51"}
|
||||||
|
{"current_steps": 2730, "total_steps": 3759, "loss": 0.4093, "lr": 8.472331750335913e-06, "epoch": 5.083798882681564, "percentage": 72.63, "elapsed_time": "6:06:39", "remaining_time": "2:18:12"}
|
||||||
|
{"current_steps": 2735, "total_steps": 3759, "loss": 0.4434, "lr": 8.396569654744117e-06, "epoch": 5.093109869646183, "percentage": 72.76, "elapsed_time": "6:07:17", "remaining_time": "2:17:31"}
|
||||||
|
{"current_steps": 2740, "total_steps": 3759, "loss": 0.4044, "lr": 8.321057721173873e-06, "epoch": 5.1024208566108005, "percentage": 72.89, "elapsed_time": "6:07:53", "remaining_time": "2:16:48"}
|
||||||
|
{"current_steps": 2745, "total_steps": 3759, "loss": 0.4225, "lr": 8.24579757761083e-06, "epoch": 5.111731843575419, "percentage": 73.02, "elapsed_time": "6:08:26", "remaining_time": "2:16:06"}
|
||||||
|
{"current_steps": 2750, "total_steps": 3759, "loss": 0.4545, "lr": 8.170790846612205e-06, "epoch": 5.121042830540037, "percentage": 73.16, "elapsed_time": "6:09:13", "remaining_time": "2:15:28"}
|
||||||
|
{"current_steps": 2755, "total_steps": 3759, "loss": 0.4161, "lr": 8.096039145271802e-06, "epoch": 5.1303538175046555, "percentage": 73.29, "elapsed_time": "6:09:47", "remaining_time": "2:14:45"}
|
||||||
|
{"current_steps": 2760, "total_steps": 3759, "loss": 0.4643, "lr": 8.021544085185178e-06, "epoch": 5.139664804469274, "percentage": 73.42, "elapsed_time": "6:10:34", "remaining_time": "2:14:07"}
|
||||||
|
{"current_steps": 2765, "total_steps": 3759, "loss": 0.4447, "lr": 7.947307272414874e-06, "epoch": 5.148975791433892, "percentage": 73.56, "elapsed_time": "6:11:18", "remaining_time": "2:13:28"}
|
||||||
|
{"current_steps": 2770, "total_steps": 3759, "loss": 0.4364, "lr": 7.873330307455797e-06, "epoch": 5.1582867783985105, "percentage": 73.69, "elapsed_time": "6:11:52", "remaining_time": "2:12:46"}
|
||||||
|
{"current_steps": 2775, "total_steps": 3759, "loss": 0.4538, "lr": 7.799614785200713e-06, "epoch": 5.167597765363128, "percentage": 73.82, "elapsed_time": "6:12:34", "remaining_time": "2:12:06"}
|
||||||
|
{"current_steps": 2780, "total_steps": 3759, "loss": 0.4354, "lr": 7.726162294905857e-06, "epoch": 5.176908752327747, "percentage": 73.96, "elapsed_time": "6:13:12", "remaining_time": "2:11:25"}
|
||||||
|
{"current_steps": 2785, "total_steps": 3759, "loss": 0.4657, "lr": 7.65297442015668e-06, "epoch": 5.186219739292365, "percentage": 74.09, "elapsed_time": "6:13:48", "remaining_time": "2:10:43"}
|
||||||
|
{"current_steps": 2790, "total_steps": 3759, "loss": 0.431, "lr": 7.58005273883371e-06, "epoch": 5.195530726256983, "percentage": 74.22, "elapsed_time": "6:14:29", "remaining_time": "2:10:03"}
|
||||||
|
{"current_steps": 2795, "total_steps": 3759, "loss": 0.471, "lr": 7.507398823078495e-06, "epoch": 5.204841713221602, "percentage": 74.35, "elapsed_time": "6:15:05", "remaining_time": "2:09:22"}
|
||||||
|
{"current_steps": 2800, "total_steps": 3759, "loss": 0.4286, "lr": 7.4350142392597855e-06, "epoch": 5.21415270018622, "percentage": 74.49, "elapsed_time": "6:15:38", "remaining_time": "2:08:39"}
|
||||||
|
{"current_steps": 2805, "total_steps": 3759, "loss": 0.4001, "lr": 7.362900547939693e-06, "epoch": 5.223463687150838, "percentage": 74.62, "elapsed_time": "6:16:11", "remaining_time": "2:07:56"}
|
||||||
|
{"current_steps": 2810, "total_steps": 3759, "loss": 0.415, "lr": 7.291059303840082e-06, "epoch": 5.232774674115456, "percentage": 74.75, "elapsed_time": "6:16:52", "remaining_time": "2:07:16"}
|
||||||
|
{"current_steps": 2815, "total_steps": 3759, "loss": 0.4682, "lr": 7.219492055809023e-06, "epoch": 5.242085661080075, "percentage": 74.89, "elapsed_time": "6:17:32", "remaining_time": "2:06:36"}
|
||||||
|
{"current_steps": 2820, "total_steps": 3759, "loss": 0.4077, "lr": 7.148200346787437e-06, "epoch": 5.251396648044693, "percentage": 75.02, "elapsed_time": "6:18:11", "remaining_time": "2:05:55"}
|
||||||
|
{"current_steps": 2825, "total_steps": 3759, "loss": 0.4622, "lr": 7.0771857137758025e-06, "epoch": 5.260707635009311, "percentage": 75.15, "elapsed_time": "6:18:56", "remaining_time": "2:05:17"}
|
||||||
|
{"current_steps": 2830, "total_steps": 3759, "loss": 0.4259, "lr": 7.0064496878010466e-06, "epoch": 5.27001862197393, "percentage": 75.29, "elapsed_time": "6:19:40", "remaining_time": "2:04:38"}
|
||||||
|
{"current_steps": 2835, "total_steps": 3759, "loss": 0.4514, "lr": 6.935993793883509e-06, "epoch": 5.279329608938547, "percentage": 75.42, "elapsed_time": "6:20:24", "remaining_time": "2:03:59"}
|
||||||
|
{"current_steps": 2840, "total_steps": 3759, "loss": 0.4331, "lr": 6.865819551004051e-06, "epoch": 5.288640595903166, "percentage": 75.55, "elapsed_time": "6:21:05", "remaining_time": "2:03:18"}
|
||||||
|
{"current_steps": 2845, "total_steps": 3759, "loss": 0.445, "lr": 6.795928472071363e-06, "epoch": 5.297951582867784, "percentage": 75.69, "elapsed_time": "6:21:42", "remaining_time": "2:02:37"}
|
||||||
|
{"current_steps": 2850, "total_steps": 3759, "loss": 0.4315, "lr": 6.726322063889299e-06, "epoch": 5.307262569832402, "percentage": 75.82, "elapsed_time": "6:22:20", "remaining_time": "2:01:56"}
|
||||||
|
{"current_steps": 2855, "total_steps": 3759, "loss": 0.4362, "lr": 6.6570018271244096e-06, "epoch": 5.316573556797021, "percentage": 75.95, "elapsed_time": "6:23:06", "remaining_time": "2:01:18"}
|
||||||
|
{"current_steps": 2860, "total_steps": 3759, "loss": 0.4276, "lr": 6.5879692562735854e-06, "epoch": 5.325884543761639, "percentage": 76.08, "elapsed_time": "6:23:43", "remaining_time": "2:00:37"}
|
||||||
|
{"current_steps": 2865, "total_steps": 3759, "loss": 0.4118, "lr": 6.519225839631833e-06, "epoch": 5.335195530726257, "percentage": 76.22, "elapsed_time": "6:24:21", "remaining_time": "1:59:56"}
|
||||||
|
{"current_steps": 2870, "total_steps": 3759, "loss": 0.4313, "lr": 6.4507730592602e-06, "epoch": 5.344506517690875, "percentage": 76.35, "elapsed_time": "6:25:02", "remaining_time": "1:59:16"}
|
||||||
|
{"current_steps": 2875, "total_steps": 3759, "loss": 0.4682, "lr": 6.382612390953813e-06, "epoch": 5.353817504655494, "percentage": 76.48, "elapsed_time": "6:25:37", "remaining_time": "1:58:34"}
|
||||||
|
{"current_steps": 2880, "total_steps": 3759, "loss": 0.4717, "lr": 6.3147453042100395e-06, "epoch": 5.363128491620111, "percentage": 76.62, "elapsed_time": "6:26:23", "remaining_time": "1:57:55"}
|
||||||
|
{"current_steps": 2885, "total_steps": 3759, "loss": 0.4276, "lr": 6.247173262196871e-06, "epoch": 5.37243947858473, "percentage": 76.75, "elapsed_time": "6:27:04", "remaining_time": "1:57:15"}
|
||||||
|
{"current_steps": 2890, "total_steps": 3759, "loss": 0.4227, "lr": 6.179897721721304e-06, "epoch": 5.381750465549349, "percentage": 76.88, "elapsed_time": "6:27:45", "remaining_time": "1:56:35"}
|
||||||
|
{"current_steps": 2895, "total_steps": 3759, "loss": 0.4594, "lr": 6.112920133197979e-06, "epoch": 5.391061452513966, "percentage": 77.02, "elapsed_time": "6:28:26", "remaining_time": "1:55:55"}
|
||||||
|
{"current_steps": 2900, "total_steps": 3759, "loss": 0.4368, "lr": 6.046241940617896e-06, "epoch": 5.400372439478585, "percentage": 77.15, "elapsed_time": "6:29:07", "remaining_time": "1:55:15"}
|
||||||
|
{"current_steps": 2905, "total_steps": 3759, "loss": 0.4468, "lr": 5.979864581517267e-06, "epoch": 5.409683426443203, "percentage": 77.28, "elapsed_time": "6:29:48", "remaining_time": "1:54:35"}
|
||||||
|
{"current_steps": 2910, "total_steps": 3759, "loss": 0.4615, "lr": 5.913789486946555e-06, "epoch": 5.418994413407821, "percentage": 77.41, "elapsed_time": "6:30:23", "remaining_time": "1:53:53"}
|
||||||
|
{"current_steps": 2915, "total_steps": 3759, "loss": 0.4304, "lr": 5.848018081439612e-06, "epoch": 5.428305400372439, "percentage": 77.55, "elapsed_time": "6:31:01", "remaining_time": "1:53:13"}
|
||||||
|
{"current_steps": 2920, "total_steps": 3759, "loss": 0.473, "lr": 5.782551782982959e-06, "epoch": 5.437616387337058, "percentage": 77.68, "elapsed_time": "6:31:46", "remaining_time": "1:52:34"}
|
||||||
|
{"current_steps": 2925, "total_steps": 3759, "loss": 0.4381, "lr": 5.7173920029851934e-06, "epoch": 5.446927374301676, "percentage": 77.81, "elapsed_time": "6:32:23", "remaining_time": "1:51:52"}
|
||||||
|
{"current_steps": 2930, "total_steps": 3759, "loss": 0.4318, "lr": 5.652540146246615e-06, "epoch": 5.456238361266294, "percentage": 77.95, "elapsed_time": "6:33:00", "remaining_time": "1:51:11"}
|
||||||
|
{"current_steps": 2935, "total_steps": 3759, "loss": 0.4446, "lr": 5.587997610928893e-06, "epoch": 5.465549348230913, "percentage": 78.08, "elapsed_time": "6:33:39", "remaining_time": "1:50:31"}
|
||||||
|
{"current_steps": 2940, "total_steps": 3759, "loss": 0.4591, "lr": 5.523765788524941e-06, "epoch": 5.4748603351955305, "percentage": 78.21, "elapsed_time": "6:34:19", "remaining_time": "1:49:50"}
|
||||||
|
{"current_steps": 2945, "total_steps": 3759, "loss": 0.4253, "lr": 5.459846063828922e-06, "epoch": 5.484171322160149, "percentage": 78.35, "elapsed_time": "6:35:00", "remaining_time": "1:49:10"}
|
||||||
|
{"current_steps": 2950, "total_steps": 3759, "loss": 0.407, "lr": 5.396239814906374e-06, "epoch": 5.493482309124767, "percentage": 78.48, "elapsed_time": "6:35:38", "remaining_time": "1:48:29"}
|
||||||
|
{"current_steps": 2955, "total_steps": 3759, "loss": 0.4299, "lr": 5.332948413064521e-06, "epoch": 5.5027932960893855, "percentage": 78.61, "elapsed_time": "6:36:13", "remaining_time": "1:47:48"}
|
||||||
|
{"current_steps": 2960, "total_steps": 3759, "loss": 0.4498, "lr": 5.269973222822698e-06, "epoch": 5.512104283054004, "percentage": 78.74, "elapsed_time": "6:36:51", "remaining_time": "1:47:07"}
|
||||||
|
{"current_steps": 2965, "total_steps": 3759, "loss": 0.4606, "lr": 5.207315601882914e-06, "epoch": 5.521415270018622, "percentage": 78.88, "elapsed_time": "6:37:29", "remaining_time": "1:46:26"}
|
||||||
|
{"current_steps": 2970, "total_steps": 3759, "loss": 0.4533, "lr": 5.14497690110064e-06, "epoch": 5.5307262569832405, "percentage": 79.01, "elapsed_time": "6:38:12", "remaining_time": "1:45:47"}
|
||||||
|
{"current_steps": 2975, "total_steps": 3759, "loss": 0.468, "lr": 5.0829584644556186e-06, "epoch": 5.540037243947858, "percentage": 79.14, "elapsed_time": "6:38:47", "remaining_time": "1:45:05"}
|
||||||
|
{"current_steps": 2980, "total_steps": 3759, "loss": 0.4419, "lr": 5.021261629022924e-06, "epoch": 5.549348230912477, "percentage": 79.28, "elapsed_time": "6:39:22", "remaining_time": "1:44:24"}
|
||||||
|
{"current_steps": 2985, "total_steps": 3759, "loss": 0.46, "lr": 4.959887724944132e-06, "epoch": 5.558659217877095, "percentage": 79.41, "elapsed_time": "6:39:59", "remaining_time": "1:43:42"}
|
||||||
|
{"current_steps": 2990, "total_steps": 3759, "loss": 0.4418, "lr": 4.8988380753986245e-06, "epoch": 5.567970204841713, "percentage": 79.54, "elapsed_time": "6:40:33", "remaining_time": "1:43:01"}
|
||||||
|
{"current_steps": 2995, "total_steps": 3759, "loss": 0.4088, "lr": 4.838113996575094e-06, "epoch": 5.577281191806332, "percentage": 79.68, "elapsed_time": "6:41:20", "remaining_time": "1:42:22"}
|
||||||
|
{"current_steps": 3000, "total_steps": 3759, "loss": 0.4425, "lr": 4.777716797643137e-06, "epoch": 5.58659217877095, "percentage": 79.81, "elapsed_time": "6:41:58", "remaining_time": "1:41:41"}
|
||||||
|
{"current_steps": 3005, "total_steps": 3759, "loss": 0.4547, "lr": 4.717647780725072e-06, "epoch": 5.595903165735568, "percentage": 79.94, "elapsed_time": "6:42:42", "remaining_time": "1:41:02"}
|
||||||
|
{"current_steps": 3010, "total_steps": 3759, "loss": 0.4328, "lr": 4.657908240867799e-06, "epoch": 5.605214152700186, "percentage": 80.07, "elapsed_time": "6:43:23", "remaining_time": "1:40:22"}
|
||||||
|
{"current_steps": 3015, "total_steps": 3759, "loss": 0.427, "lr": 4.598499466014939e-06, "epoch": 5.614525139664805, "percentage": 80.21, "elapsed_time": "6:44:09", "remaining_time": "1:39:44"}
|
||||||
|
{"current_steps": 3020, "total_steps": 3759, "loss": 0.4132, "lr": 4.539422736979043e-06, "epoch": 5.623836126629422, "percentage": 80.34, "elapsed_time": "6:44:46", "remaining_time": "1:39:02"}
|
||||||
|
{"current_steps": 3025, "total_steps": 3759, "loss": 0.4307, "lr": 4.480679327413982e-06, "epoch": 5.633147113594041, "percentage": 80.47, "elapsed_time": "6:45:23", "remaining_time": "1:38:21"}
|
||||||
|
{"current_steps": 3030, "total_steps": 3759, "loss": 0.4467, "lr": 4.422270503787487e-06, "epoch": 5.64245810055866, "percentage": 80.61, "elapsed_time": "6:46:05", "remaining_time": "1:37:42"}
|
||||||
|
{"current_steps": 3035, "total_steps": 3759, "loss": 0.4319, "lr": 4.364197525353842e-06, "epoch": 5.651769087523277, "percentage": 80.74, "elapsed_time": "6:46:41", "remaining_time": "1:37:00"}
|
||||||
|
{"current_steps": 3040, "total_steps": 3759, "loss": 0.4351, "lr": 4.30646164412674e-06, "epoch": 5.661080074487896, "percentage": 80.87, "elapsed_time": "6:47:21", "remaining_time": "1:36:20"}
|
||||||
|
{"current_steps": 3045, "total_steps": 3759, "loss": 0.4466, "lr": 4.249064104852299e-06, "epoch": 5.670391061452514, "percentage": 81.01, "elapsed_time": "6:48:01", "remaining_time": "1:35:40"}
|
||||||
|
{"current_steps": 3050, "total_steps": 3759, "loss": 0.436, "lr": 4.19200614498219e-06, "epoch": 5.679702048417132, "percentage": 81.14, "elapsed_time": "6:48:45", "remaining_time": "1:35:01"}
|
||||||
|
{"current_steps": 3055, "total_steps": 3759, "loss": 0.4069, "lr": 4.1352889946470024e-06, "epoch": 5.68901303538175, "percentage": 81.27, "elapsed_time": "6:49:17", "remaining_time": "1:34:19"}
|
||||||
|
{"current_steps": 3060, "total_steps": 3759, "loss": 0.4256, "lr": 4.078913876629716e-06, "epoch": 5.698324022346369, "percentage": 81.4, "elapsed_time": "6:49:59", "remaining_time": "1:33:39"}
|
||||||
|
{"current_steps": 3065, "total_steps": 3759, "loss": 0.4499, "lr": 4.0228820063393055e-06, "epoch": 5.707635009310987, "percentage": 81.54, "elapsed_time": "6:50:38", "remaining_time": "1:32:58"}
|
||||||
|
{"current_steps": 3070, "total_steps": 3759, "loss": 0.4312, "lr": 3.967194591784578e-06, "epoch": 5.716945996275605, "percentage": 81.67, "elapsed_time": "6:51:17", "remaining_time": "1:32:18"}
|
||||||
|
{"current_steps": 3075, "total_steps": 3759, "loss": 0.4372, "lr": 3.911852833548086e-06, "epoch": 5.726256983240224, "percentage": 81.8, "elapsed_time": "6:51:57", "remaining_time": "1:31:38"}
|
||||||
|
{"current_steps": 3080, "total_steps": 3759, "loss": 0.4475, "lr": 3.856857924760296e-06, "epoch": 5.735567970204841, "percentage": 81.94, "elapsed_time": "6:52:31", "remaining_time": "1:30:56"}
|
||||||
|
{"current_steps": 3085, "total_steps": 3759, "loss": 0.4553, "lr": 3.802211051073814e-06, "epoch": 5.74487895716946, "percentage": 82.07, "elapsed_time": "6:53:17", "remaining_time": "1:30:17"}
|
||||||
|
{"current_steps": 3090, "total_steps": 3759, "loss": 0.4615, "lr": 3.7479133906378783e-06, "epoch": 5.754189944134078, "percentage": 82.2, "elapsed_time": "6:54:03", "remaining_time": "1:29:38"}
|
||||||
|
{"current_steps": 3095, "total_steps": 3759, "loss": 0.4383, "lr": 3.6939661140728887e-06, "epoch": 5.763500931098696, "percentage": 82.34, "elapsed_time": "6:54:42", "remaining_time": "1:28:58"}
|
||||||
|
{"current_steps": 3100, "total_steps": 3759, "loss": 0.4356, "lr": 3.6403703844452374e-06, "epoch": 5.772811918063315, "percentage": 82.47, "elapsed_time": "6:55:24", "remaining_time": "1:28:18"}
|
||||||
|
{"current_steps": 3105, "total_steps": 3759, "loss": 0.4229, "lr": 3.587127357242193e-06, "epoch": 5.782122905027933, "percentage": 82.6, "elapsed_time": "6:55:59", "remaining_time": "1:27:37"}
|
||||||
|
{"current_steps": 3110, "total_steps": 3759, "loss": 0.4287, "lr": 3.534238180347005e-06, "epoch": 5.791433891992551, "percentage": 82.73, "elapsed_time": "6:56:37", "remaining_time": "1:26:56"}
|
||||||
|
{"current_steps": 3115, "total_steps": 3759, "loss": 0.4235, "lr": 3.4817039940141517e-06, "epoch": 5.800744878957169, "percentage": 82.87, "elapsed_time": "6:57:24", "remaining_time": "1:26:17"}
|
||||||
|
{"current_steps": 3120, "total_steps": 3759, "loss": 0.4484, "lr": 3.4295259308447593e-06, "epoch": 5.810055865921788, "percentage": 83.0, "elapsed_time": "6:58:09", "remaining_time": "1:25:38"}
|
||||||
|
{"current_steps": 3125, "total_steps": 3759, "loss": 0.4718, "lr": 3.3777051157621754e-06, "epoch": 5.8193668528864055, "percentage": 83.13, "elapsed_time": "6:58:47", "remaining_time": "1:24:57"}
|
||||||
|
{"current_steps": 3130, "total_steps": 3759, "loss": 0.4694, "lr": 3.326242665987738e-06, "epoch": 5.828677839851024, "percentage": 83.27, "elapsed_time": "6:59:29", "remaining_time": "1:24:17"}
|
||||||
|
{"current_steps": 3135, "total_steps": 3759, "loss": 0.4228, "lr": 3.2751396910166513e-06, "epoch": 5.837988826815643, "percentage": 83.4, "elapsed_time": "7:00:14", "remaining_time": "1:23:38"}
|
||||||
|
{"current_steps": 3140, "total_steps": 3759, "loss": 0.4168, "lr": 3.224397292594101e-06, "epoch": 5.8472998137802605, "percentage": 83.53, "elapsed_time": "7:00:55", "remaining_time": "1:22:58"}
|
||||||
|
{"current_steps": 3145, "total_steps": 3759, "loss": 0.4644, "lr": 3.1740165646915024e-06, "epoch": 5.856610800744879, "percentage": 83.67, "elapsed_time": "7:01:28", "remaining_time": "1:22:16"}
|
||||||
|
{"current_steps": 3150, "total_steps": 3759, "loss": 0.44, "lr": 3.12399859348288e-06, "epoch": 5.865921787709497, "percentage": 83.8, "elapsed_time": "7:02:09", "remaining_time": "1:21:36"}
|
||||||
|
{"current_steps": 3155, "total_steps": 3759, "loss": 0.4435, "lr": 3.07434445732149e-06, "epoch": 5.8752327746741155, "percentage": 83.93, "elapsed_time": "7:02:45", "remaining_time": "1:20:56"}
|
||||||
|
{"current_steps": 3160, "total_steps": 3759, "loss": 0.4315, "lr": 3.025055226716531e-06, "epoch": 5.884543761638733, "percentage": 84.06, "elapsed_time": "7:03:32", "remaining_time": "1:20:17"}
|
||||||
|
{"current_steps": 3165, "total_steps": 3759, "loss": 0.4331, "lr": 2.9761319643101094e-06, "epoch": 5.893854748603352, "percentage": 84.2, "elapsed_time": "7:04:16", "remaining_time": "1:19:37"}
|
||||||
|
{"current_steps": 3170, "total_steps": 3759, "loss": 0.4683, "lr": 2.927575724854299e-06, "epoch": 5.9031657355679705, "percentage": 84.33, "elapsed_time": "7:04:58", "remaining_time": "1:18:57"}
|
||||||
|
{"current_steps": 3175, "total_steps": 3759, "loss": 0.4238, "lr": 2.879387555188411e-06, "epoch": 5.912476722532588, "percentage": 84.46, "elapsed_time": "7:05:40", "remaining_time": "1:18:17"}
|
||||||
|
{"current_steps": 3180, "total_steps": 3759, "loss": 0.4632, "lr": 2.831568494216419e-06, "epoch": 5.921787709497207, "percentage": 84.6, "elapsed_time": "7:06:23", "remaining_time": "1:17:38"}
|
||||||
|
{"current_steps": 3185, "total_steps": 3759, "loss": 0.4331, "lr": 2.784119572884574e-06, "epoch": 5.931098696461825, "percentage": 84.73, "elapsed_time": "7:06:57", "remaining_time": "1:16:56"}
|
||||||
|
{"current_steps": 3190, "total_steps": 3759, "loss": 0.4178, "lr": 2.7370418141591605e-06, "epoch": 5.940409683426443, "percentage": 84.86, "elapsed_time": "7:07:28", "remaining_time": "1:16:14"}
|
||||||
|
{"current_steps": 3195, "total_steps": 3759, "loss": 0.4148, "lr": 2.6903362330044604e-06, "epoch": 5.949720670391061, "percentage": 85.0, "elapsed_time": "7:08:09", "remaining_time": "1:15:34"}
|
||||||
|
{"current_steps": 3200, "total_steps": 3759, "loss": 0.437, "lr": 2.6440038363608555e-06, "epoch": 5.95903165735568, "percentage": 85.13, "elapsed_time": "7:08:53", "remaining_time": "1:14:55"}
|
||||||
|
{"current_steps": 3205, "total_steps": 3759, "loss": 0.4582, "lr": 2.5980456231231242e-06, "epoch": 5.968342644320298, "percentage": 85.26, "elapsed_time": "7:09:34", "remaining_time": "1:14:15"}
|
||||||
|
{"current_steps": 3210, "total_steps": 3759, "loss": 0.4355, "lr": 2.552462584118911e-06, "epoch": 5.977653631284916, "percentage": 85.4, "elapsed_time": "7:10:11", "remaining_time": "1:13:34"}
|
||||||
|
{"current_steps": 3215, "total_steps": 3759, "loss": 0.4743, "lr": 2.5072557020873568e-06, "epoch": 5.986964618249535, "percentage": 85.53, "elapsed_time": "7:10:46", "remaining_time": "1:12:53"}
|
||||||
|
{"current_steps": 3220, "total_steps": 3759, "loss": 0.459, "lr": 2.462425951657923e-06, "epoch": 5.996275605214152, "percentage": 85.66, "elapsed_time": "7:11:29", "remaining_time": "1:12:13"}
|
||||||
|
{"current_steps": 3225, "total_steps": 3759, "loss": 0.4494, "lr": 2.4179742993293552e-06, "epoch": 6.005586592178771, "percentage": 85.79, "elapsed_time": "7:12:15", "remaining_time": "1:11:34"}
|
||||||
|
{"current_steps": 3230, "total_steps": 3759, "loss": 0.4572, "lr": 2.3739017034488756e-06, "epoch": 6.01489757914339, "percentage": 85.93, "elapsed_time": "7:12:57", "remaining_time": "1:10:54"}
|
||||||
|
{"current_steps": 3235, "total_steps": 3759, "loss": 0.449, "lr": 2.3302091141915107e-06, "epoch": 6.024208566108007, "percentage": 86.06, "elapsed_time": "7:13:41", "remaining_time": "1:10:14"}
|
||||||
|
{"current_steps": 3240, "total_steps": 3759, "loss": 0.4281, "lr": 2.286897473539602e-06, "epoch": 6.033519553072626, "percentage": 86.19, "elapsed_time": "7:14:28", "remaining_time": "1:09:35"}
|
||||||
|
{"current_steps": 3245, "total_steps": 3759, "loss": 0.4233, "lr": 2.2439677152624874e-06, "epoch": 6.042830540037244, "percentage": 86.33, "elapsed_time": "7:15:09", "remaining_time": "1:08:55"}
|
||||||
|
{"current_steps": 3250, "total_steps": 3759, "loss": 0.4516, "lr": 2.201420764896396e-06, "epoch": 6.052141527001862, "percentage": 86.46, "elapsed_time": "7:15:49", "remaining_time": "1:08:15"}
|
||||||
|
{"current_steps": 3255, "total_steps": 3759, "loss": 0.4321, "lr": 2.1592575397244775e-06, "epoch": 6.06145251396648, "percentage": 86.59, "elapsed_time": "7:16:29", "remaining_time": "1:07:35"}
|
||||||
|
{"current_steps": 3260, "total_steps": 3759, "loss": 0.4386, "lr": 2.1174789487570233e-06, "epoch": 6.070763500931099, "percentage": 86.73, "elapsed_time": "7:16:58", "remaining_time": "1:06:53"}
|
||||||
|
{"current_steps": 3265, "total_steps": 3759, "loss": 0.4767, "lr": 2.0760858927118833e-06, "epoch": 6.080074487895717, "percentage": 86.86, "elapsed_time": "7:17:35", "remaining_time": "1:06:12"}
|
||||||
|
{"current_steps": 3270, "total_steps": 3759, "loss": 0.4772, "lr": 2.035079263995028e-06, "epoch": 6.089385474860335, "percentage": 86.99, "elapsed_time": "7:18:16", "remaining_time": "1:05:32"}
|
||||||
|
{"current_steps": 3275, "total_steps": 3759, "loss": 0.4708, "lr": 1.9944599466813286e-06, "epoch": 6.098696461824954, "percentage": 87.12, "elapsed_time": "7:18:47", "remaining_time": "1:04:50"}
|
||||||
|
{"current_steps": 3280, "total_steps": 3759, "loss": 0.4078, "lr": 1.954228816495485e-06, "epoch": 6.1080074487895715, "percentage": 87.26, "elapsed_time": "7:19:12", "remaining_time": "1:04:08"}
|
||||||
|
{"current_steps": 3285, "total_steps": 3759, "loss": 0.428, "lr": 1.914386740793133e-06, "epoch": 6.11731843575419, "percentage": 87.39, "elapsed_time": "7:19:55", "remaining_time": "1:03:28"}
|
||||||
|
{"current_steps": 3290, "total_steps": 3759, "loss": 0.4671, "lr": 1.874934578542187e-06, "epoch": 6.126629422718808, "percentage": 87.52, "elapsed_time": "7:20:38", "remaining_time": "1:02:48"}
|
||||||
|
{"current_steps": 3295, "total_steps": 3759, "loss": 0.4303, "lr": 1.8358731803042752e-06, "epoch": 6.1359404096834265, "percentage": 87.66, "elapsed_time": "7:21:24", "remaining_time": "1:02:09"}
|
||||||
|
{"current_steps": 3300, "total_steps": 3759, "loss": 0.4017, "lr": 1.7972033882164264e-06, "epoch": 6.145251396648045, "percentage": 87.79, "elapsed_time": "7:21:55", "remaining_time": "1:01:28"}
|
||||||
|
{"current_steps": 3305, "total_steps": 3759, "loss": 0.4047, "lr": 1.7589260359729121e-06, "epoch": 6.154562383612663, "percentage": 87.92, "elapsed_time": "7:22:33", "remaining_time": "1:00:47"}
|
||||||
|
{"current_steps": 3310, "total_steps": 3759, "loss": 0.4385, "lr": 1.721041948807256e-06, "epoch": 6.1638733705772815, "percentage": 88.06, "elapsed_time": "7:23:01", "remaining_time": "1:00:05"}
|
||||||
|
{"current_steps": 3315, "total_steps": 3759, "loss": 0.461, "lr": 1.6835519434744641e-06, "epoch": 6.173184357541899, "percentage": 88.19, "elapsed_time": "7:23:40", "remaining_time": "0:59:25"}
|
||||||
|
{"current_steps": 3320, "total_steps": 3759, "loss": 0.4087, "lr": 1.6464568282334204e-06, "epoch": 6.182495344506518, "percentage": 88.32, "elapsed_time": "7:24:26", "remaining_time": "0:58:46"}
|
||||||
|
{"current_steps": 3325, "total_steps": 3759, "loss": 0.421, "lr": 1.6097574028294327e-06, "epoch": 6.191806331471136, "percentage": 88.45, "elapsed_time": "7:25:04", "remaining_time": "0:58:05"}
|
||||||
|
{"current_steps": 3330, "total_steps": 3759, "loss": 0.3933, "lr": 1.5734544584770039e-06, "epoch": 6.201117318435754, "percentage": 88.59, "elapsed_time": "7:25:38", "remaining_time": "0:57:24"}
|
||||||
|
{"current_steps": 3335, "total_steps": 3759, "loss": 0.4236, "lr": 1.5375487778427877e-06, "epoch": 6.210428305400373, "percentage": 88.72, "elapsed_time": "7:26:10", "remaining_time": "0:56:43"}
|
||||||
|
{"current_steps": 3340, "total_steps": 3759, "loss": 0.3829, "lr": 1.502041135028698e-06, "epoch": 6.219739292364991, "percentage": 88.85, "elapsed_time": "7:26:51", "remaining_time": "0:56:03"}
|
||||||
|
{"current_steps": 3345, "total_steps": 3759, "loss": 0.4416, "lr": 1.4669322955552278e-06, "epoch": 6.229050279329609, "percentage": 88.99, "elapsed_time": "7:27:24", "remaining_time": "0:55:22"}
|
||||||
|
{"current_steps": 3350, "total_steps": 3759, "loss": 0.4335, "lr": 1.4322230163449403e-06, "epoch": 6.238361266294227, "percentage": 89.12, "elapsed_time": "7:27:54", "remaining_time": "0:54:41"}
|
||||||
|
{"current_steps": 3355, "total_steps": 3759, "loss": 0.434, "lr": 1.3979140457061568e-06, "epoch": 6.247672253258846, "percentage": 89.25, "elapsed_time": "7:28:28", "remaining_time": "0:54:00"}
|
||||||
|
{"current_steps": 3360, "total_steps": 3759, "loss": 0.4567, "lr": 1.3640061233168166e-06, "epoch": 6.256983240223463, "percentage": 89.39, "elapsed_time": "7:29:09", "remaining_time": "0:53:20"}
|
||||||
|
{"current_steps": 3365, "total_steps": 3759, "loss": 0.4181, "lr": 1.3304999802085371e-06, "epoch": 6.266294227188082, "percentage": 89.52, "elapsed_time": "7:29:49", "remaining_time": "0:52:40"}
|
||||||
|
{"current_steps": 3370, "total_steps": 3759, "loss": 0.439, "lr": 1.2973963387508336e-06, "epoch": 6.275605214152701, "percentage": 89.65, "elapsed_time": "7:30:29", "remaining_time": "0:52:00"}
|
||||||
|
{"current_steps": 3375, "total_steps": 3759, "loss": 0.4675, "lr": 1.264695912635583e-06, "epoch": 6.284916201117318, "percentage": 89.78, "elapsed_time": "7:31:12", "remaining_time": "0:51:20"}
|
||||||
|
{"current_steps": 3380, "total_steps": 3759, "loss": 0.4201, "lr": 1.2323994068616064e-06, "epoch": 6.294227188081937, "percentage": 89.92, "elapsed_time": "7:31:41", "remaining_time": "0:50:38"}
|
||||||
|
{"current_steps": 3385, "total_steps": 3759, "loss": 0.3939, "lr": 1.2005075177194736e-06, "epoch": 6.303538175046555, "percentage": 90.05, "elapsed_time": "7:32:18", "remaining_time": "0:49:58"}
|
||||||
|
{"current_steps": 3390, "total_steps": 3759, "loss": 0.4434, "lr": 1.1690209327765033e-06, "epoch": 6.312849162011173, "percentage": 90.18, "elapsed_time": "7:32:56", "remaining_time": "0:49:18"}
|
||||||
|
{"current_steps": 3395, "total_steps": 3759, "loss": 0.4254, "lr": 1.1379403308619265e-06, "epoch": 6.322160148975791, "percentage": 90.32, "elapsed_time": "7:33:41", "remaining_time": "0:48:38"}
|
||||||
|
{"current_steps": 3400, "total_steps": 3759, "loss": 0.4227, "lr": 1.1072663820522655e-06, "epoch": 6.33147113594041, "percentage": 90.45, "elapsed_time": "7:34:18", "remaining_time": "0:47:58"}
|
||||||
|
{"current_steps": 3405, "total_steps": 3759, "loss": 0.4271, "lr": 1.0769997476568684e-06, "epoch": 6.340782122905028, "percentage": 90.58, "elapsed_time": "7:34:58", "remaining_time": "0:47:18"}
|
||||||
|
{"current_steps": 3410, "total_steps": 3759, "loss": 0.444, "lr": 1.0471410802036842e-06, "epoch": 6.350093109869646, "percentage": 90.72, "elapsed_time": "7:35:32", "remaining_time": "0:46:37"}
|
||||||
|
{"current_steps": 3415, "total_steps": 3759, "loss": 0.4311, "lr": 1.017691023425147e-06, "epoch": 6.359404096834265, "percentage": 90.85, "elapsed_time": "7:36:15", "remaining_time": "0:45:57"}
|
||||||
|
{"current_steps": 3420, "total_steps": 3759, "loss": 0.4078, "lr": 9.886502122443442e-07, "epoch": 6.368715083798882, "percentage": 90.98, "elapsed_time": "7:36:46", "remaining_time": "0:45:16"}
|
||||||
|
{"current_steps": 3425, "total_steps": 3759, "loss": 0.447, "lr": 9.60019272761299e-07, "epoch": 6.378026070763501, "percentage": 91.11, "elapsed_time": "7:37:25", "remaining_time": "0:44:36"}
|
||||||
|
{"current_steps": 3430, "total_steps": 3759, "loss": 0.4492, "lr": 9.317988222394825e-07, "epoch": 6.387337057728119, "percentage": 91.25, "elapsed_time": "7:38:12", "remaining_time": "0:43:57"}
|
||||||
|
{"current_steps": 3435, "total_steps": 3759, "loss": 0.4541, "lr": 9.039894690925055e-07, "epoch": 6.396648044692737, "percentage": 91.38, "elapsed_time": "7:38:52", "remaining_time": "0:43:16"}
|
||||||
|
{"current_steps": 3440, "total_steps": 3759, "loss": 0.4621, "lr": 8.765918128710016e-07, "epoch": 6.405959031657356, "percentage": 91.51, "elapsed_time": "7:39:37", "remaining_time": "0:42:37"}
|
||||||
|
{"current_steps": 3445, "total_steps": 3759, "loss": 0.4565, "lr": 8.496064442496999e-07, "epoch": 6.415270018621974, "percentage": 91.65, "elapsed_time": "7:40:20", "remaining_time": "0:41:57"}
|
||||||
|
{"current_steps": 3450, "total_steps": 3759, "loss": 0.4267, "lr": 8.230339450146863e-07, "epoch": 6.424581005586592, "percentage": 91.78, "elapsed_time": "7:41:04", "remaining_time": "0:41:17"}
|
||||||
|
{"current_steps": 3455, "total_steps": 3759, "loss": 0.43, "lr": 7.968748880508759e-07, "epoch": 6.43389199255121, "percentage": 91.91, "elapsed_time": "7:41:45", "remaining_time": "0:40:37"}
|
||||||
|
{"current_steps": 3460, "total_steps": 3759, "loss": 0.4825, "lr": 7.711298373296316e-07, "epoch": 6.443202979515829, "percentage": 92.05, "elapsed_time": "7:42:31", "remaining_time": "0:39:58"}
|
||||||
|
{"current_steps": 3465, "total_steps": 3759, "loss": 0.3915, "lr": 7.457993478966519e-07, "epoch": 6.452513966480447, "percentage": 92.18, "elapsed_time": "7:43:08", "remaining_time": "0:39:17"}
|
||||||
|
{"current_steps": 3470, "total_steps": 3759, "loss": 0.4275, "lr": 7.208839658599531e-07, "epoch": 6.461824953445065, "percentage": 92.31, "elapsed_time": "7:43:45", "remaining_time": "0:38:37"}
|
||||||
|
{"current_steps": 3475, "total_steps": 3759, "loss": 0.4032, "lr": 6.963842283781375e-07, "epoch": 6.471135940409684, "percentage": 92.44, "elapsed_time": "7:44:22", "remaining_time": "0:37:57"}
|
||||||
|
{"current_steps": 3480, "total_steps": 3759, "loss": 0.4029, "lr": 6.723006636487795e-07, "epoch": 6.4804469273743015, "percentage": 92.58, "elapsed_time": "7:44:58", "remaining_time": "0:37:16"}
|
||||||
|
{"current_steps": 3485, "total_steps": 3759, "loss": 0.4649, "lr": 6.486337908970663e-07, "epoch": 6.48975791433892, "percentage": 92.71, "elapsed_time": "7:45:36", "remaining_time": "0:36:36"}
|
||||||
|
{"current_steps": 3490, "total_steps": 3759, "loss": 0.4445, "lr": 6.253841203645827e-07, "epoch": 6.499068901303538, "percentage": 92.84, "elapsed_time": "7:46:20", "remaining_time": "0:35:56"}
|
||||||
|
{"current_steps": 3495, "total_steps": 3759, "loss": 0.4306, "lr": 6.025521532983324e-07, "epoch": 6.5083798882681565, "percentage": 92.98, "elapsed_time": "7:47:02", "remaining_time": "0:35:16"}
|
||||||
|
{"current_steps": 3500, "total_steps": 3759, "loss": 0.4348, "lr": 5.801383819398965e-07, "epoch": 6.517690875232775, "percentage": 93.11, "elapsed_time": "7:47:49", "remaining_time": "0:34:37"}
|
||||||
|
{"current_steps": 3505, "total_steps": 3759, "loss": 0.4713, "lr": 5.581432895148608e-07, "epoch": 6.527001862197393, "percentage": 93.24, "elapsed_time": "7:48:28", "remaining_time": "0:33:56"}
|
||||||
|
{"current_steps": 3510, "total_steps": 3759, "loss": 0.4387, "lr": 5.365673502223723e-07, "epoch": 6.5363128491620115, "percentage": 93.38, "elapsed_time": "7:49:14", "remaining_time": "0:33:17"}
|
||||||
|
{"current_steps": 3515, "total_steps": 3759, "loss": 0.4393, "lr": 5.154110292249237e-07, "epoch": 6.545623836126629, "percentage": 93.51, "elapsed_time": "7:49:49", "remaining_time": "0:32:36"}
|
||||||
|
{"current_steps": 3520, "total_steps": 3759, "loss": 0.4323, "lr": 4.946747826383269e-07, "epoch": 6.554934823091248, "percentage": 93.64, "elapsed_time": "7:50:36", "remaining_time": "0:31:57"}
|
||||||
|
{"current_steps": 3525, "total_steps": 3759, "loss": 0.422, "lr": 4.743590575218737e-07, "epoch": 6.564245810055866, "percentage": 93.77, "elapsed_time": "7:51:18", "remaining_time": "0:31:17"}
|
||||||
|
{"current_steps": 3530, "total_steps": 3759, "loss": 0.4387, "lr": 4.544642918686992e-07, "epoch": 6.573556797020484, "percentage": 93.91, "elapsed_time": "7:51:58", "remaining_time": "0:30:37"}
|
||||||
|
{"current_steps": 3535, "total_steps": 3759, "loss": 0.4443, "lr": 4.3499091459634713e-07, "epoch": 6.582867783985103, "percentage": 94.04, "elapsed_time": "7:52:45", "remaining_time": "0:29:57"}
|
||||||
|
{"current_steps": 3540, "total_steps": 3759, "loss": 0.3844, "lr": 4.159393455375105e-07, "epoch": 6.592178770949721, "percentage": 94.17, "elapsed_time": "7:53:22", "remaining_time": "0:29:17"}
|
||||||
|
{"current_steps": 3545, "total_steps": 3759, "loss": 0.3993, "lr": 3.9730999543098337e-07, "epoch": 6.601489757914339, "percentage": 94.31, "elapsed_time": "7:54:00", "remaining_time": "0:28:36"}
|
||||||
|
{"current_steps": 3550, "total_steps": 3759, "loss": 0.4285, "lr": 3.791032659128169e-07, "epoch": 6.610800744878957, "percentage": 94.44, "elapsed_time": "7:54:39", "remaining_time": "0:27:56"}
|
||||||
|
{"current_steps": 3555, "total_steps": 3759, "loss": 0.4327, "lr": 3.613195495076438e-07, "epoch": 6.620111731843576, "percentage": 94.57, "elapsed_time": "7:55:22", "remaining_time": "0:27:16"}
|
||||||
|
{"current_steps": 3560, "total_steps": 3759, "loss": 0.4344, "lr": 3.439592296202299e-07, "epoch": 6.629422718808193, "percentage": 94.71, "elapsed_time": "7:56:06", "remaining_time": "0:26:36"}
|
||||||
|
{"current_steps": 3565, "total_steps": 3759, "loss": 0.431, "lr": 3.2702268052718924e-07, "epoch": 6.638733705772812, "percentage": 94.84, "elapsed_time": "7:56:48", "remaining_time": "0:25:56"}
|
||||||
|
{"current_steps": 3570, "total_steps": 3759, "loss": 0.4177, "lr": 3.1051026736893977e-07, "epoch": 6.648044692737431, "percentage": 94.97, "elapsed_time": "7:57:23", "remaining_time": "0:25:16"}
|
||||||
|
{"current_steps": 3575, "total_steps": 3759, "loss": 0.4207, "lr": 2.944223461418161e-07, "epoch": 6.657355679702048, "percentage": 95.11, "elapsed_time": "7:58:02", "remaining_time": "0:24:36"}
|
||||||
|
{"current_steps": 3580, "total_steps": 3759, "loss": 0.4165, "lr": 2.7875926369039577e-07, "epoch": 6.666666666666667, "percentage": 95.24, "elapsed_time": "7:58:42", "remaining_time": "0:23:56"}
|
||||||
|
{"current_steps": 3585, "total_steps": 3759, "loss": 0.4289, "lr": 2.6352135770002283e-07, "epoch": 6.675977653631285, "percentage": 95.37, "elapsed_time": "7:59:16", "remaining_time": "0:23:15"}
|
||||||
|
{"current_steps": 3590, "total_steps": 3759, "loss": 0.4395, "lr": 2.487089566895251e-07, "epoch": 6.685288640595903, "percentage": 95.5, "elapsed_time": "7:59:51", "remaining_time": "0:22:35"}
|
||||||
|
{"current_steps": 3595, "total_steps": 3759, "loss": 0.4008, "lr": 2.3432238000414165e-07, "epoch": 6.694599627560521, "percentage": 95.64, "elapsed_time": "8:00:27", "remaining_time": "0:21:55"}
|
||||||
|
{"current_steps": 3600, "total_steps": 3759, "loss": 0.4455, "lr": 2.203619378086197e-07, "epoch": 6.70391061452514, "percentage": 95.77, "elapsed_time": "8:01:15", "remaining_time": "0:21:15"}
|
||||||
|
{"current_steps": 3605, "total_steps": 3759, "loss": 0.4437, "lr": 2.0682793108054878e-07, "epoch": 6.713221601489758, "percentage": 95.9, "elapsed_time": "8:01:51", "remaining_time": "0:20:35"}
|
||||||
|
{"current_steps": 3610, "total_steps": 3759, "loss": 0.4502, "lr": 1.937206516038548e-07, "epoch": 6.722532588454376, "percentage": 96.04, "elapsed_time": "8:02:33", "remaining_time": "0:19:55"}
|
||||||
|
{"current_steps": 3615, "total_steps": 3759, "loss": 0.418, "lr": 1.8104038196251617e-07, "epoch": 6.731843575418995, "percentage": 96.17, "elapsed_time": "8:03:09", "remaining_time": "0:19:14"}
|
||||||
|
{"current_steps": 3620, "total_steps": 3759, "loss": 0.4171, "lr": 1.6878739553447765e-07, "epoch": 6.741154562383612, "percentage": 96.3, "elapsed_time": "8:03:48", "remaining_time": "0:18:34"}
|
||||||
|
{"current_steps": 3625, "total_steps": 3759, "loss": 0.4494, "lr": 1.5696195648574385e-07, "epoch": 6.750465549348231, "percentage": 96.44, "elapsed_time": "8:04:35", "remaining_time": "0:17:54"}
|
||||||
|
{"current_steps": 3630, "total_steps": 3759, "loss": 0.4267, "lr": 1.4556431976468833e-07, "epoch": 6.759776536312849, "percentage": 96.57, "elapsed_time": "8:05:14", "remaining_time": "0:17:14"}
|
||||||
|
{"current_steps": 3635, "total_steps": 3759, "loss": 0.4567, "lr": 1.3459473109656673e-07, "epoch": 6.769087523277467, "percentage": 96.7, "elapsed_time": "8:05:50", "remaining_time": "0:16:34"}
|
||||||
|
{"current_steps": 3640, "total_steps": 3759, "loss": 0.4132, "lr": 1.2405342697820787e-07, "epoch": 6.778398510242086, "percentage": 96.83, "elapsed_time": "8:06:22", "remaining_time": "0:15:54"}
|
||||||
|
{"current_steps": 3645, "total_steps": 3759, "loss": 0.4536, "lr": 1.1394063467291771e-07, "epoch": 6.787709497206704, "percentage": 96.97, "elapsed_time": "8:06:53", "remaining_time": "0:15:13"}
|
||||||
|
{"current_steps": 3650, "total_steps": 3759, "loss": 0.4633, "lr": 1.0425657220557883e-07, "epoch": 6.797020484171322, "percentage": 97.1, "elapsed_time": "8:07:30", "remaining_time": "0:14:33"}
|
||||||
|
{"current_steps": 3655, "total_steps": 3759, "loss": 0.4269, "lr": 9.500144835795865e-08, "epoch": 6.80633147113594, "percentage": 97.23, "elapsed_time": "8:08:09", "remaining_time": "0:13:53"}
|
||||||
|
{"current_steps": 3660, "total_steps": 3759, "loss": 0.4326, "lr": 8.617546266419307e-08, "epoch": 6.815642458100559, "percentage": 97.37, "elapsed_time": "8:08:49", "remaining_time": "0:13:13"}
|
||||||
|
{"current_steps": 3665, "total_steps": 3759, "loss": 0.463, "lr": 7.777880540649652e-08, "epoch": 6.8249534450651765, "percentage": 97.5, "elapsed_time": "8:09:29", "remaining_time": "0:12:33"}
|
||||||
|
{"current_steps": 3670, "total_steps": 3759, "loss": 0.4193, "lr": 6.981165761105857e-08, "epoch": 6.834264432029795, "percentage": 97.63, "elapsed_time": "8:10:07", "remaining_time": "0:11:53"}
|
||||||
|
{"current_steps": 3675, "total_steps": 3759, "loss": 0.4379, "lr": 6.227419104413601e-08, "epoch": 6.843575418994414, "percentage": 97.77, "elapsed_time": "8:10:48", "remaining_time": "0:11:13"}
|
||||||
|
{"current_steps": 3680, "total_steps": 3759, "loss": 0.4396, "lr": 5.516656820835131e-08, "epoch": 6.8528864059590315, "percentage": 97.9, "elapsed_time": "8:11:32", "remaining_time": "0:10:33"}
|
||||||
|
{"current_steps": 3685, "total_steps": 3759, "loss": 0.3827, "lr": 4.848894233919321e-08, "epoch": 6.86219739292365, "percentage": 98.03, "elapsed_time": "8:12:18", "remaining_time": "0:09:53"}
|
||||||
|
{"current_steps": 3690, "total_steps": 3759, "loss": 0.4748, "lr": 4.2241457401703824e-08, "epoch": 6.871508379888268, "percentage": 98.16, "elapsed_time": "8:13:03", "remaining_time": "0:09:13"}
|
||||||
|
{"current_steps": 3695, "total_steps": 3759, "loss": 0.4299, "lr": 3.642424808738998e-08, "epoch": 6.8808193668528865, "percentage": 98.3, "elapsed_time": "8:13:37", "remaining_time": "0:08:32"}
|
||||||
|
{"current_steps": 3700, "total_steps": 3759, "loss": 0.4121, "lr": 3.1037439811303363e-08, "epoch": 6.890130353817504, "percentage": 98.43, "elapsed_time": "8:14:21", "remaining_time": "0:07:52"}
|
||||||
|
{"current_steps": 3705, "total_steps": 3759, "loss": 0.4535, "lr": 2.608114870934486e-08, "epoch": 6.899441340782123, "percentage": 98.56, "elapsed_time": "8:15:02", "remaining_time": "0:07:12"}
|
||||||
|
{"current_steps": 3710, "total_steps": 3759, "loss": 0.4356, "lr": 2.1555481635762156e-08, "epoch": 6.9087523277467415, "percentage": 98.7, "elapsed_time": "8:15:48", "remaining_time": "0:06:32"}
|
||||||
|
{"current_steps": 3715, "total_steps": 3759, "loss": 0.4007, "lr": 1.7460536160840424e-08, "epoch": 6.918063314711359, "percentage": 98.83, "elapsed_time": "8:16:25", "remaining_time": "0:05:52"}
|
||||||
|
{"current_steps": 3720, "total_steps": 3759, "loss": 0.4516, "lr": 1.3796400568804046e-08, "epoch": 6.927374301675978, "percentage": 98.96, "elapsed_time": "8:17:09", "remaining_time": "0:05:12"}
|
||||||
|
{"current_steps": 3725, "total_steps": 3759, "loss": 0.4653, "lr": 1.0563153855911445e-08, "epoch": 6.936685288640596, "percentage": 99.1, "elapsed_time": "8:17:52", "remaining_time": "0:04:32"}
|
||||||
|
{"current_steps": 3730, "total_steps": 3759, "loss": 0.4307, "lr": 7.760865728747568e-09, "epoch": 6.945996275605214, "percentage": 99.23, "elapsed_time": "8:18:30", "remaining_time": "0:03:52"}
|
||||||
|
{"current_steps": 3735, "total_steps": 3759, "loss": 0.4424, "lr": 5.38959660272953e-09, "epoch": 6.955307262569832, "percentage": 99.36, "elapsed_time": "8:19:16", "remaining_time": "0:03:12"}
|
||||||
|
{"current_steps": 3740, "total_steps": 3759, "loss": 0.3942, "lr": 3.4493976007965445e-09, "epoch": 6.964618249534451, "percentage": 99.49, "elapsed_time": "8:19:59", "remaining_time": "0:02:32"}
|
||||||
|
{"current_steps": 3745, "total_steps": 3759, "loss": 0.4427, "lr": 1.9403105523130295e-09, "epoch": 6.973929236499069, "percentage": 99.63, "elapsed_time": "8:20:44", "remaining_time": "0:01:52"}
|
||||||
|
{"current_steps": 3750, "total_steps": 3759, "loss": 0.4515, "lr": 8.623679921626604e-10, "epoch": 6.983240223463687, "percentage": 99.76, "elapsed_time": "8:21:21", "remaining_time": "0:01:12"}
|
||||||
|
{"current_steps": 3755, "total_steps": 3759, "loss": 0.4062, "lr": 2.1559316005115294e-10, "epoch": 6.992551210428306, "percentage": 99.89, "elapsed_time": "8:21:59", "remaining_time": "0:00:32"}
|
||||||
|
{"current_steps": 3759, "total_steps": 3759, "epoch": 7.0, "percentage": 100.0, "elapsed_time": "8:22:39", "remaining_time": "0:00:00"}
|
||||||
9254
trainer_state.json
Normal file
9254
trainer_state.json
Normal file
File diff suppressed because it is too large
Load Diff
3
training_args.bin
Normal file
3
training_args.bin
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:79b66fe69964a64ecc7ed682907c33c80d925de896f085c32669ebd3aa760cc2
|
||||||
|
size 8657
|
||||||
BIN
training_loss.png
Normal file
BIN
training_loss.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 48 KiB |
1
vocab.json
Normal file
1
vocab.json
Normal file
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user