初始化项目,由ModelHub XC社区提供模型
Model: laion/Qwen3-8B_exp_tas_top_k_32_traces_save-strategy_steps Source: Original Platform
This commit is contained in:
56
.gitattributes
vendored
Normal file
56
.gitattributes
vendored
Normal file
@@ -0,0 +1,56 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
|
||||
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zstandard filter=lfs diff=lfs merge=lfs -text
|
||||
*.tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
*.db* filter=lfs diff=lfs merge=lfs -text
|
||||
*.ark* filter=lfs diff=lfs merge=lfs -text
|
||||
**/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
|
||||
**/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
|
||||
**/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
|
||||
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.gguf* filter=lfs diff=lfs merge=lfs -text
|
||||
*.ggml filter=lfs diff=lfs merge=lfs -text
|
||||
*.llamafile* filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
|
||||
training_args.bin filter=lfs diff=lfs merge=lfs -text
|
||||
model-00002-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
merges.txt filter=lfs diff=lfs merge=lfs -text
|
||||
model-00004-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
model-00003-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
model-00001-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
vocab.json filter=lfs diff=lfs merge=lfs -text
|
||||
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||
60
README.md
Normal file
60
README.md
Normal file
@@ -0,0 +1,60 @@
|
||||
---
|
||||
library_name: transformers
|
||||
license: apache-2.0
|
||||
base_model: Qwen/Qwen3-8B
|
||||
tags:
|
||||
- llama-factory
|
||||
- full
|
||||
- generated_from_trainer
|
||||
model-index:
|
||||
- name: Qwen3-8B_exp_tas_top_k_32_traces_save-strategy_steps
|
||||
results: []
|
||||
---
|
||||
|
||||
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
||||
should probably proofread and complete it, then remove this comment. -->
|
||||
|
||||
# Qwen3-8B_exp_tas_top_k_32_traces_save-strategy_steps
|
||||
|
||||
This model is a fine-tuned version of [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) on the DCAgent/exp_tas_top_k_32_traces dataset.
|
||||
|
||||
## Model description
|
||||
|
||||
More information needed
|
||||
|
||||
## Intended uses & limitations
|
||||
|
||||
More information needed
|
||||
|
||||
## Training and evaluation data
|
||||
|
||||
More information needed
|
||||
|
||||
## Training procedure
|
||||
|
||||
### Training hyperparameters
|
||||
|
||||
The following hyperparameters were used during training:
|
||||
- learning_rate: 0.0001
|
||||
- train_batch_size: 1
|
||||
- eval_batch_size: 8
|
||||
- seed: 42
|
||||
- distributed_type: multi-GPU
|
||||
- num_devices: 32
|
||||
- total_train_batch_size: 32
|
||||
- total_eval_batch_size: 256
|
||||
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.87,0.99) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
||||
- lr_scheduler_type: cosine
|
||||
- lr_scheduler_warmup_ratio: 0.005
|
||||
- num_epochs: 8.0
|
||||
|
||||
### Training results
|
||||
|
||||
|
||||
|
||||
### Framework versions
|
||||
|
||||
- Transformers 4.55.0
|
||||
- Pytorch 2.7.1+cu128
|
||||
- Datasets 3.6.0
|
||||
- Tokenizers 0.21.1
|
||||
28
added_tokens.json
Normal file
28
added_tokens.json
Normal file
@@ -0,0 +1,28 @@
|
||||
{
|
||||
"</think>": 151668,
|
||||
"</tool_call>": 151658,
|
||||
"</tool_response>": 151666,
|
||||
"<think>": 151667,
|
||||
"<tool_call>": 151657,
|
||||
"<tool_response>": 151665,
|
||||
"<|box_end|>": 151649,
|
||||
"<|box_start|>": 151648,
|
||||
"<|endoftext|>": 151643,
|
||||
"<|file_sep|>": 151664,
|
||||
"<|fim_middle|>": 151660,
|
||||
"<|fim_pad|>": 151662,
|
||||
"<|fim_prefix|>": 151659,
|
||||
"<|fim_suffix|>": 151661,
|
||||
"<|im_end|>": 151645,
|
||||
"<|im_start|>": 151644,
|
||||
"<|image_pad|>": 151655,
|
||||
"<|object_ref_end|>": 151647,
|
||||
"<|object_ref_start|>": 151646,
|
||||
"<|quad_end|>": 151651,
|
||||
"<|quad_start|>": 151650,
|
||||
"<|repo_name|>": 151663,
|
||||
"<|video_pad|>": 151656,
|
||||
"<|vision_end|>": 151653,
|
||||
"<|vision_pad|>": 151654,
|
||||
"<|vision_start|>": 151652
|
||||
}
|
||||
16
all_results.json
Normal file
16
all_results.json
Normal file
@@ -0,0 +1,16 @@
|
||||
{
|
||||
"achieved_tflops_per_gpu": 0.0009416172291339583,
|
||||
"achieved_tflops_per_gpu_theoretical": 186.33475095527425,
|
||||
"epoch": 8.0,
|
||||
"loss_nan_ranks": 0,
|
||||
"loss_rank_avg": 0.017221365123987198,
|
||||
"mfu_percent": 0.00030180039395319175,
|
||||
"mfu_percent_theoretical": 59.72267658822893,
|
||||
"total_flos": 1379676432629760.0,
|
||||
"train_loss": 0.20488507787429666,
|
||||
"train_runtime": 45788.1262,
|
||||
"train_samples_per_second": 1.769,
|
||||
"train_steps_per_second": 0.055,
|
||||
"valid_targets_mean": 3641.4,
|
||||
"valid_targets_min": 1073
|
||||
}
|
||||
89
chat_template.jinja
Normal file
89
chat_template.jinja
Normal file
@@ -0,0 +1,89 @@
|
||||
{%- if tools %}
|
||||
{{- '<|im_start|>system\n' }}
|
||||
{%- if messages[0].role == 'system' %}
|
||||
{{- messages[0].content + '\n\n' }}
|
||||
{%- endif %}
|
||||
{{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
|
||||
{%- for tool in tools %}
|
||||
{{- "\n" }}
|
||||
{{- tool | tojson }}
|
||||
{%- endfor %}
|
||||
{{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
|
||||
{%- else %}
|
||||
{%- if messages[0].role == 'system' %}
|
||||
{{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
|
||||
{%- for message in messages[::-1] %}
|
||||
{%- set index = (messages|length - 1) - loop.index0 %}
|
||||
{%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
|
||||
{%- set ns.multi_step_tool = false %}
|
||||
{%- set ns.last_query_index = index %}
|
||||
{%- endif %}
|
||||
{%- endfor %}
|
||||
{%- for message in messages %}
|
||||
{%- if message.content is string %}
|
||||
{%- set content = message.content %}
|
||||
{%- else %}
|
||||
{%- set content = '' %}
|
||||
{%- endif %}
|
||||
{%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
|
||||
{{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
|
||||
{%- elif message.role == "assistant" %}
|
||||
{%- set reasoning_content = '' %}
|
||||
{%- if message.reasoning_content is string %}
|
||||
{%- set reasoning_content = message.reasoning_content %}
|
||||
{%- else %}
|
||||
{%- if '</think>' in content %}
|
||||
{%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
|
||||
{%- set content = content.split('</think>')[-1].lstrip('\n') %}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
{%- if loop.index0 > ns.last_query_index %}
|
||||
{%- if loop.last or (not loop.last and reasoning_content) %}
|
||||
{{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
|
||||
{%- else %}
|
||||
{{- '<|im_start|>' + message.role + '\n' + content }}
|
||||
{%- endif %}
|
||||
{%- else %}
|
||||
{{- '<|im_start|>' + message.role + '\n' + content }}
|
||||
{%- endif %}
|
||||
{%- if message.tool_calls %}
|
||||
{%- for tool_call in message.tool_calls %}
|
||||
{%- if (loop.first and content) or (not loop.first) %}
|
||||
{{- '\n' }}
|
||||
{%- endif %}
|
||||
{%- if tool_call.function %}
|
||||
{%- set tool_call = tool_call.function %}
|
||||
{%- endif %}
|
||||
{{- '<tool_call>\n{"name": "' }}
|
||||
{{- tool_call.name }}
|
||||
{{- '", "arguments": ' }}
|
||||
{%- if tool_call.arguments is string %}
|
||||
{{- tool_call.arguments }}
|
||||
{%- else %}
|
||||
{{- tool_call.arguments | tojson }}
|
||||
{%- endif %}
|
||||
{{- '}\n</tool_call>' }}
|
||||
{%- endfor %}
|
||||
{%- endif %}
|
||||
{{- '<|im_end|>\n' }}
|
||||
{%- elif message.role == "tool" %}
|
||||
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
|
||||
{{- '<|im_start|>user' }}
|
||||
{%- endif %}
|
||||
{{- '\n<tool_response>\n' }}
|
||||
{{- content }}
|
||||
{{- '\n</tool_response>' }}
|
||||
{%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
|
||||
{{- '<|im_end|>\n' }}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
{%- endfor %}
|
||||
{%- if add_generation_prompt %}
|
||||
{{- '<|im_start|>assistant\n' }}
|
||||
{%- if enable_thinking is defined and enable_thinking is false %}
|
||||
{{- '<think>\n\n</think>\n\n' }}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
68
config.json
Normal file
68
config.json
Normal file
@@ -0,0 +1,68 @@
|
||||
{
|
||||
"architectures": [
|
||||
"Qwen3ForCausalLM"
|
||||
],
|
||||
"attention_bias": false,
|
||||
"attention_dropout": 0.0,
|
||||
"bos_token_id": 151643,
|
||||
"eos_token_id": 151645,
|
||||
"head_dim": 128,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 4096,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 12288,
|
||||
"layer_types": [
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention"
|
||||
],
|
||||
"max_position_embeddings": 40960,
|
||||
"max_window_layers": 36,
|
||||
"model_type": "qwen3",
|
||||
"num_attention_heads": 32,
|
||||
"num_hidden_layers": 36,
|
||||
"num_key_value_heads": 8,
|
||||
"rms_norm_eps": 1e-06,
|
||||
"rope_scaling": null,
|
||||
"rope_theta": 1000000,
|
||||
"sliding_window": null,
|
||||
"tie_word_embeddings": false,
|
||||
"torch_dtype": "bfloat16",
|
||||
"transformers_version": "4.55.0",
|
||||
"use_cache": false,
|
||||
"use_sliding_window": false,
|
||||
"vocab_size": 151936
|
||||
}
|
||||
1
configuration.json
Normal file
1
configuration.json
Normal file
@@ -0,0 +1 @@
|
||||
{"framework": "pytorch", "task": "others", "allow_remote": true}
|
||||
13
generation_config.json
Normal file
13
generation_config.json
Normal file
@@ -0,0 +1,13 @@
|
||||
{
|
||||
"bos_token_id": 151643,
|
||||
"do_sample": true,
|
||||
"eos_token_id": [
|
||||
151645,
|
||||
151643
|
||||
],
|
||||
"pad_token_id": 151643,
|
||||
"temperature": 0.6,
|
||||
"top_k": 20,
|
||||
"top_p": 0.95,
|
||||
"transformers_version": "4.55.0"
|
||||
}
|
||||
BIN
merges.txt
(Stored with Git LFS)
Normal file
BIN
merges.txt
(Stored with Git LFS)
Normal file
Binary file not shown.
3
model-00001-of-00004.safetensors
Normal file
3
model-00001-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:85757a3a87286023a367f8ff0b8c9bb903ce9005ea2d0eeed97c71205b6f9993
|
||||
size 4902257696
|
||||
3
model-00002-of-00004.safetensors
Normal file
3
model-00002-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:875efe37c8a1dba2e1d263b926e3917d7f1da9d9a4c998e4504d0e3274e0b0c3
|
||||
size 4915960368
|
||||
3
model-00003-of-00004.safetensors
Normal file
3
model-00003-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:1fc69e9f82cd23e03f21f0842e39bbf92fffb871a2aa763ae1f4c939bd3cd569
|
||||
size 4983068496
|
||||
3
model-00004-of-00004.safetensors
Normal file
3
model-00004-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:38b4aace9b133e5970603a69d09cf531fe7e6a4107911ca74cd5e83cbdd2980f
|
||||
size 1580230264
|
||||
407
model.safetensors.index.json
Normal file
407
model.safetensors.index.json
Normal file
@@ -0,0 +1,407 @@
|
||||
{
|
||||
"metadata": {
|
||||
"total_parameters": 308224,
|
||||
"total_size": 16381470720
|
||||
},
|
||||
"weight_map": {
|
||||
"lm_head.weight": "model-00004-of-00004.safetensors",
|
||||
"model.embed_tokens.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.20.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.30.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.35.input_layernorm.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.self_attn.k_norm.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.35.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.self_attn.q_norm.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.35.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.norm.weight": "model-00004-of-00004.safetensors"
|
||||
}
|
||||
}
|
||||
12
run_summary.json
Normal file
12
run_summary.json
Normal file
@@ -0,0 +1,12 @@
|
||||
{
|
||||
"agent_name": null,
|
||||
"training_start": null,
|
||||
"training_end": null,
|
||||
"created_by": "DCAgent",
|
||||
"base_model_name": "Qwen/Qwen3-8B",
|
||||
"dataset_name": "DCAgent/exp_tas_top_k_32_traces",
|
||||
"training_type": "SFT",
|
||||
"training_parameters": "https://huggingface.co/laion/Qwen3-8B_exp_tas_top_k_32_traces_save-strategy_steps/blob/main/config.json",
|
||||
"wandb_link": "https://wandb.ai/dogml/dc-agent/runs/Qwen3-8B_exp_tas_top_k_32_traces_save-strategy_steps",
|
||||
"traces_location_s3": null
|
||||
}
|
||||
31
special_tokens_map.json
Normal file
31
special_tokens_map.json
Normal file
@@ -0,0 +1,31 @@
|
||||
{
|
||||
"additional_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>",
|
||||
"<|object_ref_start|>",
|
||||
"<|object_ref_end|>",
|
||||
"<|box_start|>",
|
||||
"<|box_end|>",
|
||||
"<|quad_start|>",
|
||||
"<|quad_end|>",
|
||||
"<|vision_start|>",
|
||||
"<|vision_end|>",
|
||||
"<|vision_pad|>",
|
||||
"<|image_pad|>",
|
||||
"<|video_pad|>"
|
||||
],
|
||||
"eos_token": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
BIN
tokenizer.json
(Stored with Git LFS)
Normal file
BIN
tokenizer.json
(Stored with Git LFS)
Normal file
Binary file not shown.
240
tokenizer_config.json
Normal file
240
tokenizer_config.json
Normal file
@@ -0,0 +1,240 @@
|
||||
{
|
||||
"add_bos_token": false,
|
||||
"add_prefix_space": false,
|
||||
"added_tokens_decoder": {
|
||||
"151643": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151644": {
|
||||
"content": "<|im_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151645": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151646": {
|
||||
"content": "<|object_ref_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151647": {
|
||||
"content": "<|object_ref_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151648": {
|
||||
"content": "<|box_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151649": {
|
||||
"content": "<|box_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151650": {
|
||||
"content": "<|quad_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151651": {
|
||||
"content": "<|quad_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151652": {
|
||||
"content": "<|vision_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151653": {
|
||||
"content": "<|vision_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151654": {
|
||||
"content": "<|vision_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151655": {
|
||||
"content": "<|image_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151656": {
|
||||
"content": "<|video_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151657": {
|
||||
"content": "<tool_call>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151658": {
|
||||
"content": "</tool_call>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151659": {
|
||||
"content": "<|fim_prefix|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151660": {
|
||||
"content": "<|fim_middle|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151661": {
|
||||
"content": "<|fim_suffix|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151662": {
|
||||
"content": "<|fim_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151663": {
|
||||
"content": "<|repo_name|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151664": {
|
||||
"content": "<|file_sep|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151665": {
|
||||
"content": "<tool_response>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151666": {
|
||||
"content": "</tool_response>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151667": {
|
||||
"content": "<think>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151668": {
|
||||
"content": "</think>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
}
|
||||
},
|
||||
"additional_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>",
|
||||
"<|object_ref_start|>",
|
||||
"<|object_ref_end|>",
|
||||
"<|box_start|>",
|
||||
"<|box_end|>",
|
||||
"<|quad_start|>",
|
||||
"<|quad_end|>",
|
||||
"<|vision_start|>",
|
||||
"<|vision_end|>",
|
||||
"<|vision_pad|>",
|
||||
"<|image_pad|>",
|
||||
"<|video_pad|>"
|
||||
],
|
||||
"bos_token": null,
|
||||
"clean_up_tokenization_spaces": false,
|
||||
"eos_token": "<|im_end|>",
|
||||
"errors": "replace",
|
||||
"extra_special_tokens": {},
|
||||
"model_max_length": 32768,
|
||||
"pad_token": "<|endoftext|>",
|
||||
"padding_side": "right",
|
||||
"split_special_tokens": false,
|
||||
"tokenizer_class": "Qwen2Tokenizer",
|
||||
"unk_token": null
|
||||
}
|
||||
16
train_results.json
Normal file
16
train_results.json
Normal file
@@ -0,0 +1,16 @@
|
||||
{
|
||||
"achieved_tflops_per_gpu": 0.0009416172291339583,
|
||||
"achieved_tflops_per_gpu_theoretical": 186.33475095527425,
|
||||
"epoch": 8.0,
|
||||
"loss_nan_ranks": 0,
|
||||
"loss_rank_avg": 0.017221365123987198,
|
||||
"mfu_percent": 0.00030180039395319175,
|
||||
"mfu_percent_theoretical": 59.72267658822893,
|
||||
"total_flos": 1379676432629760.0,
|
||||
"train_loss": 0.20488507787429666,
|
||||
"train_runtime": 45788.1262,
|
||||
"train_samples_per_second": 1.769,
|
||||
"train_steps_per_second": 0.055,
|
||||
"valid_targets_mean": 3641.4,
|
||||
"valid_targets_min": 1073
|
||||
}
|
||||
549
trainer_log.jsonl
Normal file
549
trainer_log.jsonl
Normal file
@@ -0,0 +1,549 @@
|
||||
{"current_steps": 5, "total_steps": 2536, "loss": 0.8936, "lr": 3.0769230769230774e-05, "epoch": 0.01579778830963665, "percentage": 0.2, "elapsed_time": "0:02:46", "remaining_time": "23:27:32"}
|
||||
{"current_steps": 5, "total_steps": 2536, "loss": 0.8936, "lr": 3.0769230769230774e-05, "epoch": 0.015772870662460567, "percentage": 0.2, "elapsed_time": "1:16:50", "remaining_time": "27 days, 0:16:57"}
|
||||
{"current_steps": 10, "total_steps": 2536, "loss": 0.7352, "lr": 6.923076923076924e-05, "epoch": 0.031545741324921134, "percentage": 0.39, "elapsed_time": "1:18:11", "remaining_time": "13 days, 17:11:45"}
|
||||
{"current_steps": 15, "total_steps": 2536, "loss": 0.6431, "lr": 9.99999612380875e-05, "epoch": 0.0473186119873817, "percentage": 0.59, "elapsed_time": "1:19:40", "remaining_time": "9 days, 7:09:37"}
|
||||
{"current_steps": 20, "total_steps": 2536, "loss": 0.6226, "lr": 9.999860457746025e-05, "epoch": 0.06309148264984227, "percentage": 0.79, "elapsed_time": "1:20:58", "remaining_time": "7 days, 1:46:10"}
|
||||
{"current_steps": 25, "total_steps": 2536, "loss": 0.6059, "lr": 9.999530988130677e-05, "epoch": 0.07886435331230283, "percentage": 0.99, "elapsed_time": "1:22:20", "remaining_time": "5 days, 17:50:23"}
|
||||
{"current_steps": 30, "total_steps": 2536, "loss": 0.5852, "lr": 9.999007727733537e-05, "epoch": 0.0946372239747634, "percentage": 1.18, "elapsed_time": "1:23:39", "remaining_time": "4 days, 20:28:35"}
|
||||
{"current_steps": 35, "total_steps": 2536, "loss": 0.5905, "lr": 9.998290696837115e-05, "epoch": 0.11041009463722397, "percentage": 1.38, "elapsed_time": "1:24:53", "remaining_time": "4 days, 5:06:33"}
|
||||
{"current_steps": 40, "total_steps": 2536, "loss": 0.5804, "lr": 9.997379923234816e-05, "epoch": 0.12618296529968454, "percentage": 1.58, "elapsed_time": "1:26:04", "remaining_time": "3 days, 17:31:12"}
|
||||
{"current_steps": 45, "total_steps": 2536, "loss": 0.5623, "lr": 9.996275442229857e-05, "epoch": 0.14195583596214512, "percentage": 1.77, "elapsed_time": "1:27:13", "remaining_time": "3 days, 8:28:15"}
|
||||
{"current_steps": 50, "total_steps": 2536, "loss": 0.5505, "lr": 9.994977296633902e-05, "epoch": 0.15772870662460567, "percentage": 1.97, "elapsed_time": "1:28:30", "remaining_time": "3 days, 1:20:25"}
|
||||
{"current_steps": 55, "total_steps": 2536, "loss": 0.5522, "lr": 9.993485536765398e-05, "epoch": 0.17350157728706625, "percentage": 2.17, "elapsed_time": "1:29:40", "remaining_time": "2 days, 19:25:24"}
|
||||
{"current_steps": 60, "total_steps": 2536, "loss": 0.5903, "lr": 9.991800220447634e-05, "epoch": 0.1892744479495268, "percentage": 2.37, "elapsed_time": "1:30:51", "remaining_time": "2 days, 14:29:07"}
|
||||
{"current_steps": 65, "total_steps": 2536, "loss": 0.5609, "lr": 9.989921413006489e-05, "epoch": 0.20504731861198738, "percentage": 2.56, "elapsed_time": "1:32:06", "remaining_time": "2 days, 10:21:32"}
|
||||
{"current_steps": 70, "total_steps": 2536, "loss": 0.5704, "lr": 9.987849187267908e-05, "epoch": 0.22082018927444794, "percentage": 2.76, "elapsed_time": "1:33:17", "remaining_time": "2 days, 6:46:28"}
|
||||
{"current_steps": 75, "total_steps": 2536, "loss": 0.5421, "lr": 9.985583623555076e-05, "epoch": 0.23659305993690852, "percentage": 2.96, "elapsed_time": "1:34:23", "remaining_time": "2 days, 3:37:34"}
|
||||
{"current_steps": 80, "total_steps": 2536, "loss": 0.5597, "lr": 9.9831248096853e-05, "epoch": 0.25236593059936907, "percentage": 3.15, "elapsed_time": "1:35:31", "remaining_time": "2 days, 0:52:47"}
|
||||
{"current_steps": 85, "total_steps": 2536, "loss": 0.507, "lr": 9.980472840966614e-05, "epoch": 0.26813880126182965, "percentage": 3.35, "elapsed_time": "1:36:40", "remaining_time": "1 day, 22:27:31"}
|
||||
{"current_steps": 90, "total_steps": 2536, "loss": 0.5632, "lr": 9.977627820194082e-05, "epoch": 0.28391167192429023, "percentage": 3.55, "elapsed_time": "1:37:50", "remaining_time": "1 day, 20:19:17"}
|
||||
{"current_steps": 95, "total_steps": 2536, "loss": 0.5778, "lr": 9.974589857645802e-05, "epoch": 0.2996845425867508, "percentage": 3.75, "elapsed_time": "1:39:01", "remaining_time": "1 day, 18:24:20"}
|
||||
{"current_steps": 100, "total_steps": 2536, "loss": 0.5465, "lr": 9.97135907107865e-05, "epoch": 0.31545741324921134, "percentage": 3.94, "elapsed_time": "1:40:14", "remaining_time": "1 day, 16:41:56"}
|
||||
{"current_steps": 105, "total_steps": 2536, "loss": 0.5122, "lr": 9.967935585723706e-05, "epoch": 0.3312302839116719, "percentage": 4.14, "elapsed_time": "2:04:50", "remaining_time": "2 days, 0:10:11"}
|
||||
{"current_steps": 110, "total_steps": 2536, "loss": 0.5501, "lr": 9.964319534281397e-05, "epoch": 0.3470031545741325, "percentage": 4.34, "elapsed_time": "2:06:02", "remaining_time": "1 day, 22:19:42"}
|
||||
{"current_steps": 115, "total_steps": 2536, "loss": 0.5505, "lr": 9.960511056916357e-05, "epoch": 0.3627760252365931, "percentage": 4.53, "elapsed_time": "2:07:04", "remaining_time": "1 day, 20:35:08"}
|
||||
{"current_steps": 120, "total_steps": 2536, "loss": 0.5504, "lr": 9.956510301251995e-05, "epoch": 0.3785488958990536, "percentage": 4.73, "elapsed_time": "2:08:15", "remaining_time": "1 day, 19:02:09"}
|
||||
{"current_steps": 125, "total_steps": 2536, "loss": 0.5212, "lr": 9.952317422364772e-05, "epoch": 0.3943217665615142, "percentage": 4.93, "elapsed_time": "2:09:24", "remaining_time": "1 day, 17:36:00"}
|
||||
{"current_steps": 130, "total_steps": 2536, "loss": 0.5148, "lr": 9.947932582778188e-05, "epoch": 0.41009463722397477, "percentage": 5.13, "elapsed_time": "2:10:35", "remaining_time": "1 day, 16:16:51"}
|
||||
{"current_steps": 135, "total_steps": 2536, "loss": 0.543, "lr": 9.943355952456483e-05, "epoch": 0.42586750788643535, "percentage": 5.32, "elapsed_time": "2:11:50", "remaining_time": "1 day, 15:04:54"}
|
||||
{"current_steps": 140, "total_steps": 2536, "loss": 0.5367, "lr": 9.938587708798053e-05, "epoch": 0.4416403785488959, "percentage": 5.52, "elapsed_time": "2:13:03", "remaining_time": "1 day, 13:57:18"}
|
||||
{"current_steps": 145, "total_steps": 2536, "loss": 0.5431, "lr": 9.933628036628569e-05, "epoch": 0.45741324921135645, "percentage": 5.72, "elapsed_time": "2:14:15", "remaining_time": "1 day, 12:53:58"}
|
||||
{"current_steps": 150, "total_steps": 2536, "loss": 0.5111, "lr": 9.92847712819381e-05, "epoch": 0.47318611987381703, "percentage": 5.91, "elapsed_time": "2:15:28", "remaining_time": "1 day, 11:54:56"}
|
||||
{"current_steps": 155, "total_steps": 2536, "loss": 0.5423, "lr": 9.923135183152224e-05, "epoch": 0.4889589905362776, "percentage": 6.11, "elapsed_time": "2:16:40", "remaining_time": "1 day, 10:59:36"}
|
||||
{"current_steps": 160, "total_steps": 2536, "loss": 0.5404, "lr": 9.91760240856717e-05, "epoch": 0.5047318611987381, "percentage": 6.31, "elapsed_time": "2:17:55", "remaining_time": "1 day, 10:08:15"}
|
||||
{"current_steps": 165, "total_steps": 2536, "loss": 0.5203, "lr": 9.91187901889891e-05, "epoch": 0.5205047318611987, "percentage": 6.51, "elapsed_time": "2:19:09", "remaining_time": "1 day, 9:19:40"}
|
||||
{"current_steps": 170, "total_steps": 2536, "loss": 0.5464, "lr": 9.905965235996286e-05, "epoch": 0.5362776025236593, "percentage": 6.7, "elapsed_time": "2:20:19", "remaining_time": "1 day, 8:33:06"}
|
||||
{"current_steps": 175, "total_steps": 2536, "loss": 0.52, "lr": 9.899861289088121e-05, "epoch": 0.5520504731861199, "percentage": 6.9, "elapsed_time": "2:21:30", "remaining_time": "1 day, 7:49:07"}
|
||||
{"current_steps": 180, "total_steps": 2536, "loss": 0.4798, "lr": 9.893567414774341e-05, "epoch": 0.5678233438485805, "percentage": 7.1, "elapsed_time": "2:22:40", "remaining_time": "1 day, 7:07:29"}
|
||||
{"current_steps": 185, "total_steps": 2536, "loss": 0.5247, "lr": 9.88708385701679e-05, "epoch": 0.583596214511041, "percentage": 7.29, "elapsed_time": "2:23:56", "remaining_time": "1 day, 6:29:07"}
|
||||
{"current_steps": 190, "total_steps": 2536, "loss": 0.5404, "lr": 9.88041086712979e-05, "epoch": 0.5993690851735016, "percentage": 7.49, "elapsed_time": "2:25:02", "remaining_time": "1 day, 5:50:51"}
|
||||
{"current_steps": 195, "total_steps": 2536, "loss": 0.5269, "lr": 9.873548703770388e-05, "epoch": 0.6151419558359621, "percentage": 7.69, "elapsed_time": "2:26:55", "remaining_time": "1 day, 5:23:56"}
|
||||
{"current_steps": 200, "total_steps": 2536, "loss": 0.5047, "lr": 9.866497632928336e-05, "epoch": 0.6309148264984227, "percentage": 7.89, "elapsed_time": "2:28:05", "remaining_time": "1 day, 4:49:40"}
|
||||
{"current_steps": 105, "total_steps": 2536, "loss": 0.5122, "lr": 9.967935585723706e-05, "epoch": 0.3312302839116719, "percentage": 4.14, "elapsed_time": "0:02:03", "remaining_time": "0:47:49"}
|
||||
{"current_steps": 110, "total_steps": 2536, "loss": 0.5501, "lr": 9.964319534281397e-05, "epoch": 0.3470031545741325, "percentage": 4.34, "elapsed_time": "0:03:19", "remaining_time": "1:13:14"}
|
||||
{"current_steps": 115, "total_steps": 2536, "loss": 0.5506, "lr": 9.960511056916357e-05, "epoch": 0.3627760252365931, "percentage": 4.53, "elapsed_time": "0:04:22", "remaining_time": "1:31:56"}
|
||||
{"current_steps": 120, "total_steps": 2536, "loss": 0.5503, "lr": 9.956510301251995e-05, "epoch": 0.3785488958990536, "percentage": 4.73, "elapsed_time": "0:05:33", "remaining_time": "1:51:59"}
|
||||
{"current_steps": 125, "total_steps": 2536, "loss": 0.521, "lr": 9.952317422364772e-05, "epoch": 0.3943217665615142, "percentage": 4.93, "elapsed_time": "0:06:48", "remaining_time": "2:11:10"}
|
||||
{"current_steps": 130, "total_steps": 2536, "loss": 0.5149, "lr": 9.947932582778188e-05, "epoch": 0.41009463722397477, "percentage": 5.13, "elapsed_time": "0:07:59", "remaining_time": "2:27:51"}
|
||||
{"current_steps": 135, "total_steps": 2536, "loss": 0.5427, "lr": 9.943355952456483e-05, "epoch": 0.42586750788643535, "percentage": 5.32, "elapsed_time": "0:09:14", "remaining_time": "2:44:27"}
|
||||
{"current_steps": 140, "total_steps": 2536, "loss": 0.5366, "lr": 9.938587708798053e-05, "epoch": 0.4416403785488959, "percentage": 5.52, "elapsed_time": "0:10:29", "remaining_time": "2:59:26"}
|
||||
{"current_steps": 145, "total_steps": 2536, "loss": 0.5452, "lr": 9.933628036628569e-05, "epoch": 0.45741324921135645, "percentage": 5.72, "elapsed_time": "0:11:39", "remaining_time": "3:12:11"}
|
||||
{"current_steps": 150, "total_steps": 2536, "loss": 0.5104, "lr": 9.92847712819381e-05, "epoch": 0.47318611987381703, "percentage": 5.91, "elapsed_time": "0:12:50", "remaining_time": "3:24:21"}
|
||||
{"current_steps": 155, "total_steps": 2536, "loss": 0.5411, "lr": 9.923135183152224e-05, "epoch": 0.4889589905362776, "percentage": 6.11, "elapsed_time": "0:14:01", "remaining_time": "3:35:26"}
|
||||
{"current_steps": 160, "total_steps": 2536, "loss": 0.5391, "lr": 9.91760240856717e-05, "epoch": 0.5047318611987381, "percentage": 6.31, "elapsed_time": "0:15:15", "remaining_time": "3:46:28"}
|
||||
{"current_steps": 165, "total_steps": 2536, "loss": 0.5194, "lr": 9.91187901889891e-05, "epoch": 0.5205047318611987, "percentage": 6.51, "elapsed_time": "0:16:26", "remaining_time": "3:56:19"}
|
||||
{"current_steps": 170, "total_steps": 2536, "loss": 0.5463, "lr": 9.905965235996286e-05, "epoch": 0.5362776025236593, "percentage": 6.7, "elapsed_time": "0:17:35", "remaining_time": "4:04:54"}
|
||||
{"current_steps": 175, "total_steps": 2536, "loss": 0.52, "lr": 9.899861289088121e-05, "epoch": 0.5520504731861199, "percentage": 6.9, "elapsed_time": "0:18:43", "remaining_time": "4:12:39"}
|
||||
{"current_steps": 180, "total_steps": 2536, "loss": 0.4807, "lr": 9.893567414774341e-05, "epoch": 0.5678233438485805, "percentage": 7.1, "elapsed_time": "0:19:52", "remaining_time": "4:20:08"}
|
||||
{"current_steps": 185, "total_steps": 2536, "loss": 0.5249, "lr": 9.88708385701679e-05, "epoch": 0.583596214511041, "percentage": 7.29, "elapsed_time": "0:21:06", "remaining_time": "4:28:14"}
|
||||
{"current_steps": 190, "total_steps": 2536, "loss": 0.5409, "lr": 9.88041086712979e-05, "epoch": 0.5993690851735016, "percentage": 7.49, "elapsed_time": "0:22:11", "remaining_time": "4:34:03"}
|
||||
{"current_steps": 195, "total_steps": 2536, "loss": 0.5271, "lr": 9.873548703770388e-05, "epoch": 0.6151419558359621, "percentage": 7.69, "elapsed_time": "0:23:19", "remaining_time": "4:40:02"}
|
||||
{"current_steps": 200, "total_steps": 2536, "loss": 0.5044, "lr": 9.866497632928336e-05, "epoch": 0.6309148264984227, "percentage": 7.89, "elapsed_time": "0:24:28", "remaining_time": "4:45:50"}
|
||||
{"current_steps": 105, "total_steps": 2536, "loss": 0.5122, "lr": 9.967935585723706e-05, "epoch": 0.3312302839116719, "percentage": 4.14, "elapsed_time": "0:02:02", "remaining_time": "0:47:25"}
|
||||
{"current_steps": 110, "total_steps": 2536, "loss": 0.5502, "lr": 9.964319534281397e-05, "epoch": 0.3470031545741325, "percentage": 4.34, "elapsed_time": "0:03:18", "remaining_time": "1:12:49"}
|
||||
{"current_steps": 115, "total_steps": 2536, "loss": 0.5506, "lr": 9.960511056916357e-05, "epoch": 0.3627760252365931, "percentage": 4.53, "elapsed_time": "0:04:20", "remaining_time": "1:31:27"}
|
||||
{"current_steps": 120, "total_steps": 2536, "loss": 0.5505, "lr": 9.956510301251995e-05, "epoch": 0.3785488958990536, "percentage": 4.73, "elapsed_time": "0:05:32", "remaining_time": "1:51:32"}
|
||||
{"current_steps": 125, "total_steps": 2536, "loss": 0.5214, "lr": 9.952317422364772e-05, "epoch": 0.3943217665615142, "percentage": 4.93, "elapsed_time": "0:06:42", "remaining_time": "2:09:24"}
|
||||
{"current_steps": 130, "total_steps": 2536, "loss": 0.515, "lr": 9.947932582778188e-05, "epoch": 0.41009463722397477, "percentage": 5.13, "elapsed_time": "0:07:53", "remaining_time": "2:26:06"}
|
||||
{"current_steps": 135, "total_steps": 2536, "loss": 0.5429, "lr": 9.943355952456483e-05, "epoch": 0.42586750788643535, "percentage": 5.32, "elapsed_time": "0:09:09", "remaining_time": "2:42:55"}
|
||||
{"current_steps": 140, "total_steps": 2536, "loss": 0.5364, "lr": 9.938587708798053e-05, "epoch": 0.4416403785488959, "percentage": 5.52, "elapsed_time": "0:10:21", "remaining_time": "2:57:21"}
|
||||
{"current_steps": 145, "total_steps": 2536, "loss": 0.5434, "lr": 9.933628036628569e-05, "epoch": 0.45741324921135645, "percentage": 5.72, "elapsed_time": "0:11:32", "remaining_time": "3:10:21"}
|
||||
{"current_steps": 150, "total_steps": 2536, "loss": 0.5111, "lr": 9.92847712819381e-05, "epoch": 0.47318611987381703, "percentage": 5.91, "elapsed_time": "0:12:44", "remaining_time": "3:22:33"}
|
||||
{"current_steps": 155, "total_steps": 2536, "loss": 0.542, "lr": 9.923135183152224e-05, "epoch": 0.4889589905362776, "percentage": 6.11, "elapsed_time": "0:13:55", "remaining_time": "3:33:47"}
|
||||
{"current_steps": 160, "total_steps": 2536, "loss": 0.5408, "lr": 9.91760240856717e-05, "epoch": 0.5047318611987381, "percentage": 6.31, "elapsed_time": "0:15:08", "remaining_time": "3:44:51"}
|
||||
{"current_steps": 165, "total_steps": 2536, "loss": 0.5203, "lr": 9.91187901889891e-05, "epoch": 0.5205047318611987, "percentage": 6.51, "elapsed_time": "0:16:20", "remaining_time": "3:54:54"}
|
||||
{"current_steps": 170, "total_steps": 2536, "loss": 0.5465, "lr": 9.905965235996286e-05, "epoch": 0.5362776025236593, "percentage": 6.7, "elapsed_time": "0:17:29", "remaining_time": "4:03:30"}
|
||||
{"current_steps": 175, "total_steps": 2536, "loss": 0.5197, "lr": 9.899861289088121e-05, "epoch": 0.5520504731861199, "percentage": 6.9, "elapsed_time": "0:18:37", "remaining_time": "4:11:13"}
|
||||
{"current_steps": 180, "total_steps": 2536, "loss": 0.4799, "lr": 9.893567414774341e-05, "epoch": 0.5678233438485805, "percentage": 7.1, "elapsed_time": "0:19:45", "remaining_time": "4:18:42"}
|
||||
{"current_steps": 185, "total_steps": 2536, "loss": 0.5248, "lr": 9.88708385701679e-05, "epoch": 0.583596214511041, "percentage": 7.29, "elapsed_time": "0:20:59", "remaining_time": "4:26:45"}
|
||||
{"current_steps": 190, "total_steps": 2536, "loss": 0.5407, "lr": 9.88041086712979e-05, "epoch": 0.5993690851735016, "percentage": 7.49, "elapsed_time": "0:22:04", "remaining_time": "4:32:35"}
|
||||
{"current_steps": 195, "total_steps": 2536, "loss": 0.5267, "lr": 9.873548703770388e-05, "epoch": 0.6151419558359621, "percentage": 7.69, "elapsed_time": "0:23:13", "remaining_time": "4:38:46"}
|
||||
{"current_steps": 200, "total_steps": 2536, "loss": 0.504, "lr": 9.866497632928336e-05, "epoch": 0.6309148264984227, "percentage": 7.89, "elapsed_time": "0:24:21", "remaining_time": "4:44:29"}
|
||||
{"current_steps": 205, "total_steps": 2536, "loss": 0.5286, "lr": 9.859257927915774e-05, "epoch": 0.6466876971608833, "percentage": 8.08, "elapsed_time": "0:29:22", "remaining_time": "5:33:58"}
|
||||
{"current_steps": 210, "total_steps": 2536, "loss": 0.5219, "lr": 9.851829869356651e-05, "epoch": 0.6624605678233438, "percentage": 8.28, "elapsed_time": "0:30:31", "remaining_time": "5:38:02"}
|
||||
{"current_steps": 215, "total_steps": 2536, "loss": 0.5191, "lr": 9.844213745175826e-05, "epoch": 0.6782334384858044, "percentage": 8.48, "elapsed_time": "0:31:42", "remaining_time": "5:42:16"}
|
||||
{"current_steps": 220, "total_steps": 2536, "loss": 0.5118, "lr": 9.83640985058792e-05, "epoch": 0.694006309148265, "percentage": 8.68, "elapsed_time": "0:32:52", "remaining_time": "5:46:03"}
|
||||
{"current_steps": 225, "total_steps": 2536, "loss": 0.5221, "lr": 9.828418488085877e-05, "epoch": 0.7097791798107256, "percentage": 8.87, "elapsed_time": "0:34:04", "remaining_time": "5:49:57"}
|
||||
{"current_steps": 230, "total_steps": 2536, "loss": 0.5113, "lr": 9.820239967429233e-05, "epoch": 0.7255520504731862, "percentage": 9.07, "elapsed_time": "0:35:13", "remaining_time": "5:53:11"}
|
||||
{"current_steps": 235, "total_steps": 2536, "loss": 0.5304, "lr": 9.811874605632104e-05, "epoch": 0.7413249211356467, "percentage": 9.27, "elapsed_time": "0:36:24", "remaining_time": "5:56:30"}
|
||||
{"current_steps": 240, "total_steps": 2536, "loss": 0.5083, "lr": 9.803322726950905e-05, "epoch": 0.7570977917981072, "percentage": 9.46, "elapsed_time": "0:37:28", "remaining_time": "5:58:27"}
|
||||
{"current_steps": 245, "total_steps": 2536, "loss": 0.52, "lr": 9.794584662871787e-05, "epoch": 0.7728706624605678, "percentage": 9.66, "elapsed_time": "0:38:34", "remaining_time": "6:00:46"}
|
||||
{"current_steps": 250, "total_steps": 2536, "loss": 0.4882, "lr": 9.785660752097768e-05, "epoch": 0.7886435331230284, "percentage": 9.86, "elapsed_time": "0:39:44", "remaining_time": "6:03:27"}
|
||||
{"current_steps": 255, "total_steps": 2536, "loss": 0.5434, "lr": 9.77655134053563e-05, "epoch": 0.804416403785489, "percentage": 10.06, "elapsed_time": "0:40:54", "remaining_time": "6:05:54"}
|
||||
{"current_steps": 260, "total_steps": 2536, "loss": 0.5356, "lr": 9.767256781282486e-05, "epoch": 0.8201892744479495, "percentage": 10.25, "elapsed_time": "0:42:04", "remaining_time": "6:08:16"}
|
||||
{"current_steps": 265, "total_steps": 2536, "loss": 0.5247, "lr": 9.757777434612116e-05, "epoch": 0.8359621451104101, "percentage": 10.45, "elapsed_time": "0:43:13", "remaining_time": "6:10:25"}
|
||||
{"current_steps": 270, "total_steps": 2536, "loss": 0.5046, "lr": 9.748113667960987e-05, "epoch": 0.8517350157728707, "percentage": 10.65, "elapsed_time": "0:44:15", "remaining_time": "6:11:25"}
|
||||
{"current_steps": 275, "total_steps": 2536, "loss": 0.5147, "lr": 9.738265855914013e-05, "epoch": 0.8675078864353313, "percentage": 10.84, "elapsed_time": "0:45:22", "remaining_time": "6:13:06"}
|
||||
{"current_steps": 280, "total_steps": 2536, "loss": 0.5112, "lr": 9.728234380190038e-05, "epoch": 0.8832807570977917, "percentage": 11.04, "elapsed_time": "0:46:32", "remaining_time": "6:15:00"}
|
||||
{"current_steps": 285, "total_steps": 2536, "loss": 0.5077, "lr": 9.718019629627045e-05, "epoch": 0.8990536277602523, "percentage": 11.24, "elapsed_time": "0:47:41", "remaining_time": "6:16:37"}
|
||||
{"current_steps": 290, "total_steps": 2536, "loss": 0.5297, "lr": 9.70762200016707e-05, "epoch": 0.9148264984227129, "percentage": 11.44, "elapsed_time": "0:48:49", "remaining_time": "6:18:05"}
|
||||
{"current_steps": 295, "total_steps": 2536, "loss": 0.5364, "lr": 9.697041894840865e-05, "epoch": 0.9305993690851735, "percentage": 11.63, "elapsed_time": "0:49:54", "remaining_time": "6:19:11"}
|
||||
{"current_steps": 300, "total_steps": 2536, "loss": 0.5107, "lr": 9.68627972375228e-05, "epoch": 0.9463722397476341, "percentage": 11.83, "elapsed_time": "0:51:05", "remaining_time": "6:20:50"}
|
||||
{"current_steps": 305, "total_steps": 2536, "loss": 0.4958, "lr": 9.675335904062353e-05, "epoch": 0.9621451104100947, "percentage": 12.03, "elapsed_time": "1:03:54", "remaining_time": "7:47:30"}
|
||||
{"current_steps": 310, "total_steps": 2536, "loss": 0.5171, "lr": 9.66421085997315e-05, "epoch": 0.9779179810725552, "percentage": 12.22, "elapsed_time": "1:05:05", "remaining_time": "7:47:23"}
|
||||
{"current_steps": 315, "total_steps": 2536, "loss": 0.5318, "lr": 9.65290502271132e-05, "epoch": 0.9936908517350158, "percentage": 12.42, "elapsed_time": "1:06:23", "remaining_time": "7:48:04"}
|
||||
{"current_steps": 320, "total_steps": 2536, "loss": 0.4546, "lr": 9.641418830511377e-05, "epoch": 1.0094637223974763, "percentage": 12.62, "elapsed_time": "1:07:40", "remaining_time": "7:48:36"}
|
||||
{"current_steps": 325, "total_steps": 2536, "loss": 0.4463, "lr": 9.62975272859872e-05, "epoch": 1.025236593059937, "percentage": 12.82, "elapsed_time": "1:08:46", "remaining_time": "7:47:55"}
|
||||
{"current_steps": 330, "total_steps": 2536, "loss": 0.4202, "lr": 9.617907169172367e-05, "epoch": 1.0410094637223974, "percentage": 13.01, "elapsed_time": "1:09:52", "remaining_time": "7:47:04"}
|
||||
{"current_steps": 335, "total_steps": 2536, "loss": 0.4191, "lr": 9.605882611387432e-05, "epoch": 1.0567823343848581, "percentage": 13.21, "elapsed_time": "1:10:53", "remaining_time": "7:45:46"}
|
||||
{"current_steps": 340, "total_steps": 2536, "loss": 0.4242, "lr": 9.593679521337327e-05, "epoch": 1.0725552050473186, "percentage": 13.41, "elapsed_time": "1:12:05", "remaining_time": "7:45:36"}
|
||||
{"current_steps": 345, "total_steps": 2536, "loss": 0.4375, "lr": 9.581298372035695e-05, "epoch": 1.088328075709779, "percentage": 13.6, "elapsed_time": "1:13:12", "remaining_time": "7:44:57"}
|
||||
{"current_steps": 350, "total_steps": 2536, "loss": 0.4139, "lr": 9.56873964339807e-05, "epoch": 1.1041009463722398, "percentage": 13.8, "elapsed_time": "1:14:20", "remaining_time": "7:44:19"}
|
||||
{"current_steps": 355, "total_steps": 2536, "loss": 0.4362, "lr": 9.556003822223287e-05, "epoch": 1.1198738170347002, "percentage": 14.0, "elapsed_time": "1:15:28", "remaining_time": "7:43:40"}
|
||||
{"current_steps": 360, "total_steps": 2536, "loss": 0.4258, "lr": 9.5430914021746e-05, "epoch": 1.135646687697161, "percentage": 14.2, "elapsed_time": "1:16:41", "remaining_time": "7:43:32"}
|
||||
{"current_steps": 365, "total_steps": 2536, "loss": 0.4447, "lr": 9.530002883760552e-05, "epoch": 1.1514195583596214, "percentage": 14.39, "elapsed_time": "1:17:46", "remaining_time": "7:42:38"}
|
||||
{"current_steps": 370, "total_steps": 2536, "loss": 0.4143, "lr": 9.516738774315577e-05, "epoch": 1.167192429022082, "percentage": 14.59, "elapsed_time": "1:18:59", "remaining_time": "7:42:27"}
|
||||
{"current_steps": 375, "total_steps": 2536, "loss": 0.4281, "lr": 9.503299587980331e-05, "epoch": 1.1829652996845426, "percentage": 14.79, "elapsed_time": "1:20:10", "remaining_time": "7:42:01"}
|
||||
{"current_steps": 380, "total_steps": 2536, "loss": 0.4347, "lr": 9.489685845681762e-05, "epoch": 1.1987381703470033, "percentage": 14.98, "elapsed_time": "1:21:20", "remaining_time": "7:41:32"}
|
||||
{"current_steps": 385, "total_steps": 2536, "loss": 0.4368, "lr": 9.47589807511292e-05, "epoch": 1.2145110410094637, "percentage": 15.18, "elapsed_time": "1:22:29", "remaining_time": "7:40:52"}
|
||||
{"current_steps": 390, "total_steps": 2536, "loss": 0.4168, "lr": 9.461936810712507e-05, "epoch": 1.2302839116719242, "percentage": 15.38, "elapsed_time": "1:23:33", "remaining_time": "7:39:45"}
|
||||
{"current_steps": 395, "total_steps": 2536, "loss": 0.4415, "lr": 9.447802593644152e-05, "epoch": 1.2460567823343849, "percentage": 15.58, "elapsed_time": "1:24:40", "remaining_time": "7:38:59"}
|
||||
{"current_steps": 400, "total_steps": 2536, "loss": 0.419, "lr": 9.433495971775444e-05, "epoch": 1.2618296529968454, "percentage": 15.77, "elapsed_time": "1:25:52", "remaining_time": "7:38:32"}
|
||||
{"current_steps": 405, "total_steps": 2536, "loss": 0.4336, "lr": 9.419017499656686e-05, "epoch": 1.277602523659306, "percentage": 15.97, "elapsed_time": "1:37:20", "remaining_time": "8:32:13"}
|
||||
{"current_steps": 410, "total_steps": 2536, "loss": 0.4441, "lr": 9.404367738499409e-05, "epoch": 1.2933753943217665, "percentage": 16.17, "elapsed_time": "1:38:31", "remaining_time": "8:30:51"}
|
||||
{"current_steps": 415, "total_steps": 2536, "loss": 0.4359, "lr": 9.38954725615461e-05, "epoch": 1.3091482649842272, "percentage": 16.36, "elapsed_time": "1:39:37", "remaining_time": "8:29:12"}
|
||||
{"current_steps": 420, "total_steps": 2536, "loss": 0.4434, "lr": 9.374556627090749e-05, "epoch": 1.3249211356466877, "percentage": 16.56, "elapsed_time": "1:40:44", "remaining_time": "8:27:32"}
|
||||
{"current_steps": 425, "total_steps": 2536, "loss": 0.4405, "lr": 9.359396432371476e-05, "epoch": 1.3406940063091484, "percentage": 16.76, "elapsed_time": "1:41:54", "remaining_time": "8:26:09"}
|
||||
{"current_steps": 430, "total_steps": 2536, "loss": 0.4582, "lr": 9.344067259633112e-05, "epoch": 1.3564668769716088, "percentage": 16.96, "elapsed_time": "1:43:09", "remaining_time": "8:25:13"}
|
||||
{"current_steps": 435, "total_steps": 2536, "loss": 0.4309, "lr": 9.328569703061862e-05, "epoch": 1.3722397476340693, "percentage": 17.15, "elapsed_time": "1:44:14", "remaining_time": "8:23:27"}
|
||||
{"current_steps": 440, "total_steps": 2536, "loss": 0.4341, "lr": 9.3129043633708e-05, "epoch": 1.38801261829653, "percentage": 17.35, "elapsed_time": "1:45:17", "remaining_time": "8:21:33"}
|
||||
{"current_steps": 445, "total_steps": 2536, "loss": 0.4132, "lr": 9.297071847776568e-05, "epoch": 1.4037854889589905, "percentage": 17.55, "elapsed_time": "1:46:29", "remaining_time": "8:20:21"}
|
||||
{"current_steps": 450, "total_steps": 2536, "loss": 0.4408, "lr": 9.281072769975847e-05, "epoch": 1.4195583596214512, "percentage": 17.74, "elapsed_time": "1:47:40", "remaining_time": "8:19:07"}
|
||||
{"current_steps": 455, "total_steps": 2536, "loss": 0.4422, "lr": 9.264907750121568e-05, "epoch": 1.4353312302839116, "percentage": 17.94, "elapsed_time": "1:48:49", "remaining_time": "8:17:44"}
|
||||
{"current_steps": 460, "total_steps": 2536, "loss": 0.4453, "lr": 9.248577414798871e-05, "epoch": 1.4511041009463723, "percentage": 18.14, "elapsed_time": "1:49:51", "remaining_time": "8:15:48"}
|
||||
{"current_steps": 465, "total_steps": 2536, "loss": 0.4358, "lr": 9.232082397000826e-05, "epoch": 1.4668769716088328, "percentage": 18.34, "elapsed_time": "1:51:00", "remaining_time": "8:14:26"}
|
||||
{"current_steps": 470, "total_steps": 2536, "loss": 0.4281, "lr": 9.215423336103884e-05, "epoch": 1.4826498422712935, "percentage": 18.53, "elapsed_time": "1:52:10", "remaining_time": "8:13:04"}
|
||||
{"current_steps": 475, "total_steps": 2536, "loss": 0.4424, "lr": 9.198600877843105e-05, "epoch": 1.498422712933754, "percentage": 18.73, "elapsed_time": "1:53:11", "remaining_time": "8:11:06"}
|
||||
{"current_steps": 480, "total_steps": 2536, "loss": 0.4507, "lr": 9.181615674287121e-05, "epoch": 1.5141955835962144, "percentage": 18.93, "elapsed_time": "1:54:24", "remaining_time": "8:10:02"}
|
||||
{"current_steps": 485, "total_steps": 2536, "loss": 0.4405, "lr": 9.164468383812864e-05, "epoch": 1.5299684542586751, "percentage": 19.12, "elapsed_time": "1:55:27", "remaining_time": "8:08:15"}
|
||||
{"current_steps": 490, "total_steps": 2536, "loss": 0.4294, "lr": 9.147159671080049e-05, "epoch": 1.5457413249211358, "percentage": 19.32, "elapsed_time": "1:56:37", "remaining_time": "8:06:57"}
|
||||
{"current_steps": 495, "total_steps": 2536, "loss": 0.4239, "lr": 9.129690207005402e-05, "epoch": 1.5615141955835963, "percentage": 19.52, "elapsed_time": "1:57:50", "remaining_time": "8:05:53"}
|
||||
{"current_steps": 500, "total_steps": 2536, "loss": 0.4347, "lr": 9.11206066873666e-05, "epoch": 1.5772870662460567, "percentage": 19.72, "elapsed_time": "1:58:53", "remaining_time": "8:04:09"}
|
||||
{"current_steps": 505, "total_steps": 2536, "loss": 0.4593, "lr": 9.094271739626326e-05, "epoch": 1.5930599369085172, "percentage": 19.91, "elapsed_time": "2:11:33", "remaining_time": "8:49:06"}
|
||||
{"current_steps": 510, "total_steps": 2536, "loss": 0.4157, "lr": 9.076324109205174e-05, "epoch": 1.608832807570978, "percentage": 20.11, "elapsed_time": "2:12:45", "remaining_time": "8:47:23"}
|
||||
{"current_steps": 515, "total_steps": 2536, "loss": 0.4525, "lr": 9.058218473155528e-05, "epoch": 1.6246056782334386, "percentage": 20.31, "elapsed_time": "2:13:55", "remaining_time": "8:45:33"}
|
||||
{"current_steps": 520, "total_steps": 2536, "loss": 0.4214, "lr": 9.039955533284292e-05, "epoch": 1.640378548895899, "percentage": 20.5, "elapsed_time": "2:15:04", "remaining_time": "8:43:41"}
|
||||
{"current_steps": 525, "total_steps": 2536, "loss": 0.4461, "lr": 9.021535997495749e-05, "epoch": 1.6561514195583595, "percentage": 20.7, "elapsed_time": "2:16:10", "remaining_time": "8:41:35"}
|
||||
{"current_steps": 530, "total_steps": 2536, "loss": 0.4407, "lr": 9.002960579764116e-05, "epoch": 1.6719242902208202, "percentage": 20.9, "elapsed_time": "2:17:22", "remaining_time": "8:39:56"}
|
||||
{"current_steps": 535, "total_steps": 2536, "loss": 0.4314, "lr": 8.984230000105882e-05, "epoch": 1.687697160883281, "percentage": 21.1, "elapsed_time": "2:18:27", "remaining_time": "8:37:52"}
|
||||
{"current_steps": 540, "total_steps": 2536, "loss": 0.4398, "lr": 8.965344984551882e-05, "epoch": 1.7034700315457414, "percentage": 21.29, "elapsed_time": "2:19:29", "remaining_time": "8:35:37"}
|
||||
{"current_steps": 545, "total_steps": 2536, "loss": 0.4389, "lr": 8.946306265119167e-05, "epoch": 1.7192429022082019, "percentage": 21.49, "elapsed_time": "2:20:40", "remaining_time": "8:33:53"}
|
||||
{"current_steps": 550, "total_steps": 2536, "loss": 0.4288, "lr": 8.927114579782625e-05, "epoch": 1.7350157728706623, "percentage": 21.69, "elapsed_time": "2:21:51", "remaining_time": "8:32:14"}
|
||||
{"current_steps": 555, "total_steps": 2536, "loss": 0.4424, "lr": 8.907770672446381e-05, "epoch": 1.750788643533123, "percentage": 21.88, "elapsed_time": "2:23:02", "remaining_time": "8:30:35"}
|
||||
{"current_steps": 560, "total_steps": 2536, "loss": 0.4189, "lr": 8.888275292914948e-05, "epoch": 1.7665615141955837, "percentage": 22.08, "elapsed_time": "2:24:11", "remaining_time": "8:28:48"}
|
||||
{"current_steps": 565, "total_steps": 2536, "loss": 0.4083, "lr": 8.868629196864182e-05, "epoch": 1.7823343848580442, "percentage": 22.28, "elapsed_time": "2:25:14", "remaining_time": "8:26:40"}
|
||||
{"current_steps": 570, "total_steps": 2536, "loss": 0.4458, "lr": 8.848833145811976e-05, "epoch": 1.7981072555205047, "percentage": 22.48, "elapsed_time": "2:26:24", "remaining_time": "8:25:00"}
|
||||
{"current_steps": 575, "total_steps": 2536, "loss": 0.4215, "lr": 8.828887907088753e-05, "epoch": 1.8138801261829653, "percentage": 22.67, "elapsed_time": "2:27:34", "remaining_time": "8:23:18"}
|
||||
{"current_steps": 580, "total_steps": 2536, "loss": 0.439, "lr": 8.808794253807707e-05, "epoch": 1.8296529968454258, "percentage": 22.87, "elapsed_time": "2:28:37", "remaining_time": "8:21:14"}
|
||||
{"current_steps": 585, "total_steps": 2536, "loss": 0.4216, "lr": 8.788552964834859e-05, "epoch": 1.8454258675078865, "percentage": 23.07, "elapsed_time": "2:29:45", "remaining_time": "8:19:28"}
|
||||
{"current_steps": 590, "total_steps": 2536, "loss": 0.4411, "lr": 8.768164824758846e-05, "epoch": 1.861198738170347, "percentage": 23.26, "elapsed_time": "2:30:50", "remaining_time": "8:17:30"}
|
||||
{"current_steps": 595, "total_steps": 2536, "loss": 0.4492, "lr": 8.747630623860521e-05, "epoch": 1.8769716088328074, "percentage": 23.46, "elapsed_time": "2:32:04", "remaining_time": "8:16:05"}
|
||||
{"current_steps": 600, "total_steps": 2536, "loss": 0.4475, "lr": 8.726951158082311e-05, "epoch": 1.8927444794952681, "percentage": 23.66, "elapsed_time": "2:33:16", "remaining_time": "8:14:33"}
|
||||
{"current_steps": 605, "total_steps": 2536, "loss": 0.4272, "lr": 8.706127228997376e-05, "epoch": 1.9085173501577288, "percentage": 23.86, "elapsed_time": "2:45:02", "remaining_time": "8:46:46"}
|
||||
{"current_steps": 610, "total_steps": 2536, "loss": 0.4211, "lr": 8.685159643778528e-05, "epoch": 1.9242902208201893, "percentage": 24.05, "elapsed_time": "2:46:12", "remaining_time": "8:44:47"}
|
||||
{"current_steps": 615, "total_steps": 2536, "loss": 0.4143, "lr": 8.664049215166955e-05, "epoch": 1.9400630914826498, "percentage": 24.25, "elapsed_time": "2:47:22", "remaining_time": "8:42:49"}
|
||||
{"current_steps": 620, "total_steps": 2536, "loss": 0.4392, "lr": 8.6427967614407e-05, "epoch": 1.9558359621451105, "percentage": 24.45, "elapsed_time": "2:48:27", "remaining_time": "8:40:36"}
|
||||
{"current_steps": 625, "total_steps": 2536, "loss": 0.4323, "lr": 8.621403106382968e-05, "epoch": 1.971608832807571, "percentage": 24.65, "elapsed_time": "2:49:36", "remaining_time": "8:38:34"}
|
||||
{"current_steps": 630, "total_steps": 2536, "loss": 0.4511, "lr": 8.599869079250165e-05, "epoch": 1.9873817034700316, "percentage": 24.84, "elapsed_time": "2:50:44", "remaining_time": "8:36:33"}
|
||||
{"current_steps": 635, "total_steps": 2536, "loss": 0.4148, "lr": 8.578195514739784e-05, "epoch": 2.003154574132492, "percentage": 25.04, "elapsed_time": "2:51:48", "remaining_time": "8:34:19"}
|
||||
{"current_steps": 640, "total_steps": 2536, "loss": 0.3381, "lr": 8.556383252958026e-05, "epoch": 2.0189274447949526, "percentage": 25.24, "elapsed_time": "2:52:57", "remaining_time": "8:32:23"}
|
||||
{"current_steps": 645, "total_steps": 2536, "loss": 0.3389, "lr": 8.534433139387259e-05, "epoch": 2.034700315457413, "percentage": 25.43, "elapsed_time": "2:54:04", "remaining_time": "8:30:21"}
|
||||
{"current_steps": 650, "total_steps": 2536, "loss": 0.3169, "lr": 8.512346024853219e-05, "epoch": 2.050473186119874, "percentage": 25.63, "elapsed_time": "2:55:12", "remaining_time": "8:28:23"}
|
||||
{"current_steps": 655, "total_steps": 2536, "loss": 0.3269, "lr": 8.490122765492057e-05, "epoch": 2.0662460567823344, "percentage": 25.83, "elapsed_time": "2:56:21", "remaining_time": "8:26:28"}
|
||||
{"current_steps": 660, "total_steps": 2536, "loss": 0.325, "lr": 8.467764222717136e-05, "epoch": 2.082018927444795, "percentage": 26.03, "elapsed_time": "2:57:29", "remaining_time": "8:24:29"}
|
||||
{"current_steps": 665, "total_steps": 2536, "loss": 0.333, "lr": 8.445271263185646e-05, "epoch": 2.0977917981072554, "percentage": 26.22, "elapsed_time": "2:58:34", "remaining_time": "8:22:26"}
|
||||
{"current_steps": 670, "total_steps": 2536, "loss": 0.3071, "lr": 8.422644758765012e-05, "epoch": 2.1135646687697163, "percentage": 26.42, "elapsed_time": "2:59:41", "remaining_time": "8:20:26"}
|
||||
{"current_steps": 675, "total_steps": 2536, "loss": 0.3367, "lr": 8.399885586499101e-05, "epoch": 2.1293375394321767, "percentage": 26.62, "elapsed_time": "3:00:51", "remaining_time": "8:18:38"}
|
||||
{"current_steps": 680, "total_steps": 2536, "loss": 0.3413, "lr": 8.376994628574219e-05, "epoch": 2.145110410094637, "percentage": 26.81, "elapsed_time": "3:02:00", "remaining_time": "8:16:47"}
|
||||
{"current_steps": 685, "total_steps": 2536, "loss": 0.3347, "lr": 8.353972772284927e-05, "epoch": 2.1608832807570977, "percentage": 27.01, "elapsed_time": "3:03:09", "remaining_time": "8:14:56"}
|
||||
{"current_steps": 690, "total_steps": 2536, "loss": 0.3309, "lr": 8.330820909999633e-05, "epoch": 2.176656151419558, "percentage": 27.21, "elapsed_time": "3:04:20", "remaining_time": "8:13:11"}
|
||||
{"current_steps": 695, "total_steps": 2536, "loss": 0.3455, "lr": 8.307539939126016e-05, "epoch": 2.192429022082019, "percentage": 27.41, "elapsed_time": "3:05:25", "remaining_time": "8:11:09"}
|
||||
{"current_steps": 700, "total_steps": 2536, "loss": 0.3309, "lr": 8.284130762076235e-05, "epoch": 2.2082018927444795, "percentage": 27.6, "elapsed_time": "3:06:33", "remaining_time": "8:09:18"}
|
||||
{"current_steps": 705, "total_steps": 2536, "loss": 0.3183, "lr": 8.260594286231947e-05, "epoch": 2.22397476340694, "percentage": 27.8, "elapsed_time": "3:15:23", "remaining_time": "8:27:28"}
|
||||
{"current_steps": 710, "total_steps": 2536, "loss": 0.349, "lr": 8.236931423909138e-05, "epoch": 2.2397476340694005, "percentage": 28.0, "elapsed_time": "3:16:37", "remaining_time": "8:25:42"}
|
||||
{"current_steps": 715, "total_steps": 2536, "loss": 0.3271, "lr": 8.213143092322769e-05, "epoch": 2.2555205047318614, "percentage": 28.19, "elapsed_time": "3:17:43", "remaining_time": "8:23:35"}
|
||||
{"current_steps": 720, "total_steps": 2536, "loss": 0.3097, "lr": 8.189230213551202e-05, "epoch": 2.271293375394322, "percentage": 28.39, "elapsed_time": "3:18:50", "remaining_time": "8:21:32"}
|
||||
{"current_steps": 725, "total_steps": 2536, "loss": 0.3464, "lr": 8.165193714500481e-05, "epoch": 2.2870662460567823, "percentage": 28.59, "elapsed_time": "3:19:54", "remaining_time": "8:19:20"}
|
||||
{"current_steps": 730, "total_steps": 2536, "loss": 0.3422, "lr": 8.141034526868389e-05, "epoch": 2.302839116719243, "percentage": 28.79, "elapsed_time": "3:21:05", "remaining_time": "8:17:29"}
|
||||
{"current_steps": 735, "total_steps": 2536, "loss": 0.3341, "lr": 8.116753587108339e-05, "epoch": 2.3186119873817033, "percentage": 28.98, "elapsed_time": "3:22:13", "remaining_time": "8:15:31"}
|
||||
{"current_steps": 740, "total_steps": 2536, "loss": 0.3144, "lr": 8.092351836393076e-05, "epoch": 2.334384858044164, "percentage": 29.18, "elapsed_time": "3:23:24", "remaining_time": "8:13:41"}
|
||||
{"current_steps": 745, "total_steps": 2536, "loss": 0.344, "lr": 8.067830220578191e-05, "epoch": 2.3501577287066246, "percentage": 29.38, "elapsed_time": "3:24:35", "remaining_time": "8:11:50"}
|
||||
{"current_steps": 750, "total_steps": 2536, "loss": 0.336, "lr": 8.043189690165467e-05, "epoch": 2.365930599369085, "percentage": 29.57, "elapsed_time": "3:25:46", "remaining_time": "8:10:01"}
|
||||
{"current_steps": 755, "total_steps": 2536, "loss": 0.3277, "lr": 8.018431200266023e-05, "epoch": 2.3817034700315456, "percentage": 29.77, "elapsed_time": "3:26:56", "remaining_time": "8:08:10"}
|
||||
{"current_steps": 760, "total_steps": 2536, "loss": 0.3289, "lr": 7.993555710563303e-05, "epoch": 2.3974763406940065, "percentage": 29.97, "elapsed_time": "3:28:02", "remaining_time": "8:06:08"}
|
||||
{"current_steps": 765, "total_steps": 2536, "loss": 0.3337, "lr": 7.968564185275873e-05, "epoch": 2.413249211356467, "percentage": 30.17, "elapsed_time": "3:29:13", "remaining_time": "8:04:21"}
|
||||
{"current_steps": 770, "total_steps": 2536, "loss": 0.3592, "lr": 7.943457593120045e-05, "epoch": 2.4290220820189274, "percentage": 30.36, "elapsed_time": "3:30:22", "remaining_time": "8:02:29"}
|
||||
{"current_steps": 775, "total_steps": 2536, "loss": 0.33, "lr": 7.918236907272327e-05, "epoch": 2.444794952681388, "percentage": 30.56, "elapsed_time": "3:31:34", "remaining_time": "8:00:44"}
|
||||
{"current_steps": 780, "total_steps": 2536, "loss": 0.3429, "lr": 7.892903105331712e-05, "epoch": 2.4605678233438484, "percentage": 30.76, "elapsed_time": "3:32:40", "remaining_time": "7:58:47"}
|
||||
{"current_steps": 785, "total_steps": 2536, "loss": 0.354, "lr": 7.867457169281765e-05, "epoch": 2.4763406940063093, "percentage": 30.95, "elapsed_time": "3:33:46", "remaining_time": "7:56:50"}
|
||||
{"current_steps": 790, "total_steps": 2536, "loss": 0.3407, "lr": 7.841900085452574e-05, "epoch": 2.4921135646687698, "percentage": 31.15, "elapsed_time": "3:34:55", "remaining_time": "7:55:00"}
|
||||
{"current_steps": 795, "total_steps": 2536, "loss": 0.3323, "lr": 7.816232844482516e-05, "epoch": 2.5078864353312302, "percentage": 31.35, "elapsed_time": "3:35:59", "remaining_time": "7:53:00"}
|
||||
{"current_steps": 800, "total_steps": 2536, "loss": 0.3453, "lr": 7.790456441279853e-05, "epoch": 2.5236593059936907, "percentage": 31.55, "elapsed_time": "3:37:09", "remaining_time": "7:51:12"}
|
||||
{"current_steps": 805, "total_steps": 2536, "loss": 0.3405, "lr": 7.764571874984174e-05, "epoch": 2.5394321766561516, "percentage": 31.74, "elapsed_time": "3:50:33", "remaining_time": "8:15:45"}
|
||||
{"current_steps": 810, "total_steps": 2536, "loss": 0.3383, "lr": 7.73858014892766e-05, "epoch": 2.555205047318612, "percentage": 31.94, "elapsed_time": "3:51:42", "remaining_time": "8:13:44"}
|
||||
{"current_steps": 815, "total_steps": 2536, "loss": 0.3684, "lr": 7.712482270596199e-05, "epoch": 2.5709779179810726, "percentage": 32.14, "elapsed_time": "3:52:50", "remaining_time": "8:11:41"}
|
||||
{"current_steps": 820, "total_steps": 2536, "loss": 0.3348, "lr": 7.686279251590331e-05, "epoch": 2.586750788643533, "percentage": 32.33, "elapsed_time": "3:53:46", "remaining_time": "8:09:12"}
|
||||
{"current_steps": 825, "total_steps": 2536, "loss": 0.3327, "lr": 7.659972107586035e-05, "epoch": 2.6025236593059935, "percentage": 32.53, "elapsed_time": "3:54:52", "remaining_time": "8:07:07"}
|
||||
{"current_steps": 830, "total_steps": 2536, "loss": 0.3234, "lr": 7.633561858295364e-05, "epoch": 2.6182965299684544, "percentage": 32.73, "elapsed_time": "3:56:04", "remaining_time": "8:05:14"}
|
||||
{"current_steps": 835, "total_steps": 2536, "loss": 0.3435, "lr": 7.607049527426916e-05, "epoch": 2.634069400630915, "percentage": 32.93, "elapsed_time": "3:57:13", "remaining_time": "8:03:14"}
|
||||
{"current_steps": 840, "total_steps": 2536, "loss": 0.3437, "lr": 7.580436142646155e-05, "epoch": 2.6498422712933754, "percentage": 33.12, "elapsed_time": "3:58:17", "remaining_time": "8:01:06"}
|
||||
{"current_steps": 845, "total_steps": 2536, "loss": 0.3291, "lr": 7.55372273553557e-05, "epoch": 2.665615141955836, "percentage": 33.32, "elapsed_time": "3:59:24", "remaining_time": "7:59:05"}
|
||||
{"current_steps": 850, "total_steps": 2536, "loss": 0.345, "lr": 7.526910341554703e-05, "epoch": 2.6813880126182967, "percentage": 33.52, "elapsed_time": "4:00:29", "remaining_time": "7:57:00"}
|
||||
{"current_steps": 855, "total_steps": 2536, "loss": 0.3484, "lr": 7.500000000000001e-05, "epoch": 2.697160883280757, "percentage": 33.71, "elapsed_time": "4:01:41", "remaining_time": "7:55:10"}
|
||||
{"current_steps": 860, "total_steps": 2536, "loss": 0.3473, "lr": 7.472992753964532e-05, "epoch": 2.7129337539432177, "percentage": 33.91, "elapsed_time": "4:02:49", "remaining_time": "7:53:12"}
|
||||
{"current_steps": 865, "total_steps": 2536, "loss": 0.3463, "lr": 7.445889650297559e-05, "epoch": 2.728706624605678, "percentage": 34.11, "elapsed_time": "4:03:55", "remaining_time": "7:51:13"}
|
||||
{"current_steps": 870, "total_steps": 2536, "loss": 0.3287, "lr": 7.418691739563957e-05, "epoch": 2.7444794952681386, "percentage": 34.31, "elapsed_time": "4:05:04", "remaining_time": "7:49:18"}
|
||||
{"current_steps": 875, "total_steps": 2536, "loss": 0.3552, "lr": 7.391400076003492e-05, "epoch": 2.7602523659305995, "percentage": 34.5, "elapsed_time": "4:06:14", "remaining_time": "7:47:26"}
|
||||
{"current_steps": 880, "total_steps": 2536, "loss": 0.3412, "lr": 7.36401571748996e-05, "epoch": 2.77602523659306, "percentage": 34.7, "elapsed_time": "4:07:22", "remaining_time": "7:45:29"}
|
||||
{"current_steps": 885, "total_steps": 2536, "loss": 0.3486, "lr": 7.336539725490178e-05, "epoch": 2.7917981072555205, "percentage": 34.9, "elapsed_time": "4:08:24", "remaining_time": "7:43:24"}
|
||||
{"current_steps": 890, "total_steps": 2536, "loss": 0.3511, "lr": 7.30897316502284e-05, "epoch": 2.807570977917981, "percentage": 35.09, "elapsed_time": "4:09:29", "remaining_time": "7:41:24"}
|
||||
{"current_steps": 895, "total_steps": 2536, "loss": 0.3477, "lr": 7.281317104617239e-05, "epoch": 2.823343848580442, "percentage": 35.29, "elapsed_time": "4:10:37", "remaining_time": "7:39:30"}
|
||||
{"current_steps": 900, "total_steps": 2536, "loss": 0.3471, "lr": 7.253572616271844e-05, "epoch": 2.8391167192429023, "percentage": 35.49, "elapsed_time": "4:11:47", "remaining_time": "7:37:42"}
|
||||
{"current_steps": 905, "total_steps": 2536, "loss": 0.3324, "lr": 7.225740775412751e-05, "epoch": 2.854889589905363, "percentage": 35.69, "elapsed_time": "4:23:31", "remaining_time": "7:54:55"}
|
||||
{"current_steps": 910, "total_steps": 2536, "loss": 0.3266, "lr": 7.197822660851991e-05, "epoch": 2.8706624605678233, "percentage": 35.88, "elapsed_time": "4:24:38", "remaining_time": "7:52:52"}
|
||||
{"current_steps": 915, "total_steps": 2536, "loss": 0.3351, "lr": 7.169819354745725e-05, "epoch": 2.8864353312302837, "percentage": 36.08, "elapsed_time": "4:25:50", "remaining_time": "7:50:56"}
|
||||
{"current_steps": 920, "total_steps": 2536, "loss": 0.3322, "lr": 7.141731942552288e-05, "epoch": 2.9022082018927446, "percentage": 36.28, "elapsed_time": "4:26:58", "remaining_time": "7:48:56"}
|
||||
{"current_steps": 925, "total_steps": 2536, "loss": 0.3419, "lr": 7.113561512990119e-05, "epoch": 2.917981072555205, "percentage": 36.47, "elapsed_time": "4:28:05", "remaining_time": "7:46:55"}
|
||||
{"current_steps": 930, "total_steps": 2536, "loss": 0.3696, "lr": 7.085309157995557e-05, "epoch": 2.9337539432176656, "percentage": 36.67, "elapsed_time": "4:29:15", "remaining_time": "7:44:57"}
|
||||
{"current_steps": 935, "total_steps": 2536, "loss": 0.3571, "lr": 7.056975972680517e-05, "epoch": 2.949526813880126, "percentage": 36.87, "elapsed_time": "4:30:26", "remaining_time": "7:43:04"}
|
||||
{"current_steps": 940, "total_steps": 2536, "loss": 0.3197, "lr": 7.028563055290044e-05, "epoch": 2.965299684542587, "percentage": 37.07, "elapsed_time": "4:31:36", "remaining_time": "7:41:09"}
|
||||
{"current_steps": 945, "total_steps": 2536, "loss": 0.355, "lr": 7.000071507159744e-05, "epoch": 2.9810725552050474, "percentage": 37.26, "elapsed_time": "4:32:37", "remaining_time": "7:38:59"}
|
||||
{"current_steps": 950, "total_steps": 2536, "loss": 0.3426, "lr": 6.971502432673085e-05, "epoch": 2.996845425867508, "percentage": 37.46, "elapsed_time": "4:33:44", "remaining_time": "7:37:00"}
|
||||
{"current_steps": 955, "total_steps": 2536, "loss": 0.2364, "lr": 6.942856939218599e-05, "epoch": 3.0126182965299684, "percentage": 37.66, "elapsed_time": "4:34:51", "remaining_time": "7:35:02"}
|
||||
{"current_steps": 960, "total_steps": 2536, "loss": 0.233, "lr": 6.914136137146951e-05, "epoch": 3.028391167192429, "percentage": 37.85, "elapsed_time": "4:35:53", "remaining_time": "7:32:55"}
|
||||
{"current_steps": 965, "total_steps": 2536, "loss": 0.2151, "lr": 6.885341139727912e-05, "epoch": 3.0441640378548898, "percentage": 38.05, "elapsed_time": "4:37:07", "remaining_time": "7:31:09"}
|
||||
{"current_steps": 970, "total_steps": 2536, "loss": 0.214, "lr": 6.856473063107187e-05, "epoch": 3.0599369085173502, "percentage": 38.25, "elapsed_time": "4:38:17", "remaining_time": "7:29:17"}
|
||||
{"current_steps": 975, "total_steps": 2536, "loss": 0.2324, "lr": 6.827533026263169e-05, "epoch": 3.0757097791798107, "percentage": 38.45, "elapsed_time": "4:39:21", "remaining_time": "7:27:15"}
|
||||
{"current_steps": 980, "total_steps": 2536, "loss": 0.2165, "lr": 6.798522150963552e-05, "epoch": 3.091482649842271, "percentage": 38.64, "elapsed_time": "4:40:32", "remaining_time": "7:25:25"}
|
||||
{"current_steps": 985, "total_steps": 2536, "loss": 0.2348, "lr": 6.769441561721863e-05, "epoch": 3.107255520504732, "percentage": 38.84, "elapsed_time": "4:41:41", "remaining_time": "7:23:33"}
|
||||
{"current_steps": 990, "total_steps": 2536, "loss": 0.2303, "lr": 6.740292385753858e-05, "epoch": 3.1230283911671926, "percentage": 39.04, "elapsed_time": "4:42:55", "remaining_time": "7:21:48"}
|
||||
{"current_steps": 995, "total_steps": 2536, "loss": 0.2271, "lr": 6.711075752933847e-05, "epoch": 3.138801261829653, "percentage": 39.24, "elapsed_time": "4:44:00", "remaining_time": "7:19:51"}
|
||||
{"current_steps": 1000, "total_steps": 2536, "loss": 0.2236, "lr": 6.681792795750875e-05, "epoch": 3.1545741324921135, "percentage": 39.43, "elapsed_time": "4:45:10", "remaining_time": "7:18:02"}
|
||||
{"current_steps": 1005, "total_steps": 2536, "loss": 0.2405, "lr": 6.652444649264856e-05, "epoch": 3.170347003154574, "percentage": 39.63, "elapsed_time": "4:50:52", "remaining_time": "7:23:06"}
|
||||
{"current_steps": 1010, "total_steps": 2536, "loss": 0.2386, "lr": 6.623032451062542e-05, "epoch": 3.186119873817035, "percentage": 39.83, "elapsed_time": "4:52:00", "remaining_time": "7:21:11"}
|
||||
{"current_steps": 1015, "total_steps": 2536, "loss": 0.2375, "lr": 6.593557341213457e-05, "epoch": 3.2018927444794953, "percentage": 40.02, "elapsed_time": "4:53:02", "remaining_time": "7:19:07"}
|
||||
{"current_steps": 1020, "total_steps": 2536, "loss": 0.2403, "lr": 6.564020462225679e-05, "epoch": 3.217665615141956, "percentage": 40.22, "elapsed_time": "4:54:04", "remaining_time": "7:17:03"}
|
||||
{"current_steps": 1025, "total_steps": 2536, "loss": 0.2277, "lr": 6.534422959001585e-05, "epoch": 3.2334384858044163, "percentage": 40.42, "elapsed_time": "4:55:11", "remaining_time": "7:15:09"}
|
||||
{"current_steps": 1030, "total_steps": 2536, "loss": 0.23, "lr": 6.504765978793443e-05, "epoch": 3.249211356466877, "percentage": 40.62, "elapsed_time": "4:56:17", "remaining_time": "7:13:13"}
|
||||
{"current_steps": 1035, "total_steps": 2536, "loss": 0.2298, "lr": 6.475050671158961e-05, "epoch": 3.2649842271293377, "percentage": 40.81, "elapsed_time": "4:57:24", "remaining_time": "7:11:18"}
|
||||
{"current_steps": 1040, "total_steps": 2536, "loss": 0.2231, "lr": 6.445278187916722e-05, "epoch": 3.280757097791798, "percentage": 41.01, "elapsed_time": "4:58:29", "remaining_time": "7:09:22"}
|
||||
{"current_steps": 1045, "total_steps": 2536, "loss": 0.2406, "lr": 6.415449683101537e-05, "epoch": 3.2965299684542586, "percentage": 41.21, "elapsed_time": "4:59:39", "remaining_time": "7:07:32"}
|
||||
{"current_steps": 1050, "total_steps": 2536, "loss": 0.2343, "lr": 6.385566312919716e-05, "epoch": 3.312302839116719, "percentage": 41.4, "elapsed_time": "5:00:48", "remaining_time": "7:05:42"}
|
||||
{"current_steps": 1055, "total_steps": 2536, "loss": 0.221, "lr": 6.355629235704248e-05, "epoch": 3.32807570977918, "percentage": 41.6, "elapsed_time": "5:01:58", "remaining_time": "7:03:55"}
|
||||
{"current_steps": 1060, "total_steps": 2536, "loss": 0.2393, "lr": 6.3256396118699e-05, "epoch": 3.3438485804416405, "percentage": 41.8, "elapsed_time": "5:03:05", "remaining_time": "7:02:02"}
|
||||
{"current_steps": 1065, "total_steps": 2536, "loss": 0.2323, "lr": 6.295598603868246e-05, "epoch": 3.359621451104101, "percentage": 42.0, "elapsed_time": "5:04:09", "remaining_time": "7:00:07"}
|
||||
{"current_steps": 1070, "total_steps": 2536, "loss": 0.2252, "lr": 6.265507376142594e-05, "epoch": 3.3753943217665614, "percentage": 42.19, "elapsed_time": "5:05:16", "remaining_time": "6:58:15"}
|
||||
{"current_steps": 1075, "total_steps": 2536, "loss": 0.2266, "lr": 6.235367095082867e-05, "epoch": 3.3911671924290223, "percentage": 42.39, "elapsed_time": "5:06:23", "remaining_time": "6:56:24"}
|
||||
{"current_steps": 1080, "total_steps": 2536, "loss": 0.239, "lr": 6.205178928980377e-05, "epoch": 3.406940063091483, "percentage": 42.59, "elapsed_time": "5:07:32", "remaining_time": "6:54:36"}
|
||||
{"current_steps": 1085, "total_steps": 2536, "loss": 0.2496, "lr": 6.174944047982549e-05, "epoch": 3.4227129337539433, "percentage": 42.78, "elapsed_time": "5:08:42", "remaining_time": "6:52:50"}
|
||||
{"current_steps": 1090, "total_steps": 2536, "loss": 0.2247, "lr": 6.144663624047564e-05, "epoch": 3.4384858044164037, "percentage": 42.98, "elapsed_time": "5:09:53", "remaining_time": "6:51:06"}
|
||||
{"current_steps": 1095, "total_steps": 2536, "loss": 0.2368, "lr": 6.114338830898922e-05, "epoch": 3.454258675078864, "percentage": 43.18, "elapsed_time": "5:11:02", "remaining_time": "6:49:19"}
|
||||
{"current_steps": 1100, "total_steps": 2536, "loss": 0.2427, "lr": 6.083970843979957e-05, "epoch": 3.470031545741325, "percentage": 43.38, "elapsed_time": "5:12:14", "remaining_time": "6:47:36"}
|
||||
{"current_steps": 1105, "total_steps": 2536, "loss": 0.2328, "lr": 6.0535608404082724e-05, "epoch": 3.4858044164037856, "percentage": 43.57, "elapsed_time": "5:20:38", "remaining_time": "6:55:14"}
|
||||
{"current_steps": 1110, "total_steps": 2536, "loss": 0.2434, "lr": 6.0231099989301086e-05, "epoch": 3.501577287066246, "percentage": 43.77, "elapsed_time": "5:21:47", "remaining_time": "6:53:24"}
|
||||
{"current_steps": 1115, "total_steps": 2536, "loss": 0.2365, "lr": 5.9926194998746624e-05, "epoch": 3.5173501577287065, "percentage": 43.97, "elapsed_time": "5:22:57", "remaining_time": "6:51:36"}
|
||||
{"current_steps": 1120, "total_steps": 2536, "loss": 0.226, "lr": 5.9620905251083196e-05, "epoch": 3.5331230283911674, "percentage": 44.16, "elapsed_time": "5:24:05", "remaining_time": "6:49:44"}
|
||||
{"current_steps": 1125, "total_steps": 2536, "loss": 0.2384, "lr": 5.931524257988864e-05, "epoch": 3.548895899053628, "percentage": 44.36, "elapsed_time": "5:25:14", "remaining_time": "6:47:55"}
|
||||
{"current_steps": 1130, "total_steps": 2536, "loss": 0.2344, "lr": 5.900921883319591e-05, "epoch": 3.5646687697160884, "percentage": 44.56, "elapsed_time": "5:26:24", "remaining_time": "6:46:08"}
|
||||
{"current_steps": 1135, "total_steps": 2536, "loss": 0.2393, "lr": 5.870284587303394e-05, "epoch": 3.580441640378549, "percentage": 44.76, "elapsed_time": "5:27:28", "remaining_time": "6:44:13"}
|
||||
{"current_steps": 1140, "total_steps": 2536, "loss": 0.2234, "lr": 5.839613557496776e-05, "epoch": 3.5962145110410093, "percentage": 44.95, "elapsed_time": "5:28:38", "remaining_time": "6:42:26"}
|
||||
{"current_steps": 1145, "total_steps": 2536, "loss": 0.2269, "lr": 5.808909982763825e-05, "epoch": 3.61198738170347, "percentage": 45.15, "elapsed_time": "5:29:48", "remaining_time": "6:40:40"}
|
||||
{"current_steps": 1150, "total_steps": 2536, "loss": 0.2335, "lr": 5.778175053230126e-05, "epoch": 3.6277602523659307, "percentage": 45.35, "elapsed_time": "5:30:55", "remaining_time": "6:38:50"}
|
||||
{"current_steps": 1155, "total_steps": 2536, "loss": 0.2214, "lr": 5.747409960236637e-05, "epoch": 3.643533123028391, "percentage": 45.54, "elapsed_time": "5:32:02", "remaining_time": "6:37:00"}
|
||||
{"current_steps": 1160, "total_steps": 2536, "loss": 0.2414, "lr": 5.716615896293501e-05, "epoch": 3.6593059936908516, "percentage": 45.74, "elapsed_time": "5:33:10", "remaining_time": "6:35:12"}
|
||||
{"current_steps": 1165, "total_steps": 2536, "loss": 0.2357, "lr": 5.68579405503383e-05, "epoch": 3.6750788643533125, "percentage": 45.94, "elapsed_time": "5:34:17", "remaining_time": "6:33:24"}
|
||||
{"current_steps": 1170, "total_steps": 2536, "loss": 0.2354, "lr": 5.654945631167433e-05, "epoch": 3.690851735015773, "percentage": 46.14, "elapsed_time": "5:35:30", "remaining_time": "6:31:42"}
|
||||
{"current_steps": 1175, "total_steps": 2536, "loss": 0.2393, "lr": 5.624071820434508e-05, "epoch": 3.7066246056782335, "percentage": 46.33, "elapsed_time": "5:36:36", "remaining_time": "6:29:53"}
|
||||
{"current_steps": 1180, "total_steps": 2536, "loss": 0.2374, "lr": 5.593173819559294e-05, "epoch": 3.722397476340694, "percentage": 46.53, "elapsed_time": "5:37:42", "remaining_time": "6:28:04"}
|
||||
{"current_steps": 1185, "total_steps": 2536, "loss": 0.2301, "lr": 5.562252826203687e-05, "epoch": 3.7381703470031544, "percentage": 46.73, "elapsed_time": "5:38:45", "remaining_time": "6:26:12"}
|
||||
{"current_steps": 1190, "total_steps": 2536, "loss": 0.2463, "lr": 5.531310038920805e-05, "epoch": 3.753943217665615, "percentage": 46.92, "elapsed_time": "5:39:53", "remaining_time": "6:24:26"}
|
||||
{"current_steps": 1195, "total_steps": 2536, "loss": 0.2305, "lr": 5.500346657108545e-05, "epoch": 3.769716088328076, "percentage": 47.12, "elapsed_time": "5:41:03", "remaining_time": "6:22:44"}
|
||||
{"current_steps": 1200, "total_steps": 2536, "loss": 0.2337, "lr": 5.469363880963082e-05, "epoch": 3.7854889589905363, "percentage": 47.32, "elapsed_time": "5:42:11", "remaining_time": "6:20:58"}
|
||||
{"current_steps": 1205, "total_steps": 2536, "loss": 0.2386, "lr": 5.438362911432347e-05, "epoch": 3.8012618296529967, "percentage": 47.52, "elapsed_time": "5:54:05", "remaining_time": "6:31:06"}
|
||||
{"current_steps": 1210, "total_steps": 2536, "loss": 0.2424, "lr": 5.407344950169486e-05, "epoch": 3.8170347003154577, "percentage": 47.71, "elapsed_time": "5:55:10", "remaining_time": "6:29:13"}
|
||||
{"current_steps": 1215, "total_steps": 2536, "loss": 0.2444, "lr": 5.376311199486268e-05, "epoch": 3.832807570977918, "percentage": 47.91, "elapsed_time": "5:56:13", "remaining_time": "6:27:18"}
|
||||
{"current_steps": 1220, "total_steps": 2536, "loss": 0.2337, "lr": 5.3452628623064934e-05, "epoch": 3.8485804416403786, "percentage": 48.11, "elapsed_time": "5:57:19", "remaining_time": "6:25:27"}
|
||||
{"current_steps": 1225, "total_steps": 2536, "loss": 0.2377, "lr": 5.31420114211936e-05, "epoch": 3.864353312302839, "percentage": 48.3, "elapsed_time": "5:58:31", "remaining_time": "6:23:41"}
|
||||
{"current_steps": 1230, "total_steps": 2536, "loss": 0.2263, "lr": 5.2831272429328116e-05, "epoch": 3.8801261829652995, "percentage": 48.5, "elapsed_time": "5:59:40", "remaining_time": "6:21:54"}
|
||||
{"current_steps": 1235, "total_steps": 2536, "loss": 0.2235, "lr": 5.2520423692268775e-05, "epoch": 3.89589905362776, "percentage": 48.7, "elapsed_time": "6:00:53", "remaining_time": "6:20:10"}
|
||||
{"current_steps": 1240, "total_steps": 2536, "loss": 0.2333, "lr": 5.220947725906975e-05, "epoch": 3.911671924290221, "percentage": 48.9, "elapsed_time": "6:02:04", "remaining_time": "6:18:25"}
|
||||
{"current_steps": 1245, "total_steps": 2536, "loss": 0.2501, "lr": 5.18984451825721e-05, "epoch": 3.9274447949526814, "percentage": 49.09, "elapsed_time": "6:03:04", "remaining_time": "6:16:29"}
|
||||
{"current_steps": 1250, "total_steps": 2536, "loss": 0.2365, "lr": 5.1587339518936585e-05, "epoch": 3.943217665615142, "percentage": 49.29, "elapsed_time": "6:04:10", "remaining_time": "6:14:39"}
|
||||
{"current_steps": 1255, "total_steps": 2536, "loss": 0.2332, "lr": 5.127617232717631e-05, "epoch": 3.958990536277603, "percentage": 49.49, "elapsed_time": "6:05:20", "remaining_time": "6:12:54"}
|
||||
{"current_steps": 1260, "total_steps": 2536, "loss": 0.2336, "lr": 5.096495566868935e-05, "epoch": 3.9747634069400632, "percentage": 49.68, "elapsed_time": "6:06:27", "remaining_time": "6:11:07"}
|
||||
{"current_steps": 1265, "total_steps": 2536, "loss": 0.2438, "lr": 5.065370160679115e-05, "epoch": 3.9905362776025237, "percentage": 49.88, "elapsed_time": "6:07:36", "remaining_time": "6:09:21"}
|
||||
{"current_steps": 1270, "total_steps": 2536, "loss": 0.1921, "lr": 5.034242220624706e-05, "epoch": 4.006309148264984, "percentage": 50.08, "elapsed_time": "6:08:40", "remaining_time": "6:07:31"}
|
||||
{"current_steps": 1275, "total_steps": 2536, "loss": 0.1417, "lr": 5.003112953280452e-05, "epoch": 4.022082018927445, "percentage": 50.28, "elapsed_time": "6:09:50", "remaining_time": "6:05:46"}
|
||||
{"current_steps": 1280, "total_steps": 2536, "loss": 0.1399, "lr": 4.971983565272553e-05, "epoch": 4.037854889589905, "percentage": 50.47, "elapsed_time": "6:11:01", "remaining_time": "6:04:04"}
|
||||
{"current_steps": 1285, "total_steps": 2536, "loss": 0.1313, "lr": 4.940855263231873e-05, "epoch": 4.053627760252366, "percentage": 50.67, "elapsed_time": "6:12:07", "remaining_time": "6:02:16"}
|
||||
{"current_steps": 1290, "total_steps": 2536, "loss": 0.141, "lr": 4.909729253747197e-05, "epoch": 4.069400630914826, "percentage": 50.87, "elapsed_time": "6:13:14", "remaining_time": "6:00:31"}
|
||||
{"current_steps": 1295, "total_steps": 2536, "loss": 0.1351, "lr": 4.878606743318439e-05, "epoch": 4.085173501577287, "percentage": 51.06, "elapsed_time": "6:14:20", "remaining_time": "5:58:43"}
|
||||
{"current_steps": 1300, "total_steps": 2536, "loss": 0.1441, "lr": 4.8474889383098855e-05, "epoch": 4.100946372239748, "percentage": 51.26, "elapsed_time": "6:15:25", "remaining_time": "5:56:57"}
|
||||
{"current_steps": 1305, "total_steps": 2536, "loss": 0.1386, "lr": 4.816377044903428e-05, "epoch": 4.116719242902208, "percentage": 51.46, "elapsed_time": "6:30:14", "remaining_time": "6:08:06"}
|
||||
{"current_steps": 1310, "total_steps": 2536, "loss": 0.1345, "lr": 4.7852722690518196e-05, "epoch": 4.132492113564669, "percentage": 51.66, "elapsed_time": "6:31:21", "remaining_time": "6:06:15"}
|
||||
{"current_steps": 1315, "total_steps": 2536, "loss": 0.1315, "lr": 4.75417581643192e-05, "epoch": 4.148264984227129, "percentage": 51.85, "elapsed_time": "6:32:32", "remaining_time": "6:04:28"}
|
||||
{"current_steps": 1320, "total_steps": 2536, "loss": 0.15, "lr": 4.723088892397968e-05, "epoch": 4.16403785488959, "percentage": 52.05, "elapsed_time": "6:33:36", "remaining_time": "6:02:35"}
|
||||
{"current_steps": 1325, "total_steps": 2536, "loss": 0.1272, "lr": 4.6920127019348556e-05, "epoch": 4.17981072555205, "percentage": 52.25, "elapsed_time": "6:34:45", "remaining_time": "6:00:47"}
|
||||
{"current_steps": 1330, "total_steps": 2536, "loss": 0.1358, "lr": 4.6609484496114256e-05, "epoch": 4.195583596214511, "percentage": 52.44, "elapsed_time": "6:35:53", "remaining_time": "5:58:59"}
|
||||
{"current_steps": 1335, "total_steps": 2536, "loss": 0.1522, "lr": 4.629897339533771e-05, "epoch": 4.211356466876971, "percentage": 52.64, "elapsed_time": "6:37:06", "remaining_time": "5:57:14"}
|
||||
{"current_steps": 1340, "total_steps": 2536, "loss": 0.1386, "lr": 4.598860575298575e-05, "epoch": 4.2271293375394325, "percentage": 52.84, "elapsed_time": "6:38:19", "remaining_time": "5:55:31"}
|
||||
{"current_steps": 1345, "total_steps": 2536, "loss": 0.1443, "lr": 4.5678393599464435e-05, "epoch": 4.242902208201893, "percentage": 53.04, "elapsed_time": "6:39:28", "remaining_time": "5:53:44"}
|
||||
{"current_steps": 1350, "total_steps": 2536, "loss": 0.1485, "lr": 4.5368348959152864e-05, "epoch": 4.2586750788643535, "percentage": 53.23, "elapsed_time": "6:40:39", "remaining_time": "5:51:58"}
|
||||
{"current_steps": 1355, "total_steps": 2536, "loss": 0.1396, "lr": 4.505848384993696e-05, "epoch": 4.274447949526814, "percentage": 53.43, "elapsed_time": "6:41:50", "remaining_time": "5:50:14"}
|
||||
{"current_steps": 1360, "total_steps": 2536, "loss": 0.1456, "lr": 4.474881028274375e-05, "epoch": 4.290220820189274, "percentage": 53.63, "elapsed_time": "6:42:50", "remaining_time": "5:48:20"}
|
||||
{"current_steps": 1365, "total_steps": 2536, "loss": 0.1363, "lr": 4.4439340261075716e-05, "epoch": 4.305993690851735, "percentage": 53.82, "elapsed_time": "6:43:59", "remaining_time": "5:46:34"}
|
||||
{"current_steps": 1370, "total_steps": 2536, "loss": 0.1435, "lr": 4.413008578054558e-05, "epoch": 4.321766561514195, "percentage": 54.02, "elapsed_time": "6:45:08", "remaining_time": "5:44:48"}
|
||||
{"current_steps": 1375, "total_steps": 2536, "loss": 0.1398, "lr": 4.3821058828411244e-05, "epoch": 4.337539432176656, "percentage": 54.22, "elapsed_time": "6:46:14", "remaining_time": "5:43:01"}
|
||||
{"current_steps": 1380, "total_steps": 2536, "loss": 0.1375, "lr": 4.35122713831113e-05, "epoch": 4.353312302839116, "percentage": 54.42, "elapsed_time": "6:47:20", "remaining_time": "5:41:13"}
|
||||
{"current_steps": 1385, "total_steps": 2536, "loss": 0.1387, "lr": 4.320373541380054e-05, "epoch": 4.369085173501578, "percentage": 54.61, "elapsed_time": "6:48:26", "remaining_time": "5:39:26"}
|
||||
{"current_steps": 1390, "total_steps": 2536, "loss": 0.1442, "lr": 4.289546287988614e-05, "epoch": 4.384858044164038, "percentage": 54.81, "elapsed_time": "6:49:31", "remaining_time": "5:37:38"}
|
||||
{"current_steps": 1395, "total_steps": 2536, "loss": 0.1442, "lr": 4.258746573056401e-05, "epoch": 4.400630914826499, "percentage": 55.01, "elapsed_time": "6:50:40", "remaining_time": "5:35:54"}
|
||||
{"current_steps": 1400, "total_steps": 2536, "loss": 0.1509, "lr": 4.2279755904355704e-05, "epoch": 4.416403785488959, "percentage": 55.21, "elapsed_time": "6:51:44", "remaining_time": "5:34:06"}
|
||||
{"current_steps": 1405, "total_steps": 2536, "loss": 0.1363, "lr": 4.197234532864558e-05, "epoch": 4.4321766561514195, "percentage": 55.4, "elapsed_time": "7:03:26", "remaining_time": "5:40:52"}
|
||||
{"current_steps": 1410, "total_steps": 2536, "loss": 0.138, "lr": 4.1665245919218544e-05, "epoch": 4.44794952681388, "percentage": 55.6, "elapsed_time": "7:04:33", "remaining_time": "5:39:02"}
|
||||
{"current_steps": 1415, "total_steps": 2536, "loss": 0.1423, "lr": 4.135846957979811e-05, "epoch": 4.4637223974763405, "percentage": 55.8, "elapsed_time": "7:05:36", "remaining_time": "5:37:10"}
|
||||
{"current_steps": 1420, "total_steps": 2536, "loss": 0.1401, "lr": 4.105202820158503e-05, "epoch": 4.479495268138801, "percentage": 55.99, "elapsed_time": "7:06:44", "remaining_time": "5:35:22"}
|
||||
{"current_steps": 1425, "total_steps": 2536, "loss": 0.1341, "lr": 4.074593366279636e-05, "epoch": 4.495268138801261, "percentage": 56.19, "elapsed_time": "7:07:48", "remaining_time": "5:33:32"}
|
||||
{"current_steps": 1430, "total_steps": 2536, "loss": 0.1333, "lr": 4.044019782820505e-05, "epoch": 4.511041009463723, "percentage": 56.39, "elapsed_time": "7:08:58", "remaining_time": "5:31:46"}
|
||||
{"current_steps": 1435, "total_steps": 2536, "loss": 0.1374, "lr": 4.0134832548680006e-05, "epoch": 4.526813880126183, "percentage": 56.59, "elapsed_time": "7:10:05", "remaining_time": "5:29:59"}
|
||||
{"current_steps": 1440, "total_steps": 2536, "loss": 0.143, "lr": 3.982984966072677e-05, "epoch": 4.542586750788644, "percentage": 56.78, "elapsed_time": "7:11:18", "remaining_time": "5:28:16"}
|
||||
{"current_steps": 1445, "total_steps": 2536, "loss": 0.1371, "lr": 3.952526098602873e-05, "epoch": 4.558359621451104, "percentage": 56.98, "elapsed_time": "7:12:27", "remaining_time": "5:26:31"}
|
||||
{"current_steps": 1450, "total_steps": 2536, "loss": 0.1469, "lr": 3.9221078330988806e-05, "epoch": 4.574132492113565, "percentage": 57.18, "elapsed_time": "7:13:30", "remaining_time": "5:24:40"}
|
||||
{"current_steps": 1455, "total_steps": 2536, "loss": 0.1431, "lr": 3.89173134862719e-05, "epoch": 4.589905362776025, "percentage": 57.37, "elapsed_time": "7:14:38", "remaining_time": "5:22:55"}
|
||||
{"current_steps": 1460, "total_steps": 2536, "loss": 0.1411, "lr": 3.861397822634784e-05, "epoch": 4.605678233438486, "percentage": 57.57, "elapsed_time": "7:15:46", "remaining_time": "5:21:09"}
|
||||
{"current_steps": 1465, "total_steps": 2536, "loss": 0.1399, "lr": 3.831108430903494e-05, "epoch": 4.621451104100946, "percentage": 57.77, "elapsed_time": "7:16:52", "remaining_time": "5:19:22"}
|
||||
{"current_steps": 1470, "total_steps": 2536, "loss": 0.134, "lr": 3.800864347504437e-05, "epoch": 4.6372239747634065, "percentage": 57.97, "elapsed_time": "7:18:04", "remaining_time": "5:17:40"}
|
||||
{"current_steps": 1475, "total_steps": 2536, "loss": 0.1411, "lr": 3.7706667447524876e-05, "epoch": 4.652996845425868, "percentage": 58.16, "elapsed_time": "7:19:09", "remaining_time": "5:15:54"}
|
||||
{"current_steps": 1480, "total_steps": 2536, "loss": 0.1468, "lr": 3.740516793160855e-05, "epoch": 4.668769716088328, "percentage": 58.36, "elapsed_time": "7:20:12", "remaining_time": "5:14:05"}
|
||||
{"current_steps": 1485, "total_steps": 2536, "loss": 0.1372, "lr": 3.710415661395699e-05, "epoch": 4.684542586750789, "percentage": 58.56, "elapsed_time": "7:21:17", "remaining_time": "5:12:19"}
|
||||
{"current_steps": 1490, "total_steps": 2536, "loss": 0.1461, "lr": 3.6803645162308376e-05, "epoch": 4.700315457413249, "percentage": 58.75, "elapsed_time": "7:22:25", "remaining_time": "5:10:34"}
|
||||
{"current_steps": 1495, "total_steps": 2536, "loss": 0.151, "lr": 3.6503645225025175e-05, "epoch": 4.71608832807571, "percentage": 58.95, "elapsed_time": "7:23:33", "remaining_time": "5:08:51"}
|
||||
{"current_steps": 1500, "total_steps": 2536, "loss": 0.1392, "lr": 3.620416843064266e-05, "epoch": 4.73186119873817, "percentage": 59.15, "elapsed_time": "7:24:41", "remaining_time": "5:07:07"}
|
||||
{"current_steps": 1505, "total_steps": 2536, "loss": 0.1513, "lr": 3.5905226387418126e-05, "epoch": 4.747634069400631, "percentage": 59.35, "elapsed_time": "7:37:23", "remaining_time": "5:13:19"}
|
||||
{"current_steps": 1510, "total_steps": 2536, "loss": 0.1457, "lr": 3.5606830682880965e-05, "epoch": 4.763406940063091, "percentage": 59.54, "elapsed_time": "7:38:35", "remaining_time": "5:11:35"}
|
||||
{"current_steps": 1515, "total_steps": 2536, "loss": 0.1396, "lr": 3.530899288338352e-05, "epoch": 4.779179810725552, "percentage": 59.74, "elapsed_time": "7:39:44", "remaining_time": "5:09:50"}
|
||||
{"current_steps": 1520, "total_steps": 2536, "loss": 0.1465, "lr": 3.501172453365268e-05, "epoch": 4.794952681388013, "percentage": 59.94, "elapsed_time": "7:40:51", "remaining_time": "5:08:03"}
|
||||
{"current_steps": 1525, "total_steps": 2536, "loss": 0.1377, "lr": 3.471503715634252e-05, "epoch": 4.8107255520504735, "percentage": 60.13, "elapsed_time": "7:41:58", "remaining_time": "5:06:15"}
|
||||
{"current_steps": 1530, "total_steps": 2536, "loss": 0.1487, "lr": 3.44189422515875e-05, "epoch": 4.826498422712934, "percentage": 60.33, "elapsed_time": "7:43:08", "remaining_time": "5:04:31"}
|
||||
{"current_steps": 1535, "total_steps": 2536, "loss": 0.1459, "lr": 3.4123451296556845e-05, "epoch": 4.842271293375394, "percentage": 60.53, "elapsed_time": "7:44:12", "remaining_time": "5:02:43"}
|
||||
{"current_steps": 1540, "total_steps": 2536, "loss": 0.1479, "lr": 3.382857574500957e-05, "epoch": 4.858044164037855, "percentage": 60.73, "elapsed_time": "7:45:23", "remaining_time": "5:00:59"}
|
||||
{"current_steps": 1545, "total_steps": 2536, "loss": 0.1522, "lr": 3.3534327026850574e-05, "epoch": 4.873817034700315, "percentage": 60.92, "elapsed_time": "7:46:33", "remaining_time": "4:59:15"}
|
||||
{"current_steps": 1550, "total_steps": 2536, "loss": 0.1431, "lr": 3.324071654768754e-05, "epoch": 4.889589905362776, "percentage": 61.12, "elapsed_time": "7:47:43", "remaining_time": "4:57:31"}
|
||||
{"current_steps": 1555, "total_steps": 2536, "loss": 0.1401, "lr": 3.2947755688388874e-05, "epoch": 4.905362776025236, "percentage": 61.32, "elapsed_time": "7:48:52", "remaining_time": "4:55:47"}
|
||||
{"current_steps": 1560, "total_steps": 2536, "loss": 0.1463, "lr": 3.26554558046426e-05, "epoch": 4.921135646687697, "percentage": 61.51, "elapsed_time": "7:49:59", "remaining_time": "4:54:02"}
|
||||
{"current_steps": 1565, "total_steps": 2536, "loss": 0.1308, "lr": 3.236382822651606e-05, "epoch": 4.936908517350158, "percentage": 61.71, "elapsed_time": "7:51:05", "remaining_time": "4:52:17"}
|
||||
{"current_steps": 1570, "total_steps": 2536, "loss": 0.1426, "lr": 3.207288425801689e-05, "epoch": 4.952681388012619, "percentage": 61.91, "elapsed_time": "7:52:20", "remaining_time": "4:50:37"}
|
||||
{"current_steps": 1575, "total_steps": 2536, "loss": 0.1435, "lr": 3.1782635176654764e-05, "epoch": 4.968454258675079, "percentage": 62.11, "elapsed_time": "7:53:28", "remaining_time": "4:48:53"}
|
||||
{"current_steps": 1580, "total_steps": 2536, "loss": 0.1455, "lr": 3.149309223300428e-05, "epoch": 4.9842271293375395, "percentage": 62.3, "elapsed_time": "7:54:33", "remaining_time": "4:47:08"}
|
||||
{"current_steps": 1585, "total_steps": 2536, "loss": 0.1392, "lr": 3.120426665026891e-05, "epoch": 5.0, "percentage": 62.5, "elapsed_time": "7:55:44", "remaining_time": "4:45:26"}
|
||||
{"current_steps": 1590, "total_steps": 2536, "loss": 0.0784, "lr": 3.091616962384587e-05, "epoch": 5.0157728706624605, "percentage": 62.7, "elapsed_time": "7:56:54", "remaining_time": "4:43:44"}
|
||||
{"current_steps": 1595, "total_steps": 2536, "loss": 0.079, "lr": 3.06288123208923e-05, "epoch": 5.031545741324921, "percentage": 62.89, "elapsed_time": "7:58:07", "remaining_time": "4:42:04"}
|
||||
{"current_steps": 1600, "total_steps": 2536, "loss": 0.0682, "lr": 3.034220587989226e-05, "epoch": 5.047318611987381, "percentage": 63.09, "elapsed_time": "7:59:19", "remaining_time": "4:40:24"}
|
||||
{"current_steps": 1605, "total_steps": 2536, "loss": 0.0742, "lr": 3.005636141022512e-05, "epoch": 5.063091482649842, "percentage": 63.29, "elapsed_time": "8:06:31", "remaining_time": "4:42:12"}
|
||||
{"current_steps": 1610, "total_steps": 2536, "loss": 0.0725, "lr": 2.977128999173482e-05, "epoch": 5.078864353312303, "percentage": 63.49, "elapsed_time": "8:07:37", "remaining_time": "4:40:27"}
|
||||
{"current_steps": 1615, "total_steps": 2536, "loss": 0.075, "lr": 2.948700267430049e-05, "epoch": 5.094637223974764, "percentage": 63.68, "elapsed_time": "8:08:38", "remaining_time": "4:38:39"}
|
||||
{"current_steps": 1620, "total_steps": 2536, "loss": 0.0771, "lr": 2.920351047740808e-05, "epoch": 5.110410094637224, "percentage": 63.88, "elapsed_time": "8:09:44", "remaining_time": "4:36:55"}
|
||||
{"current_steps": 1625, "total_steps": 2536, "loss": 0.0785, "lr": 2.892082438972325e-05, "epoch": 5.126182965299685, "percentage": 64.08, "elapsed_time": "8:10:55", "remaining_time": "4:35:13"}
|
||||
{"current_steps": 1630, "total_steps": 2536, "loss": 0.0776, "lr": 2.863895536866541e-05, "epoch": 5.141955835962145, "percentage": 64.27, "elapsed_time": "8:12:04", "remaining_time": "4:33:30"}
|
||||
{"current_steps": 1635, "total_steps": 2536, "loss": 0.0736, "lr": 2.835791433998301e-05, "epoch": 5.157728706624606, "percentage": 64.47, "elapsed_time": "8:13:13", "remaining_time": "4:31:47"}
|
||||
{"current_steps": 1640, "total_steps": 2536, "loss": 0.077, "lr": 2.807771219733004e-05, "epoch": 5.173501577287066, "percentage": 64.67, "elapsed_time": "8:14:16", "remaining_time": "4:30:02"}
|
||||
{"current_steps": 1645, "total_steps": 2536, "loss": 0.0807, "lr": 2.7798359801843766e-05, "epoch": 5.1892744479495265, "percentage": 64.87, "elapsed_time": "8:15:21", "remaining_time": "4:28:18"}
|
||||
{"current_steps": 1650, "total_steps": 2536, "loss": 0.0753, "lr": 2.7519867981723712e-05, "epoch": 5.205047318611987, "percentage": 65.06, "elapsed_time": "8:16:32", "remaining_time": "4:26:37"}
|
||||
{"current_steps": 1655, "total_steps": 2536, "loss": 0.0787, "lr": 2.724224753181197e-05, "epoch": 5.220820189274448, "percentage": 65.26, "elapsed_time": "8:17:43", "remaining_time": "4:24:57"}
|
||||
{"current_steps": 1660, "total_steps": 2536, "loss": 0.0817, "lr": 2.6965509213174777e-05, "epoch": 5.236593059936909, "percentage": 65.46, "elapsed_time": "8:18:53", "remaining_time": "4:23:16"}
|
||||
{"current_steps": 1665, "total_steps": 2536, "loss": 0.0762, "lr": 2.6689663752685334e-05, "epoch": 5.252365930599369, "percentage": 65.65, "elapsed_time": "8:20:01", "remaining_time": "4:21:34"}
|
||||
{"current_steps": 1670, "total_steps": 2536, "loss": 0.0721, "lr": 2.641472184260809e-05, "epoch": 5.26813880126183, "percentage": 65.85, "elapsed_time": "8:21:13", "remaining_time": "4:19:55"}
|
||||
{"current_steps": 1675, "total_steps": 2536, "loss": 0.0765, "lr": 2.614069414018428e-05, "epoch": 5.28391167192429, "percentage": 66.05, "elapsed_time": "8:22:22", "remaining_time": "4:18:13"}
|
||||
{"current_steps": 1680, "total_steps": 2536, "loss": 0.0729, "lr": 2.5867591267218805e-05, "epoch": 5.299684542586751, "percentage": 66.25, "elapsed_time": "8:23:28", "remaining_time": "4:16:31"}
|
||||
{"current_steps": 1685, "total_steps": 2536, "loss": 0.0753, "lr": 2.5595423809668452e-05, "epoch": 5.315457413249211, "percentage": 66.44, "elapsed_time": "8:24:38", "remaining_time": "4:14:52"}
|
||||
{"current_steps": 1690, "total_steps": 2536, "loss": 0.0776, "lr": 2.532420231723172e-05, "epoch": 5.331230283911672, "percentage": 66.64, "elapsed_time": "8:25:50", "remaining_time": "4:13:13"}
|
||||
{"current_steps": 1695, "total_steps": 2536, "loss": 0.0753, "lr": 2.5053937302939767e-05, "epoch": 5.347003154574132, "percentage": 66.84, "elapsed_time": "8:26:59", "remaining_time": "4:11:33"}
|
||||
{"current_steps": 1700, "total_steps": 2536, "loss": 0.074, "lr": 2.4784639242748953e-05, "epoch": 5.3627760252365935, "percentage": 67.03, "elapsed_time": "8:28:06", "remaining_time": "4:09:52"}
|
||||
{"current_steps": 1705, "total_steps": 2536, "loss": 0.0766, "lr": 2.451631857513472e-05, "epoch": 5.378548895899054, "percentage": 67.23, "elapsed_time": "8:41:24", "remaining_time": "4:14:07"}
|
||||
{"current_steps": 1710, "total_steps": 2536, "loss": 0.0751, "lr": 2.4248985700687084e-05, "epoch": 5.394321766561514, "percentage": 67.43, "elapsed_time": "8:42:28", "remaining_time": "4:12:22"}
|
||||
{"current_steps": 1715, "total_steps": 2536, "loss": 0.0762, "lr": 2.39826509817074e-05, "epoch": 5.410094637223975, "percentage": 67.63, "elapsed_time": "8:43:35", "remaining_time": "4:10:38"}
|
||||
{"current_steps": 1720, "total_steps": 2536, "loss": 0.0802, "lr": 2.3717324741806718e-05, "epoch": 5.425867507886435, "percentage": 67.82, "elapsed_time": "8:44:44", "remaining_time": "4:08:56"}
|
||||
{"current_steps": 1725, "total_steps": 2536, "loss": 0.0775, "lr": 2.3453017265505673e-05, "epoch": 5.441640378548896, "percentage": 68.02, "elapsed_time": "8:45:56", "remaining_time": "4:07:15"}
|
||||
{"current_steps": 1730, "total_steps": 2536, "loss": 0.0759, "lr": 2.3189738797835708e-05, "epoch": 5.457413249211356, "percentage": 68.22, "elapsed_time": "8:46:57", "remaining_time": "4:05:30"}
|
||||
{"current_steps": 1735, "total_steps": 2536, "loss": 0.0756, "lr": 2.292749954394216e-05, "epoch": 5.473186119873817, "percentage": 68.41, "elapsed_time": "8:48:03", "remaining_time": "4:03:47"}
|
||||
{"current_steps": 1740, "total_steps": 2536, "loss": 0.0752, "lr": 2.266630966868852e-05, "epoch": 5.488958990536277, "percentage": 68.61, "elapsed_time": "8:49:11", "remaining_time": "4:02:05"}
|
||||
{"current_steps": 1745, "total_steps": 2536, "loss": 0.0774, "lr": 2.2406179296262453e-05, "epoch": 5.504731861198739, "percentage": 68.81, "elapsed_time": "8:50:19", "remaining_time": "4:00:23"}
|
||||
{"current_steps": 1750, "total_steps": 2536, "loss": 0.0722, "lr": 2.2147118509783445e-05, "epoch": 5.520504731861199, "percentage": 69.01, "elapsed_time": "8:51:28", "remaining_time": "3:58:42"}
|
||||
{"current_steps": 1755, "total_steps": 2536, "loss": 0.0747, "lr": 2.1889137350911894e-05, "epoch": 5.5362776025236595, "percentage": 69.2, "elapsed_time": "8:52:32", "remaining_time": "3:56:59"}
|
||||
{"current_steps": 1760, "total_steps": 2536, "loss": 0.0747, "lr": 2.1632245819459913e-05, "epoch": 5.55205047318612, "percentage": 69.4, "elapsed_time": "8:53:37", "remaining_time": "3:55:16"}
|
||||
{"current_steps": 1765, "total_steps": 2536, "loss": 0.0788, "lr": 2.1376453873003664e-05, "epoch": 5.5678233438485805, "percentage": 69.6, "elapsed_time": "8:54:46", "remaining_time": "3:53:36"}
|
||||
{"current_steps": 1770, "total_steps": 2536, "loss": 0.0817, "lr": 2.112177142649746e-05, "epoch": 5.583596214511041, "percentage": 69.79, "elapsed_time": "8:55:56", "remaining_time": "3:51:56"}
|
||||
{"current_steps": 1775, "total_steps": 2536, "loss": 0.0799, "lr": 2.0868208351889402e-05, "epoch": 5.599369085173501, "percentage": 69.99, "elapsed_time": "8:57:07", "remaining_time": "3:50:16"}
|
||||
{"current_steps": 1780, "total_steps": 2536, "loss": 0.0725, "lr": 2.0615774477738738e-05, "epoch": 5.615141955835962, "percentage": 70.19, "elapsed_time": "8:58:10", "remaining_time": "3:48:34"}
|
||||
{"current_steps": 1785, "total_steps": 2536, "loss": 0.0791, "lr": 2.0364479588834835e-05, "epoch": 5.630914826498422, "percentage": 70.39, "elapsed_time": "8:59:19", "remaining_time": "3:46:54"}
|
||||
{"current_steps": 1790, "total_steps": 2536, "loss": 0.0793, "lr": 2.0114333425817993e-05, "epoch": 5.646687697160884, "percentage": 70.58, "elapsed_time": "9:00:25", "remaining_time": "3:45:13"}
|
||||
{"current_steps": 1795, "total_steps": 2536, "loss": 0.0734, "lr": 1.9865345684801846e-05, "epoch": 5.662460567823344, "percentage": 70.78, "elapsed_time": "9:01:32", "remaining_time": "3:43:33"}
|
||||
{"current_steps": 1800, "total_steps": 2536, "loss": 0.0779, "lr": 1.9617526016997486e-05, "epoch": 5.678233438485805, "percentage": 70.98, "elapsed_time": "9:02:44", "remaining_time": "3:41:55"}
|
||||
{"current_steps": 1805, "total_steps": 2536, "loss": 0.0785, "lr": 1.937088402833943e-05, "epoch": 5.694006309148265, "percentage": 71.18, "elapsed_time": "9:13:32", "remaining_time": "3:44:10"}
|
||||
{"current_steps": 1810, "total_steps": 2536, "loss": 0.0784, "lr": 1.9125429279113173e-05, "epoch": 5.709779179810726, "percentage": 71.37, "elapsed_time": "9:14:43", "remaining_time": "3:42:30"}
|
||||
{"current_steps": 1815, "total_steps": 2536, "loss": 0.0803, "lr": 1.8881171283584752e-05, "epoch": 5.725552050473186, "percentage": 71.57, "elapsed_time": "9:15:51", "remaining_time": "3:40:48"}
|
||||
{"current_steps": 1820, "total_steps": 2536, "loss": 0.0797, "lr": 1.8638119509631853e-05, "epoch": 5.7413249211356465, "percentage": 71.77, "elapsed_time": "9:17:00", "remaining_time": "3:39:07"}
|
||||
{"current_steps": 1825, "total_steps": 2536, "loss": 0.0799, "lr": 1.839628337837686e-05, "epoch": 5.757097791798107, "percentage": 71.96, "elapsed_time": "9:18:12", "remaining_time": "3:37:28"}
|
||||
{"current_steps": 1830, "total_steps": 2536, "loss": 0.078, "lr": 1.8155672263821666e-05, "epoch": 5.7728706624605675, "percentage": 72.16, "elapsed_time": "9:19:21", "remaining_time": "3:35:47"}
|
||||
{"current_steps": 1835, "total_steps": 2536, "loss": 0.0786, "lr": 1.7916295492484315e-05, "epoch": 5.788643533123029, "percentage": 72.36, "elapsed_time": "9:20:26", "remaining_time": "3:34:05"}
|
||||
{"current_steps": 1840, "total_steps": 2536, "loss": 0.0739, "lr": 1.7678162343037524e-05, "epoch": 5.804416403785489, "percentage": 72.56, "elapsed_time": "9:21:36", "remaining_time": "3:32:25"}
|
||||
{"current_steps": 1845, "total_steps": 2536, "loss": 0.0744, "lr": 1.744128204594893e-05, "epoch": 5.82018927444795, "percentage": 72.75, "elapsed_time": "9:22:45", "remaining_time": "3:30:45"}
|
||||
{"current_steps": 1850, "total_steps": 2536, "loss": 0.078, "lr": 1.7205663783123436e-05, "epoch": 5.83596214511041, "percentage": 72.95, "elapsed_time": "9:23:52", "remaining_time": "3:29:05"}
|
||||
{"current_steps": 1855, "total_steps": 2536, "loss": 0.0772, "lr": 1.6971316687547213e-05, "epoch": 5.851735015772871, "percentage": 73.15, "elapsed_time": "9:25:00", "remaining_time": "3:27:25"}
|
||||
{"current_steps": 1860, "total_steps": 2536, "loss": 0.074, "lr": 1.6738249842933697e-05, "epoch": 5.867507886435331, "percentage": 73.34, "elapsed_time": "9:26:08", "remaining_time": "3:25:45"}
|
||||
{"current_steps": 1865, "total_steps": 2536, "loss": 0.0787, "lr": 1.6506472283371527e-05, "epoch": 5.883280757097792, "percentage": 73.54, "elapsed_time": "9:27:15", "remaining_time": "3:24:05"}
|
||||
{"current_steps": 1870, "total_steps": 2536, "loss": 0.0752, "lr": 1.6275992992974308e-05, "epoch": 5.899053627760252, "percentage": 73.74, "elapsed_time": "9:28:24", "remaining_time": "3:22:26"}
|
||||
{"current_steps": 1875, "total_steps": 2536, "loss": 0.0749, "lr": 1.604682090553243e-05, "epoch": 5.914826498422713, "percentage": 73.94, "elapsed_time": "9:29:35", "remaining_time": "3:20:47"}
|
||||
{"current_steps": 1880, "total_steps": 2536, "loss": 0.0753, "lr": 1.5818964904166756e-05, "epoch": 5.930599369085174, "percentage": 74.13, "elapsed_time": "9:30:43", "remaining_time": "3:19:08"}
|
||||
{"current_steps": 1885, "total_steps": 2536, "loss": 0.0734, "lr": 1.55924338209843e-05, "epoch": 5.946372239747634, "percentage": 74.33, "elapsed_time": "9:31:51", "remaining_time": "3:17:29"}
|
||||
{"current_steps": 1890, "total_steps": 2536, "loss": 0.0759, "lr": 1.536723643673582e-05, "epoch": 5.962145110410095, "percentage": 74.53, "elapsed_time": "9:33:01", "remaining_time": "3:15:51"}
|
||||
{"current_steps": 1895, "total_steps": 2536, "loss": 0.0752, "lr": 1.5143381480475583e-05, "epoch": 5.977917981072555, "percentage": 74.72, "elapsed_time": "9:34:09", "remaining_time": "3:14:12"}
|
||||
{"current_steps": 1900, "total_steps": 2536, "loss": 0.0704, "lr": 1.49208776292229e-05, "epoch": 5.993690851735016, "percentage": 74.92, "elapsed_time": "9:35:09", "remaining_time": "3:12:31"}
|
||||
{"current_steps": 1905, "total_steps": 2536, "loss": 0.0483, "lr": 1.4699733507625862e-05, "epoch": 6.009463722397476, "percentage": 75.12, "elapsed_time": "9:41:59", "remaining_time": "3:12:46"}
|
||||
{"current_steps": 1910, "total_steps": 2536, "loss": 0.0406, "lr": 1.4479957687626933e-05, "epoch": 6.025236593059937, "percentage": 75.32, "elapsed_time": "9:43:05", "remaining_time": "3:11:06"}
|
||||
{"current_steps": 1915, "total_steps": 2536, "loss": 0.0401, "lr": 1.4261558688130838e-05, "epoch": 6.041009463722397, "percentage": 75.51, "elapsed_time": "9:44:14", "remaining_time": "3:09:27"}
|
||||
{"current_steps": 1920, "total_steps": 2536, "loss": 0.0369, "lr": 1.4044544974674246e-05, "epoch": 6.056782334384858, "percentage": 75.71, "elapsed_time": "9:45:20", "remaining_time": "3:07:47"}
|
||||
{"current_steps": 1925, "total_steps": 2536, "loss": 0.0374, "lr": 1.3828924959097612e-05, "epoch": 6.072555205047319, "percentage": 75.91, "elapsed_time": "9:46:33", "remaining_time": "3:06:10"}
|
||||
{"current_steps": 1930, "total_steps": 2536, "loss": 0.0351, "lr": 1.3614706999219213e-05, "epoch": 6.0883280757097795, "percentage": 76.1, "elapsed_time": "9:47:43", "remaining_time": "3:04:32"}
|
||||
{"current_steps": 1935, "total_steps": 2536, "loss": 0.0385, "lr": 1.340189939851112e-05, "epoch": 6.10410094637224, "percentage": 76.3, "elapsed_time": "9:48:50", "remaining_time": "3:02:53"}
|
||||
{"current_steps": 1940, "total_steps": 2536, "loss": 0.0369, "lr": 1.3190510405777345e-05, "epoch": 6.1198738170347005, "percentage": 76.5, "elapsed_time": "9:50:04", "remaining_time": "3:01:16"}
|
||||
{"current_steps": 1945, "total_steps": 2536, "loss": 0.037, "lr": 1.2980548214834142e-05, "epoch": 6.135646687697161, "percentage": 76.7, "elapsed_time": "9:51:11", "remaining_time": "2:59:38"}
|
||||
{"current_steps": 1950, "total_steps": 2536, "loss": 0.0363, "lr": 1.2772020964192316e-05, "epoch": 6.151419558359621, "percentage": 76.89, "elapsed_time": "9:52:14", "remaining_time": "2:57:58"}
|
||||
{"current_steps": 1955, "total_steps": 2536, "loss": 0.0392, "lr": 1.2564936736741867e-05, "epoch": 6.167192429022082, "percentage": 77.09, "elapsed_time": "9:53:19", "remaining_time": "2:56:19"}
|
||||
{"current_steps": 1960, "total_steps": 2536, "loss": 0.0384, "lr": 1.23593035594386e-05, "epoch": 6.182965299684542, "percentage": 77.29, "elapsed_time": "9:54:22", "remaining_time": "2:54:40"}
|
||||
{"current_steps": 1965, "total_steps": 2536, "loss": 0.0382, "lr": 1.215512940299305e-05, "epoch": 6.198738170347003, "percentage": 77.48, "elapsed_time": "9:55:20", "remaining_time": "2:52:59"}
|
||||
{"current_steps": 1970, "total_steps": 2536, "loss": 0.0376, "lr": 1.1952422181561424e-05, "epoch": 6.214511041009464, "percentage": 77.68, "elapsed_time": "9:56:33", "remaining_time": "2:51:23"}
|
||||
{"current_steps": 1975, "total_steps": 2536, "loss": 0.0374, "lr": 1.1751189752438957e-05, "epoch": 6.230283911671925, "percentage": 77.88, "elapsed_time": "9:57:43", "remaining_time": "2:49:47"}
|
||||
{"current_steps": 1980, "total_steps": 2536, "loss": 0.0378, "lr": 1.1551439915755274e-05, "epoch": 6.246056782334385, "percentage": 78.08, "elapsed_time": "9:58:52", "remaining_time": "2:48:10"}
|
||||
{"current_steps": 1985, "total_steps": 2536, "loss": 0.0374, "lr": 1.135318041417207e-05, "epoch": 6.261829652996846, "percentage": 78.27, "elapsed_time": "10:00:00", "remaining_time": "2:46:33"}
|
||||
{"current_steps": 1990, "total_steps": 2536, "loss": 0.0365, "lr": 1.1156418932582941e-05, "epoch": 6.277602523659306, "percentage": 78.47, "elapsed_time": "10:01:05", "remaining_time": "2:44:55"}
|
||||
{"current_steps": 1995, "total_steps": 2536, "loss": 0.0411, "lr": 1.096116309781558e-05, "epoch": 6.2933753943217665, "percentage": 78.67, "elapsed_time": "10:02:16", "remaining_time": "2:43:19"}
|
||||
{"current_steps": 2000, "total_steps": 2536, "loss": 0.0378, "lr": 1.0767420478336093e-05, "epoch": 6.309148264984227, "percentage": 78.86, "elapsed_time": "10:03:28", "remaining_time": "2:41:43"}
|
||||
{"current_steps": 2005, "total_steps": 2536, "loss": 0.0384, "lr": 1.0575198583955698e-05, "epoch": 6.3249211356466875, "percentage": 79.06, "elapsed_time": "10:11:55", "remaining_time": "2:42:03"}
|
||||
{"current_steps": 2010, "total_steps": 2536, "loss": 0.0352, "lr": 1.0384504865539497e-05, "epoch": 6.340694006309148, "percentage": 79.26, "elapsed_time": "10:13:04", "remaining_time": "2:40:26"}
|
||||
{"current_steps": 2015, "total_steps": 2536, "loss": 0.0387, "lr": 1.0195346714717813e-05, "epoch": 6.356466876971609, "percentage": 79.46, "elapsed_time": "10:14:14", "remaining_time": "2:38:49"}
|
||||
{"current_steps": 2020, "total_steps": 2536, "loss": 0.0396, "lr": 1.0007731463599601e-05, "epoch": 6.37223974763407, "percentage": 79.65, "elapsed_time": "10:15:21", "remaining_time": "2:37:11"}
|
||||
{"current_steps": 2025, "total_steps": 2536, "loss": 0.0367, "lr": 9.82166638448827e-06, "epoch": 6.38801261829653, "percentage": 79.85, "elapsed_time": "10:16:27", "remaining_time": "2:35:33"}
|
||||
{"current_steps": 2030, "total_steps": 2536, "loss": 0.0365, "lr": 9.637158689599746e-06, "epoch": 6.403785488958991, "percentage": 80.05, "elapsed_time": "10:17:39", "remaining_time": "2:33:57"}
|
||||
{"current_steps": 2035, "total_steps": 2536, "loss": 0.0422, "lr": 9.454215530782994e-06, "epoch": 6.419558359621451, "percentage": 80.24, "elapsed_time": "10:18:47", "remaining_time": "2:32:20"}
|
||||
{"current_steps": 2040, "total_steps": 2536, "loss": 0.0409, "lr": 9.272843999242736e-06, "epoch": 6.435331230283912, "percentage": 80.44, "elapsed_time": "10:19:58", "remaining_time": "2:30:44"}
|
||||
{"current_steps": 2045, "total_steps": 2536, "loss": 0.0382, "lr": 9.093051125264623e-06, "epoch": 6.451104100946372, "percentage": 80.64, "elapsed_time": "10:21:01", "remaining_time": "2:29:06"}
|
||||
{"current_steps": 2050, "total_steps": 2536, "loss": 0.0399, "lr": 8.91484387794267e-06, "epoch": 6.466876971608833, "percentage": 80.84, "elapsed_time": "10:22:08", "remaining_time": "2:27:29"}
|
||||
{"current_steps": 2055, "total_steps": 2536, "loss": 0.0357, "lr": 8.73822916490919e-06, "epoch": 6.482649842271293, "percentage": 81.03, "elapsed_time": "10:23:18", "remaining_time": "2:25:53"}
|
||||
{"current_steps": 2060, "total_steps": 2536, "loss": 0.0371, "lr": 8.563213832067014e-06, "epoch": 6.498422712933754, "percentage": 81.23, "elapsed_time": "10:24:25", "remaining_time": "2:24:17"}
|
||||
{"current_steps": 2065, "total_steps": 2536, "loss": 0.0394, "lr": 8.389804663324142e-06, "epoch": 6.514195583596215, "percentage": 81.43, "elapsed_time": "10:25:32", "remaining_time": "2:22:40"}
|
||||
{"current_steps": 2070, "total_steps": 2536, "loss": 0.037, "lr": 8.218008380330723e-06, "epoch": 6.529968454258675, "percentage": 81.62, "elapsed_time": "10:26:34", "remaining_time": "2:21:03"}
|
||||
{"current_steps": 2075, "total_steps": 2536, "loss": 0.0393, "lr": 8.047831642218611e-06, "epoch": 6.545741324921136, "percentage": 81.82, "elapsed_time": "10:27:40", "remaining_time": "2:19:27"}
|
||||
{"current_steps": 2080, "total_steps": 2536, "loss": 0.037, "lr": 7.879281045343184e-06, "epoch": 6.561514195583596, "percentage": 82.02, "elapsed_time": "10:28:48", "remaining_time": "2:17:51"}
|
||||
{"current_steps": 2085, "total_steps": 2536, "loss": 0.0374, "lr": 7.712363123027678e-06, "epoch": 6.577287066246057, "percentage": 82.22, "elapsed_time": "10:29:53", "remaining_time": "2:16:14"}
|
||||
{"current_steps": 2090, "total_steps": 2536, "loss": 0.036, "lr": 7.547084345309924e-06, "epoch": 6.593059936908517, "percentage": 82.41, "elapsed_time": "10:31:00", "remaining_time": "2:14:39"}
|
||||
{"current_steps": 2095, "total_steps": 2536, "loss": 0.0382, "lr": 7.383451118691576e-06, "epoch": 6.608832807570978, "percentage": 82.61, "elapsed_time": "10:32:08", "remaining_time": "2:13:04"}
|
||||
{"current_steps": 2100, "total_steps": 2536, "loss": 0.036, "lr": 7.221469785889784e-06, "epoch": 6.624605678233438, "percentage": 82.81, "elapsed_time": "10:33:17", "remaining_time": "2:11:28"}
|
||||
{"current_steps": 2105, "total_steps": 2536, "loss": 0.0369, "lr": 7.061146625591331e-06, "epoch": 6.6403785488958995, "percentage": 83.0, "elapsed_time": "10:38:55", "remaining_time": "2:10:49"}
|
||||
{"current_steps": 2110, "total_steps": 2536, "loss": 0.0376, "lr": 6.902487852209238e-06, "epoch": 6.65615141955836, "percentage": 83.2, "elapsed_time": "10:40:00", "remaining_time": "2:09:12"}
|
||||
{"current_steps": 2115, "total_steps": 2536, "loss": 0.0399, "lr": 6.7454996156419485e-06, "epoch": 6.6719242902208205, "percentage": 83.4, "elapsed_time": "10:41:12", "remaining_time": "2:07:38"}
|
||||
{"current_steps": 2120, "total_steps": 2536, "loss": 0.0377, "lr": 6.590188001034864e-06, "epoch": 6.687697160883281, "percentage": 83.6, "elapsed_time": "10:42:23", "remaining_time": "2:06:03"}
|
||||
{"current_steps": 2125, "total_steps": 2536, "loss": 0.0353, "lr": 6.436559028544559e-06, "epoch": 6.703470031545741, "percentage": 83.79, "elapsed_time": "10:43:32", "remaining_time": "2:04:28"}
|
||||
{"current_steps": 2130, "total_steps": 2536, "loss": 0.0375, "lr": 6.284618653105328e-06, "epoch": 6.719242902208202, "percentage": 83.99, "elapsed_time": "10:44:40", "remaining_time": "2:02:52"}
|
||||
{"current_steps": 2135, "total_steps": 2536, "loss": 0.0353, "lr": 6.134372764198465e-06, "epoch": 6.735015772870662, "percentage": 84.19, "elapsed_time": "10:45:47", "remaining_time": "2:01:17"}
|
||||
{"current_steps": 2140, "total_steps": 2536, "loss": 0.038, "lr": 5.985827185623899e-06, "epoch": 6.750788643533123, "percentage": 84.38, "elapsed_time": "10:46:51", "remaining_time": "1:59:41"}
|
||||
{"current_steps": 2145, "total_steps": 2536, "loss": 0.0382, "lr": 5.8389876752745045e-06, "epoch": 6.766561514195583, "percentage": 84.58, "elapsed_time": "10:48:01", "remaining_time": "1:58:07"}
|
||||
{"current_steps": 2150, "total_steps": 2536, "loss": 0.0355, "lr": 5.693859924912892e-06, "epoch": 6.782334384858045, "percentage": 84.78, "elapsed_time": "10:49:12", "remaining_time": "1:56:33"}
|
||||
{"current_steps": 2155, "total_steps": 2536, "loss": 0.039, "lr": 5.550449559950755e-06, "epoch": 6.798107255520505, "percentage": 84.98, "elapsed_time": "10:51:22", "remaining_time": "1:55:09"}
|
||||
{"current_steps": 2160, "total_steps": 2536, "loss": 0.0391, "lr": 5.408762139230888e-06, "epoch": 6.813880126182966, "percentage": 85.17, "elapsed_time": "10:52:33", "remaining_time": "1:53:35"}
|
||||
{"current_steps": 2165, "total_steps": 2536, "loss": 0.0365, "lr": 5.268803154811669e-06, "epoch": 6.829652996845426, "percentage": 85.37, "elapsed_time": "10:53:43", "remaining_time": "1:52:01"}
|
||||
{"current_steps": 2170, "total_steps": 2536, "loss": 0.0381, "lr": 5.1305780317541855e-06, "epoch": 6.8454258675078865, "percentage": 85.57, "elapsed_time": "10:54:49", "remaining_time": "1:50:26"}
|
||||
{"current_steps": 2175, "total_steps": 2536, "loss": 0.0388, "lr": 4.99409212791192e-06, "epoch": 6.861198738170347, "percentage": 85.76, "elapsed_time": "10:56:00", "remaining_time": "1:48:52"}
|
||||
{"current_steps": 2180, "total_steps": 2536, "loss": 0.0365, "lr": 4.8593507337231666e-06, "epoch": 6.8769716088328074, "percentage": 85.96, "elapsed_time": "10:57:11", "remaining_time": "1:47:19"}
|
||||
{"current_steps": 2185, "total_steps": 2536, "loss": 0.0375, "lr": 4.726359072005859e-06, "epoch": 6.892744479495268, "percentage": 86.16, "elapsed_time": "10:58:20", "remaining_time": "1:45:45"}
|
||||
{"current_steps": 2190, "total_steps": 2536, "loss": 0.0347, "lr": 4.5951222977551444e-06, "epoch": 6.908517350157728, "percentage": 86.36, "elapsed_time": "10:59:27", "remaining_time": "1:44:11"}
|
||||
{"current_steps": 2195, "total_steps": 2536, "loss": 0.0382, "lr": 4.465645497943621e-06, "epoch": 6.92429022082019, "percentage": 86.55, "elapsed_time": "11:00:37", "remaining_time": "1:42:37"}
|
||||
{"current_steps": 2200, "total_steps": 2536, "loss": 0.0359, "lr": 4.337933691324109e-06, "epoch": 6.94006309148265, "percentage": 86.75, "elapsed_time": "11:01:43", "remaining_time": "1:41:03"}
|
||||
{"current_steps": 2205, "total_steps": 2536, "loss": 0.0378, "lr": 4.21199182823514e-06, "epoch": 6.955835962145111, "percentage": 86.95, "elapsed_time": "11:10:14", "remaining_time": "1:40:36"}
|
||||
{"current_steps": 2210, "total_steps": 2536, "loss": 0.039, "lr": 4.08782479040905e-06, "epoch": 6.971608832807571, "percentage": 87.15, "elapsed_time": "11:11:23", "remaining_time": "1:39:02"}
|
||||
{"current_steps": 2215, "total_steps": 2536, "loss": 0.0362, "lr": 3.9654373907827665e-06, "epoch": 6.987381703470032, "percentage": 87.34, "elapsed_time": "11:12:35", "remaining_time": "1:37:28"}
|
||||
{"current_steps": 2220, "total_steps": 2536, "loss": 0.0306, "lr": 3.844834373311257e-06, "epoch": 7.003154574132492, "percentage": 87.54, "elapsed_time": "11:13:39", "remaining_time": "1:35:53"}
|
||||
{"current_steps": 2225, "total_steps": 2536, "loss": 0.0226, "lr": 3.7260204127836316e-06, "epoch": 7.018927444794953, "percentage": 87.74, "elapsed_time": "11:14:43", "remaining_time": "1:34:18"}
|
||||
{"current_steps": 2230, "total_steps": 2536, "loss": 0.0214, "lr": 3.609000114641964e-06, "epoch": 7.034700315457413, "percentage": 87.93, "elapsed_time": "11:15:52", "remaining_time": "1:32:44"}
|
||||
{"current_steps": 2235, "total_steps": 2536, "loss": 0.0202, "lr": 3.4937780148027344e-06, "epoch": 7.0504731861198735, "percentage": 88.13, "elapsed_time": "11:17:01", "remaining_time": "1:31:10"}
|
||||
{"current_steps": 2240, "total_steps": 2536, "loss": 0.0204, "lr": 3.3803585794810466e-06, "epoch": 7.066246056782334, "percentage": 88.33, "elapsed_time": "11:18:05", "remaining_time": "1:29:36"}
|
||||
{"current_steps": 2245, "total_steps": 2536, "loss": 0.0213, "lr": 3.2687462050175034e-06, "epoch": 7.082018927444795, "percentage": 88.53, "elapsed_time": "11:19:13", "remaining_time": "1:28:02"}
|
||||
{"current_steps": 2250, "total_steps": 2536, "loss": 0.0197, "lr": 3.1589452177077815e-06, "epoch": 7.097791798107256, "percentage": 88.72, "elapsed_time": "11:20:23", "remaining_time": "1:26:29"}
|
||||
{"current_steps": 2255, "total_steps": 2536, "loss": 0.0213, "lr": 3.0509598736349343e-06, "epoch": 7.113564668769716, "percentage": 88.92, "elapsed_time": "11:21:32", "remaining_time": "1:24:55"}
|
||||
{"current_steps": 2260, "total_steps": 2536, "loss": 0.0218, "lr": 2.9447943585044545e-06, "epoch": 7.129337539432177, "percentage": 89.12, "elapsed_time": "11:22:44", "remaining_time": "1:23:22"}
|
||||
{"current_steps": 2265, "total_steps": 2536, "loss": 0.021, "lr": 2.840452787481979e-06, "epoch": 7.145110410094637, "percentage": 89.31, "elapsed_time": "11:23:50", "remaining_time": "1:21:49"}
|
||||
{"current_steps": 2270, "total_steps": 2536, "loss": 0.0211, "lr": 2.7379392050338236e-06, "epoch": 7.160883280757098, "percentage": 89.51, "elapsed_time": "11:25:00", "remaining_time": "1:20:16"}
|
||||
{"current_steps": 2275, "total_steps": 2536, "loss": 0.0202, "lr": 2.63725758477017e-06, "epoch": 7.176656151419558, "percentage": 89.71, "elapsed_time": "11:26:04", "remaining_time": "1:18:42"}
|
||||
{"current_steps": 2280, "total_steps": 2536, "loss": 0.0214, "lr": 2.5384118292910818e-06, "epoch": 7.192429022082019, "percentage": 89.91, "elapsed_time": "11:27:08", "remaining_time": "1:17:09"}
|
||||
{"current_steps": 2285, "total_steps": 2536, "loss": 0.0211, "lr": 2.4414057700351934e-06, "epoch": 7.208201892744479, "percentage": 90.1, "elapsed_time": "11:28:12", "remaining_time": "1:15:35"}
|
||||
{"current_steps": 2290, "total_steps": 2536, "loss": 0.0232, "lr": 2.34624316713124e-06, "epoch": 7.2239747634069404, "percentage": 90.3, "elapsed_time": "11:29:23", "remaining_time": "1:14:03"}
|
||||
{"current_steps": 2295, "total_steps": 2536, "loss": 0.0203, "lr": 2.2529277092522503e-06, "epoch": 7.239747634069401, "percentage": 90.5, "elapsed_time": "11:30:32", "remaining_time": "1:12:30"}
|
||||
{"current_steps": 2300, "total_steps": 2536, "loss": 0.0209, "lr": 2.1614630134726367e-06, "epoch": 7.255520504731861, "percentage": 90.69, "elapsed_time": "11:31:37", "remaining_time": "1:10:58"}
|
||||
{"current_steps": 2305, "total_steps": 2536, "loss": 0.0193, "lr": 2.0718526251279346e-06, "epoch": 7.271293375394322, "percentage": 90.89, "elapsed_time": "11:36:32", "remaining_time": "1:09:48"}
|
||||
{"current_steps": 2310, "total_steps": 2536, "loss": 0.0191, "lr": 1.9841000176774148e-06, "epoch": 7.287066246056782, "percentage": 91.09, "elapsed_time": "11:37:38", "remaining_time": "1:08:15"}
|
||||
{"current_steps": 2315, "total_steps": 2536, "loss": 0.0196, "lr": 1.898208592569406e-06, "epoch": 7.302839116719243, "percentage": 91.29, "elapsed_time": "11:39:29", "remaining_time": "1:06:46"}
|
||||
{"current_steps": 2320, "total_steps": 2536, "loss": 0.0206, "lr": 1.8141816791095e-06, "epoch": 7.318611987381703, "percentage": 91.48, "elapsed_time": "11:40:34", "remaining_time": "1:05:13"}
|
||||
{"current_steps": 2325, "total_steps": 2536, "loss": 0.0225, "lr": 1.7320225343314566e-06, "epoch": 7.334384858044164, "percentage": 91.68, "elapsed_time": "11:41:41", "remaining_time": "1:03:40"}
|
||||
{"current_steps": 2330, "total_steps": 2536, "loss": 0.0216, "lr": 1.6517343428709975e-06, "epoch": 7.350157728706624, "percentage": 91.88, "elapsed_time": "11:42:51", "remaining_time": "1:02:08"}
|
||||
{"current_steps": 2335, "total_steps": 2536, "loss": 0.02, "lr": 1.5733202168423055e-06, "epoch": 7.365930599369086, "percentage": 92.07, "elapsed_time": "11:44:03", "remaining_time": "1:00:36"}
|
||||
{"current_steps": 2340, "total_steps": 2536, "loss": 0.022, "lr": 1.4967831957174606e-06, "epoch": 7.381703470031546, "percentage": 92.27, "elapsed_time": "11:45:10", "remaining_time": "0:59:03"}
|
||||
{"current_steps": 2345, "total_steps": 2536, "loss": 0.0201, "lr": 1.4221262462085715e-06, "epoch": 7.3974763406940065, "percentage": 92.47, "elapsed_time": "11:46:16", "remaining_time": "0:57:31"}
|
||||
{"current_steps": 2350, "total_steps": 2536, "loss": 0.0213, "lr": 1.3493522621528088e-06, "epoch": 7.413249211356467, "percentage": 92.67, "elapsed_time": "11:47:22", "remaining_time": "0:55:59"}
|
||||
{"current_steps": 2355, "total_steps": 2536, "loss": 0.0232, "lr": 1.2784640644002366e-06, "epoch": 7.429022082018927, "percentage": 92.86, "elapsed_time": "11:48:34", "remaining_time": "0:54:27"}
|
||||
{"current_steps": 2360, "total_steps": 2536, "loss": 0.0215, "lr": 1.209464400704452e-06, "epoch": 7.444794952681388, "percentage": 93.06, "elapsed_time": "11:49:40", "remaining_time": "0:52:55"}
|
||||
{"current_steps": 2365, "total_steps": 2536, "loss": 0.0202, "lr": 1.1423559456160803e-06, "epoch": 7.460567823343848, "percentage": 93.26, "elapsed_time": "11:50:49", "remaining_time": "0:51:23"}
|
||||
{"current_steps": 2370, "total_steps": 2536, "loss": 0.0203, "lr": 1.0771413003791253e-06, "epoch": 7.476340694006309, "percentage": 93.45, "elapsed_time": "11:51:56", "remaining_time": "0:49:51"}
|
||||
{"current_steps": 2375, "total_steps": 2536, "loss": 0.019, "lr": 1.0138229928301212e-06, "epoch": 7.492113564668769, "percentage": 93.65, "elapsed_time": "11:53:06", "remaining_time": "0:48:20"}
|
||||
{"current_steps": 2380, "total_steps": 2536, "loss": 0.0209, "lr": 9.524034773001511e-07, "epoch": 7.50788643533123, "percentage": 93.85, "elapsed_time": "11:54:12", "remaining_time": "0:46:48"}
|
||||
{"current_steps": 2385, "total_steps": 2536, "loss": 0.0212, "lr": 8.928851345197165e-07, "epoch": 7.523659305993691, "percentage": 94.05, "elapsed_time": "11:55:17", "remaining_time": "0:45:17"}
|
||||
{"current_steps": 2390, "total_steps": 2536, "loss": 0.0211, "lr": 8.352702715264726e-07, "epoch": 7.539432176656152, "percentage": 94.24, "elapsed_time": "11:56:21", "remaining_time": "0:43:45"}
|
||||
{"current_steps": 2395, "total_steps": 2536, "loss": 0.0207, "lr": 7.795611215757615e-07, "epoch": 7.555205047318612, "percentage": 94.44, "elapsed_time": "11:57:28", "remaining_time": "0:42:14"}
|
||||
{"current_steps": 2400, "total_steps": 2536, "loss": 0.019, "lr": 7.257598440540802e-07, "epoch": 7.570977917981073, "percentage": 94.64, "elapsed_time": "11:58:34", "remaining_time": "0:40:43"}
|
||||
{"current_steps": 2405, "total_steps": 2536, "loss": 0.022, "lr": 6.738685243953769e-07, "epoch": 7.586750788643533, "percentage": 94.83, "elapsed_time": "12:06:03", "remaining_time": "0:39:32"}
|
||||
{"current_steps": 2410, "total_steps": 2536, "loss": 0.0181, "lr": 6.238891740002195e-07, "epoch": 7.6025236593059935, "percentage": 95.03, "elapsed_time": "12:07:12", "remaining_time": "0:38:01"}
|
||||
{"current_steps": 2415, "total_steps": 2536, "loss": 0.0224, "lr": 5.758237301577874e-07, "epoch": 7.618296529968454, "percentage": 95.23, "elapsed_time": "12:08:22", "remaining_time": "0:36:29"}
|
||||
{"current_steps": 2420, "total_steps": 2536, "loss": 0.0198, "lr": 5.296740559708413e-07, "epoch": 7.634069400630915, "percentage": 95.43, "elapsed_time": "12:09:32", "remaining_time": "0:34:58"}
|
||||
{"current_steps": 2425, "total_steps": 2536, "loss": 0.0188, "lr": 4.854419402834709e-07, "epoch": 7.649842271293375, "percentage": 95.62, "elapsed_time": "12:10:42", "remaining_time": "0:33:26"}
|
||||
{"current_steps": 2430, "total_steps": 2536, "loss": 0.0211, "lr": 4.431290976117497e-07, "epoch": 7.665615141955836, "percentage": 95.82, "elapsed_time": "12:11:46", "remaining_time": "0:31:55"}
|
||||
{"current_steps": 2435, "total_steps": 2536, "loss": 0.0212, "lr": 4.0273716807731067e-07, "epoch": 7.681388012618297, "percentage": 96.02, "elapsed_time": "12:12:58", "remaining_time": "0:30:24"}
|
||||
{"current_steps": 2440, "total_steps": 2536, "loss": 0.0221, "lr": 3.642677173437137e-07, "epoch": 7.697160883280757, "percentage": 96.21, "elapsed_time": "12:14:12", "remaining_time": "0:28:53"}
|
||||
{"current_steps": 2445, "total_steps": 2536, "loss": 0.0201, "lr": 3.2772223655583857e-07, "epoch": 7.712933753943218, "percentage": 96.41, "elapsed_time": "12:15:15", "remaining_time": "0:27:21"}
|
||||
{"current_steps": 2450, "total_steps": 2536, "loss": 0.0198, "lr": 2.9310214228202013e-07, "epoch": 7.728706624605678, "percentage": 96.61, "elapsed_time": "12:16:24", "remaining_time": "0:25:50"}
|
||||
{"current_steps": 2455, "total_steps": 2536, "loss": 0.0208, "lr": 2.604087764591534e-07, "epoch": 7.744479495268139, "percentage": 96.81, "elapsed_time": "12:17:29", "remaining_time": "0:24:19"}
|
||||
{"current_steps": 2460, "total_steps": 2536, "loss": 0.0203, "lr": 2.2964340634069603e-07, "epoch": 7.760252365930599, "percentage": 97.0, "elapsed_time": "12:18:34", "remaining_time": "0:22:49"}
|
||||
{"current_steps": 2465, "total_steps": 2536, "loss": 0.0195, "lr": 2.0080722444754118e-07, "epoch": 7.7760252365930596, "percentage": 97.2, "elapsed_time": "12:19:44", "remaining_time": "0:21:18"}
|
||||
{"current_steps": 2470, "total_steps": 2536, "loss": 0.0209, "lr": 1.7390134852177664e-07, "epoch": 7.79179810725552, "percentage": 97.4, "elapsed_time": "12:20:51", "remaining_time": "0:19:47"}
|
||||
{"current_steps": 2475, "total_steps": 2536, "loss": 0.0222, "lr": 1.48926821483375e-07, "epoch": 7.807570977917981, "percentage": 97.59, "elapsed_time": "12:22:02", "remaining_time": "0:18:17"}
|
||||
{"current_steps": 2480, "total_steps": 2536, "loss": 0.0194, "lr": 1.2588461138977604e-07, "epoch": 7.823343848580442, "percentage": 97.79, "elapsed_time": "12:23:11", "remaining_time": "0:16:46"}
|
||||
{"current_steps": 2485, "total_steps": 2536, "loss": 0.0216, "lr": 1.0477561139832781e-07, "epoch": 7.839116719242902, "percentage": 97.99, "elapsed_time": "12:24:17", "remaining_time": "0:15:16"}
|
||||
{"current_steps": 2490, "total_steps": 2536, "loss": 0.02, "lr": 8.560063973171439e-08, "epoch": 7.854889589905363, "percentage": 98.19, "elapsed_time": "12:25:22", "remaining_time": "0:13:46"}
|
||||
{"current_steps": 2495, "total_steps": 2536, "loss": 0.0211, "lr": 6.836043964620342e-08, "epoch": 7.870662460567823, "percentage": 98.38, "elapsed_time": "12:26:28", "remaining_time": "0:12:16"}
|
||||
{"current_steps": 2500, "total_steps": 2536, "loss": 0.0191, "lr": 5.3055679402846946e-08, "epoch": 7.886435331230284, "percentage": 98.58, "elapsed_time": "12:27:29", "remaining_time": "0:10:45"}
|
||||
{"current_steps": 2505, "total_steps": 2536, "loss": 0.0216, "lr": 3.968695224158547e-08, "epoch": 7.902208201892744, "percentage": 98.78, "elapsed_time": "12:32:11", "remaining_time": "0:09:18"}
|
||||
{"current_steps": 2510, "total_steps": 2536, "loss": 0.0198, "lr": 2.8254776358238588e-08, "epoch": 7.917981072555205, "percentage": 98.97, "elapsed_time": "12:33:19", "remaining_time": "0:07:48"}
|
||||
{"current_steps": 2515, "total_steps": 2536, "loss": 0.0225, "lr": 1.8759594884443233e-08, "epoch": 7.933753943217665, "percentage": 99.17, "elapsed_time": "12:34:27", "remaining_time": "0:06:17"}
|
||||
{"current_steps": 2520, "total_steps": 2536, "loss": 0.0215, "lr": 1.1201775870445242e-08, "epoch": 7.9495268138801265, "percentage": 99.37, "elapsed_time": "12:35:35", "remaining_time": "0:04:47"}
|
||||
{"current_steps": 2525, "total_steps": 2536, "loss": 0.0197, "lr": 5.581612270855186e-09, "epoch": 7.965299684542587, "percentage": 99.57, "elapsed_time": "12:36:45", "remaining_time": "0:03:17"}
|
||||
{"current_steps": 2530, "total_steps": 2536, "loss": 0.022, "lr": 1.8993219332907877e-09, "epoch": 7.981072555205047, "percentage": 99.76, "elapsed_time": "12:37:56", "remaining_time": "0:01:47"}
|
||||
{"current_steps": 2535, "total_steps": 2536, "loss": 0.0198, "lr": 1.5504758992257451e-10, "epoch": 7.996845425867508, "percentage": 99.96, "elapsed_time": "12:39:03", "remaining_time": "0:00:17"}
|
||||
{"current_steps": 2536, "total_steps": 2536, "epoch": 8.0, "percentage": 100.0, "elapsed_time": "12:43:06", "remaining_time": "0:00:00"}
|
||||
5624
trainer_state.json
Normal file
5624
trainer_state.json
Normal file
File diff suppressed because it is too large
Load Diff
3
training_args.bin
Normal file
3
training_args.bin
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:66e6ab08343afbf77e05fe0528ed063ee0c8ca8501080fcb2c048c8db69ef713
|
||||
size 8721
|
||||
BIN
training_loss.png
Normal file
BIN
training_loss.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 39 KiB |
BIN
vocab.json
(Stored with Git LFS)
Normal file
BIN
vocab.json
(Stored with Git LFS)
Normal file
Binary file not shown.
Reference in New Issue
Block a user