初始化项目,由ModelHub XC社区提供模型
Model: laion/Kimi-K2T-neulab-agenttuning-webshop-sandboxes-maxeps-32k Source: Original Platform
This commit is contained in:
36
.gitattributes
vendored
Normal file
36
.gitattributes
vendored
Normal file
@@ -0,0 +1,36 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||
61
README.md
Normal file
61
README.md
Normal file
@@ -0,0 +1,61 @@
|
||||
---
|
||||
library_name: transformers
|
||||
license: apache-2.0
|
||||
base_model: Qwen/Qwen3-8B
|
||||
tags:
|
||||
- llama-factory
|
||||
- full
|
||||
- generated_from_trainer
|
||||
model-index:
|
||||
- name: Kimi-K2T-neulab-agenttuning-webshop-sandboxes-maxeps-32k
|
||||
results: []
|
||||
---
|
||||
|
||||
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
||||
should probably proofread and complete it, then remove this comment. -->
|
||||
|
||||
# Kimi-K2T-neulab-agenttuning-webshop-sandboxes-maxeps-32k
|
||||
|
||||
This model is a fine-tuned version of [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) on the penfever/Kimi-K2T-neulab-agenttuning-webshop-sandboxes-maxeps-32k dataset.
|
||||
|
||||
## Model description
|
||||
|
||||
More information needed
|
||||
|
||||
## Intended uses & limitations
|
||||
|
||||
More information needed
|
||||
|
||||
## Training and evaluation data
|
||||
|
||||
More information needed
|
||||
|
||||
## Training procedure
|
||||
|
||||
### Training hyperparameters
|
||||
|
||||
The following hyperparameters were used during training:
|
||||
- learning_rate: 4e-05
|
||||
- train_batch_size: 1
|
||||
- eval_batch_size: 8
|
||||
- seed: 42
|
||||
- distributed_type: multi-GPU
|
||||
- num_devices: 8
|
||||
- gradient_accumulation_steps: 2
|
||||
- total_train_batch_size: 16
|
||||
- total_eval_batch_size: 64
|
||||
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.98) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
||||
- lr_scheduler_type: cosine
|
||||
- lr_scheduler_warmup_ratio: 0.1
|
||||
- num_epochs: 7.0
|
||||
|
||||
### Training results
|
||||
|
||||
|
||||
|
||||
### Framework versions
|
||||
|
||||
- Transformers 4.57.3
|
||||
- Pytorch 2.9.0+cu128
|
||||
- Datasets 4.4.1
|
||||
- Tokenizers 0.22.2
|
||||
28
added_tokens.json
Normal file
28
added_tokens.json
Normal file
@@ -0,0 +1,28 @@
|
||||
{
|
||||
"</think>": 151668,
|
||||
"</tool_call>": 151658,
|
||||
"</tool_response>": 151666,
|
||||
"<think>": 151667,
|
||||
"<tool_call>": 151657,
|
||||
"<tool_response>": 151665,
|
||||
"<|box_end|>": 151649,
|
||||
"<|box_start|>": 151648,
|
||||
"<|endoftext|>": 151643,
|
||||
"<|file_sep|>": 151664,
|
||||
"<|fim_middle|>": 151660,
|
||||
"<|fim_pad|>": 151662,
|
||||
"<|fim_prefix|>": 151659,
|
||||
"<|fim_suffix|>": 151661,
|
||||
"<|im_end|>": 151645,
|
||||
"<|im_start|>": 151644,
|
||||
"<|image_pad|>": 151655,
|
||||
"<|object_ref_end|>": 151647,
|
||||
"<|object_ref_start|>": 151646,
|
||||
"<|quad_end|>": 151651,
|
||||
"<|quad_start|>": 151650,
|
||||
"<|repo_name|>": 151663,
|
||||
"<|video_pad|>": 151656,
|
||||
"<|vision_end|>": 151653,
|
||||
"<|vision_pad|>": 151654,
|
||||
"<|vision_start|>": 151652
|
||||
}
|
||||
16
all_results.json
Normal file
16
all_results.json
Normal file
@@ -0,0 +1,16 @@
|
||||
{
|
||||
"achieved_tflops_per_gpu": 6.196733100702532,
|
||||
"achieved_tflops_per_gpu_theoretical": 426.8759749157192,
|
||||
"epoch": 7.0,
|
||||
"loss_nan_ranks": 0,
|
||||
"loss_rank_avg": 0.26545536518096924,
|
||||
"mfu_percent": 0.626565530910266,
|
||||
"mfu_percent_theoretical": 43.16238371240841,
|
||||
"total_flos": 9.890960502181724e+17,
|
||||
"train_loss": 0.3139860285869714,
|
||||
"train_runtime": 19951.9657,
|
||||
"train_samples_per_second": 2.007,
|
||||
"train_steps_per_second": 0.126,
|
||||
"valid_targets_mean": 4094.0,
|
||||
"valid_targets_min": 3070
|
||||
}
|
||||
89
chat_template.jinja
Normal file
89
chat_template.jinja
Normal file
@@ -0,0 +1,89 @@
|
||||
{%- if tools %}
|
||||
{{- '<|im_start|>system\n' }}
|
||||
{%- if messages[0].role == 'system' %}
|
||||
{{- messages[0].content + '\n\n' }}
|
||||
{%- endif %}
|
||||
{{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
|
||||
{%- for tool in tools %}
|
||||
{{- "\n" }}
|
||||
{{- tool | tojson }}
|
||||
{%- endfor %}
|
||||
{{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
|
||||
{%- else %}
|
||||
{%- if messages[0].role == 'system' %}
|
||||
{{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
|
||||
{%- for message in messages[::-1] %}
|
||||
{%- set index = (messages|length - 1) - loop.index0 %}
|
||||
{%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
|
||||
{%- set ns.multi_step_tool = false %}
|
||||
{%- set ns.last_query_index = index %}
|
||||
{%- endif %}
|
||||
{%- endfor %}
|
||||
{%- for message in messages %}
|
||||
{%- if message.content is string %}
|
||||
{%- set content = message.content %}
|
||||
{%- else %}
|
||||
{%- set content = '' %}
|
||||
{%- endif %}
|
||||
{%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
|
||||
{{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
|
||||
{%- elif message.role == "assistant" %}
|
||||
{%- set reasoning_content = '' %}
|
||||
{%- if message.reasoning_content is string %}
|
||||
{%- set reasoning_content = message.reasoning_content %}
|
||||
{%- else %}
|
||||
{%- if '</think>' in content %}
|
||||
{%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
|
||||
{%- set content = content.split('</think>')[-1].lstrip('\n') %}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
{%- if loop.index0 > ns.last_query_index %}
|
||||
{%- if loop.last or (not loop.last and reasoning_content) %}
|
||||
{{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
|
||||
{%- else %}
|
||||
{{- '<|im_start|>' + message.role + '\n' + content }}
|
||||
{%- endif %}
|
||||
{%- else %}
|
||||
{{- '<|im_start|>' + message.role + '\n' + content }}
|
||||
{%- endif %}
|
||||
{%- if message.tool_calls %}
|
||||
{%- for tool_call in message.tool_calls %}
|
||||
{%- if (loop.first and content) or (not loop.first) %}
|
||||
{{- '\n' }}
|
||||
{%- endif %}
|
||||
{%- if tool_call.function %}
|
||||
{%- set tool_call = tool_call.function %}
|
||||
{%- endif %}
|
||||
{{- '<tool_call>\n{"name": "' }}
|
||||
{{- tool_call.name }}
|
||||
{{- '", "arguments": ' }}
|
||||
{%- if tool_call.arguments is string %}
|
||||
{{- tool_call.arguments }}
|
||||
{%- else %}
|
||||
{{- tool_call.arguments | tojson }}
|
||||
{%- endif %}
|
||||
{{- '}\n</tool_call>' }}
|
||||
{%- endfor %}
|
||||
{%- endif %}
|
||||
{{- '<|im_end|>\n' }}
|
||||
{%- elif message.role == "tool" %}
|
||||
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
|
||||
{{- '<|im_start|>user' }}
|
||||
{%- endif %}
|
||||
{{- '\n<tool_response>\n' }}
|
||||
{{- content }}
|
||||
{{- '\n</tool_response>' }}
|
||||
{%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
|
||||
{{- '<|im_end|>\n' }}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
{%- endfor %}
|
||||
{%- if add_generation_prompt %}
|
||||
{{- '<|im_start|>assistant\n' }}
|
||||
{%- if enable_thinking is defined and enable_thinking is false %}
|
||||
{{- '<think>\n\n</think>\n\n' }}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
68
config.json
Normal file
68
config.json
Normal file
@@ -0,0 +1,68 @@
|
||||
{
|
||||
"architectures": [
|
||||
"Qwen3ForCausalLM"
|
||||
],
|
||||
"attention_bias": false,
|
||||
"attention_dropout": 0.0,
|
||||
"dtype": "bfloat16",
|
||||
"eos_token_id": 151645,
|
||||
"head_dim": 128,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 4096,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 12288,
|
||||
"layer_types": [
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention",
|
||||
"full_attention"
|
||||
],
|
||||
"max_position_embeddings": 40960,
|
||||
"max_window_layers": 36,
|
||||
"model_type": "qwen3",
|
||||
"num_attention_heads": 32,
|
||||
"num_hidden_layers": 36,
|
||||
"num_key_value_heads": 8,
|
||||
"pad_token_id": 151643,
|
||||
"rms_norm_eps": 1e-06,
|
||||
"rope_scaling": null,
|
||||
"rope_theta": 1000000,
|
||||
"sliding_window": null,
|
||||
"tie_word_embeddings": false,
|
||||
"transformers_version": "4.57.3",
|
||||
"use_cache": false,
|
||||
"use_sliding_window": false,
|
||||
"vocab_size": 151936
|
||||
}
|
||||
12
generation_config.json
Normal file
12
generation_config.json
Normal file
@@ -0,0 +1,12 @@
|
||||
{
|
||||
"do_sample": true,
|
||||
"eos_token_id": [
|
||||
151645,
|
||||
151643
|
||||
],
|
||||
"pad_token_id": 151643,
|
||||
"temperature": 0.6,
|
||||
"top_k": 20,
|
||||
"top_p": 0.95,
|
||||
"transformers_version": "4.57.3"
|
||||
}
|
||||
151388
merges.txt
Normal file
151388
merges.txt
Normal file
File diff suppressed because it is too large
Load Diff
3
model-00001-of-00004.safetensors
Normal file
3
model-00001-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:77c07080409806c832c11ec6c139c2223816c96ee29cbfeedf61bc06e19fe81f
|
||||
size 4902257696
|
||||
3
model-00002-of-00004.safetensors
Normal file
3
model-00002-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:d7751f6c03178d383f1da147e2bed895a4487b761877d06540766dff07516540
|
||||
size 4915960368
|
||||
3
model-00003-of-00004.safetensors
Normal file
3
model-00003-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:51ddbdf5113283752815bb054a7a8b1b5ffdb45f3bc2d5d0b14ed8acce8470bf
|
||||
size 4983068496
|
||||
3
model-00004-of-00004.safetensors
Normal file
3
model-00004-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:a43d8a11b4d051ee805c69e3e4c3a6aa261aea7dc0aa3d6e5b9b0ad1489be194
|
||||
size 1580230264
|
||||
407
model.safetensors.index.json
Normal file
407
model.safetensors.index.json
Normal file
@@ -0,0 +1,407 @@
|
||||
{
|
||||
"metadata": {
|
||||
"total_parameters": 308224,
|
||||
"total_size": 16381470720
|
||||
},
|
||||
"weight_map": {
|
||||
"lm_head.weight": "model-00004-of-00004.safetensors",
|
||||
"model.embed_tokens.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.20.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.30.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.35.input_layernorm.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.self_attn.k_norm.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.35.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.self_attn.q_norm.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.35.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.norm.weight": "model-00004-of-00004.safetensors"
|
||||
}
|
||||
}
|
||||
12
run_summary.json
Normal file
12
run_summary.json
Normal file
@@ -0,0 +1,12 @@
|
||||
{
|
||||
"agent_name": "Kimi-K2T-neulab-agenttuning-webshop-sandboxes-maxeps-32k",
|
||||
"training_start": null,
|
||||
"training_end": null,
|
||||
"created_by": "DCAgent",
|
||||
"base_model_name": "Qwen/Qwen3-8B",
|
||||
"dataset_name": "penfever/Kimi-K2T-neulab-agenttuning-webshop-sandboxes-maxeps-32k",
|
||||
"training_type": "SFT",
|
||||
"training_parameters": "https://huggingface.co/laion/Kimi-K2T-neulab-agenttuning-webshop-sandboxes-maxeps-32k/blob/main/config.json",
|
||||
"wandb_link": "https://wandb.ai/dogml/OpenThoughts-Agent/runs/sft_Kimi-K2T-neulab-agenttuning-webshop-sandboxes-maxeps-32k_Qwen3-8B",
|
||||
"traces_location_s3": null
|
||||
}
|
||||
31
special_tokens_map.json
Normal file
31
special_tokens_map.json
Normal file
@@ -0,0 +1,31 @@
|
||||
{
|
||||
"additional_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>",
|
||||
"<|object_ref_start|>",
|
||||
"<|object_ref_end|>",
|
||||
"<|box_start|>",
|
||||
"<|box_end|>",
|
||||
"<|quad_start|>",
|
||||
"<|quad_end|>",
|
||||
"<|vision_start|>",
|
||||
"<|vision_end|>",
|
||||
"<|vision_pad|>",
|
||||
"<|image_pad|>",
|
||||
"<|video_pad|>"
|
||||
],
|
||||
"eos_token": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
BIN
tokenizer.json
(Stored with Git LFS)
Normal file
BIN
tokenizer.json
(Stored with Git LFS)
Normal file
Binary file not shown.
240
tokenizer_config.json
Normal file
240
tokenizer_config.json
Normal file
@@ -0,0 +1,240 @@
|
||||
{
|
||||
"add_bos_token": false,
|
||||
"add_prefix_space": false,
|
||||
"added_tokens_decoder": {
|
||||
"151643": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151644": {
|
||||
"content": "<|im_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151645": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151646": {
|
||||
"content": "<|object_ref_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151647": {
|
||||
"content": "<|object_ref_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151648": {
|
||||
"content": "<|box_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151649": {
|
||||
"content": "<|box_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151650": {
|
||||
"content": "<|quad_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151651": {
|
||||
"content": "<|quad_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151652": {
|
||||
"content": "<|vision_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151653": {
|
||||
"content": "<|vision_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151654": {
|
||||
"content": "<|vision_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151655": {
|
||||
"content": "<|image_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151656": {
|
||||
"content": "<|video_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151657": {
|
||||
"content": "<tool_call>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151658": {
|
||||
"content": "</tool_call>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151659": {
|
||||
"content": "<|fim_prefix|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151660": {
|
||||
"content": "<|fim_middle|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151661": {
|
||||
"content": "<|fim_suffix|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151662": {
|
||||
"content": "<|fim_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151663": {
|
||||
"content": "<|repo_name|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151664": {
|
||||
"content": "<|file_sep|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151665": {
|
||||
"content": "<tool_response>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151666": {
|
||||
"content": "</tool_response>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151667": {
|
||||
"content": "<think>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151668": {
|
||||
"content": "</think>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
}
|
||||
},
|
||||
"additional_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>",
|
||||
"<|object_ref_start|>",
|
||||
"<|object_ref_end|>",
|
||||
"<|box_start|>",
|
||||
"<|box_end|>",
|
||||
"<|quad_start|>",
|
||||
"<|quad_end|>",
|
||||
"<|vision_start|>",
|
||||
"<|vision_end|>",
|
||||
"<|vision_pad|>",
|
||||
"<|image_pad|>",
|
||||
"<|video_pad|>"
|
||||
],
|
||||
"bos_token": null,
|
||||
"clean_up_tokenization_spaces": false,
|
||||
"eos_token": "<|im_end|>",
|
||||
"errors": "replace",
|
||||
"extra_special_tokens": {},
|
||||
"model_max_length": 32768,
|
||||
"pad_token": "<|endoftext|>",
|
||||
"padding_side": "right",
|
||||
"split_special_tokens": false,
|
||||
"tokenizer_class": "Qwen2Tokenizer",
|
||||
"unk_token": null
|
||||
}
|
||||
16
train_results.json
Normal file
16
train_results.json
Normal file
@@ -0,0 +1,16 @@
|
||||
{
|
||||
"achieved_tflops_per_gpu": 6.196733100702532,
|
||||
"achieved_tflops_per_gpu_theoretical": 426.8759749157192,
|
||||
"epoch": 7.0,
|
||||
"loss_nan_ranks": 0,
|
||||
"loss_rank_avg": 0.26545536518096924,
|
||||
"mfu_percent": 0.626565530910266,
|
||||
"mfu_percent_theoretical": 43.16238371240841,
|
||||
"total_flos": 9.890960502181724e+17,
|
||||
"train_loss": 0.3139860285869714,
|
||||
"train_runtime": 19951.9657,
|
||||
"train_samples_per_second": 2.007,
|
||||
"train_steps_per_second": 0.126,
|
||||
"valid_targets_mean": 4094.0,
|
||||
"valid_targets_min": 3070
|
||||
}
|
||||
502
trainer_log.jsonl
Normal file
502
trainer_log.jsonl
Normal file
@@ -0,0 +1,502 @@
|
||||
{"current_steps": 5, "total_steps": 2506, "loss": 1.0535, "lr": 6.374501992031873e-07, "epoch": 0.013986013986013986, "percentage": 0.2, "elapsed_time": "0:00:53", "remaining_time": "7:26:22"}
|
||||
{"current_steps": 10, "total_steps": 2506, "loss": 1.0403, "lr": 1.4342629482071716e-06, "epoch": 0.027972027972027972, "percentage": 0.4, "elapsed_time": "0:01:35", "remaining_time": "6:37:09"}
|
||||
{"current_steps": 15, "total_steps": 2506, "loss": 1.0073, "lr": 2.2310756972111555e-06, "epoch": 0.04195804195804196, "percentage": 0.6, "elapsed_time": "0:02:17", "remaining_time": "6:19:36"}
|
||||
{"current_steps": 20, "total_steps": 2506, "loss": 0.9661, "lr": 3.0278884462151397e-06, "epoch": 0.055944055944055944, "percentage": 0.8, "elapsed_time": "0:03:00", "remaining_time": "6:12:55"}
|
||||
{"current_steps": 25, "total_steps": 2506, "loss": 0.9017, "lr": 3.824701195219123e-06, "epoch": 0.06993006993006994, "percentage": 1.0, "elapsed_time": "0:03:38", "remaining_time": "6:00:58"}
|
||||
{"current_steps": 30, "total_steps": 2506, "loss": 0.8338, "lr": 4.621513944223108e-06, "epoch": 0.08391608391608392, "percentage": 1.2, "elapsed_time": "0:04:23", "remaining_time": "6:02:37"}
|
||||
{"current_steps": 35, "total_steps": 2506, "loss": 0.7852, "lr": 5.418326693227092e-06, "epoch": 0.0979020979020979, "percentage": 1.4, "elapsed_time": "0:05:01", "remaining_time": "5:54:57"}
|
||||
{"current_steps": 40, "total_steps": 2506, "loss": 0.7225, "lr": 6.215139442231076e-06, "epoch": 0.11188811188811189, "percentage": 1.6, "elapsed_time": "0:05:41", "remaining_time": "5:50:34"}
|
||||
{"current_steps": 45, "total_steps": 2506, "loss": 0.686, "lr": 7.011952191235061e-06, "epoch": 0.1258741258741259, "percentage": 1.8, "elapsed_time": "0:06:21", "remaining_time": "5:47:58"}
|
||||
{"current_steps": 50, "total_steps": 2506, "loss": 0.6457, "lr": 7.808764940239044e-06, "epoch": 0.13986013986013987, "percentage": 2.0, "elapsed_time": "0:07:03", "remaining_time": "5:46:55"}
|
||||
{"current_steps": 55, "total_steps": 2506, "loss": 0.6121, "lr": 8.605577689243028e-06, "epoch": 0.15384615384615385, "percentage": 2.19, "elapsed_time": "0:07:48", "remaining_time": "5:47:55"}
|
||||
{"current_steps": 60, "total_steps": 2506, "loss": 0.596, "lr": 9.402390438247013e-06, "epoch": 0.16783216783216784, "percentage": 2.39, "elapsed_time": "0:08:32", "remaining_time": "5:48:11"}
|
||||
{"current_steps": 65, "total_steps": 2506, "loss": 0.5676, "lr": 1.0199203187250997e-05, "epoch": 0.18181818181818182, "percentage": 2.59, "elapsed_time": "0:09:14", "remaining_time": "5:46:48"}
|
||||
{"current_steps": 70, "total_steps": 2506, "loss": 0.5405, "lr": 1.099601593625498e-05, "epoch": 0.1958041958041958, "percentage": 2.79, "elapsed_time": "0:09:57", "remaining_time": "5:46:39"}
|
||||
{"current_steps": 75, "total_steps": 2506, "loss": 0.5166, "lr": 1.1792828685258967e-05, "epoch": 0.2097902097902098, "percentage": 2.99, "elapsed_time": "0:10:38", "remaining_time": "5:44:54"}
|
||||
{"current_steps": 80, "total_steps": 2506, "loss": 0.5094, "lr": 1.258964143426295e-05, "epoch": 0.22377622377622378, "percentage": 3.19, "elapsed_time": "0:11:22", "remaining_time": "5:44:45"}
|
||||
{"current_steps": 85, "total_steps": 2506, "loss": 0.4822, "lr": 1.3386454183266932e-05, "epoch": 0.23776223776223776, "percentage": 3.39, "elapsed_time": "0:12:08", "remaining_time": "5:45:41"}
|
||||
{"current_steps": 90, "total_steps": 2506, "loss": 0.4835, "lr": 1.4183266932270919e-05, "epoch": 0.2517482517482518, "percentage": 3.59, "elapsed_time": "0:12:47", "remaining_time": "5:43:22"}
|
||||
{"current_steps": 95, "total_steps": 2506, "loss": 0.4679, "lr": 1.4980079681274901e-05, "epoch": 0.26573426573426573, "percentage": 3.79, "elapsed_time": "0:13:30", "remaining_time": "5:42:55"}
|
||||
{"current_steps": 100, "total_steps": 2506, "loss": 0.4653, "lr": 1.5776892430278886e-05, "epoch": 0.27972027972027974, "percentage": 3.99, "elapsed_time": "0:14:15", "remaining_time": "5:43:00"}
|
||||
{"current_steps": 105, "total_steps": 2506, "loss": 0.4423, "lr": 1.6573705179282872e-05, "epoch": 0.2937062937062937, "percentage": 4.19, "elapsed_time": "0:14:57", "remaining_time": "5:42:11"}
|
||||
{"current_steps": 110, "total_steps": 2506, "loss": 0.4317, "lr": 1.7370517928286855e-05, "epoch": 0.3076923076923077, "percentage": 4.39, "elapsed_time": "0:15:37", "remaining_time": "5:40:16"}
|
||||
{"current_steps": 115, "total_steps": 2506, "loss": 0.4364, "lr": 1.8167330677290838e-05, "epoch": 0.32167832167832167, "percentage": 4.59, "elapsed_time": "0:16:17", "remaining_time": "5:38:43"}
|
||||
{"current_steps": 120, "total_steps": 2506, "loss": 0.4368, "lr": 1.8964143426294824e-05, "epoch": 0.3356643356643357, "percentage": 4.79, "elapsed_time": "0:16:57", "remaining_time": "5:37:14"}
|
||||
{"current_steps": 125, "total_steps": 2506, "loss": 0.4183, "lr": 1.9760956175298807e-05, "epoch": 0.34965034965034963, "percentage": 4.99, "elapsed_time": "0:17:43", "remaining_time": "5:37:30"}
|
||||
{"current_steps": 130, "total_steps": 2506, "loss": 0.4161, "lr": 2.055776892430279e-05, "epoch": 0.36363636363636365, "percentage": 5.19, "elapsed_time": "0:18:21", "remaining_time": "5:35:26"}
|
||||
{"current_steps": 135, "total_steps": 2506, "loss": 0.4174, "lr": 2.1354581673306773e-05, "epoch": 0.3776223776223776, "percentage": 5.39, "elapsed_time": "0:18:57", "remaining_time": "5:33:01"}
|
||||
{"current_steps": 140, "total_steps": 2506, "loss": 0.4071, "lr": 2.2151394422310756e-05, "epoch": 0.3916083916083916, "percentage": 5.59, "elapsed_time": "0:19:36", "remaining_time": "5:31:14"}
|
||||
{"current_steps": 145, "total_steps": 2506, "loss": 0.4142, "lr": 2.2948207171314745e-05, "epoch": 0.40559440559440557, "percentage": 5.79, "elapsed_time": "0:20:18", "remaining_time": "5:30:41"}
|
||||
{"current_steps": 150, "total_steps": 2506, "loss": 0.4044, "lr": 2.3745019920318728e-05, "epoch": 0.4195804195804196, "percentage": 5.99, "elapsed_time": "0:21:04", "remaining_time": "5:30:55"}
|
||||
{"current_steps": 155, "total_steps": 2506, "loss": 0.3983, "lr": 2.454183266932271e-05, "epoch": 0.43356643356643354, "percentage": 6.19, "elapsed_time": "0:21:44", "remaining_time": "5:29:50"}
|
||||
{"current_steps": 160, "total_steps": 2506, "loss": 0.3926, "lr": 2.5338645418326694e-05, "epoch": 0.44755244755244755, "percentage": 6.38, "elapsed_time": "0:22:31", "remaining_time": "5:30:22"}
|
||||
{"current_steps": 165, "total_steps": 2506, "loss": 0.3901, "lr": 2.6135458167330677e-05, "epoch": 0.46153846153846156, "percentage": 6.58, "elapsed_time": "0:23:10", "remaining_time": "5:28:48"}
|
||||
{"current_steps": 170, "total_steps": 2506, "loss": 0.3933, "lr": 2.6932270916334663e-05, "epoch": 0.4755244755244755, "percentage": 6.78, "elapsed_time": "0:23:51", "remaining_time": "5:27:55"}
|
||||
{"current_steps": 175, "total_steps": 2506, "loss": 0.3982, "lr": 2.772908366533865e-05, "epoch": 0.48951048951048953, "percentage": 6.98, "elapsed_time": "0:24:32", "remaining_time": "5:26:59"}
|
||||
{"current_steps": 180, "total_steps": 2506, "loss": 0.3773, "lr": 2.8525896414342632e-05, "epoch": 0.5034965034965035, "percentage": 7.18, "elapsed_time": "0:25:18", "remaining_time": "5:26:57"}
|
||||
{"current_steps": 185, "total_steps": 2506, "loss": 0.3861, "lr": 2.9322709163346615e-05, "epoch": 0.5174825174825175, "percentage": 7.38, "elapsed_time": "0:26:02", "remaining_time": "5:26:45"}
|
||||
{"current_steps": 190, "total_steps": 2506, "loss": 0.3855, "lr": 3.01195219123506e-05, "epoch": 0.5314685314685315, "percentage": 7.58, "elapsed_time": "0:26:41", "remaining_time": "5:25:19"}
|
||||
{"current_steps": 195, "total_steps": 2506, "loss": 0.3825, "lr": 3.0916334661354584e-05, "epoch": 0.5454545454545454, "percentage": 7.78, "elapsed_time": "0:27:23", "remaining_time": "5:24:32"}
|
||||
{"current_steps": 200, "total_steps": 2506, "loss": 0.3744, "lr": 3.1713147410358564e-05, "epoch": 0.5594405594405595, "percentage": 7.98, "elapsed_time": "0:28:03", "remaining_time": "5:23:34"}
|
||||
{"current_steps": 205, "total_steps": 2506, "loss": 0.3894, "lr": 3.250996015936256e-05, "epoch": 0.5734265734265734, "percentage": 8.18, "elapsed_time": "0:28:49", "remaining_time": "5:23:32"}
|
||||
{"current_steps": 210, "total_steps": 2506, "loss": 0.3757, "lr": 3.3306772908366536e-05, "epoch": 0.5874125874125874, "percentage": 8.38, "elapsed_time": "0:29:29", "remaining_time": "5:22:27"}
|
||||
{"current_steps": 215, "total_steps": 2506, "loss": 0.371, "lr": 3.410358565737052e-05, "epoch": 0.6013986013986014, "percentage": 8.58, "elapsed_time": "0:30:09", "remaining_time": "5:21:25"}
|
||||
{"current_steps": 220, "total_steps": 2506, "loss": 0.3841, "lr": 3.49003984063745e-05, "epoch": 0.6153846153846154, "percentage": 8.78, "elapsed_time": "0:30:50", "remaining_time": "5:20:29"}
|
||||
{"current_steps": 225, "total_steps": 2506, "loss": 0.3705, "lr": 3.569721115537849e-05, "epoch": 0.6293706293706294, "percentage": 8.98, "elapsed_time": "0:31:31", "remaining_time": "5:19:38"}
|
||||
{"current_steps": 230, "total_steps": 2506, "loss": 0.3643, "lr": 3.6494023904382475e-05, "epoch": 0.6433566433566433, "percentage": 9.18, "elapsed_time": "0:32:08", "remaining_time": "5:18:00"}
|
||||
{"current_steps": 235, "total_steps": 2506, "loss": 0.3619, "lr": 3.7290836653386454e-05, "epoch": 0.6573426573426573, "percentage": 9.38, "elapsed_time": "0:32:45", "remaining_time": "5:16:37"}
|
||||
{"current_steps": 240, "total_steps": 2506, "loss": 0.3613, "lr": 3.808764940239044e-05, "epoch": 0.6713286713286714, "percentage": 9.58, "elapsed_time": "0:33:28", "remaining_time": "5:16:01"}
|
||||
{"current_steps": 245, "total_steps": 2506, "loss": 0.356, "lr": 3.8884462151394427e-05, "epoch": 0.6853146853146853, "percentage": 9.78, "elapsed_time": "0:34:07", "remaining_time": "5:14:54"}
|
||||
{"current_steps": 250, "total_steps": 2506, "loss": 0.3674, "lr": 3.968127490039841e-05, "epoch": 0.6993006993006993, "percentage": 9.98, "elapsed_time": "0:34:43", "remaining_time": "5:13:23"}
|
||||
{"current_steps": 255, "total_steps": 2506, "loss": 0.3586, "lr": 3.999982531784936e-05, "epoch": 0.7132867132867133, "percentage": 10.18, "elapsed_time": "0:35:25", "remaining_time": "5:12:43"}
|
||||
{"current_steps": 260, "total_steps": 2506, "loss": 0.3638, "lr": 3.9998757826867935e-05, "epoch": 0.7272727272727273, "percentage": 10.38, "elapsed_time": "0:36:04", "remaining_time": "5:11:40"}
|
||||
{"current_steps": 265, "total_steps": 2506, "loss": 0.3542, "lr": 3.9996719942279066e-05, "epoch": 0.7412587412587412, "percentage": 10.57, "elapsed_time": "0:36:47", "remaining_time": "5:11:09"}
|
||||
{"current_steps": 270, "total_steps": 2506, "loss": 0.3611, "lr": 3.999371176296642e-05, "epoch": 0.7552447552447552, "percentage": 10.77, "elapsed_time": "0:37:25", "remaining_time": "5:09:54"}
|
||||
{"current_steps": 275, "total_steps": 2506, "loss": 0.3623, "lr": 3.998973343489495e-05, "epoch": 0.7692307692307693, "percentage": 10.97, "elapsed_time": "0:38:06", "remaining_time": "5:09:07"}
|
||||
{"current_steps": 280, "total_steps": 2506, "loss": 0.352, "lr": 3.998478515110385e-05, "epoch": 0.7832167832167832, "percentage": 11.17, "elapsed_time": "0:38:45", "remaining_time": "5:08:09"}
|
||||
{"current_steps": 285, "total_steps": 2506, "loss": 0.3534, "lr": 3.99788671516972e-05, "epoch": 0.7972027972027972, "percentage": 11.37, "elapsed_time": "0:39:24", "remaining_time": "5:07:09"}
|
||||
{"current_steps": 290, "total_steps": 2506, "loss": 0.3558, "lr": 3.9971979723832254e-05, "epoch": 0.8111888111888111, "percentage": 11.57, "elapsed_time": "0:40:02", "remaining_time": "5:05:59"}
|
||||
{"current_steps": 295, "total_steps": 2506, "loss": 0.3524, "lr": 3.9964123201705584e-05, "epoch": 0.8251748251748252, "percentage": 11.77, "elapsed_time": "0:40:43", "remaining_time": "5:05:16"}
|
||||
{"current_steps": 300, "total_steps": 2506, "loss": 0.3553, "lr": 3.995529796653679e-05, "epoch": 0.8391608391608392, "percentage": 11.97, "elapsed_time": "0:41:24", "remaining_time": "5:04:29"}
|
||||
{"current_steps": 305, "total_steps": 2506, "loss": 0.3641, "lr": 3.9945504446550074e-05, "epoch": 0.8531468531468531, "percentage": 12.17, "elapsed_time": "0:42:05", "remaining_time": "5:03:48"}
|
||||
{"current_steps": 310, "total_steps": 2506, "loss": 0.3461, "lr": 3.99347431169534e-05, "epoch": 0.8671328671328671, "percentage": 12.37, "elapsed_time": "0:42:49", "remaining_time": "5:03:24"}
|
||||
{"current_steps": 315, "total_steps": 2506, "loss": 0.3673, "lr": 3.992301449991548e-05, "epoch": 0.8811188811188811, "percentage": 12.57, "elapsed_time": "0:43:25", "remaining_time": "5:02:05"}
|
||||
{"current_steps": 320, "total_steps": 2506, "loss": 0.3366, "lr": 3.991031916454041e-05, "epoch": 0.8951048951048951, "percentage": 12.77, "elapsed_time": "0:44:09", "remaining_time": "5:01:39"}
|
||||
{"current_steps": 325, "total_steps": 2506, "loss": 0.3495, "lr": 3.989665772684006e-05, "epoch": 0.9090909090909091, "percentage": 12.97, "elapsed_time": "0:44:48", "remaining_time": "5:00:43"}
|
||||
{"current_steps": 330, "total_steps": 2506, "loss": 0.3434, "lr": 3.988203084970418e-05, "epoch": 0.9230769230769231, "percentage": 13.17, "elapsed_time": "0:45:31", "remaining_time": "5:00:14"}
|
||||
{"current_steps": 335, "total_steps": 2506, "loss": 0.3442, "lr": 3.9866439242868275e-05, "epoch": 0.9370629370629371, "percentage": 13.37, "elapsed_time": "0:46:09", "remaining_time": "4:59:09"}
|
||||
{"current_steps": 340, "total_steps": 2506, "loss": 0.3365, "lr": 3.98498836628791e-05, "epoch": 0.951048951048951, "percentage": 13.57, "elapsed_time": "0:46:51", "remaining_time": "4:58:28"}
|
||||
{"current_steps": 345, "total_steps": 2506, "loss": 0.3389, "lr": 3.983236491305801e-05, "epoch": 0.965034965034965, "percentage": 13.77, "elapsed_time": "0:47:27", "remaining_time": "4:57:17"}
|
||||
{"current_steps": 350, "total_steps": 2506, "loss": 0.3478, "lr": 3.981388384346193e-05, "epoch": 0.9790209790209791, "percentage": 13.97, "elapsed_time": "0:48:12", "remaining_time": "4:56:58"}
|
||||
{"current_steps": 355, "total_steps": 2506, "loss": 0.3346, "lr": 3.979444135084215e-05, "epoch": 0.993006993006993, "percentage": 14.17, "elapsed_time": "0:48:49", "remaining_time": "4:55:51"}
|
||||
{"current_steps": 360, "total_steps": 2506, "loss": 0.3392, "lr": 3.9774038378600796e-05, "epoch": 1.0055944055944055, "percentage": 14.37, "elapsed_time": "0:49:26", "remaining_time": "4:54:43"}
|
||||
{"current_steps": 365, "total_steps": 2506, "loss": 0.3389, "lr": 3.975267591674504e-05, "epoch": 1.0195804195804197, "percentage": 14.57, "elapsed_time": "0:50:08", "remaining_time": "4:54:06"}
|
||||
{"current_steps": 370, "total_steps": 2506, "loss": 0.3425, "lr": 3.973035500183909e-05, "epoch": 1.0335664335664336, "percentage": 14.76, "elapsed_time": "0:50:45", "remaining_time": "4:53:01"}
|
||||
{"current_steps": 375, "total_steps": 2506, "loss": 0.3303, "lr": 3.9707076716953866e-05, "epoch": 1.0475524475524476, "percentage": 14.96, "elapsed_time": "0:51:27", "remaining_time": "4:52:24"}
|
||||
{"current_steps": 380, "total_steps": 2506, "loss": 0.3413, "lr": 3.9682842191614466e-05, "epoch": 1.0615384615384615, "percentage": 15.16, "elapsed_time": "0:52:08", "remaining_time": "4:51:42"}
|
||||
{"current_steps": 385, "total_steps": 2506, "loss": 0.3344, "lr": 3.965765260174534e-05, "epoch": 1.0755244755244755, "percentage": 15.36, "elapsed_time": "0:52:46", "remaining_time": "4:50:43"}
|
||||
{"current_steps": 390, "total_steps": 2506, "loss": 0.3403, "lr": 3.9631509169613265e-05, "epoch": 1.0895104895104895, "percentage": 15.56, "elapsed_time": "0:53:22", "remaining_time": "4:49:33"}
|
||||
{"current_steps": 395, "total_steps": 2506, "loss": 0.3338, "lr": 3.9604413163767985e-05, "epoch": 1.1034965034965034, "percentage": 15.76, "elapsed_time": "0:54:03", "remaining_time": "4:48:53"}
|
||||
{"current_steps": 400, "total_steps": 2506, "loss": 0.3249, "lr": 3.957636589898072e-05, "epoch": 1.1174825174825176, "percentage": 15.96, "elapsed_time": "0:54:45", "remaining_time": "4:48:16"}
|
||||
{"current_steps": 405, "total_steps": 2506, "loss": 0.327, "lr": 3.95473687361803e-05, "epoch": 1.1314685314685315, "percentage": 16.16, "elapsed_time": "0:55:25", "remaining_time": "4:47:28"}
|
||||
{"current_steps": 410, "total_steps": 2506, "loss": 0.3364, "lr": 3.951742308238719e-05, "epoch": 1.1454545454545455, "percentage": 16.36, "elapsed_time": "0:56:07", "remaining_time": "4:46:54"}
|
||||
{"current_steps": 415, "total_steps": 2506, "loss": 0.3255, "lr": 3.948653039064519e-05, "epoch": 1.1594405594405595, "percentage": 16.56, "elapsed_time": "0:56:50", "remaining_time": "4:46:21"}
|
||||
{"current_steps": 420, "total_steps": 2506, "loss": 0.3275, "lr": 3.9454692159950935e-05, "epoch": 1.1734265734265734, "percentage": 16.76, "elapsed_time": "0:57:30", "remaining_time": "4:45:38"}
|
||||
{"current_steps": 425, "total_steps": 2506, "loss": 0.3274, "lr": 3.9421909935181146e-05, "epoch": 1.1874125874125874, "percentage": 16.96, "elapsed_time": "0:58:11", "remaining_time": "4:44:55"}
|
||||
{"current_steps": 430, "total_steps": 2506, "loss": 0.3322, "lr": 3.938818530701768e-05, "epoch": 1.2013986013986013, "percentage": 17.16, "elapsed_time": "0:58:53", "remaining_time": "4:44:18"}
|
||||
{"current_steps": 435, "total_steps": 2506, "loss": 0.3253, "lr": 3.935351991187035e-05, "epoch": 1.2153846153846155, "percentage": 17.36, "elapsed_time": "0:59:32", "remaining_time": "4:43:28"}
|
||||
{"current_steps": 440, "total_steps": 2506, "loss": 0.3276, "lr": 3.9317915431797535e-05, "epoch": 1.2293706293706295, "percentage": 17.56, "elapsed_time": "1:00:13", "remaining_time": "4:42:46"}
|
||||
{"current_steps": 445, "total_steps": 2506, "loss": 0.3276, "lr": 3.928137359442452e-05, "epoch": 1.2433566433566434, "percentage": 17.76, "elapsed_time": "1:00:55", "remaining_time": "4:42:11"}
|
||||
{"current_steps": 450, "total_steps": 2506, "loss": 0.3328, "lr": 3.924389617285969e-05, "epoch": 1.2573426573426574, "percentage": 17.96, "elapsed_time": "1:01:33", "remaining_time": "4:41:13"}
|
||||
{"current_steps": 455, "total_steps": 2506, "loss": 0.3278, "lr": 3.920548498560852e-05, "epoch": 1.2713286713286713, "percentage": 18.16, "elapsed_time": "1:02:13", "remaining_time": "4:40:28"}
|
||||
{"current_steps": 460, "total_steps": 2506, "loss": 0.3256, "lr": 3.9166141896485295e-05, "epoch": 1.2853146853146853, "percentage": 18.36, "elapsed_time": "1:02:50", "remaining_time": "4:39:31"}
|
||||
{"current_steps": 465, "total_steps": 2506, "loss": 0.319, "lr": 3.912586881452268e-05, "epoch": 1.2993006993006992, "percentage": 18.56, "elapsed_time": "1:03:29", "remaining_time": "4:38:40"}
|
||||
{"current_steps": 470, "total_steps": 2506, "loss": 0.3269, "lr": 3.9084667693879116e-05, "epoch": 1.3132867132867134, "percentage": 18.75, "elapsed_time": "1:04:10", "remaining_time": "4:38:00"}
|
||||
{"current_steps": 475, "total_steps": 2506, "loss": 0.3263, "lr": 3.904254053374398e-05, "epoch": 1.3272727272727272, "percentage": 18.95, "elapsed_time": "1:04:53", "remaining_time": "4:37:27"}
|
||||
{"current_steps": 480, "total_steps": 2506, "loss": 0.3255, "lr": 3.899948937824058e-05, "epoch": 1.3412587412587413, "percentage": 19.15, "elapsed_time": "1:05:35", "remaining_time": "4:36:49"}
|
||||
{"current_steps": 485, "total_steps": 2506, "loss": 0.3298, "lr": 3.895551631632694e-05, "epoch": 1.3552447552447553, "percentage": 19.35, "elapsed_time": "1:06:15", "remaining_time": "4:36:06"}
|
||||
{"current_steps": 490, "total_steps": 2506, "loss": 0.3334, "lr": 3.8910623481694514e-05, "epoch": 1.3692307692307693, "percentage": 19.55, "elapsed_time": "1:06:54", "remaining_time": "4:35:17"}
|
||||
{"current_steps": 495, "total_steps": 2506, "loss": 0.3249, "lr": 3.886481305266456e-05, "epoch": 1.3832167832167832, "percentage": 19.75, "elapsed_time": "1:07:36", "remaining_time": "4:34:40"}
|
||||
{"current_steps": 500, "total_steps": 2506, "loss": 0.33, "lr": 3.881808725208253e-05, "epoch": 1.3972027972027972, "percentage": 19.95, "elapsed_time": "1:08:18", "remaining_time": "4:34:04"}
|
||||
{"current_steps": 505, "total_steps": 2506, "loss": 0.3352, "lr": 3.8770448347210144e-05, "epoch": 1.4111888111888111, "percentage": 20.15, "elapsed_time": "1:08:55", "remaining_time": "4:33:05"}
|
||||
{"current_steps": 510, "total_steps": 2506, "loss": 0.3196, "lr": 3.87218986496154e-05, "epoch": 1.425174825174825, "percentage": 20.35, "elapsed_time": "1:09:32", "remaining_time": "4:32:11"}
|
||||
{"current_steps": 515, "total_steps": 2506, "loss": 0.3244, "lr": 3.867244051506042e-05, "epoch": 1.4391608391608393, "percentage": 20.55, "elapsed_time": "1:10:12", "remaining_time": "4:31:23"}
|
||||
{"current_steps": 520, "total_steps": 2506, "loss": 0.3285, "lr": 3.862207634338715e-05, "epoch": 1.4531468531468532, "percentage": 20.75, "elapsed_time": "1:10:52", "remaining_time": "4:30:39"}
|
||||
{"current_steps": 525, "total_steps": 2506, "loss": 0.3234, "lr": 3.857080857840087e-05, "epoch": 1.4671328671328672, "percentage": 20.95, "elapsed_time": "1:11:27", "remaining_time": "4:29:38"}
|
||||
{"current_steps": 530, "total_steps": 2506, "loss": 0.3267, "lr": 3.851863970775166e-05, "epoch": 1.4811188811188811, "percentage": 21.15, "elapsed_time": "1:12:05", "remaining_time": "4:28:48"}
|
||||
{"current_steps": 535, "total_steps": 2506, "loss": 0.3293, "lr": 3.846557226281367e-05, "epoch": 1.495104895104895, "percentage": 21.35, "elapsed_time": "1:12:51", "remaining_time": "4:28:26"}
|
||||
{"current_steps": 540, "total_steps": 2506, "loss": 0.3271, "lr": 3.84116088185623e-05, "epoch": 1.509090909090909, "percentage": 21.55, "elapsed_time": "1:13:29", "remaining_time": "4:27:32"}
|
||||
{"current_steps": 545, "total_steps": 2506, "loss": 0.3202, "lr": 3.835675199344923e-05, "epoch": 1.523076923076923, "percentage": 21.75, "elapsed_time": "1:14:08", "remaining_time": "4:26:45"}
|
||||
{"current_steps": 550, "total_steps": 2506, "loss": 0.3184, "lr": 3.830100444927542e-05, "epoch": 1.5370629370629372, "percentage": 21.95, "elapsed_time": "1:14:54", "remaining_time": "4:26:22"}
|
||||
{"current_steps": 555, "total_steps": 2506, "loss": 0.3118, "lr": 3.8244368891061884e-05, "epoch": 1.551048951048951, "percentage": 22.15, "elapsed_time": "1:15:33", "remaining_time": "4:25:37"}
|
||||
{"current_steps": 560, "total_steps": 2506, "loss": 0.3229, "lr": 3.81868480669185e-05, "epoch": 1.565034965034965, "percentage": 22.35, "elapsed_time": "1:16:10", "remaining_time": "4:24:42"}
|
||||
{"current_steps": 565, "total_steps": 2506, "loss": 0.3255, "lr": 3.812844476791061e-05, "epoch": 1.579020979020979, "percentage": 22.55, "elapsed_time": "1:16:45", "remaining_time": "4:23:43"}
|
||||
{"current_steps": 570, "total_steps": 2506, "loss": 0.3258, "lr": 3.8069161827923624e-05, "epoch": 1.593006993006993, "percentage": 22.75, "elapsed_time": "1:17:23", "remaining_time": "4:22:50"}
|
||||
{"current_steps": 575, "total_steps": 2506, "loss": 0.3301, "lr": 3.80090021235255e-05, "epoch": 1.606993006993007, "percentage": 22.94, "elapsed_time": "1:18:01", "remaining_time": "4:22:00"}
|
||||
{"current_steps": 580, "total_steps": 2506, "loss": 0.3256, "lr": 3.794796857382717e-05, "epoch": 1.620979020979021, "percentage": 23.14, "elapsed_time": "1:18:40", "remaining_time": "4:21:14"}
|
||||
{"current_steps": 585, "total_steps": 2506, "loss": 0.3277, "lr": 3.7886064140340896e-05, "epoch": 1.634965034965035, "percentage": 23.34, "elapsed_time": "1:19:18", "remaining_time": "4:20:27"}
|
||||
{"current_steps": 590, "total_steps": 2506, "loss": 0.3154, "lr": 3.782329182683657e-05, "epoch": 1.6489510489510488, "percentage": 23.54, "elapsed_time": "1:19:57", "remaining_time": "4:19:38"}
|
||||
{"current_steps": 595, "total_steps": 2506, "loss": 0.3219, "lr": 3.775965467919594e-05, "epoch": 1.662937062937063, "percentage": 23.74, "elapsed_time": "1:20:36", "remaining_time": "4:18:52"}
|
||||
{"current_steps": 600, "total_steps": 2506, "loss": 0.3237, "lr": 3.769515578526486e-05, "epoch": 1.676923076923077, "percentage": 23.94, "elapsed_time": "1:21:18", "remaining_time": "4:18:16"}
|
||||
{"current_steps": 605, "total_steps": 2506, "loss": 0.3209, "lr": 3.762979827470343e-05, "epoch": 1.690909090909091, "percentage": 24.14, "elapsed_time": "1:21:58", "remaining_time": "4:17:33"}
|
||||
{"current_steps": 610, "total_steps": 2506, "loss": 0.3158, "lr": 3.756358531883413e-05, "epoch": 1.7048951048951049, "percentage": 24.34, "elapsed_time": "1:22:36", "remaining_time": "4:16:47"}
|
||||
{"current_steps": 615, "total_steps": 2506, "loss": 0.3248, "lr": 3.749652013048797e-05, "epoch": 1.7188811188811188, "percentage": 24.54, "elapsed_time": "1:23:19", "remaining_time": "4:16:11"}
|
||||
{"current_steps": 620, "total_steps": 2506, "loss": 0.3259, "lr": 3.742860596384856e-05, "epoch": 1.732867132867133, "percentage": 24.74, "elapsed_time": "1:23:54", "remaining_time": "4:15:15"}
|
||||
{"current_steps": 625, "total_steps": 2506, "loss": 0.3144, "lr": 3.735984611429423e-05, "epoch": 1.7468531468531467, "percentage": 24.94, "elapsed_time": "1:24:29", "remaining_time": "4:14:17"}
|
||||
{"current_steps": 630, "total_steps": 2506, "loss": 0.3227, "lr": 3.7290243918238117e-05, "epoch": 1.760839160839161, "percentage": 25.14, "elapsed_time": "1:25:03", "remaining_time": "4:13:17"}
|
||||
{"current_steps": 635, "total_steps": 2506, "loss": 0.3298, "lr": 3.72198027529663e-05, "epoch": 1.7748251748251749, "percentage": 25.34, "elapsed_time": "1:25:45", "remaining_time": "4:12:41"}
|
||||
{"current_steps": 640, "total_steps": 2506, "loss": 0.3252, "lr": 3.714852603647387e-05, "epoch": 1.7888111888111888, "percentage": 25.54, "elapsed_time": "1:26:23", "remaining_time": "4:11:54"}
|
||||
{"current_steps": 645, "total_steps": 2506, "loss": 0.3218, "lr": 3.707641722729915e-05, "epoch": 1.8027972027972028, "percentage": 25.74, "elapsed_time": "1:27:06", "remaining_time": "4:11:20"}
|
||||
{"current_steps": 650, "total_steps": 2506, "loss": 0.3166, "lr": 3.700347982435583e-05, "epoch": 1.8167832167832167, "percentage": 25.94, "elapsed_time": "1:27:52", "remaining_time": "4:10:55"}
|
||||
{"current_steps": 655, "total_steps": 2506, "loss": 0.3233, "lr": 3.6929717366763186e-05, "epoch": 1.830769230769231, "percentage": 26.14, "elapsed_time": "1:28:30", "remaining_time": "4:10:05"}
|
||||
{"current_steps": 660, "total_steps": 2506, "loss": 0.3106, "lr": 3.685513343367438e-05, "epoch": 1.8447552447552447, "percentage": 26.34, "elapsed_time": "1:29:09", "remaining_time": "4:09:21"}
|
||||
{"current_steps": 665, "total_steps": 2506, "loss": 0.3192, "lr": 3.677973164410278e-05, "epoch": 1.8587412587412588, "percentage": 26.54, "elapsed_time": "1:29:47", "remaining_time": "4:08:35"}
|
||||
{"current_steps": 670, "total_steps": 2506, "loss": 0.3164, "lr": 3.6703515656746365e-05, "epoch": 1.8727272727272726, "percentage": 26.74, "elapsed_time": "1:30:24", "remaining_time": "4:07:45"}
|
||||
{"current_steps": 675, "total_steps": 2506, "loss": 0.3156, "lr": 3.662648916981015e-05, "epoch": 1.8867132867132868, "percentage": 26.94, "elapsed_time": "1:31:02", "remaining_time": "4:06:58"}
|
||||
{"current_steps": 680, "total_steps": 2506, "loss": 0.3198, "lr": 3.654865592082681e-05, "epoch": 1.9006993006993007, "percentage": 27.13, "elapsed_time": "1:31:41", "remaining_time": "4:06:12"}
|
||||
{"current_steps": 685, "total_steps": 2506, "loss": 0.3144, "lr": 3.647001968647527e-05, "epoch": 1.9146853146853147, "percentage": 27.33, "elapsed_time": "1:32:22", "remaining_time": "4:05:34"}
|
||||
{"current_steps": 690, "total_steps": 2506, "loss": 0.3216, "lr": 3.6390584282397464e-05, "epoch": 1.9286713286713286, "percentage": 27.53, "elapsed_time": "1:33:01", "remaining_time": "4:04:48"}
|
||||
{"current_steps": 695, "total_steps": 2506, "loss": 0.3267, "lr": 3.631035356301321e-05, "epoch": 1.9426573426573426, "percentage": 27.73, "elapsed_time": "1:33:37", "remaining_time": "4:03:56"}
|
||||
{"current_steps": 700, "total_steps": 2506, "loss": 0.317, "lr": 3.6229331421333155e-05, "epoch": 1.9566433566433568, "percentage": 27.93, "elapsed_time": "1:34:20", "remaining_time": "4:03:25"}
|
||||
{"current_steps": 705, "total_steps": 2506, "loss": 0.3146, "lr": 3.6147521788769884e-05, "epoch": 1.9706293706293705, "percentage": 28.13, "elapsed_time": "1:35:00", "remaining_time": "4:02:43"}
|
||||
{"current_steps": 710, "total_steps": 2506, "loss": 0.3136, "lr": 3.606492863494718e-05, "epoch": 1.9846153846153847, "percentage": 28.33, "elapsed_time": "1:35:38", "remaining_time": "4:01:56"}
|
||||
{"current_steps": 715, "total_steps": 2506, "loss": 0.3158, "lr": 3.598155596750736e-05, "epoch": 1.9986013986013986, "percentage": 28.53, "elapsed_time": "1:36:19", "remaining_time": "4:01:17"}
|
||||
{"current_steps": 720, "total_steps": 2506, "loss": 0.3132, "lr": 3.589740783191688e-05, "epoch": 2.011188811188811, "percentage": 28.73, "elapsed_time": "1:36:55", "remaining_time": "4:00:24"}
|
||||
{"current_steps": 725, "total_steps": 2506, "loss": 0.3026, "lr": 3.581248831126996e-05, "epoch": 2.025174825174825, "percentage": 28.93, "elapsed_time": "1:37:32", "remaining_time": "3:59:36"}
|
||||
{"current_steps": 730, "total_steps": 2506, "loss": 0.3088, "lr": 3.572680152609053e-05, "epoch": 2.0391608391608393, "percentage": 29.13, "elapsed_time": "1:38:07", "remaining_time": "3:58:42"}
|
||||
{"current_steps": 735, "total_steps": 2506, "loss": 0.3129, "lr": 3.564035163413225e-05, "epoch": 2.053146853146853, "percentage": 29.33, "elapsed_time": "1:38:46", "remaining_time": "3:57:59"}
|
||||
{"current_steps": 740, "total_steps": 2506, "loss": 0.3041, "lr": 3.555314283017677e-05, "epoch": 2.0671328671328673, "percentage": 29.53, "elapsed_time": "1:39:24", "remaining_time": "3:57:15"}
|
||||
{"current_steps": 745, "total_steps": 2506, "loss": 0.3097, "lr": 3.546517934583021e-05, "epoch": 2.081118881118881, "percentage": 29.73, "elapsed_time": "1:40:04", "remaining_time": "3:56:33"}
|
||||
{"current_steps": 750, "total_steps": 2506, "loss": 0.3057, "lr": 3.5376465449317816e-05, "epoch": 2.095104895104895, "percentage": 29.93, "elapsed_time": "1:40:44", "remaining_time": "3:55:52"}
|
||||
{"current_steps": 755, "total_steps": 2506, "loss": 0.2998, "lr": 3.5287005445276835e-05, "epoch": 2.109090909090909, "percentage": 30.13, "elapsed_time": "1:41:28", "remaining_time": "3:55:19"}
|
||||
{"current_steps": 760, "total_steps": 2506, "loss": 0.2969, "lr": 3.5196803674547674e-05, "epoch": 2.123076923076923, "percentage": 30.33, "elapsed_time": "1:42:04", "remaining_time": "3:54:31"}
|
||||
{"current_steps": 765, "total_steps": 2506, "loss": 0.3083, "lr": 3.510586451396326e-05, "epoch": 2.1370629370629373, "percentage": 30.53, "elapsed_time": "1:42:42", "remaining_time": "3:53:44"}
|
||||
{"current_steps": 770, "total_steps": 2506, "loss": 0.3125, "lr": 3.5014192376136655e-05, "epoch": 2.151048951048951, "percentage": 30.73, "elapsed_time": "1:43:24", "remaining_time": "3:53:08"}
|
||||
{"current_steps": 775, "total_steps": 2506, "loss": 0.3028, "lr": 3.492179170924696e-05, "epoch": 2.165034965034965, "percentage": 30.93, "elapsed_time": "1:44:02", "remaining_time": "3:52:22"}
|
||||
{"current_steps": 780, "total_steps": 2506, "loss": 0.3072, "lr": 3.482866699682347e-05, "epoch": 2.179020979020979, "percentage": 31.13, "elapsed_time": "1:44:46", "remaining_time": "3:51:51"}
|
||||
{"current_steps": 785, "total_steps": 2506, "loss": 0.3084, "lr": 3.47348227575281e-05, "epoch": 2.193006993006993, "percentage": 31.32, "elapsed_time": "1:45:26", "remaining_time": "3:51:09"}
|
||||
{"current_steps": 790, "total_steps": 2506, "loss": 0.3042, "lr": 3.464026354493617e-05, "epoch": 2.206993006993007, "percentage": 31.52, "elapsed_time": "1:46:05", "remaining_time": "3:50:26"}
|
||||
{"current_steps": 795, "total_steps": 2506, "loss": 0.3008, "lr": 3.454499394731543e-05, "epoch": 2.220979020979021, "percentage": 31.72, "elapsed_time": "1:46:44", "remaining_time": "3:49:42"}
|
||||
{"current_steps": 800, "total_steps": 2506, "loss": 0.3169, "lr": 3.4449018587403414e-05, "epoch": 2.234965034965035, "percentage": 31.92, "elapsed_time": "1:47:20", "remaining_time": "3:48:54"}
|
||||
{"current_steps": 805, "total_steps": 2506, "loss": 0.301, "lr": 3.435234212218313e-05, "epoch": 2.248951048951049, "percentage": 32.12, "elapsed_time": "1:47:57", "remaining_time": "3:48:08"}
|
||||
{"current_steps": 810, "total_steps": 2506, "loss": 0.3069, "lr": 3.425496924265714e-05, "epoch": 2.262937062937063, "percentage": 32.32, "elapsed_time": "1:48:37", "remaining_time": "3:47:26"}
|
||||
{"current_steps": 815, "total_steps": 2506, "loss": 0.3061, "lr": 3.415690467361989e-05, "epoch": 2.276923076923077, "percentage": 32.52, "elapsed_time": "1:49:12", "remaining_time": "3:46:34"}
|
||||
{"current_steps": 820, "total_steps": 2506, "loss": 0.3064, "lr": 3.405815317342844e-05, "epoch": 2.290909090909091, "percentage": 32.72, "elapsed_time": "1:49:53", "remaining_time": "3:45:56"}
|
||||
{"current_steps": 825, "total_steps": 2506, "loss": 0.2918, "lr": 3.395871953377164e-05, "epoch": 2.3048951048951047, "percentage": 32.92, "elapsed_time": "1:50:37", "remaining_time": "3:45:24"}
|
||||
{"current_steps": 830, "total_steps": 2506, "loss": 0.3025, "lr": 3.3858608579437556e-05, "epoch": 2.318881118881119, "percentage": 33.12, "elapsed_time": "1:51:14", "remaining_time": "3:44:38"}
|
||||
{"current_steps": 835, "total_steps": 2506, "loss": 0.3034, "lr": 3.3757825168079396e-05, "epoch": 2.3328671328671327, "percentage": 33.32, "elapsed_time": "1:51:55", "remaining_time": "3:43:59"}
|
||||
{"current_steps": 840, "total_steps": 2506, "loss": 0.3054, "lr": 3.365637418997981e-05, "epoch": 2.346853146853147, "percentage": 33.52, "elapsed_time": "1:52:33", "remaining_time": "3:43:13"}
|
||||
{"current_steps": 845, "total_steps": 2506, "loss": 0.2999, "lr": 3.3554260567813546e-05, "epoch": 2.360839160839161, "percentage": 33.72, "elapsed_time": "1:53:14", "remaining_time": "3:42:36"}
|
||||
{"current_steps": 850, "total_steps": 2506, "loss": 0.3011, "lr": 3.3451489256408664e-05, "epoch": 2.3748251748251747, "percentage": 33.92, "elapsed_time": "1:53:53", "remaining_time": "3:41:53"}
|
||||
{"current_steps": 855, "total_steps": 2506, "loss": 0.3045, "lr": 3.3348065242506066e-05, "epoch": 2.388811188811189, "percentage": 34.12, "elapsed_time": "1:54:36", "remaining_time": "3:41:18"}
|
||||
{"current_steps": 860, "total_steps": 2506, "loss": 0.3014, "lr": 3.3243993544517525e-05, "epoch": 2.4027972027972027, "percentage": 34.32, "elapsed_time": "1:55:16", "remaining_time": "3:40:38"}
|
||||
{"current_steps": 865, "total_steps": 2506, "loss": 0.304, "lr": 3.313927921228221e-05, "epoch": 2.416783216783217, "percentage": 34.52, "elapsed_time": "1:55:55", "remaining_time": "3:39:55"}
|
||||
{"current_steps": 870, "total_steps": 2506, "loss": 0.3004, "lr": 3.303392732682163e-05, "epoch": 2.430769230769231, "percentage": 34.72, "elapsed_time": "1:56:32", "remaining_time": "3:39:09"}
|
||||
{"current_steps": 875, "total_steps": 2506, "loss": 0.3092, "lr": 3.292794300009309e-05, "epoch": 2.4447552447552447, "percentage": 34.92, "elapsed_time": "1:57:09", "remaining_time": "3:38:23"}
|
||||
{"current_steps": 880, "total_steps": 2506, "loss": 0.3091, "lr": 3.282133137474164e-05, "epoch": 2.458741258741259, "percentage": 35.12, "elapsed_time": "1:57:53", "remaining_time": "3:37:49"}
|
||||
{"current_steps": 885, "total_steps": 2506, "loss": 0.2987, "lr": 3.271409762385057e-05, "epoch": 2.4727272727272727, "percentage": 35.32, "elapsed_time": "1:58:32", "remaining_time": "3:37:08"}
|
||||
{"current_steps": 890, "total_steps": 2506, "loss": 0.3052, "lr": 3.2606246950690365e-05, "epoch": 2.486713286713287, "percentage": 35.51, "elapsed_time": "1:59:13", "remaining_time": "3:36:28"}
|
||||
{"current_steps": 895, "total_steps": 2506, "loss": 0.3095, "lr": 3.2497784588466235e-05, "epoch": 2.5006993006993006, "percentage": 35.71, "elapsed_time": "1:59:49", "remaining_time": "3:35:41"}
|
||||
{"current_steps": 900, "total_steps": 2506, "loss": 0.3064, "lr": 3.23887158000642e-05, "epoch": 2.5146853146853148, "percentage": 35.91, "elapsed_time": "2:00:30", "remaining_time": "3:35:01"}
|
||||
{"current_steps": 905, "total_steps": 2506, "loss": 0.3007, "lr": 3.2279045877795724e-05, "epoch": 2.5286713286713285, "percentage": 36.11, "elapsed_time": "2:01:08", "remaining_time": "3:34:17"}
|
||||
{"current_steps": 910, "total_steps": 2506, "loss": 0.3016, "lr": 3.216878014314088e-05, "epoch": 2.5426573426573427, "percentage": 36.31, "elapsed_time": "2:01:48", "remaining_time": "3:33:37"}
|
||||
{"current_steps": 915, "total_steps": 2506, "loss": 0.2981, "lr": 3.205792394649017e-05, "epoch": 2.556643356643357, "percentage": 36.51, "elapsed_time": "2:02:26", "remaining_time": "3:32:54"}
|
||||
{"current_steps": 920, "total_steps": 2506, "loss": 0.3028, "lr": 3.194648266688492e-05, "epoch": 2.5706293706293706, "percentage": 36.71, "elapsed_time": "2:03:03", "remaining_time": "3:32:08"}
|
||||
{"current_steps": 925, "total_steps": 2506, "loss": 0.3003, "lr": 3.183446171175623e-05, "epoch": 2.5846153846153848, "percentage": 36.91, "elapsed_time": "2:03:39", "remaining_time": "3:31:22"}
|
||||
{"current_steps": 930, "total_steps": 2506, "loss": 0.296, "lr": 3.1721866516662646e-05, "epoch": 2.5986013986013985, "percentage": 37.11, "elapsed_time": "2:04:18", "remaining_time": "3:30:40"}
|
||||
{"current_steps": 935, "total_steps": 2506, "loss": 0.3065, "lr": 3.160870254502637e-05, "epoch": 2.6125874125874127, "percentage": 37.31, "elapsed_time": "2:05:00", "remaining_time": "3:30:01"}
|
||||
{"current_steps": 940, "total_steps": 2506, "loss": 0.3006, "lr": 3.1494975287868166e-05, "epoch": 2.626573426573427, "percentage": 37.51, "elapsed_time": "2:05:45", "remaining_time": "3:29:30"}
|
||||
{"current_steps": 945, "total_steps": 2506, "loss": 0.298, "lr": 3.138069026354095e-05, "epoch": 2.6405594405594406, "percentage": 37.71, "elapsed_time": "2:06:27", "remaining_time": "3:28:53"}
|
||||
{"current_steps": 950, "total_steps": 2506, "loss": 0.2966, "lr": 3.1265853017461984e-05, "epoch": 2.6545454545454543, "percentage": 37.91, "elapsed_time": "2:07:12", "remaining_time": "3:28:20"}
|
||||
{"current_steps": 955, "total_steps": 2506, "loss": 0.3065, "lr": 3.115046912184382e-05, "epoch": 2.6685314685314685, "percentage": 38.11, "elapsed_time": "2:07:48", "remaining_time": "3:27:34"}
|
||||
{"current_steps": 960, "total_steps": 2506, "loss": 0.3064, "lr": 3.103454417542394e-05, "epoch": 2.6825174825174827, "percentage": 38.31, "elapsed_time": "2:08:29", "remaining_time": "3:26:55"}
|
||||
{"current_steps": 965, "total_steps": 2506, "loss": 0.3062, "lr": 3.091808380319305e-05, "epoch": 2.6965034965034964, "percentage": 38.51, "elapsed_time": "2:09:08", "remaining_time": "3:26:13"}
|
||||
{"current_steps": 970, "total_steps": 2506, "loss": 0.2989, "lr": 3.0801093656122136e-05, "epoch": 2.7104895104895106, "percentage": 38.71, "elapsed_time": "2:09:45", "remaining_time": "3:25:28"}
|
||||
{"current_steps": 975, "total_steps": 2506, "loss": 0.3048, "lr": 3.0683579410888345e-05, "epoch": 2.7244755244755243, "percentage": 38.91, "elapsed_time": "2:10:20", "remaining_time": "3:24:40"}
|
||||
{"current_steps": 980, "total_steps": 2506, "loss": 0.299, "lr": 3.056554676959942e-05, "epoch": 2.7384615384615385, "percentage": 39.11, "elapsed_time": "2:11:02", "remaining_time": "3:24:03"}
|
||||
{"current_steps": 985, "total_steps": 2506, "loss": 0.3031, "lr": 3.0447001459517117e-05, "epoch": 2.7524475524475527, "percentage": 39.31, "elapsed_time": "2:11:45", "remaining_time": "3:23:27"}
|
||||
{"current_steps": 990, "total_steps": 2506, "loss": 0.3043, "lr": 3.0327949232779242e-05, "epoch": 2.7664335664335664, "percentage": 39.51, "elapsed_time": "2:12:24", "remaining_time": "3:22:46"}
|
||||
{"current_steps": 995, "total_steps": 2506, "loss": 0.3034, "lr": 3.020839586612057e-05, "epoch": 2.78041958041958, "percentage": 39.7, "elapsed_time": "2:13:07", "remaining_time": "3:22:09"}
|
||||
{"current_steps": 1000, "total_steps": 2506, "loss": 0.2919, "lr": 3.0088347160592534e-05, "epoch": 2.7944055944055943, "percentage": 39.9, "elapsed_time": "2:13:47", "remaining_time": "3:21:30"}
|
||||
{"current_steps": 1005, "total_steps": 2506, "loss": 0.3059, "lr": 2.996780894128174e-05, "epoch": 2.8083916083916085, "percentage": 40.1, "elapsed_time": "2:14:27", "remaining_time": "3:20:49"}
|
||||
{"current_steps": 1010, "total_steps": 2506, "loss": 0.3046, "lr": 2.9846787057027335e-05, "epoch": 2.8223776223776222, "percentage": 40.3, "elapsed_time": "2:15:10", "remaining_time": "3:20:13"}
|
||||
{"current_steps": 1015, "total_steps": 2506, "loss": 0.303, "lr": 2.972528738013717e-05, "epoch": 2.8363636363636364, "percentage": 40.5, "elapsed_time": "2:15:51", "remaining_time": "3:19:34"}
|
||||
{"current_steps": 1020, "total_steps": 2506, "loss": 0.2978, "lr": 2.960331580610291e-05, "epoch": 2.85034965034965, "percentage": 40.7, "elapsed_time": "2:16:28", "remaining_time": "3:18:50"}
|
||||
{"current_steps": 1025, "total_steps": 2506, "loss": 0.303, "lr": 2.9480878253313908e-05, "epoch": 2.8643356643356643, "percentage": 40.9, "elapsed_time": "2:17:07", "remaining_time": "3:18:07"}
|
||||
{"current_steps": 1030, "total_steps": 2506, "loss": 0.3073, "lr": 2.9357980662770082e-05, "epoch": 2.8783216783216785, "percentage": 41.1, "elapsed_time": "2:17:45", "remaining_time": "3:17:24"}
|
||||
{"current_steps": 1035, "total_steps": 2506, "loss": 0.2955, "lr": 2.923462899779363e-05, "epoch": 2.8923076923076922, "percentage": 41.3, "elapsed_time": "2:18:31", "remaining_time": "3:16:52"}
|
||||
{"current_steps": 1040, "total_steps": 2506, "loss": 0.2922, "lr": 2.9110829243739638e-05, "epoch": 2.9062937062937064, "percentage": 41.5, "elapsed_time": "2:19:13", "remaining_time": "3:16:15"}
|
||||
{"current_steps": 1045, "total_steps": 2506, "loss": 0.3146, "lr": 2.8986587407705698e-05, "epoch": 2.92027972027972, "percentage": 41.7, "elapsed_time": "2:19:51", "remaining_time": "3:15:32"}
|
||||
{"current_steps": 1050, "total_steps": 2506, "loss": 0.3047, "lr": 2.8861909518240412e-05, "epoch": 2.9342657342657343, "percentage": 41.9, "elapsed_time": "2:20:31", "remaining_time": "3:14:51"}
|
||||
{"current_steps": 1055, "total_steps": 2506, "loss": 0.2988, "lr": 2.873680162505087e-05, "epoch": 2.9482517482517485, "percentage": 42.1, "elapsed_time": "2:21:07", "remaining_time": "3:14:05"}
|
||||
{"current_steps": 1060, "total_steps": 2506, "loss": 0.2979, "lr": 2.8611269798709088e-05, "epoch": 2.9622377622377623, "percentage": 42.3, "elapsed_time": "2:21:45", "remaining_time": "3:13:22"}
|
||||
{"current_steps": 1065, "total_steps": 2506, "loss": 0.3048, "lr": 2.8485320130357467e-05, "epoch": 2.976223776223776, "percentage": 42.5, "elapsed_time": "2:22:24", "remaining_time": "3:12:41"}
|
||||
{"current_steps": 1070, "total_steps": 2506, "loss": 0.3053, "lr": 2.8358958731413237e-05, "epoch": 2.99020979020979, "percentage": 42.7, "elapsed_time": "2:23:05", "remaining_time": "3:12:01"}
|
||||
{"current_steps": 1075, "total_steps": 2506, "loss": 0.2893, "lr": 2.8232191733271902e-05, "epoch": 3.0027972027972027, "percentage": 42.9, "elapsed_time": "2:23:40", "remaining_time": "3:11:15"}
|
||||
{"current_steps": 1080, "total_steps": 2506, "loss": 0.289, "lr": 2.8105025287009722e-05, "epoch": 3.016783216783217, "percentage": 43.1, "elapsed_time": "2:24:19", "remaining_time": "3:10:34"}
|
||||
{"current_steps": 1085, "total_steps": 2506, "loss": 0.2849, "lr": 2.7977465563085266e-05, "epoch": 3.0307692307692307, "percentage": 43.3, "elapsed_time": "2:24:58", "remaining_time": "3:09:52"}
|
||||
{"current_steps": 1090, "total_steps": 2506, "loss": 0.2889, "lr": 2.7849518751039988e-05, "epoch": 3.044755244755245, "percentage": 43.5, "elapsed_time": "2:25:35", "remaining_time": "3:09:08"}
|
||||
{"current_steps": 1095, "total_steps": 2506, "loss": 0.2936, "lr": 2.7721191059197906e-05, "epoch": 3.0587412587412586, "percentage": 43.7, "elapsed_time": "2:26:13", "remaining_time": "3:08:25"}
|
||||
{"current_steps": 1100, "total_steps": 2506, "loss": 0.2839, "lr": 2.7592488714364346e-05, "epoch": 3.0727272727272728, "percentage": 43.89, "elapsed_time": "2:26:56", "remaining_time": "3:07:49"}
|
||||
{"current_steps": 1105, "total_steps": 2506, "loss": 0.2921, "lr": 2.7463417961523818e-05, "epoch": 3.0867132867132865, "percentage": 44.09, "elapsed_time": "2:27:35", "remaining_time": "3:07:07"}
|
||||
{"current_steps": 1110, "total_steps": 2506, "loss": 0.29, "lr": 2.7333985063536963e-05, "epoch": 3.1006993006993007, "percentage": 44.29, "elapsed_time": "2:28:17", "remaining_time": "3:06:29"}
|
||||
{"current_steps": 1115, "total_steps": 2506, "loss": 0.2882, "lr": 2.72041963008367e-05, "epoch": 3.114685314685315, "percentage": 44.49, "elapsed_time": "2:28:51", "remaining_time": "3:05:42"}
|
||||
{"current_steps": 1120, "total_steps": 2506, "loss": 0.2912, "lr": 2.707405797112344e-05, "epoch": 3.1286713286713286, "percentage": 44.69, "elapsed_time": "2:29:31", "remaining_time": "3:05:02"}
|
||||
{"current_steps": 1125, "total_steps": 2506, "loss": 0.286, "lr": 2.6943576389059555e-05, "epoch": 3.1426573426573428, "percentage": 44.89, "elapsed_time": "2:30:09", "remaining_time": "3:04:20"}
|
||||
{"current_steps": 1130, "total_steps": 2506, "loss": 0.2911, "lr": 2.6812757885962925e-05, "epoch": 3.1566433566433565, "percentage": 45.09, "elapsed_time": "2:30:53", "remaining_time": "3:03:44"}
|
||||
{"current_steps": 1135, "total_steps": 2506, "loss": 0.2859, "lr": 2.6681608809499742e-05, "epoch": 3.1706293706293707, "percentage": 45.29, "elapsed_time": "2:31:29", "remaining_time": "3:02:59"}
|
||||
{"current_steps": 1140, "total_steps": 2506, "loss": 0.2925, "lr": 2.6550135523376536e-05, "epoch": 3.184615384615385, "percentage": 45.49, "elapsed_time": "2:32:08", "remaining_time": "3:02:17"}
|
||||
{"current_steps": 1145, "total_steps": 2506, "loss": 0.2838, "lr": 2.641834440703133e-05, "epoch": 3.1986013986013986, "percentage": 45.69, "elapsed_time": "2:32:49", "remaining_time": "3:01:39"}
|
||||
{"current_steps": 1150, "total_steps": 2506, "loss": 0.2869, "lr": 2.6286241855324148e-05, "epoch": 3.2125874125874128, "percentage": 45.89, "elapsed_time": "2:33:30", "remaining_time": "3:01:00"}
|
||||
{"current_steps": 1155, "total_steps": 2506, "loss": 0.2945, "lr": 2.615383427822669e-05, "epoch": 3.2265734265734265, "percentage": 46.09, "elapsed_time": "2:34:13", "remaining_time": "3:00:23"}
|
||||
{"current_steps": 1160, "total_steps": 2506, "loss": 0.2948, "lr": 2.6021128100511312e-05, "epoch": 3.2405594405594407, "percentage": 46.29, "elapsed_time": "2:34:51", "remaining_time": "2:59:41"}
|
||||
{"current_steps": 1165, "total_steps": 2506, "loss": 0.2854, "lr": 2.5888129761439268e-05, "epoch": 3.2545454545454544, "percentage": 46.49, "elapsed_time": "2:35:31", "remaining_time": "2:59:00"}
|
||||
{"current_steps": 1170, "total_steps": 2506, "loss": 0.2947, "lr": 2.575484571444828e-05, "epoch": 3.2685314685314686, "percentage": 46.69, "elapsed_time": "2:36:10", "remaining_time": "2:58:20"}
|
||||
{"current_steps": 1175, "total_steps": 2506, "loss": 0.2966, "lr": 2.5621282426839376e-05, "epoch": 3.2825174825174823, "percentage": 46.89, "elapsed_time": "2:36:50", "remaining_time": "2:57:39"}
|
||||
{"current_steps": 1180, "total_steps": 2506, "loss": 0.2859, "lr": 2.5487446379463095e-05, "epoch": 3.2965034965034965, "percentage": 47.09, "elapsed_time": "2:37:37", "remaining_time": "2:57:07"}
|
||||
{"current_steps": 1185, "total_steps": 2506, "loss": 0.2888, "lr": 2.535334406640503e-05, "epoch": 3.3104895104895107, "percentage": 47.29, "elapsed_time": "2:38:13", "remaining_time": "2:56:23"}
|
||||
{"current_steps": 1190, "total_steps": 2506, "loss": 0.2918, "lr": 2.5218981994670683e-05, "epoch": 3.3244755244755244, "percentage": 47.49, "elapsed_time": "2:38:50", "remaining_time": "2:55:39"}
|
||||
{"current_steps": 1195, "total_steps": 2506, "loss": 0.2838, "lr": 2.5084366683869746e-05, "epoch": 3.3384615384615386, "percentage": 47.69, "elapsed_time": "2:39:27", "remaining_time": "2:54:56"}
|
||||
{"current_steps": 1200, "total_steps": 2506, "loss": 0.2944, "lr": 2.494950466589976e-05, "epoch": 3.3524475524475523, "percentage": 47.89, "elapsed_time": "2:40:07", "remaining_time": "2:54:16"}
|
||||
{"current_steps": 1205, "total_steps": 2506, "loss": 0.2874, "lr": 2.4814402484629172e-05, "epoch": 3.3664335664335665, "percentage": 48.08, "elapsed_time": "2:40:44", "remaining_time": "2:53:33"}
|
||||
{"current_steps": 1210, "total_steps": 2506, "loss": 0.2912, "lr": 2.4679066695579783e-05, "epoch": 3.3804195804195802, "percentage": 48.28, "elapsed_time": "2:41:23", "remaining_time": "2:52:52"}
|
||||
{"current_steps": 1215, "total_steps": 2506, "loss": 0.2958, "lr": 2.454350386560868e-05, "epoch": 3.3944055944055944, "percentage": 48.48, "elapsed_time": "2:42:09", "remaining_time": "2:52:18"}
|
||||
{"current_steps": 1220, "total_steps": 2506, "loss": 0.293, "lr": 2.440772057258958e-05, "epoch": 3.408391608391608, "percentage": 48.68, "elapsed_time": "2:42:48", "remaining_time": "2:51:36"}
|
||||
{"current_steps": 1225, "total_steps": 2506, "loss": 0.2905, "lr": 2.4271723405093683e-05, "epoch": 3.4223776223776223, "percentage": 48.88, "elapsed_time": "2:43:25", "remaining_time": "2:50:53"}
|
||||
{"current_steps": 1230, "total_steps": 2506, "loss": 0.2953, "lr": 2.4135518962069924e-05, "epoch": 3.4363636363636365, "percentage": 49.08, "elapsed_time": "2:44:03", "remaining_time": "2:50:11"}
|
||||
{"current_steps": 1235, "total_steps": 2506, "loss": 0.2874, "lr": 2.3999113852524825e-05, "epoch": 3.4503496503496502, "percentage": 49.28, "elapsed_time": "2:44:45", "remaining_time": "2:49:33"}
|
||||
{"current_steps": 1240, "total_steps": 2506, "loss": 0.2846, "lr": 2.386251469520179e-05, "epoch": 3.4643356643356644, "percentage": 49.48, "elapsed_time": "2:45:29", "remaining_time": "2:48:57"}
|
||||
{"current_steps": 1245, "total_steps": 2506, "loss": 0.2851, "lr": 2.3725728118259927e-05, "epoch": 3.478321678321678, "percentage": 49.68, "elapsed_time": "2:46:09", "remaining_time": "2:48:17"}
|
||||
{"current_steps": 1250, "total_steps": 2506, "loss": 0.2904, "lr": 2.358876075895247e-05, "epoch": 3.4923076923076923, "percentage": 49.88, "elapsed_time": "2:46:51", "remaining_time": "2:47:39"}
|
||||
{"current_steps": 1255, "total_steps": 2506, "loss": 0.2895, "lr": 2.345161926330468e-05, "epoch": 3.5062937062937065, "percentage": 50.08, "elapsed_time": "2:47:33", "remaining_time": "2:47:01"}
|
||||
{"current_steps": 1260, "total_steps": 2506, "loss": 0.2878, "lr": 2.3314310285791395e-05, "epoch": 3.5202797202797202, "percentage": 50.28, "elapsed_time": "2:48:09", "remaining_time": "2:46:17"}
|
||||
{"current_steps": 1265, "total_steps": 2506, "loss": 0.2851, "lr": 2.3176840489014127e-05, "epoch": 3.5342657342657344, "percentage": 50.48, "elapsed_time": "2:48:47", "remaining_time": "2:45:35"}
|
||||
{"current_steps": 1270, "total_steps": 2506, "loss": 0.29, "lr": 2.303921654337776e-05, "epoch": 3.548251748251748, "percentage": 50.68, "elapsed_time": "2:49:26", "remaining_time": "2:44:54"}
|
||||
{"current_steps": 1275, "total_steps": 2506, "loss": 0.2948, "lr": 2.29014451267669e-05, "epoch": 3.5622377622377623, "percentage": 50.88, "elapsed_time": "2:50:06", "remaining_time": "2:44:14"}
|
||||
{"current_steps": 1280, "total_steps": 2506, "loss": 0.2919, "lr": 2.276353292422185e-05, "epoch": 3.576223776223776, "percentage": 51.08, "elapsed_time": "2:50:44", "remaining_time": "2:43:32"}
|
||||
{"current_steps": 1285, "total_steps": 2506, "loss": 0.282, "lr": 2.2625486627614223e-05, "epoch": 3.5902097902097903, "percentage": 51.28, "elapsed_time": "2:51:29", "remaining_time": "2:42:57"}
|
||||
{"current_steps": 1290, "total_steps": 2506, "loss": 0.2799, "lr": 2.248731293532222e-05, "epoch": 3.604195804195804, "percentage": 51.48, "elapsed_time": "2:52:12", "remaining_time": "2:42:19"}
|
||||
{"current_steps": 1295, "total_steps": 2506, "loss": 0.2911, "lr": 2.2349018551905653e-05, "epoch": 3.618181818181818, "percentage": 51.68, "elapsed_time": "2:52:49", "remaining_time": "2:41:37"}
|
||||
{"current_steps": 1300, "total_steps": 2506, "loss": 0.2847, "lr": 2.221061018778058e-05, "epoch": 3.6321678321678323, "percentage": 51.88, "elapsed_time": "2:53:28", "remaining_time": "2:40:56"}
|
||||
{"current_steps": 1305, "total_steps": 2506, "loss": 0.2885, "lr": 2.207209455889368e-05, "epoch": 3.646153846153846, "percentage": 52.08, "elapsed_time": "2:54:08", "remaining_time": "2:40:15"}
|
||||
{"current_steps": 1310, "total_steps": 2506, "loss": 0.2906, "lr": 2.193347838639647e-05, "epoch": 3.6601398601398603, "percentage": 52.27, "elapsed_time": "2:54:48", "remaining_time": "2:39:35"}
|
||||
{"current_steps": 1315, "total_steps": 2506, "loss": 0.2825, "lr": 2.1794768396319058e-05, "epoch": 3.674125874125874, "percentage": 52.47, "elapsed_time": "2:55:28", "remaining_time": "2:38:55"}
|
||||
{"current_steps": 1320, "total_steps": 2506, "loss": 0.2924, "lr": 2.1655971319243853e-05, "epoch": 3.688111888111888, "percentage": 52.67, "elapsed_time": "2:56:08", "remaining_time": "2:38:15"}
|
||||
{"current_steps": 1325, "total_steps": 2506, "loss": 0.2817, "lr": 2.1517093889978966e-05, "epoch": 3.7020979020979023, "percentage": 52.87, "elapsed_time": "2:56:46", "remaining_time": "2:37:34"}
|
||||
{"current_steps": 1330, "total_steps": 2506, "loss": 0.284, "lr": 2.1378142847231417e-05, "epoch": 3.716083916083916, "percentage": 53.07, "elapsed_time": "2:57:26", "remaining_time": "2:36:54"}
|
||||
{"current_steps": 1335, "total_steps": 2506, "loss": 0.283, "lr": 2.123912493328013e-05, "epoch": 3.73006993006993, "percentage": 53.27, "elapsed_time": "2:58:02", "remaining_time": "2:36:10"}
|
||||
{"current_steps": 1340, "total_steps": 2506, "loss": 0.2929, "lr": 2.1100046893648813e-05, "epoch": 3.744055944055944, "percentage": 53.47, "elapsed_time": "2:58:46", "remaining_time": "2:35:33"}
|
||||
{"current_steps": 1345, "total_steps": 2506, "loss": 0.2932, "lr": 2.096091547677864e-05, "epoch": 3.758041958041958, "percentage": 53.67, "elapsed_time": "2:59:27", "remaining_time": "2:34:54"}
|
||||
{"current_steps": 1350, "total_steps": 2506, "loss": 0.2889, "lr": 2.0821737433700773e-05, "epoch": 3.772027972027972, "percentage": 53.87, "elapsed_time": "3:00:03", "remaining_time": "2:34:10"}
|
||||
{"current_steps": 1355, "total_steps": 2506, "loss": 0.2922, "lr": 2.068251951770882e-05, "epoch": 3.786013986013986, "percentage": 54.07, "elapsed_time": "3:00:42", "remaining_time": "2:33:30"}
|
||||
{"current_steps": 1360, "total_steps": 2506, "loss": 0.2847, "lr": 2.054326848403113e-05, "epoch": 3.8, "percentage": 54.27, "elapsed_time": "3:01:20", "remaining_time": "2:32:48"}
|
||||
{"current_steps": 1365, "total_steps": 2506, "loss": 0.2865, "lr": 2.0403991089502995e-05, "epoch": 3.813986013986014, "percentage": 54.47, "elapsed_time": "3:01:57", "remaining_time": "2:32:05"}
|
||||
{"current_steps": 1370, "total_steps": 2506, "loss": 0.291, "lr": 2.026469409223883e-05, "epoch": 3.827972027972028, "percentage": 54.67, "elapsed_time": "3:02:36", "remaining_time": "2:31:25"}
|
||||
{"current_steps": 1375, "total_steps": 2506, "loss": 0.2891, "lr": 2.012538425130421e-05, "epoch": 3.841958041958042, "percentage": 54.87, "elapsed_time": "3:03:15", "remaining_time": "2:30:43"}
|
||||
{"current_steps": 1380, "total_steps": 2506, "loss": 0.295, "lr": 1.998606832638792e-05, "epoch": 3.855944055944056, "percentage": 55.07, "elapsed_time": "3:03:55", "remaining_time": "2:30:04"}
|
||||
{"current_steps": 1385, "total_steps": 2506, "loss": 0.2978, "lr": 1.984675307747397e-05, "epoch": 3.86993006993007, "percentage": 55.27, "elapsed_time": "3:04:32", "remaining_time": "2:29:21"}
|
||||
{"current_steps": 1390, "total_steps": 2506, "loss": 0.2876, "lr": 1.970744526451356e-05, "epoch": 3.883916083916084, "percentage": 55.47, "elapsed_time": "3:05:08", "remaining_time": "2:28:38"}
|
||||
{"current_steps": 1395, "total_steps": 2506, "loss": 0.2926, "lr": 1.956815164709707e-05, "epoch": 3.8979020979020977, "percentage": 55.67, "elapsed_time": "3:05:48", "remaining_time": "2:27:58"}
|
||||
{"current_steps": 1400, "total_steps": 2506, "loss": 0.2826, "lr": 1.942887898412608e-05, "epoch": 3.911888111888112, "percentage": 55.87, "elapsed_time": "3:06:28", "remaining_time": "2:27:18"}
|
||||
{"current_steps": 1405, "total_steps": 2506, "loss": 0.2872, "lr": 1.928963403348541e-05, "epoch": 3.9258741258741257, "percentage": 56.07, "elapsed_time": "3:07:06", "remaining_time": "2:26:37"}
|
||||
{"current_steps": 1410, "total_steps": 2506, "loss": 0.2941, "lr": 1.91504235517152e-05, "epoch": 3.93986013986014, "percentage": 56.26, "elapsed_time": "3:07:44", "remaining_time": "2:25:56"}
|
||||
{"current_steps": 1415, "total_steps": 2506, "loss": 0.2879, "lr": 1.9011254293683067e-05, "epoch": 3.953846153846154, "percentage": 56.46, "elapsed_time": "3:08:23", "remaining_time": "2:25:14"}
|
||||
{"current_steps": 1420, "total_steps": 2506, "loss": 0.2929, "lr": 1.8872133012256328e-05, "epoch": 3.9678321678321677, "percentage": 56.66, "elapsed_time": "3:09:00", "remaining_time": "2:24:33"}
|
||||
{"current_steps": 1425, "total_steps": 2506, "loss": 0.2831, "lr": 1.8733066457974373e-05, "epoch": 3.981818181818182, "percentage": 56.86, "elapsed_time": "3:09:42", "remaining_time": "2:23:54"}
|
||||
{"current_steps": 1430, "total_steps": 2506, "loss": 0.2897, "lr": 1.8594061378721057e-05, "epoch": 3.9958041958041957, "percentage": 57.06, "elapsed_time": "3:10:23", "remaining_time": "2:23:15"}
|
||||
{"current_steps": 1435, "total_steps": 2506, "loss": 0.2839, "lr": 1.8455124519397308e-05, "epoch": 4.008391608391609, "percentage": 57.26, "elapsed_time": "3:11:00", "remaining_time": "2:22:33"}
|
||||
{"current_steps": 1440, "total_steps": 2506, "loss": 0.2728, "lr": 1.831626262159386e-05, "epoch": 4.022377622377622, "percentage": 57.46, "elapsed_time": "3:11:41", "remaining_time": "2:21:54"}
|
||||
{"current_steps": 1445, "total_steps": 2506, "loss": 0.2832, "lr": 1.817748242326409e-05, "epoch": 4.036363636363636, "percentage": 57.66, "elapsed_time": "3:12:21", "remaining_time": "2:21:14"}
|
||||
{"current_steps": 1450, "total_steps": 2506, "loss": 0.2803, "lr": 1.8038790658397097e-05, "epoch": 4.05034965034965, "percentage": 57.86, "elapsed_time": "3:12:59", "remaining_time": "2:20:32"}
|
||||
{"current_steps": 1455, "total_steps": 2506, "loss": 0.2771, "lr": 1.7900194056690955e-05, "epoch": 4.0643356643356645, "percentage": 58.06, "elapsed_time": "3:13:39", "remaining_time": "2:19:53"}
|
||||
{"current_steps": 1460, "total_steps": 2506, "loss": 0.2771, "lr": 1.7761699343226167e-05, "epoch": 4.078321678321679, "percentage": 58.26, "elapsed_time": "3:14:15", "remaining_time": "2:19:10"}
|
||||
{"current_steps": 1465, "total_steps": 2506, "loss": 0.2767, "lr": 1.7623313238139335e-05, "epoch": 4.092307692307692, "percentage": 58.46, "elapsed_time": "3:14:52", "remaining_time": "2:18:28"}
|
||||
{"current_steps": 1470, "total_steps": 2506, "loss": 0.2811, "lr": 1.748504245629711e-05, "epoch": 4.106293706293706, "percentage": 58.66, "elapsed_time": "3:15:29", "remaining_time": "2:17:46"}
|
||||
{"current_steps": 1475, "total_steps": 2506, "loss": 0.2858, "lr": 1.7346893706970333e-05, "epoch": 4.12027972027972, "percentage": 58.86, "elapsed_time": "3:16:11", "remaining_time": "2:17:07"}
|
||||
{"current_steps": 1480, "total_steps": 2506, "loss": 0.2794, "lr": 1.7208873693508493e-05, "epoch": 4.1342657342657345, "percentage": 59.06, "elapsed_time": "3:16:54", "remaining_time": "2:16:30"}
|
||||
{"current_steps": 1485, "total_steps": 2506, "loss": 0.2777, "lr": 1.7070989113014483e-05, "epoch": 4.148251748251749, "percentage": 59.26, "elapsed_time": "3:17:32", "remaining_time": "2:15:49"}
|
||||
{"current_steps": 1490, "total_steps": 2506, "loss": 0.2772, "lr": 1.6933246656019613e-05, "epoch": 4.162237762237762, "percentage": 59.46, "elapsed_time": "3:18:12", "remaining_time": "2:15:09"}
|
||||
{"current_steps": 1495, "total_steps": 2506, "loss": 0.2806, "lr": 1.6795653006158977e-05, "epoch": 4.176223776223776, "percentage": 59.66, "elapsed_time": "3:18:53", "remaining_time": "2:14:30"}
|
||||
{"current_steps": 1500, "total_steps": 2506, "loss": 0.2804, "lr": 1.6658214839847168e-05, "epoch": 4.19020979020979, "percentage": 59.86, "elapsed_time": "3:19:32", "remaining_time": "2:13:49"}
|
||||
{"current_steps": 1505, "total_steps": 2506, "loss": 0.28, "lr": 1.6520938825954265e-05, "epoch": 4.2041958041958045, "percentage": 60.06, "elapsed_time": "3:20:47", "remaining_time": "2:13:32"}
|
||||
{"current_steps": 1510, "total_steps": 2506, "loss": 0.2794, "lr": 1.638383162548229e-05, "epoch": 4.218181818181818, "percentage": 60.26, "elapsed_time": "3:21:24", "remaining_time": "2:12:50"}
|
||||
{"current_steps": 1515, "total_steps": 2506, "loss": 0.2806, "lr": 1.6246899891241995e-05, "epoch": 4.232167832167832, "percentage": 60.45, "elapsed_time": "3:22:04", "remaining_time": "2:12:11"}
|
||||
{"current_steps": 1520, "total_steps": 2506, "loss": 0.2765, "lr": 1.6110150267530017e-05, "epoch": 4.246153846153846, "percentage": 60.65, "elapsed_time": "3:22:43", "remaining_time": "2:11:30"}
|
||||
{"current_steps": 1525, "total_steps": 2506, "loss": 0.2774, "lr": 1.597358938980651e-05, "epoch": 4.26013986013986, "percentage": 60.85, "elapsed_time": "3:23:21", "remaining_time": "2:10:48"}
|
||||
{"current_steps": 1530, "total_steps": 2506, "loss": 0.2782, "lr": 1.583722388437317e-05, "epoch": 4.2741258741258745, "percentage": 61.05, "elapsed_time": "3:24:02", "remaining_time": "2:10:09"}
|
||||
{"current_steps": 1535, "total_steps": 2506, "loss": 0.2732, "lr": 1.570106036805169e-05, "epoch": 4.288111888111888, "percentage": 61.25, "elapsed_time": "3:24:42", "remaining_time": "2:09:29"}
|
||||
{"current_steps": 1540, "total_steps": 2506, "loss": 0.2835, "lr": 1.5565105447862716e-05, "epoch": 4.302097902097902, "percentage": 61.45, "elapsed_time": "3:25:21", "remaining_time": "2:08:49"}
|
||||
{"current_steps": 1545, "total_steps": 2506, "loss": 0.2794, "lr": 1.5429365720705247e-05, "epoch": 4.316083916083916, "percentage": 61.65, "elapsed_time": "3:26:03", "remaining_time": "2:08:10"}
|
||||
{"current_steps": 1550, "total_steps": 2506, "loss": 0.2788, "lr": 1.5293847773036526e-05, "epoch": 4.33006993006993, "percentage": 61.85, "elapsed_time": "3:26:41", "remaining_time": "2:07:29"}
|
||||
{"current_steps": 1555, "total_steps": 2506, "loss": 0.2809, "lr": 1.5158558180552467e-05, "epoch": 4.344055944055944, "percentage": 62.05, "elapsed_time": "3:27:23", "remaining_time": "2:06:50"}
|
||||
{"current_steps": 1560, "total_steps": 2506, "loss": 0.2802, "lr": 1.5023503507868586e-05, "epoch": 4.358041958041958, "percentage": 62.25, "elapsed_time": "3:28:07", "remaining_time": "2:06:12"}
|
||||
{"current_steps": 1565, "total_steps": 2506, "loss": 0.2817, "lr": 1.4888690308201442e-05, "epoch": 4.372027972027972, "percentage": 62.45, "elapsed_time": "3:28:42", "remaining_time": "2:05:29"}
|
||||
{"current_steps": 1570, "total_steps": 2506, "loss": 0.2805, "lr": 1.4754125123050668e-05, "epoch": 4.386013986013986, "percentage": 62.65, "elapsed_time": "3:29:26", "remaining_time": "2:04:51"}
|
||||
{"current_steps": 1575, "total_steps": 2506, "loss": 0.277, "lr": 1.4619814481881582e-05, "epoch": 4.4, "percentage": 62.85, "elapsed_time": "3:30:03", "remaining_time": "2:04:10"}
|
||||
{"current_steps": 1580, "total_steps": 2506, "loss": 0.2703, "lr": 1.4485764901808328e-05, "epoch": 4.413986013986014, "percentage": 63.05, "elapsed_time": "3:30:42", "remaining_time": "2:03:29"}
|
||||
{"current_steps": 1585, "total_steps": 2506, "loss": 0.2743, "lr": 1.435198288727766e-05, "epoch": 4.427972027972028, "percentage": 63.25, "elapsed_time": "3:31:22", "remaining_time": "2:02:49"}
|
||||
{"current_steps": 1590, "total_steps": 2506, "loss": 0.2768, "lr": 1.4218474929753358e-05, "epoch": 4.441958041958042, "percentage": 63.45, "elapsed_time": "3:32:06", "remaining_time": "2:02:11"}
|
||||
{"current_steps": 1595, "total_steps": 2506, "loss": 0.2878, "lr": 1.4085247507401188e-05, "epoch": 4.455944055944056, "percentage": 63.65, "elapsed_time": "3:32:42", "remaining_time": "2:01:29"}
|
||||
{"current_steps": 1600, "total_steps": 2506, "loss": 0.2738, "lr": 1.3952307084774599e-05, "epoch": 4.46993006993007, "percentage": 63.85, "elapsed_time": "3:33:19", "remaining_time": "2:00:47"}
|
||||
{"current_steps": 1605, "total_steps": 2506, "loss": 0.2785, "lr": 1.3819660112501054e-05, "epoch": 4.483916083916084, "percentage": 64.05, "elapsed_time": "3:33:58", "remaining_time": "2:00:07"}
|
||||
{"current_steps": 1610, "total_steps": 2506, "loss": 0.2811, "lr": 1.3687313026969003e-05, "epoch": 4.497902097902098, "percentage": 64.25, "elapsed_time": "3:34:35", "remaining_time": "1:59:25"}
|
||||
{"current_steps": 1615, "total_steps": 2506, "loss": 0.2769, "lr": 1.3555272250015575e-05, "epoch": 4.511888111888112, "percentage": 64.45, "elapsed_time": "3:35:14", "remaining_time": "1:58:45"}
|
||||
{"current_steps": 1620, "total_steps": 2506, "loss": 0.2898, "lr": 1.342354418861501e-05, "epoch": 4.525874125874126, "percentage": 64.64, "elapsed_time": "3:35:51", "remaining_time": "1:58:03"}
|
||||
{"current_steps": 1625, "total_steps": 2506, "loss": 0.2807, "lr": 1.329213523456772e-05, "epoch": 4.5398601398601395, "percentage": 64.84, "elapsed_time": "3:36:30", "remaining_time": "1:57:22"}
|
||||
{"current_steps": 1630, "total_steps": 2506, "loss": 0.273, "lr": 1.316105176419018e-05, "epoch": 4.553846153846154, "percentage": 65.04, "elapsed_time": "3:37:12", "remaining_time": "1:56:43"}
|
||||
{"current_steps": 1635, "total_steps": 2506, "loss": 0.277, "lr": 1.3030300138005516e-05, "epoch": 4.567832167832168, "percentage": 65.24, "elapsed_time": "3:37:54", "remaining_time": "1:56:04"}
|
||||
{"current_steps": 1640, "total_steps": 2506, "loss": 0.2783, "lr": 1.2899886700434885e-05, "epoch": 4.581818181818182, "percentage": 65.44, "elapsed_time": "3:38:33", "remaining_time": "1:55:24"}
|
||||
{"current_steps": 1645, "total_steps": 2506, "loss": 0.2793, "lr": 1.2769817779489606e-05, "epoch": 4.595804195804196, "percentage": 65.64, "elapsed_time": "3:39:15", "remaining_time": "1:54:45"}
|
||||
{"current_steps": 1650, "total_steps": 2506, "loss": 0.2834, "lr": 1.2640099686464157e-05, "epoch": 4.6097902097902095, "percentage": 65.84, "elapsed_time": "3:39:53", "remaining_time": "1:54:04"}
|
||||
{"current_steps": 1655, "total_steps": 2506, "loss": 0.2759, "lr": 1.2510738715629866e-05, "epoch": 4.623776223776224, "percentage": 66.04, "elapsed_time": "3:40:29", "remaining_time": "1:53:22"}
|
||||
{"current_steps": 1660, "total_steps": 2506, "loss": 0.2828, "lr": 1.2381741143929547e-05, "epoch": 4.637762237762238, "percentage": 66.24, "elapsed_time": "3:41:09", "remaining_time": "1:52:42"}
|
||||
{"current_steps": 1665, "total_steps": 2506, "loss": 0.2778, "lr": 1.22531132306729e-05, "epoch": 4.651748251748252, "percentage": 66.44, "elapsed_time": "3:41:46", "remaining_time": "1:52:01"}
|
||||
{"current_steps": 1670, "total_steps": 2506, "loss": 0.2704, "lr": 1.212486121723281e-05, "epoch": 4.665734265734265, "percentage": 66.64, "elapsed_time": "3:42:23", "remaining_time": "1:51:19"}
|
||||
{"current_steps": 1675, "total_steps": 2506, "loss": 0.2813, "lr": 1.1996991326742484e-05, "epoch": 4.6797202797202795, "percentage": 66.84, "elapsed_time": "3:43:03", "remaining_time": "1:50:39"}
|
||||
{"current_steps": 1680, "total_steps": 2506, "loss": 0.2775, "lr": 1.1869509763793497e-05, "epoch": 4.693706293706294, "percentage": 67.04, "elapsed_time": "3:43:39", "remaining_time": "1:49:57"}
|
||||
{"current_steps": 1685, "total_steps": 2506, "loss": 0.2708, "lr": 1.174242271413473e-05, "epoch": 4.707692307692308, "percentage": 67.24, "elapsed_time": "3:44:14", "remaining_time": "1:49:15"}
|
||||
{"current_steps": 1690, "total_steps": 2506, "loss": 0.2764, "lr": 1.1615736344372203e-05, "epoch": 4.721678321678322, "percentage": 67.44, "elapsed_time": "3:44:53", "remaining_time": "1:48:35"}
|
||||
{"current_steps": 1695, "total_steps": 2506, "loss": 0.2704, "lr": 1.148945680166989e-05, "epoch": 4.735664335664335, "percentage": 67.64, "elapsed_time": "3:45:31", "remaining_time": "1:47:54"}
|
||||
{"current_steps": 1700, "total_steps": 2506, "loss": 0.2818, "lr": 1.136359021345139e-05, "epoch": 4.7496503496503495, "percentage": 67.84, "elapsed_time": "3:46:10", "remaining_time": "1:47:13"}
|
||||
{"current_steps": 1705, "total_steps": 2506, "loss": 0.2822, "lr": 1.123814268710267e-05, "epoch": 4.763636363636364, "percentage": 68.04, "elapsed_time": "3:46:53", "remaining_time": "1:46:35"}
|
||||
{"current_steps": 1710, "total_steps": 2506, "loss": 0.2754, "lr": 1.1113120309675645e-05, "epoch": 4.777622377622378, "percentage": 68.24, "elapsed_time": "3:47:31", "remaining_time": "1:45:54"}
|
||||
{"current_steps": 1715, "total_steps": 2506, "loss": 0.2755, "lr": 1.098852914759292e-05, "epoch": 4.791608391608392, "percentage": 68.44, "elapsed_time": "3:48:08", "remaining_time": "1:45:13"}
|
||||
{"current_steps": 1720, "total_steps": 2506, "loss": 0.2748, "lr": 1.086437524635331e-05, "epoch": 4.805594405594405, "percentage": 68.64, "elapsed_time": "3:48:43", "remaining_time": "1:44:31"}
|
||||
{"current_steps": 1725, "total_steps": 2506, "loss": 0.2689, "lr": 1.0740664630238592e-05, "epoch": 4.8195804195804195, "percentage": 68.83, "elapsed_time": "3:49:22", "remaining_time": "1:43:51"}
|
||||
{"current_steps": 1730, "total_steps": 2506, "loss": 0.2758, "lr": 1.0617403302021128e-05, "epoch": 4.833566433566434, "percentage": 69.03, "elapsed_time": "3:50:08", "remaining_time": "1:43:13"}
|
||||
{"current_steps": 1735, "total_steps": 2506, "loss": 0.2777, "lr": 1.0494597242672647e-05, "epoch": 4.847552447552448, "percentage": 69.23, "elapsed_time": "3:50:51", "remaining_time": "1:42:35"}
|
||||
{"current_steps": 1740, "total_steps": 2506, "loss": 0.2772, "lr": 1.037225241107399e-05, "epoch": 4.861538461538462, "percentage": 69.43, "elapsed_time": "3:51:34", "remaining_time": "1:41:56"}
|
||||
{"current_steps": 1745, "total_steps": 2506, "loss": 0.2817, "lr": 1.025037474372599e-05, "epoch": 4.875524475524475, "percentage": 69.63, "elapsed_time": "3:52:18", "remaining_time": "1:41:18"}
|
||||
{"current_steps": 1750, "total_steps": 2506, "loss": 0.2763, "lr": 1.0128970154461424e-05, "epoch": 4.8895104895104895, "percentage": 69.83, "elapsed_time": "3:52:57", "remaining_time": "1:40:38"}
|
||||
{"current_steps": 1755, "total_steps": 2506, "loss": 0.2827, "lr": 1.000804453415801e-05, "epoch": 4.903496503496504, "percentage": 70.03, "elapsed_time": "3:53:33", "remaining_time": "1:39:56"}
|
||||
{"current_steps": 1760, "total_steps": 2506, "loss": 0.2756, "lr": 9.887603750452646e-06, "epoch": 4.917482517482518, "percentage": 70.23, "elapsed_time": "3:54:11", "remaining_time": "1:39:15"}
|
||||
{"current_steps": 1765, "total_steps": 2506, "loss": 0.2823, "lr": 9.767653647456614e-06, "epoch": 4.931468531468531, "percentage": 70.43, "elapsed_time": "3:54:48", "remaining_time": "1:38:34"}
|
||||
{"current_steps": 1770, "total_steps": 2506, "loss": 0.2825, "lr": 9.648200045472071e-06, "epoch": 4.945454545454545, "percentage": 70.63, "elapsed_time": "3:55:28", "remaining_time": "1:37:54"}
|
||||
{"current_steps": 1775, "total_steps": 2506, "loss": 0.2774, "lr": 9.5292487407096e-06, "epoch": 4.9594405594405595, "percentage": 70.83, "elapsed_time": "3:56:11", "remaining_time": "1:37:16"}
|
||||
{"current_steps": 1780, "total_steps": 2506, "loss": 0.2691, "lr": 9.410805505006974e-06, "epoch": 4.973426573426574, "percentage": 71.03, "elapsed_time": "3:56:46", "remaining_time": "1:36:34"}
|
||||
{"current_steps": 1785, "total_steps": 2506, "loss": 0.2825, "lr": 9.29287608554907e-06, "epoch": 4.987412587412587, "percentage": 71.23, "elapsed_time": "3:57:22", "remaining_time": "1:35:53"}
|
||||
{"current_steps": 1790, "total_steps": 2506, "loss": 0.2778, "lr": 9.175466204589039e-06, "epoch": 5.0, "percentage": 71.43, "elapsed_time": "3:58:00", "remaining_time": "1:35:12"}
|
||||
{"current_steps": 1795, "total_steps": 2506, "loss": 0.271, "lr": 9.0585815591706e-06, "epoch": 5.013986013986014, "percentage": 71.63, "elapsed_time": "3:58:36", "remaining_time": "1:34:30"}
|
||||
{"current_steps": 1800, "total_steps": 2506, "loss": 0.2696, "lr": 8.942227820851653e-06, "epoch": 5.027972027972028, "percentage": 71.83, "elapsed_time": "3:59:15", "remaining_time": "1:33:50"}
|
||||
{"current_steps": 1805, "total_steps": 2506, "loss": 0.2672, "lr": 8.82641063542904e-06, "epoch": 5.041958041958042, "percentage": 72.03, "elapsed_time": "3:59:52", "remaining_time": "1:33:09"}
|
||||
{"current_steps": 1810, "total_steps": 2506, "loss": 0.2677, "lr": 8.711135622664622e-06, "epoch": 5.055944055944056, "percentage": 72.23, "elapsed_time": "4:00:39", "remaining_time": "1:32:32"}
|
||||
{"current_steps": 1815, "total_steps": 2506, "loss": 0.2678, "lr": 8.596408376012562e-06, "epoch": 5.06993006993007, "percentage": 72.43, "elapsed_time": "4:01:22", "remaining_time": "1:31:53"}
|
||||
{"current_steps": 1820, "total_steps": 2506, "loss": 0.2629, "lr": 8.482234462347955e-06, "epoch": 5.083916083916084, "percentage": 72.63, "elapsed_time": "4:02:00", "remaining_time": "1:31:13"}
|
||||
{"current_steps": 1825, "total_steps": 2506, "loss": 0.2686, "lr": 8.368619421696693e-06, "epoch": 5.0979020979020975, "percentage": 72.83, "elapsed_time": "4:02:41", "remaining_time": "1:30:33"}
|
||||
{"current_steps": 1830, "total_steps": 2506, "loss": 0.2734, "lr": 8.255568766966613e-06, "epoch": 5.111888111888112, "percentage": 73.02, "elapsed_time": "4:03:20", "remaining_time": "1:29:53"}
|
||||
{"current_steps": 1835, "total_steps": 2506, "loss": 0.2705, "lr": 8.143087983680061e-06, "epoch": 5.125874125874126, "percentage": 73.22, "elapsed_time": "4:04:01", "remaining_time": "1:29:13"}
|
||||
{"current_steps": 1840, "total_steps": 2506, "loss": 0.2716, "lr": 8.031182529707664e-06, "epoch": 5.13986013986014, "percentage": 73.42, "elapsed_time": "4:04:37", "remaining_time": "1:28:32"}
|
||||
{"current_steps": 1845, "total_steps": 2506, "loss": 0.2711, "lr": 7.919857835003537e-06, "epoch": 5.153846153846154, "percentage": 73.62, "elapsed_time": "4:05:22", "remaining_time": "1:27:54"}
|
||||
{"current_steps": 1850, "total_steps": 2506, "loss": 0.2728, "lr": 7.80911930134177e-06, "epoch": 5.1678321678321675, "percentage": 73.82, "elapsed_time": "4:05:59", "remaining_time": "1:27:13"}
|
||||
{"current_steps": 1855, "total_steps": 2506, "loss": 0.2737, "lr": 7.698972302054363e-06, "epoch": 5.181818181818182, "percentage": 74.02, "elapsed_time": "4:06:37", "remaining_time": "1:26:33"}
|
||||
{"current_steps": 1860, "total_steps": 2506, "loss": 0.2733, "lr": 7.589422181770445e-06, "epoch": 5.195804195804196, "percentage": 74.22, "elapsed_time": "4:07:18", "remaining_time": "1:25:53"}
|
||||
{"current_steps": 1865, "total_steps": 2506, "loss": 0.2736, "lr": 7.480474256157009e-06, "epoch": 5.20979020979021, "percentage": 74.42, "elapsed_time": "4:07:56", "remaining_time": "1:25:13"}
|
||||
{"current_steps": 1870, "total_steps": 2506, "loss": 0.2747, "lr": 7.3721338116609e-06, "epoch": 5.223776223776224, "percentage": 74.62, "elapsed_time": "4:08:35", "remaining_time": "1:24:32"}
|
||||
{"current_steps": 1875, "total_steps": 2506, "loss": 0.2665, "lr": 7.264406105252371e-06, "epoch": 5.2377622377622375, "percentage": 74.82, "elapsed_time": "4:09:17", "remaining_time": "1:23:53"}
|
||||
{"current_steps": 1880, "total_steps": 2506, "loss": 0.2632, "lr": 7.15729636416995e-06, "epoch": 5.251748251748252, "percentage": 75.02, "elapsed_time": "4:09:52", "remaining_time": "1:23:12"}
|
||||
{"current_steps": 1885, "total_steps": 2506, "loss": 0.2653, "lr": 7.050809785666843e-06, "epoch": 5.265734265734266, "percentage": 75.22, "elapsed_time": "4:10:35", "remaining_time": "1:22:33"}
|
||||
{"current_steps": 1890, "total_steps": 2506, "loss": 0.2694, "lr": 6.944951536758704e-06, "epoch": 5.27972027972028, "percentage": 75.42, "elapsed_time": "4:11:12", "remaining_time": "1:21:52"}
|
||||
{"current_steps": 1895, "total_steps": 2506, "loss": 0.2772, "lr": 6.83972675397298e-06, "epoch": 5.293706293706293, "percentage": 75.62, "elapsed_time": "4:11:53", "remaining_time": "1:21:13"}
|
||||
{"current_steps": 1900, "total_steps": 2506, "loss": 0.2692, "lr": 6.7351405430995945e-06, "epoch": 5.3076923076923075, "percentage": 75.82, "elapsed_time": "4:12:34", "remaining_time": "1:20:33"}
|
||||
{"current_steps": 1905, "total_steps": 2506, "loss": 0.2717, "lr": 6.631197978943273e-06, "epoch": 5.321678321678322, "percentage": 76.02, "elapsed_time": "4:13:14", "remaining_time": "1:19:53"}
|
||||
{"current_steps": 1910, "total_steps": 2506, "loss": 0.2747, "lr": 6.527904105077243e-06, "epoch": 5.335664335664336, "percentage": 76.22, "elapsed_time": "4:13:56", "remaining_time": "1:19:14"}
|
||||
{"current_steps": 1915, "total_steps": 2506, "loss": 0.2733, "lr": 6.425263933598549e-06, "epoch": 5.34965034965035, "percentage": 76.42, "elapsed_time": "4:14:36", "remaining_time": "1:18:34"}
|
||||
{"current_steps": 1920, "total_steps": 2506, "loss": 0.2751, "lr": 6.323282444884826e-06, "epoch": 5.363636363636363, "percentage": 76.62, "elapsed_time": "4:15:16", "remaining_time": "1:17:54"}
|
||||
{"current_steps": 1925, "total_steps": 2506, "loss": 0.2798, "lr": 6.221964587352653e-06, "epoch": 5.3776223776223775, "percentage": 76.82, "elapsed_time": "4:15:58", "remaining_time": "1:17:15"}
|
||||
{"current_steps": 1930, "total_steps": 2506, "loss": 0.2642, "lr": 6.121315277217441e-06, "epoch": 5.391608391608392, "percentage": 77.02, "elapsed_time": "4:16:38", "remaining_time": "1:16:35"}
|
||||
{"current_steps": 1935, "total_steps": 2506, "loss": 0.2729, "lr": 6.0213393982548555e-06, "epoch": 5.405594405594406, "percentage": 77.21, "elapsed_time": "4:17:16", "remaining_time": "1:15:55"}
|
||||
{"current_steps": 1940, "total_steps": 2506, "loss": 0.2712, "lr": 5.922041801563898e-06, "epoch": 5.41958041958042, "percentage": 77.41, "elapsed_time": "4:18:03", "remaining_time": "1:15:17"}
|
||||
{"current_steps": 1945, "total_steps": 2506, "loss": 0.2693, "lr": 5.823427305331461e-06, "epoch": 5.433566433566433, "percentage": 77.61, "elapsed_time": "4:18:39", "remaining_time": "1:14:36"}
|
||||
{"current_steps": 1950, "total_steps": 2506, "loss": 0.2733, "lr": 5.72550069459858e-06, "epoch": 5.4475524475524475, "percentage": 77.81, "elapsed_time": "4:19:24", "remaining_time": "1:13:57"}
|
||||
{"current_steps": 1955, "total_steps": 2506, "loss": 0.2718, "lr": 5.628266721028226e-06, "epoch": 5.461538461538462, "percentage": 78.01, "elapsed_time": "4:20:03", "remaining_time": "1:13:17"}
|
||||
{"current_steps": 1960, "total_steps": 2506, "loss": 0.2701, "lr": 5.5317301026747575e-06, "epoch": 5.475524475524476, "percentage": 78.21, "elapsed_time": "4:20:41", "remaining_time": "1:12:37"}
|
||||
{"current_steps": 1965, "total_steps": 2506, "loss": 0.2714, "lr": 5.435895523754957e-06, "epoch": 5.489510489510489, "percentage": 78.41, "elapsed_time": "4:21:21", "remaining_time": "1:11:57"}
|
||||
{"current_steps": 1970, "total_steps": 2506, "loss": 0.2701, "lr": 5.340767634420794e-06, "epoch": 5.503496503496503, "percentage": 78.61, "elapsed_time": "4:21:58", "remaining_time": "1:11:16"}
|
||||
{"current_steps": 1975, "total_steps": 2506, "loss": 0.2721, "lr": 5.24635105053372e-06, "epoch": 5.5174825174825175, "percentage": 78.81, "elapsed_time": "4:22:36", "remaining_time": "1:10:36"}
|
||||
{"current_steps": 1980, "total_steps": 2506, "loss": 0.2709, "lr": 5.15265035344076e-06, "epoch": 5.531468531468532, "percentage": 79.01, "elapsed_time": "4:23:16", "remaining_time": "1:09:56"}
|
||||
{"current_steps": 1985, "total_steps": 2506, "loss": 0.2728, "lr": 5.059670089752166e-06, "epoch": 5.545454545454545, "percentage": 79.21, "elapsed_time": "4:23:54", "remaining_time": "1:09:16"}
|
||||
{"current_steps": 1990, "total_steps": 2506, "loss": 0.2746, "lr": 4.967414771120837e-06, "epoch": 5.559440559440559, "percentage": 79.41, "elapsed_time": "4:24:33", "remaining_time": "1:08:36"}
|
||||
{"current_steps": 1995, "total_steps": 2506, "loss": 0.2719, "lr": 4.875888874023358e-06, "epoch": 5.573426573426573, "percentage": 79.61, "elapsed_time": "4:25:13", "remaining_time": "1:07:56"}
|
||||
{"current_steps": 2000, "total_steps": 2506, "loss": 0.2802, "lr": 4.78509683954284e-06, "epoch": 5.5874125874125875, "percentage": 79.81, "elapsed_time": "4:25:53", "remaining_time": "1:07:16"}
|
||||
{"current_steps": 2005, "total_steps": 2506, "loss": 0.2645, "lr": 4.695043073153398e-06, "epoch": 5.601398601398602, "percentage": 80.01, "elapsed_time": "4:26:30", "remaining_time": "1:06:35"}
|
||||
{"current_steps": 2010, "total_steps": 2506, "loss": 0.2749, "lr": 4.605731944506377e-06, "epoch": 5.615384615384615, "percentage": 80.21, "elapsed_time": "4:27:09", "remaining_time": "1:05:55"}
|
||||
{"current_steps": 2015, "total_steps": 2506, "loss": 0.266, "lr": 4.5171677872183506e-06, "epoch": 5.629370629370629, "percentage": 80.41, "elapsed_time": "4:27:47", "remaining_time": "1:05:15"}
|
||||
{"current_steps": 2020, "total_steps": 2506, "loss": 0.2699, "lr": 4.429354898660829e-06, "epoch": 5.643356643356643, "percentage": 80.61, "elapsed_time": "4:28:27", "remaining_time": "1:04:35"}
|
||||
{"current_steps": 2025, "total_steps": 2506, "loss": 0.2601, "lr": 4.3422975397517455e-06, "epoch": 5.6573426573426575, "percentage": 80.81, "elapsed_time": "4:29:05", "remaining_time": "1:03:55"}
|
||||
{"current_steps": 2030, "total_steps": 2506, "loss": 0.2729, "lr": 4.255999934748673e-06, "epoch": 5.671328671328672, "percentage": 81.01, "elapsed_time": "4:29:42", "remaining_time": "1:03:14"}
|
||||
{"current_steps": 2035, "total_steps": 2506, "loss": 0.269, "lr": 4.1704662710439156e-06, "epoch": 5.685314685314685, "percentage": 81.21, "elapsed_time": "4:30:20", "remaining_time": "1:02:34"}
|
||||
{"current_steps": 2040, "total_steps": 2506, "loss": 0.2763, "lr": 4.085700698961252e-06, "epoch": 5.699300699300699, "percentage": 81.4, "elapsed_time": "4:31:00", "remaining_time": "1:01:54"}
|
||||
{"current_steps": 2045, "total_steps": 2506, "loss": 0.2662, "lr": 4.00170733155461e-06, "epoch": 5.713286713286713, "percentage": 81.6, "elapsed_time": "4:31:42", "remaining_time": "1:01:15"}
|
||||
{"current_steps": 2050, "total_steps": 2506, "loss": 0.2729, "lr": 3.9184902444084575e-06, "epoch": 5.7272727272727275, "percentage": 81.8, "elapsed_time": "4:32:20", "remaining_time": "1:00:34"}
|
||||
{"current_steps": 2055, "total_steps": 2506, "loss": 0.2702, "lr": 3.836053475440058e-06, "epoch": 5.741258741258742, "percentage": 82.0, "elapsed_time": "4:33:04", "remaining_time": "0:59:55"}
|
||||
{"current_steps": 2060, "total_steps": 2506, "loss": 0.2698, "lr": 3.7544010247035247e-06, "epoch": 5.755244755244755, "percentage": 82.2, "elapsed_time": "4:33:39", "remaining_time": "0:59:14"}
|
||||
{"current_steps": 2065, "total_steps": 2506, "loss": 0.2675, "lr": 3.6735368541957494e-06, "epoch": 5.769230769230769, "percentage": 82.4, "elapsed_time": "4:34:16", "remaining_time": "0:58:34"}
|
||||
{"current_steps": 2070, "total_steps": 2506, "loss": 0.2671, "lr": 3.5934648876641287e-06, "epoch": 5.783216783216783, "percentage": 82.6, "elapsed_time": "4:34:58", "remaining_time": "0:57:54"}
|
||||
{"current_steps": 2075, "total_steps": 2506, "loss": 0.2717, "lr": 3.5141890104162e-06, "epoch": 5.7972027972027975, "percentage": 82.8, "elapsed_time": "4:35:36", "remaining_time": "0:57:14"}
|
||||
{"current_steps": 2080, "total_steps": 2506, "loss": 0.2792, "lr": 3.4357130691311057e-06, "epoch": 5.811188811188811, "percentage": 83.0, "elapsed_time": "4:36:12", "remaining_time": "0:56:34"}
|
||||
{"current_steps": 2085, "total_steps": 2506, "loss": 0.2763, "lr": 3.3580408716729342e-06, "epoch": 5.825174825174825, "percentage": 83.2, "elapsed_time": "4:36:51", "remaining_time": "0:55:54"}
|
||||
{"current_steps": 2090, "total_steps": 2506, "loss": 0.2755, "lr": 3.2811761869059524e-06, "epoch": 5.839160839160839, "percentage": 83.4, "elapsed_time": "4:37:27", "remaining_time": "0:55:13"}
|
||||
{"current_steps": 2095, "total_steps": 2506, "loss": 0.2723, "lr": 3.205122744511746e-06, "epoch": 5.853146853146853, "percentage": 83.6, "elapsed_time": "4:38:07", "remaining_time": "0:54:33"}
|
||||
{"current_steps": 2100, "total_steps": 2506, "loss": 0.2695, "lr": 3.129884234808238e-06, "epoch": 5.867132867132867, "percentage": 83.8, "elapsed_time": "4:38:46", "remaining_time": "0:53:53"}
|
||||
{"current_steps": 2105, "total_steps": 2506, "loss": 0.2644, "lr": 3.0554643085706037e-06, "epoch": 5.881118881118881, "percentage": 84.0, "elapsed_time": "4:39:25", "remaining_time": "0:53:13"}
|
||||
{"current_steps": 2110, "total_steps": 2506, "loss": 0.2693, "lr": 2.981866576854164e-06, "epoch": 5.895104895104895, "percentage": 84.2, "elapsed_time": "4:40:03", "remaining_time": "0:52:33"}
|
||||
{"current_steps": 2115, "total_steps": 2506, "loss": 0.2698, "lr": 2.909094610819134e-06, "epoch": 5.909090909090909, "percentage": 84.4, "elapsed_time": "4:40:42", "remaining_time": "0:51:53"}
|
||||
{"current_steps": 2120, "total_steps": 2506, "loss": 0.2653, "lr": 2.8371519415573635e-06, "epoch": 5.923076923076923, "percentage": 84.6, "elapsed_time": "4:41:24", "remaining_time": "0:51:14"}
|
||||
{"current_steps": 2125, "total_steps": 2506, "loss": 0.2711, "lr": 2.7660420599209726e-06, "epoch": 5.937062937062937, "percentage": 84.8, "elapsed_time": "4:42:05", "remaining_time": "0:50:34"}
|
||||
{"current_steps": 2130, "total_steps": 2506, "loss": 0.2733, "lr": 2.6957684163530017e-06, "epoch": 5.951048951048951, "percentage": 85.0, "elapsed_time": "4:42:45", "remaining_time": "0:49:54"}
|
||||
{"current_steps": 2135, "total_steps": 2506, "loss": 0.2681, "lr": 2.6263344207199446e-06, "epoch": 5.965034965034965, "percentage": 85.2, "elapsed_time": "4:43:24", "remaining_time": "0:49:14"}
|
||||
{"current_steps": 2140, "total_steps": 2506, "loss": 0.2713, "lr": 2.557743442146343e-06, "epoch": 5.979020979020979, "percentage": 85.4, "elapsed_time": "4:44:03", "remaining_time": "0:48:34"}
|
||||
{"current_steps": 2145, "total_steps": 2506, "loss": 0.2654, "lr": 2.489998808851255e-06, "epoch": 5.993006993006993, "percentage": 85.59, "elapsed_time": "4:44:42", "remaining_time": "0:47:54"}
|
||||
{"current_steps": 2150, "total_steps": 2506, "loss": 0.2756, "lr": 2.423103807986802e-06, "epoch": 6.0055944055944055, "percentage": 85.79, "elapsed_time": "4:45:18", "remaining_time": "0:47:14"}
|
||||
{"current_steps": 2155, "total_steps": 2506, "loss": 0.269, "lr": 2.3570616854786364e-06, "epoch": 6.01958041958042, "percentage": 85.99, "elapsed_time": "4:45:56", "remaining_time": "0:46:34"}
|
||||
{"current_steps": 2160, "total_steps": 2506, "loss": 0.2639, "lr": 2.291875645868471e-06, "epoch": 6.033566433566434, "percentage": 86.19, "elapsed_time": "4:46:33", "remaining_time": "0:45:54"}
|
||||
{"current_steps": 2165, "total_steps": 2506, "loss": 0.2663, "lr": 2.227548852158552e-06, "epoch": 6.047552447552447, "percentage": 86.39, "elapsed_time": "4:47:12", "remaining_time": "0:45:14"}
|
||||
{"current_steps": 2170, "total_steps": 2506, "loss": 0.2679, "lr": 2.1640844256582262e-06, "epoch": 6.061538461538461, "percentage": 86.59, "elapsed_time": "4:47:52", "remaining_time": "0:44:34"}
|
||||
{"current_steps": 2175, "total_steps": 2506, "loss": 0.2658, "lr": 2.10148544583243e-06, "epoch": 6.0755244755244755, "percentage": 86.79, "elapsed_time": "4:48:29", "remaining_time": "0:43:54"}
|
||||
{"current_steps": 2180, "total_steps": 2506, "loss": 0.2728, "lr": 2.039754950152313e-06, "epoch": 6.08951048951049, "percentage": 86.99, "elapsed_time": "4:49:03", "remaining_time": "0:43:13"}
|
||||
{"current_steps": 2185, "total_steps": 2506, "loss": 0.2698, "lr": 1.978895933947835e-06, "epoch": 6.103496503496504, "percentage": 87.19, "elapsed_time": "4:49:42", "remaining_time": "0:42:33"}
|
||||
{"current_steps": 2190, "total_steps": 2506, "loss": 0.2598, "lr": 1.918911350262411e-06, "epoch": 6.117482517482517, "percentage": 87.39, "elapsed_time": "4:50:20", "remaining_time": "0:41:53"}
|
||||
{"current_steps": 2195, "total_steps": 2506, "loss": 0.2668, "lr": 1.859804109709651e-06, "epoch": 6.131468531468531, "percentage": 87.59, "elapsed_time": "4:51:00", "remaining_time": "0:41:13"}
|
||||
{"current_steps": 2200, "total_steps": 2506, "loss": 0.265, "lr": 1.8015770803320997e-06, "epoch": 6.1454545454545455, "percentage": 87.79, "elapsed_time": "4:51:35", "remaining_time": "0:40:33"}
|
||||
{"current_steps": 2205, "total_steps": 2506, "loss": 0.2721, "lr": 1.744233087462095e-06, "epoch": 6.15944055944056, "percentage": 87.99, "elapsed_time": "4:52:15", "remaining_time": "0:39:53"}
|
||||
{"current_steps": 2210, "total_steps": 2506, "loss": 0.2651, "lr": 1.6877749135846521e-06, "epoch": 6.173426573426573, "percentage": 88.19, "elapsed_time": "4:52:55", "remaining_time": "0:39:13"}
|
||||
{"current_steps": 2215, "total_steps": 2506, "loss": 0.2686, "lr": 1.6322052982024739e-06, "epoch": 6.187412587412587, "percentage": 88.39, "elapsed_time": "4:53:33", "remaining_time": "0:38:34"}
|
||||
{"current_steps": 2220, "total_steps": 2506, "loss": 0.2679, "lr": 1.577526937703e-06, "epoch": 6.201398601398601, "percentage": 88.59, "elapsed_time": "4:54:14", "remaining_time": "0:37:54"}
|
||||
{"current_steps": 2225, "total_steps": 2506, "loss": 0.2694, "lr": 1.5237424852275905e-06, "epoch": 6.2153846153846155, "percentage": 88.79, "elapsed_time": "4:54:51", "remaining_time": "0:37:14"}
|
||||
{"current_steps": 2230, "total_steps": 2506, "loss": 0.2715, "lr": 1.4708545505427796e-06, "epoch": 6.22937062937063, "percentage": 88.99, "elapsed_time": "4:55:33", "remaining_time": "0:36:34"}
|
||||
{"current_steps": 2235, "total_steps": 2506, "loss": 0.2698, "lr": 1.418865699913643e-06, "epoch": 6.243356643356643, "percentage": 89.19, "elapsed_time": "4:56:09", "remaining_time": "0:35:54"}
|
||||
{"current_steps": 2240, "total_steps": 2506, "loss": 0.2695, "lr": 1.3677784559792672e-06, "epoch": 6.257342657342657, "percentage": 89.39, "elapsed_time": "4:56:49", "remaining_time": "0:35:14"}
|
||||
{"current_steps": 2245, "total_steps": 2506, "loss": 0.2649, "lr": 1.3175952976303675e-06, "epoch": 6.271328671328671, "percentage": 89.58, "elapsed_time": "4:57:34", "remaining_time": "0:34:35"}
|
||||
{"current_steps": 2250, "total_steps": 2506, "loss": 0.2641, "lr": 1.268318659888974e-06, "epoch": 6.2853146853146855, "percentage": 89.78, "elapsed_time": "4:58:15", "remaining_time": "0:33:56"}
|
||||
{"current_steps": 2255, "total_steps": 2506, "loss": 0.2646, "lr": 1.2199509337903103e-06, "epoch": 6.2993006993007, "percentage": 89.98, "elapsed_time": "4:58:54", "remaining_time": "0:33:16"}
|
||||
{"current_steps": 2260, "total_steps": 2506, "loss": 0.2733, "lr": 1.172494466266747e-06, "epoch": 6.313286713286713, "percentage": 90.18, "elapsed_time": "4:59:31", "remaining_time": "0:32:36"}
|
||||
{"current_steps": 2265, "total_steps": 2506, "loss": 0.263, "lr": 1.1259515600339465e-06, "epoch": 6.327272727272727, "percentage": 90.38, "elapsed_time": "5:00:12", "remaining_time": "0:31:56"}
|
||||
{"current_steps": 2270, "total_steps": 2506, "loss": 0.2637, "lr": 1.0803244734790996e-06, "epoch": 6.341258741258741, "percentage": 90.58, "elapsed_time": "5:00:53", "remaining_time": "0:31:16"}
|
||||
{"current_steps": 2275, "total_steps": 2506, "loss": 0.2645, "lr": 1.0356154205513724e-06, "epoch": 6.3552447552447555, "percentage": 90.78, "elapsed_time": "5:01:33", "remaining_time": "0:30:37"}
|
||||
{"current_steps": 2280, "total_steps": 2506, "loss": 0.2684, "lr": 9.918265706544617e-07, "epoch": 6.36923076923077, "percentage": 90.98, "elapsed_time": "5:02:11", "remaining_time": "0:29:57"}
|
||||
{"current_steps": 2285, "total_steps": 2506, "loss": 0.2725, "lr": 9.489600485413297e-07, "epoch": 6.383216783216783, "percentage": 91.18, "elapsed_time": "5:02:51", "remaining_time": "0:29:17"}
|
||||
{"current_steps": 2290, "total_steps": 2506, "loss": 0.2666, "lr": 9.070179342111163e-07, "epoch": 6.397202797202797, "percentage": 91.38, "elapsed_time": "5:03:29", "remaining_time": "0:28:37"}
|
||||
{"current_steps": 2295, "total_steps": 2506, "loss": 0.2745, "lr": 8.660022628082033e-07, "epoch": 6.411188811188811, "percentage": 91.58, "elapsed_time": "5:04:07", "remaining_time": "0:27:57"}
|
||||
{"current_steps": 2300, "total_steps": 2506, "loss": 0.2642, "lr": 8.259150245234671e-07, "epoch": 6.4251748251748255, "percentage": 91.78, "elapsed_time": "5:04:50", "remaining_time": "0:27:18"}
|
||||
{"current_steps": 2305, "total_steps": 2506, "loss": 0.2627, "lr": 7.867581644977029e-07, "epoch": 6.439160839160839, "percentage": 91.98, "elapsed_time": "5:05:28", "remaining_time": "0:26:38"}
|
||||
{"current_steps": 2310, "total_steps": 2506, "loss": 0.2615, "lr": 7.485335827272555e-07, "epoch": 6.453146853146853, "percentage": 92.18, "elapsed_time": "5:06:06", "remaining_time": "0:25:58"}
|
||||
{"current_steps": 2315, "total_steps": 2506, "loss": 0.2648, "lr": 7.11243133971804e-07, "epoch": 6.467132867132867, "percentage": 92.38, "elapsed_time": "5:06:48", "remaining_time": "0:25:18"}
|
||||
{"current_steps": 2320, "total_steps": 2506, "loss": 0.2592, "lr": 6.748886276643874e-07, "epoch": 6.481118881118881, "percentage": 92.58, "elapsed_time": "5:07:29", "remaining_time": "0:24:39"}
|
||||
{"current_steps": 2325, "total_steps": 2506, "loss": 0.2678, "lr": 6.394718278235923e-07, "epoch": 6.495104895104895, "percentage": 92.78, "elapsed_time": "5:08:08", "remaining_time": "0:23:59"}
|
||||
{"current_steps": 2330, "total_steps": 2506, "loss": 0.2651, "lr": 6.049944529679641e-07, "epoch": 6.509090909090909, "percentage": 92.98, "elapsed_time": "5:08:50", "remaining_time": "0:23:19"}
|
||||
{"current_steps": 2335, "total_steps": 2506, "loss": 0.2699, "lr": 5.714581760326133e-07, "epoch": 6.523076923076923, "percentage": 93.18, "elapsed_time": "5:09:30", "remaining_time": "0:22:39"}
|
||||
{"current_steps": 2340, "total_steps": 2506, "loss": 0.2636, "lr": 5.388646242880446e-07, "epoch": 6.537062937062937, "percentage": 93.38, "elapsed_time": "5:10:06", "remaining_time": "0:21:59"}
|
||||
{"current_steps": 2345, "total_steps": 2506, "loss": 0.269, "lr": 5.072153792611967e-07, "epoch": 6.551048951048951, "percentage": 93.58, "elapsed_time": "5:10:45", "remaining_time": "0:21:20"}
|
||||
{"current_steps": 2350, "total_steps": 2506, "loss": 0.2662, "lr": 4.765119766587023e-07, "epoch": 6.565034965034965, "percentage": 93.77, "elapsed_time": "5:11:27", "remaining_time": "0:20:40"}
|
||||
{"current_steps": 2355, "total_steps": 2506, "loss": 0.2715, "lr": 4.4675590629237543e-07, "epoch": 6.579020979020979, "percentage": 93.97, "elapsed_time": "5:12:08", "remaining_time": "0:20:00"}
|
||||
{"current_steps": 2360, "total_steps": 2506, "loss": 0.26, "lr": 4.1794861200691317e-07, "epoch": 6.593006993006993, "percentage": 94.17, "elapsed_time": "5:12:45", "remaining_time": "0:19:20"}
|
||||
{"current_steps": 2365, "total_steps": 2506, "loss": 0.2641, "lr": 3.9009149160984305e-07, "epoch": 6.606993006993007, "percentage": 94.37, "elapsed_time": "5:13:24", "remaining_time": "0:18:41"}
|
||||
{"current_steps": 2370, "total_steps": 2506, "loss": 0.2683, "lr": 3.6318589680369276e-07, "epoch": 6.620979020979021, "percentage": 94.57, "elapsed_time": "5:14:05", "remaining_time": "0:18:01"}
|
||||
{"current_steps": 2375, "total_steps": 2506, "loss": 0.2656, "lr": 3.3723313312040927e-07, "epoch": 6.634965034965035, "percentage": 94.77, "elapsed_time": "5:14:43", "remaining_time": "0:17:21"}
|
||||
{"current_steps": 2380, "total_steps": 2506, "loss": 0.2656, "lr": 3.1223445985800294e-07, "epoch": 6.648951048951049, "percentage": 94.97, "elapsed_time": "5:15:24", "remaining_time": "0:16:41"}
|
||||
{"current_steps": 2385, "total_steps": 2506, "loss": 0.2678, "lr": 2.88191090019454e-07, "epoch": 6.662937062937063, "percentage": 95.17, "elapsed_time": "5:16:01", "remaining_time": "0:16:01"}
|
||||
{"current_steps": 2390, "total_steps": 2506, "loss": 0.2602, "lr": 2.651041902538332e-07, "epoch": 6.676923076923077, "percentage": 95.37, "elapsed_time": "5:16:39", "remaining_time": "0:15:22"}
|
||||
{"current_steps": 2395, "total_steps": 2506, "loss": 0.275, "lr": 2.429748807997201e-07, "epoch": 6.690909090909091, "percentage": 95.57, "elapsed_time": "5:17:20", "remaining_time": "0:14:42"}
|
||||
{"current_steps": 2400, "total_steps": 2506, "loss": 0.2653, "lr": 2.2180423543082253e-07, "epoch": 6.704895104895105, "percentage": 95.77, "elapsed_time": "5:18:03", "remaining_time": "0:14:02"}
|
||||
{"current_steps": 2405, "total_steps": 2506, "loss": 0.2661, "lr": 2.0159328140389346e-07, "epoch": 6.718881118881119, "percentage": 95.97, "elapsed_time": "5:18:41", "remaining_time": "0:13:23"}
|
||||
{"current_steps": 2410, "total_steps": 2506, "loss": 0.2658, "lr": 1.8234299940886434e-07, "epoch": 6.732867132867133, "percentage": 96.17, "elapsed_time": "5:19:24", "remaining_time": "0:12:43"}
|
||||
{"current_steps": 2415, "total_steps": 2506, "loss": 0.2629, "lr": 1.640543235212877e-07, "epoch": 6.746853146853147, "percentage": 96.37, "elapsed_time": "5:20:06", "remaining_time": "0:12:03"}
|
||||
{"current_steps": 2420, "total_steps": 2506, "loss": 0.269, "lr": 1.467281411569821e-07, "epoch": 6.7608391608391605, "percentage": 96.57, "elapsed_time": "5:20:45", "remaining_time": "0:11:23"}
|
||||
{"current_steps": 2425, "total_steps": 2506, "loss": 0.2622, "lr": 1.303652930289956e-07, "epoch": 6.774825174825175, "percentage": 96.77, "elapsed_time": "5:21:21", "remaining_time": "0:10:44"}
|
||||
{"current_steps": 2430, "total_steps": 2506, "loss": 0.2701, "lr": 1.1496657310680282e-07, "epoch": 6.788811188811189, "percentage": 96.97, "elapsed_time": "5:21:58", "remaining_time": "0:10:04"}
|
||||
{"current_steps": 2435, "total_steps": 2506, "loss": 0.2721, "lr": 1.0053272857777797e-07, "epoch": 6.802797202797203, "percentage": 97.17, "elapsed_time": "5:22:38", "remaining_time": "0:09:24"}
|
||||
{"current_steps": 2440, "total_steps": 2506, "loss": 0.2604, "lr": 8.706445981093937e-08, "epoch": 6.816783216783216, "percentage": 97.37, "elapsed_time": "5:23:17", "remaining_time": "0:08:44"}
|
||||
{"current_steps": 2445, "total_steps": 2506, "loss": 0.2713, "lr": 7.45624203229789e-08, "epoch": 6.8307692307692305, "percentage": 97.57, "elapsed_time": "5:23:55", "remaining_time": "0:08:04"}
|
||||
{"current_steps": 2450, "total_steps": 2506, "loss": 0.2603, "lr": 6.302721674652957e-08, "epoch": 6.844755244755245, "percentage": 97.77, "elapsed_time": "5:24:32", "remaining_time": "0:07:25"}
|
||||
{"current_steps": 2455, "total_steps": 2506, "loss": 0.2624, "lr": 5.2459408800744626e-08, "epoch": 6.858741258741259, "percentage": 97.96, "elapsed_time": "5:25:10", "remaining_time": "0:06:45"}
|
||||
{"current_steps": 2460, "total_steps": 2506, "loss": 0.2674, "lr": 4.285950926413929e-08, "epoch": 6.872727272727273, "percentage": 98.16, "elapsed_time": "5:25:51", "remaining_time": "0:06:05"}
|
||||
{"current_steps": 2465, "total_steps": 2506, "loss": 0.2689, "lr": 3.4227983949699506e-08, "epoch": 6.886713286713286, "percentage": 98.36, "elapsed_time": "5:26:32", "remaining_time": "0:05:25"}
|
||||
{"current_steps": 2470, "total_steps": 2506, "loss": 0.2713, "lr": 2.656525168228674e-08, "epoch": 6.9006993006993005, "percentage": 98.56, "elapsed_time": "5:27:12", "remaining_time": "0:04:46"}
|
||||
{"current_steps": 2475, "total_steps": 2506, "loss": 0.2687, "lr": 1.9871684278314207e-08, "epoch": 6.914685314685315, "percentage": 98.76, "elapsed_time": "5:27:49", "remaining_time": "0:04:06"}
|
||||
{"current_steps": 2480, "total_steps": 2506, "loss": 0.2678, "lr": 1.4147606527707969e-08, "epoch": 6.928671328671329, "percentage": 98.96, "elapsed_time": "5:28:26", "remaining_time": "0:03:26"}
|
||||
{"current_steps": 2485, "total_steps": 2506, "loss": 0.2635, "lr": 9.393296178137334e-09, "epoch": 6.942657342657343, "percentage": 99.16, "elapsed_time": "5:29:05", "remaining_time": "0:02:46"}
|
||||
{"current_steps": 2490, "total_steps": 2506, "loss": 0.2609, "lr": 5.6089839215522916e-09, "epoch": 6.956643356643356, "percentage": 99.36, "elapsed_time": "5:29:42", "remaining_time": "0:02:07"}
|
||||
{"current_steps": 2495, "total_steps": 2506, "loss": 0.2668, "lr": 2.794853382976914e-09, "epoch": 6.9706293706293705, "percentage": 99.56, "elapsed_time": "5:30:23", "remaining_time": "0:01:27"}
|
||||
{"current_steps": 2500, "total_steps": 2506, "loss": 0.2702, "lr": 9.510411116075978e-10, "epoch": 6.984615384615385, "percentage": 99.76, "elapsed_time": "5:31:04", "remaining_time": "0:00:47"}
|
||||
{"current_steps": 2505, "total_steps": 2506, "loss": 0.2672, "lr": 7.763657418280446e-11, "epoch": 6.998601398601399, "percentage": 99.96, "elapsed_time": "5:31:45", "remaining_time": "0:00:07"}
|
||||
{"current_steps": 2506, "total_steps": 2506, "epoch": 7.0, "percentage": 100.0, "elapsed_time": "5:32:28", "remaining_time": "0:00:00"}
|
||||
5558
trainer_state.json
Normal file
5558
trainer_state.json
Normal file
File diff suppressed because it is too large
Load Diff
3
training_args.bin
Normal file
3
training_args.bin
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:957d6fd02a71b886e00e10639f0214b8d0d41debccb0a1a04f7f1566d488d4b9
|
||||
size 8785
|
||||
BIN
training_loss.png
Normal file
BIN
training_loss.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 37 KiB |
1
vocab.json
Normal file
1
vocab.json
Normal file
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user