初始化项目,由ModelHub XC社区提供模型

Model: laion/nemotron-terminal-scientific_computing__Qwen3-8B
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-04-25 01:11:05 +08:00
commit 93f8dd95a9
23 changed files with 154904 additions and 0 deletions

36
.gitattributes vendored Normal file
View File

@@ -0,0 +1,36 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text

61
README.md Normal file
View File

@@ -0,0 +1,61 @@
---
library_name: transformers
license: other
base_model: Qwen/Qwen3-8B
tags:
- llama-factory
- full
- generated_from_trainer
model-index:
- name: nemotron-scientific-computing__Qwen3-8B
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# nemotron-scientific-computing__Qwen3-8B
This model is a fine-tuned version of [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) on the /e/data1/datasets/playground/ot/hf_hub/datasets--laion--nemotron-terminal-scientific_computing/snapshots/610c7db0b8510b87e3c99b3bd49660bc56821866_thinking_preprocessed dataset.
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 4e-05
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 32
- gradient_accumulation_steps: 3
- total_train_batch_size: 96
- total_eval_batch_size: 256
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.98) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 5.0
### Training results
### Framework versions
- Transformers 4.57.6
- Pytorch 2.9.1+cu130
- Datasets 4.7.0
- Tokenizers 0.22.2

28
added_tokens.json Normal file
View File

@@ -0,0 +1,28 @@
{
"</think>": 151668,
"</tool_call>": 151658,
"</tool_response>": 151666,
"<think>": 151667,
"<tool_call>": 151657,
"<tool_response>": 151665,
"<|box_end|>": 151649,
"<|box_start|>": 151648,
"<|endoftext|>": 151643,
"<|file_sep|>": 151664,
"<|fim_middle|>": 151660,
"<|fim_pad|>": 151662,
"<|fim_prefix|>": 151659,
"<|fim_suffix|>": 151661,
"<|im_end|>": 151645,
"<|im_start|>": 151644,
"<|image_pad|>": 151655,
"<|object_ref_end|>": 151647,
"<|object_ref_start|>": 151646,
"<|quad_end|>": 151651,
"<|quad_start|>": 151650,
"<|repo_name|>": 151663,
"<|video_pad|>": 151656,
"<|vision_end|>": 151653,
"<|vision_pad|>": 151654,
"<|vision_start|>": 151652
}

16
all_results.json Normal file
View File

@@ -0,0 +1,16 @@
{
"achieved_tflops_per_gpu": 51242.27907434514,
"achieved_tflops_per_gpu_theoretical": 1501156.7964550059,
"epoch": 5.0,
"loss_nan_ranks": 0,
"loss_rank_avg": 0.0942406952381134,
"mfu_percent": 3621.3624787523067,
"mfu_percent_theoretical": 106088.81953745625,
"total_flos": 3.756182037619278e+18,
"train_loss": 0.0,
"train_runtime": 2.2907,
"train_samples_per_second": 42523.793,
"train_steps_per_second": 443.093,
"valid_targets_mean": 8724.4,
"valid_targets_min": 2268
}

89
chat_template.jinja Normal file
View File

@@ -0,0 +1,89 @@
{%- if tools %}
{{- '<|im_start|>system\n' }}
{%- if messages[0].role == 'system' %}
{{- messages[0].content + '\n\n' }}
{%- endif %}
{{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
{%- for tool in tools %}
{{- "\n" }}
{{- tool | tojson }}
{%- endfor %}
{{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
{%- else %}
{%- if messages[0].role == 'system' %}
{{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
{%- endif %}
{%- endif %}
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
{%- for message in messages[::-1] %}
{%- set index = (messages|length - 1) - loop.index0 %}
{%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
{%- set ns.multi_step_tool = false %}
{%- set ns.last_query_index = index %}
{%- endif %}
{%- endfor %}
{%- for message in messages %}
{%- if message.content is string %}
{%- set content = message.content %}
{%- else %}
{%- set content = '' %}
{%- endif %}
{%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
{{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
{%- elif message.role == "assistant" %}
{%- set reasoning_content = '' %}
{%- if message.reasoning_content is string %}
{%- set reasoning_content = message.reasoning_content %}
{%- else %}
{%- if '</think>' in content %}
{%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
{%- set content = content.split('</think>')[-1].lstrip('\n') %}
{%- endif %}
{%- endif %}
{%- if loop.index0 > ns.last_query_index %}
{%- if loop.last or (not loop.last and reasoning_content) %}
{{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
{%- else %}
{{- '<|im_start|>' + message.role + '\n' + content }}
{%- endif %}
{%- else %}
{{- '<|im_start|>' + message.role + '\n' + content }}
{%- endif %}
{%- if message.tool_calls %}
{%- for tool_call in message.tool_calls %}
{%- if (loop.first and content) or (not loop.first) %}
{{- '\n' }}
{%- endif %}
{%- if tool_call.function %}
{%- set tool_call = tool_call.function %}
{%- endif %}
{{- '<tool_call>\n{"name": "' }}
{{- tool_call.name }}
{{- '", "arguments": ' }}
{%- if tool_call.arguments is string %}
{{- tool_call.arguments }}
{%- else %}
{{- tool_call.arguments | tojson }}
{%- endif %}
{{- '}\n</tool_call>' }}
{%- endfor %}
{%- endif %}
{{- '<|im_end|>\n' }}
{%- elif message.role == "tool" %}
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
{{- '<|im_start|>user' }}
{%- endif %}
{{- '\n<tool_response>\n' }}
{{- content }}
{{- '\n</tool_response>' }}
{%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
{{- '<|im_end|>\n' }}
{%- endif %}
{%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
{{- '<|im_start|>assistant\n' }}
{%- if enable_thinking is defined and enable_thinking is false %}
{{- '<think>\n\n</think>\n\n' }}
{%- endif %}
{%- endif %}

68
config.json Normal file
View File

@@ -0,0 +1,68 @@
{
"architectures": [
"Qwen3ForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"dtype": "bfloat16",
"eos_token_id": 151645,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 12288,
"layer_types": [
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention"
],
"max_position_embeddings": 40960,
"max_window_layers": 36,
"model_type": "qwen3",
"num_attention_heads": 32,
"num_hidden_layers": 36,
"num_key_value_heads": 8,
"pad_token_id": 151643,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000,
"sliding_window": null,
"tie_word_embeddings": false,
"transformers_version": "4.57.6",
"use_cache": false,
"use_sliding_window": false,
"vocab_size": 151936
}

12
generation_config.json Normal file
View File

@@ -0,0 +1,12 @@
{
"do_sample": true,
"eos_token_id": [
151645,
151643
],
"pad_token_id": 151643,
"temperature": 0.6,
"top_k": 20,
"top_p": 0.95,
"transformers_version": "4.57.6"
}

151388
merges.txt Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:86739826f9f023060e887e948d0aad266bd8b7b5e64c24886c695f020b3b89e0
size 4902257696

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:3c2fc744f19e554d726ea9bfa573a2a5c23bab26ae213872d4832d6a2dfe7566
size 4915960368

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:06e942038c3f9f9a18dd79705de8ce5aed5f79105ae0af1ac651720c049c489b
size 4983068496

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d3ce8034b87321da3904ed60e2b4e6f82fbf9339745ad6b794b282718e7981cd
size 1580230264

View File

@@ -0,0 +1,407 @@
{
"metadata": {
"total_parameters": 308224,
"total_size": 16381470720
},
"weight_map": {
"lm_head.weight": "model-00004-of-00004.safetensors",
"model.embed_tokens.weight": "model-00001-of-00004.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.20.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.21.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.21.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.21.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.22.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.30.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.32.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.32.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.32.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.32.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.32.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.32.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.32.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.32.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.32.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.32.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.32.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.33.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.33.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.33.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.33.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.33.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.33.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.33.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.33.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.33.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.33.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.33.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.34.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.34.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.34.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.34.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.34.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.34.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.34.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.34.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.34.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"model.layers.34.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.34.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.35.input_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.35.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.35.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.35.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.35.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.35.self_attn.k_norm.weight": "model-00004-of-00004.safetensors",
"model.layers.35.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.35.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.35.self_attn.q_norm.weight": "model-00004-of-00004.safetensors",
"model.layers.35.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.35.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.9.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.norm.weight": "model-00004-of-00004.safetensors"
}
}

12
run_summary.json Normal file
View File

@@ -0,0 +1,12 @@
{
"agent_name": "610c7db0b8510b87e3c99b3bd49660bc56821866_thinking_preprocessed",
"training_start": null,
"training_end": null,
"created_by": "DCAgent",
"base_model_name": "Qwen/Qwen3-8B",
"dataset_name": "/e/data1/datasets/playground/ot/hf_hub/datasets--laion--nemotron-terminal-scientific_computing/snapshots/610c7db0b8510b87e3c99b3bd49660bc56821866_thinking_preprocessed",
"training_type": "SFT",
"training_parameters": "https://huggingface.co/laion/nemotron-terminal-scientific_computing__Qwen3-8B/blob/main/config.json",
"wandb_link": null,
"traces_location_s3": null
}

31
special_tokens_map.json Normal file
View File

@@ -0,0 +1,31 @@
{
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"eos_token": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

3
tokenizer.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
size 11422654

240
tokenizer_config.json Normal file
View File

@@ -0,0 +1,240 @@
{
"add_bos_token": false,
"add_prefix_space": false,
"added_tokens_decoder": {
"151643": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151644": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151645": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151646": {
"content": "<|object_ref_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151647": {
"content": "<|object_ref_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151648": {
"content": "<|box_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151649": {
"content": "<|box_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151650": {
"content": "<|quad_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151651": {
"content": "<|quad_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151652": {
"content": "<|vision_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151653": {
"content": "<|vision_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151654": {
"content": "<|vision_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151655": {
"content": "<|image_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151656": {
"content": "<|video_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151657": {
"content": "<tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151658": {
"content": "</tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151659": {
"content": "<|fim_prefix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151660": {
"content": "<|fim_middle|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151661": {
"content": "<|fim_suffix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151662": {
"content": "<|fim_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151663": {
"content": "<|repo_name|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151664": {
"content": "<|file_sep|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151665": {
"content": "<tool_response>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151666": {
"content": "</tool_response>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151667": {
"content": "<think>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151668": {
"content": "</think>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
}
},
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"bos_token": null,
"clean_up_tokenization_spaces": false,
"eos_token": "<|im_end|>",
"errors": "replace",
"extra_special_tokens": {},
"model_max_length": 32768,
"pad_token": "<|endoftext|>",
"padding_side": "right",
"split_special_tokens": false,
"tokenizer_class": "Qwen2Tokenizer",
"unk_token": null
}

12
train_results.json Normal file
View File

@@ -0,0 +1,12 @@
{
"achieved_tflops_per_gpu": 51242.27907434514,
"achieved_tflops_per_gpu_theoretical": 1501156.7964550059,
"epoch": 5.0,
"mfu_percent": 3621.3624787523067,
"mfu_percent_theoretical": 106088.81953745625,
"total_flos": 3.756182037619278e+18,
"train_loss": 0.0,
"train_runtime": 2.2907,
"train_samples_per_second": 42523.793,
"train_steps_per_second": 443.093
}

209
trainer_log.jsonl Normal file
View File

@@ -0,0 +1,209 @@
{"current_steps": 5, "total_steps": 1015, "loss": 0.8575, "lr": 1.5686274509803923e-06, "epoch": 0.024630541871921183, "percentage": 0.49, "elapsed_time": "0:02:08", "remaining_time": "7:13:37"}
{"current_steps": 10, "total_steps": 1015, "loss": 0.8189, "lr": 3.529411764705883e-06, "epoch": 0.04926108374384237, "percentage": 0.99, "elapsed_time": "0:04:02", "remaining_time": "6:46:12"}
{"current_steps": 15, "total_steps": 1015, "loss": 0.7166, "lr": 5.4901960784313735e-06, "epoch": 0.07389162561576355, "percentage": 1.48, "elapsed_time": "0:06:06", "remaining_time": "6:47:04"}
{"current_steps": 20, "total_steps": 1015, "loss": 0.6554, "lr": 7.450980392156863e-06, "epoch": 0.09852216748768473, "percentage": 1.97, "elapsed_time": "0:07:59", "remaining_time": "6:37:43"}
{"current_steps": 25, "total_steps": 1015, "loss": 0.6269, "lr": 9.411764705882354e-06, "epoch": 0.12315270935960591, "percentage": 2.46, "elapsed_time": "0:09:54", "remaining_time": "6:32:40"}
{"current_steps": 30, "total_steps": 1015, "loss": 0.5914, "lr": 1.1372549019607844e-05, "epoch": 0.1477832512315271, "percentage": 2.96, "elapsed_time": "0:11:55", "remaining_time": "6:31:42"}
{"current_steps": 35, "total_steps": 1015, "loss": 0.5478, "lr": 1.3333333333333333e-05, "epoch": 0.1724137931034483, "percentage": 3.45, "elapsed_time": "0:14:00", "remaining_time": "6:32:05"}
{"current_steps": 40, "total_steps": 1015, "loss": 0.5168, "lr": 1.5294117647058822e-05, "epoch": 0.19704433497536947, "percentage": 3.94, "elapsed_time": "0:16:02", "remaining_time": "6:31:08"}
{"current_steps": 45, "total_steps": 1015, "loss": 0.5032, "lr": 1.7254901960784314e-05, "epoch": 0.22167487684729065, "percentage": 4.43, "elapsed_time": "0:18:03", "remaining_time": "6:29:19"}
{"current_steps": 50, "total_steps": 1015, "loss": 0.482, "lr": 1.9215686274509807e-05, "epoch": 0.24630541871921183, "percentage": 4.93, "elapsed_time": "0:20:04", "remaining_time": "6:27:25"}
{"current_steps": 55, "total_steps": 1015, "loss": 0.4697, "lr": 2.1176470588235296e-05, "epoch": 0.270935960591133, "percentage": 5.42, "elapsed_time": "0:22:09", "remaining_time": "6:26:39"}
{"current_steps": 60, "total_steps": 1015, "loss": 0.4524, "lr": 2.3137254901960788e-05, "epoch": 0.2955665024630542, "percentage": 5.91, "elapsed_time": "0:24:11", "remaining_time": "6:25:06"}
{"current_steps": 65, "total_steps": 1015, "loss": 0.4471, "lr": 2.5098039215686277e-05, "epoch": 0.32019704433497537, "percentage": 6.4, "elapsed_time": "0:26:15", "remaining_time": "6:23:48"}
{"current_steps": 70, "total_steps": 1015, "loss": 0.4394, "lr": 2.705882352941177e-05, "epoch": 0.3448275862068966, "percentage": 6.9, "elapsed_time": "0:28:08", "remaining_time": "6:19:56"}
{"current_steps": 75, "total_steps": 1015, "loss": 0.4316, "lr": 2.9019607843137258e-05, "epoch": 0.3694581280788177, "percentage": 7.39, "elapsed_time": "0:30:07", "remaining_time": "6:17:29"}
{"current_steps": 80, "total_steps": 1015, "loss": 0.4175, "lr": 3.098039215686275e-05, "epoch": 0.39408866995073893, "percentage": 7.88, "elapsed_time": "0:32:08", "remaining_time": "6:15:37"}
{"current_steps": 85, "total_steps": 1015, "loss": 0.4151, "lr": 3.294117647058824e-05, "epoch": 0.4187192118226601, "percentage": 8.37, "elapsed_time": "0:34:09", "remaining_time": "6:13:41"}
{"current_steps": 90, "total_steps": 1015, "loss": 0.4137, "lr": 3.490196078431373e-05, "epoch": 0.4433497536945813, "percentage": 8.87, "elapsed_time": "0:36:13", "remaining_time": "6:12:14"}
{"current_steps": 95, "total_steps": 1015, "loss": 0.4065, "lr": 3.686274509803922e-05, "epoch": 0.46798029556650245, "percentage": 9.36, "elapsed_time": "0:38:25", "remaining_time": "6:12:06"}
{"current_steps": 100, "total_steps": 1015, "loss": 0.4092, "lr": 3.882352941176471e-05, "epoch": 0.49261083743842365, "percentage": 9.85, "elapsed_time": "0:40:22", "remaining_time": "6:09:23"}
{"current_steps": 105, "total_steps": 1015, "loss": 0.4025, "lr": 3.999952639479403e-05, "epoch": 0.5172413793103449, "percentage": 10.34, "elapsed_time": "0:42:27", "remaining_time": "6:07:54"}
{"current_steps": 110, "total_steps": 1015, "loss": 0.4066, "lr": 3.999419859382013e-05, "epoch": 0.541871921182266, "percentage": 10.84, "elapsed_time": "0:44:24", "remaining_time": "6:05:24"}
{"current_steps": 115, "total_steps": 1015, "loss": 0.3992, "lr": 3.99829525676357e-05, "epoch": 0.5665024630541872, "percentage": 11.33, "elapsed_time": "0:46:20", "remaining_time": "6:02:37"}
{"current_steps": 120, "total_steps": 1015, "loss": 0.4006, "lr": 3.996579164503212e-05, "epoch": 0.5911330049261084, "percentage": 11.82, "elapsed_time": "0:48:17", "remaining_time": "6:00:09"}
{"current_steps": 125, "total_steps": 1015, "loss": 0.4021, "lr": 3.9942720905593045e-05, "epoch": 0.6157635467980296, "percentage": 12.32, "elapsed_time": "0:50:13", "remaining_time": "5:57:34"}
{"current_steps": 130, "total_steps": 1015, "loss": 0.3953, "lr": 3.991374717819092e-05, "epoch": 0.6403940886699507, "percentage": 12.81, "elapsed_time": "0:58:45", "remaining_time": "6:39:59"}
{"current_steps": 135, "total_steps": 1015, "loss": 0.3917, "lr": 3.987887903896564e-05, "epoch": 0.6650246305418719, "percentage": 13.3, "elapsed_time": "1:00:41", "remaining_time": "6:35:37"}
{"current_steps": 140, "total_steps": 1015, "loss": 0.392, "lr": 3.9838126808786006e-05, "epoch": 0.6896551724137931, "percentage": 13.79, "elapsed_time": "1:02:38", "remaining_time": "6:31:32"}
{"current_steps": 145, "total_steps": 1015, "loss": 0.3869, "lr": 3.9791502550194803e-05, "epoch": 0.7142857142857143, "percentage": 14.29, "elapsed_time": "1:04:37", "remaining_time": "6:27:47"}
{"current_steps": 150, "total_steps": 1015, "loss": 0.388, "lr": 3.973902006383831e-05, "epoch": 0.7389162561576355, "percentage": 14.78, "elapsed_time": "1:06:39", "remaining_time": "6:24:26"}
{"current_steps": 155, "total_steps": 1015, "loss": 0.3824, "lr": 3.968069488438139e-05, "epoch": 0.7635467980295566, "percentage": 15.27, "elapsed_time": "1:08:40", "remaining_time": "6:21:04"}
{"current_steps": 160, "total_steps": 1015, "loss": 0.3812, "lr": 3.9616544275909195e-05, "epoch": 0.7881773399014779, "percentage": 15.76, "elapsed_time": "1:10:43", "remaining_time": "6:17:58"}
{"current_steps": 165, "total_steps": 1015, "loss": 0.3816, "lr": 3.954658722681712e-05, "epoch": 0.812807881773399, "percentage": 16.26, "elapsed_time": "1:12:39", "remaining_time": "6:14:17"}
{"current_steps": 170, "total_steps": 1015, "loss": 0.3849, "lr": 3.9470844444190246e-05, "epoch": 0.8374384236453202, "percentage": 16.75, "elapsed_time": "1:14:39", "remaining_time": "6:11:06"}
{"current_steps": 175, "total_steps": 1015, "loss": 0.3823, "lr": 3.938933834767414e-05, "epoch": 0.8620689655172413, "percentage": 17.24, "elapsed_time": "1:16:42", "remaining_time": "6:08:13"}
{"current_steps": 180, "total_steps": 1015, "loss": 0.3743, "lr": 3.930209306283867e-05, "epoch": 0.8866995073891626, "percentage": 17.73, "elapsed_time": "1:18:40", "remaining_time": "6:04:56"}
{"current_steps": 185, "total_steps": 1015, "loss": 0.3729, "lr": 3.9209134414036925e-05, "epoch": 0.9113300492610837, "percentage": 18.23, "elapsed_time": "1:20:44", "remaining_time": "6:02:15"}
{"current_steps": 190, "total_steps": 1015, "loss": 0.3801, "lr": 3.9110489916761276e-05, "epoch": 0.9359605911330049, "percentage": 18.72, "elapsed_time": "1:22:39", "remaining_time": "5:58:52"}
{"current_steps": 195, "total_steps": 1015, "loss": 0.3741, "lr": 3.9006188769498865e-05, "epoch": 0.9605911330049262, "percentage": 19.21, "elapsed_time": "1:24:33", "remaining_time": "5:55:34"}
{"current_steps": 200, "total_steps": 1015, "loss": 0.3741, "lr": 3.8896261845088955e-05, "epoch": 0.9852216748768473, "percentage": 19.7, "elapsed_time": "1:26:39", "remaining_time": "5:53:07"}
{"current_steps": 205, "total_steps": 1015, "loss": 0.3612, "lr": 3.8780741681584636e-05, "epoch": 1.0098522167487685, "percentage": 20.2, "elapsed_time": "1:28:45", "remaining_time": "5:50:43"}
{"current_steps": 210, "total_steps": 1015, "loss": 0.3593, "lr": 3.865966247262166e-05, "epoch": 1.0344827586206897, "percentage": 20.69, "elapsed_time": "1:30:47", "remaining_time": "5:48:03"}
{"current_steps": 215, "total_steps": 1015, "loss": 0.3561, "lr": 3.8533060057297235e-05, "epoch": 1.0591133004926108, "percentage": 21.18, "elapsed_time": "1:32:44", "remaining_time": "5:45:05"}
{"current_steps": 220, "total_steps": 1015, "loss": 0.3581, "lr": 3.840097190956175e-05, "epoch": 1.083743842364532, "percentage": 21.67, "elapsed_time": "1:34:48", "remaining_time": "5:42:36"}
{"current_steps": 225, "total_steps": 1015, "loss": 0.3503, "lr": 3.826343712712658e-05, "epoch": 1.1083743842364533, "percentage": 22.17, "elapsed_time": "1:36:54", "remaining_time": "5:40:16"}
{"current_steps": 230, "total_steps": 1015, "loss": 0.3595, "lr": 3.81204964198913e-05, "epoch": 1.1330049261083743, "percentage": 22.66, "elapsed_time": "1:39:09", "remaining_time": "5:38:25"}
{"current_steps": 235, "total_steps": 1015, "loss": 0.3497, "lr": 3.797219209789365e-05, "epoch": 1.1576354679802956, "percentage": 23.15, "elapsed_time": "1:41:16", "remaining_time": "5:36:07"}
{"current_steps": 240, "total_steps": 1015, "loss": 0.3585, "lr": 3.7818568058785906e-05, "epoch": 1.1822660098522166, "percentage": 23.65, "elapsed_time": "1:43:19", "remaining_time": "5:33:39"}
{"current_steps": 245, "total_steps": 1015, "loss": 0.3591, "lr": 3.7659669774841274e-05, "epoch": 1.206896551724138, "percentage": 24.14, "elapsed_time": "1:45:18", "remaining_time": "5:30:58"}
{"current_steps": 250, "total_steps": 1015, "loss": 0.351, "lr": 3.749554427949426e-05, "epoch": 1.2315270935960592, "percentage": 24.63, "elapsed_time": "1:47:19", "remaining_time": "5:28:25"}
{"current_steps": 255, "total_steps": 1015, "loss": 0.3501, "lr": 3.7326240153418895e-05, "epoch": 1.2561576354679804, "percentage": 25.12, "elapsed_time": "1:49:29", "remaining_time": "5:26:20"}
{"current_steps": 260, "total_steps": 1015, "loss": 0.3582, "lr": 3.7151807510148975e-05, "epoch": 1.2807881773399015, "percentage": 25.62, "elapsed_time": "1:51:29", "remaining_time": "5:23:45"}
{"current_steps": 265, "total_steps": 1015, "loss": 0.3577, "lr": 3.697229798124464e-05, "epoch": 1.3054187192118227, "percentage": 26.11, "elapsed_time": "1:53:24", "remaining_time": "5:20:58"}
{"current_steps": 270, "total_steps": 1015, "loss": 0.3545, "lr": 3.678776470100954e-05, "epoch": 1.3300492610837438, "percentage": 26.6, "elapsed_time": "1:55:17", "remaining_time": "5:18:07"}
{"current_steps": 275, "total_steps": 1015, "loss": 0.3534, "lr": 3.659826229076326e-05, "epoch": 1.354679802955665, "percentage": 27.09, "elapsed_time": "1:57:17", "remaining_time": "5:15:36"}
{"current_steps": 280, "total_steps": 1015, "loss": 0.3469, "lr": 3.640384684267357e-05, "epoch": 1.3793103448275863, "percentage": 27.59, "elapsed_time": "1:59:14", "remaining_time": "5:12:59"}
{"current_steps": 285, "total_steps": 1015, "loss": 0.3532, "lr": 3.6204575903153285e-05, "epoch": 1.4039408866995073, "percentage": 28.08, "elapsed_time": "2:01:11", "remaining_time": "5:10:26"}
{"current_steps": 290, "total_steps": 1015, "loss": 0.3474, "lr": 3.600050845582669e-05, "epoch": 1.4285714285714286, "percentage": 28.57, "elapsed_time": "2:03:09", "remaining_time": "5:07:52"}
{"current_steps": 295, "total_steps": 1015, "loss": 0.3484, "lr": 3.57917049040706e-05, "epoch": 1.4532019704433496, "percentage": 29.06, "elapsed_time": "2:05:07", "remaining_time": "5:05:23"}
{"current_steps": 300, "total_steps": 1015, "loss": 0.3518, "lr": 3.557822705313507e-05, "epoch": 1.477832512315271, "percentage": 29.56, "elapsed_time": "2:06:58", "remaining_time": "5:02:37"}
{"current_steps": 305, "total_steps": 1015, "loss": 0.3469, "lr": 3.5360138091849276e-05, "epoch": 1.5024630541871922, "percentage": 30.05, "elapsed_time": "2:09:04", "remaining_time": "5:00:28"}
{"current_steps": 310, "total_steps": 1015, "loss": 0.3508, "lr": 3.513750257391778e-05, "epoch": 1.5270935960591134, "percentage": 30.54, "elapsed_time": "2:11:07", "remaining_time": "4:58:12"}
{"current_steps": 315, "total_steps": 1015, "loss": 0.3477, "lr": 3.4910386398812784e-05, "epoch": 1.5517241379310345, "percentage": 31.03, "elapsed_time": "2:13:10", "remaining_time": "4:55:57"}
{"current_steps": 320, "total_steps": 1015, "loss": 0.3485, "lr": 3.467885679226817e-05, "epoch": 1.5763546798029555, "percentage": 31.53, "elapsed_time": "2:15:07", "remaining_time": "4:53:27"}
{"current_steps": 325, "total_steps": 1015, "loss": 0.3532, "lr": 3.444298228638077e-05, "epoch": 1.6009852216748768, "percentage": 32.02, "elapsed_time": "2:17:04", "remaining_time": "4:51:01"}
{"current_steps": 330, "total_steps": 1015, "loss": 0.345, "lr": 3.420283269932514e-05, "epoch": 1.625615763546798, "percentage": 32.51, "elapsed_time": "2:19:03", "remaining_time": "4:48:38"}
{"current_steps": 335, "total_steps": 1015, "loss": 0.3458, "lr": 3.3958479114687515e-05, "epoch": 1.6502463054187193, "percentage": 33.0, "elapsed_time": "2:20:55", "remaining_time": "4:46:04"}
{"current_steps": 340, "total_steps": 1015, "loss": 0.3473, "lr": 3.3709993860425346e-05, "epoch": 1.6748768472906403, "percentage": 33.5, "elapsed_time": "2:22:55", "remaining_time": "4:43:44"}
{"current_steps": 345, "total_steps": 1015, "loss": 0.3516, "lr": 3.345745048745838e-05, "epoch": 1.6995073891625616, "percentage": 33.99, "elapsed_time": "2:24:56", "remaining_time": "4:41:29"}
{"current_steps": 350, "total_steps": 1015, "loss": 0.3546, "lr": 3.320092374789782e-05, "epoch": 1.7241379310344827, "percentage": 34.48, "elapsed_time": "2:26:58", "remaining_time": "4:39:14"}
{"current_steps": 355, "total_steps": 1015, "loss": 0.3472, "lr": 3.2940489572919917e-05, "epoch": 1.748768472906404, "percentage": 34.98, "elapsed_time": "2:28:44", "remaining_time": "4:36:32"}
{"current_steps": 360, "total_steps": 1015, "loss": 0.3448, "lr": 3.267622505029053e-05, "epoch": 1.7733990147783252, "percentage": 35.47, "elapsed_time": "2:30:34", "remaining_time": "4:33:56"}
{"current_steps": 365, "total_steps": 1015, "loss": 0.3492, "lr": 3.24082084015474e-05, "epoch": 1.7980295566502464, "percentage": 35.96, "elapsed_time": "2:32:29", "remaining_time": "4:31:34"}
{"current_steps": 370, "total_steps": 1015, "loss": 0.3508, "lr": 3.213651895884683e-05, "epoch": 1.8226600985221675, "percentage": 36.45, "elapsed_time": "2:34:31", "remaining_time": "4:29:22"}
{"current_steps": 375, "total_steps": 1015, "loss": 0.3463, "lr": 3.1861237141481506e-05, "epoch": 1.8472906403940885, "percentage": 36.95, "elapsed_time": "2:36:29", "remaining_time": "4:27:04"}
{"current_steps": 380, "total_steps": 1015, "loss": 0.3535, "lr": 3.158244443207671e-05, "epoch": 1.8719211822660098, "percentage": 37.44, "elapsed_time": "2:38:31", "remaining_time": "4:24:54"}
{"current_steps": 385, "total_steps": 1015, "loss": 0.3417, "lr": 3.130022335247163e-05, "epoch": 1.896551724137931, "percentage": 37.93, "elapsed_time": "2:40:28", "remaining_time": "4:22:35"}
{"current_steps": 390, "total_steps": 1015, "loss": 0.345, "lr": 3.101465743929318e-05, "epoch": 1.9211822660098523, "percentage": 38.42, "elapsed_time": "2:42:31", "remaining_time": "4:20:27"}
{"current_steps": 395, "total_steps": 1015, "loss": 0.3455, "lr": 3.072583121922939e-05, "epoch": 1.9458128078817734, "percentage": 38.92, "elapsed_time": "2:44:28", "remaining_time": "4:18:09"}
{"current_steps": 400, "total_steps": 1015, "loss": 0.3453, "lr": 3.0433830184009694e-05, "epoch": 1.9704433497536946, "percentage": 39.41, "elapsed_time": "2:46:27", "remaining_time": "4:15:56"}
{"current_steps": 405, "total_steps": 1015, "loss": 0.3489, "lr": 3.0138740765099724e-05, "epoch": 1.9950738916256157, "percentage": 39.9, "elapsed_time": "2:48:23", "remaining_time": "4:13:36"}
{"current_steps": 410, "total_steps": 1015, "loss": 0.3243, "lr": 2.984065030811776e-05, "epoch": 2.019704433497537, "percentage": 40.39, "elapsed_time": "2:50:19", "remaining_time": "4:11:19"}
{"current_steps": 415, "total_steps": 1015, "loss": 0.3208, "lr": 2.9539647046980716e-05, "epoch": 2.044334975369458, "percentage": 40.89, "elapsed_time": "2:52:19", "remaining_time": "4:09:08"}
{"current_steps": 420, "total_steps": 1015, "loss": 0.3251, "lr": 2.923582007778716e-05, "epoch": 2.0689655172413794, "percentage": 41.38, "elapsed_time": "2:54:20", "remaining_time": "4:06:58"}
{"current_steps": 425, "total_steps": 1015, "loss": 0.3271, "lr": 2.8929259332445096e-05, "epoch": 2.0935960591133007, "percentage": 41.87, "elapsed_time": "2:56:19", "remaining_time": "4:04:46"}
{"current_steps": 430, "total_steps": 1015, "loss": 0.3258, "lr": 2.8620055552052403e-05, "epoch": 2.1182266009852215, "percentage": 42.36, "elapsed_time": "2:58:18", "remaining_time": "4:02:34"}
{"current_steps": 435, "total_steps": 1015, "loss": 0.3224, "lr": 2.8308300260037734e-05, "epoch": 2.142857142857143, "percentage": 42.86, "elapsed_time": "3:00:24", "remaining_time": "4:00:32"}
{"current_steps": 440, "total_steps": 1015, "loss": 0.325, "lr": 2.7994085735069814e-05, "epoch": 2.167487684729064, "percentage": 43.35, "elapsed_time": "3:02:26", "remaining_time": "3:58:25"}
{"current_steps": 445, "total_steps": 1015, "loss": 0.3197, "lr": 2.767750498374327e-05, "epoch": 2.1921182266009853, "percentage": 43.84, "elapsed_time": "3:04:18", "remaining_time": "3:56:05"}
{"current_steps": 450, "total_steps": 1015, "loss": 0.3188, "lr": 2.735865171304889e-05, "epoch": 2.2167487684729066, "percentage": 44.33, "elapsed_time": "3:06:22", "remaining_time": "3:54:00"}
{"current_steps": 455, "total_steps": 1015, "loss": 0.3243, "lr": 2.703762030263666e-05, "epoch": 2.2413793103448274, "percentage": 44.83, "elapsed_time": "3:08:27", "remaining_time": "3:51:56"}
{"current_steps": 460, "total_steps": 1015, "loss": 0.3212, "lr": 2.6714505776879666e-05, "epoch": 2.2660098522167487, "percentage": 45.32, "elapsed_time": "3:10:32", "remaining_time": "3:49:53"}
{"current_steps": 465, "total_steps": 1015, "loss": 0.3182, "lr": 2.6389403776747116e-05, "epoch": 2.29064039408867, "percentage": 45.81, "elapsed_time": "3:12:28", "remaining_time": "3:47:38"}
{"current_steps": 470, "total_steps": 1015, "loss": 0.3244, "lr": 2.606241053149492e-05, "epoch": 2.315270935960591, "percentage": 46.31, "elapsed_time": "3:14:19", "remaining_time": "3:45:20"}
{"current_steps": 475, "total_steps": 1015, "loss": 0.327, "lr": 2.5733622830182095e-05, "epoch": 2.3399014778325125, "percentage": 46.8, "elapsed_time": "3:16:21", "remaining_time": "3:43:13"}
{"current_steps": 480, "total_steps": 1015, "loss": 0.3234, "lr": 2.5403137993021483e-05, "epoch": 2.3645320197044333, "percentage": 47.29, "elapsed_time": "3:18:22", "remaining_time": "3:41:05"}
{"current_steps": 485, "total_steps": 1015, "loss": 0.3223, "lr": 2.5071053842573264e-05, "epoch": 2.3891625615763545, "percentage": 47.78, "elapsed_time": "3:20:35", "remaining_time": "3:39:12"}
{"current_steps": 490, "total_steps": 1015, "loss": 0.3208, "lr": 2.473746867478973e-05, "epoch": 2.413793103448276, "percentage": 48.28, "elapsed_time": "3:22:29", "remaining_time": "3:36:57"}
{"current_steps": 495, "total_steps": 1015, "loss": 0.325, "lr": 2.4402481229919982e-05, "epoch": 2.438423645320197, "percentage": 48.77, "elapsed_time": "3:24:30", "remaining_time": "3:34:50"}
{"current_steps": 500, "total_steps": 1015, "loss": 0.321, "lr": 2.406619066328311e-05, "epoch": 2.4630541871921183, "percentage": 49.26, "elapsed_time": "3:26:32", "remaining_time": "3:32:44"}
{"current_steps": 505, "total_steps": 1015, "loss": 0.3213, "lr": 2.3728696515918496e-05, "epoch": 2.4876847290640396, "percentage": 49.75, "elapsed_time": "3:28:24", "remaining_time": "3:30:28"}
{"current_steps": 510, "total_steps": 1015, "loss": 0.322, "lr": 2.3390098685121938e-05, "epoch": 2.512315270935961, "percentage": 50.25, "elapsed_time": "3:30:25", "remaining_time": "3:28:21"}
{"current_steps": 515, "total_steps": 1015, "loss": 0.3222, "lr": 2.3050497394876363e-05, "epoch": 2.5369458128078817, "percentage": 50.74, "elapsed_time": "3:32:21", "remaining_time": "3:26:10"}
{"current_steps": 520, "total_steps": 1015, "loss": 0.3341, "lr": 2.2709993166185803e-05, "epoch": 2.561576354679803, "percentage": 51.23, "elapsed_time": "3:34:18", "remaining_time": "3:24:00"}
{"current_steps": 525, "total_steps": 1015, "loss": 0.3242, "lr": 2.2368686787321475e-05, "epoch": 2.586206896551724, "percentage": 51.72, "elapsed_time": "3:36:17", "remaining_time": "3:21:52"}
{"current_steps": 530, "total_steps": 1015, "loss": 0.3196, "lr": 2.2026679283988727e-05, "epoch": 2.6108374384236455, "percentage": 52.22, "elapsed_time": "3:38:13", "remaining_time": "3:19:41"}
{"current_steps": 535, "total_steps": 1015, "loss": 0.3221, "lr": 2.168407188942373e-05, "epoch": 2.6354679802955667, "percentage": 52.71, "elapsed_time": "3:40:09", "remaining_time": "3:17:31"}
{"current_steps": 540, "total_steps": 1015, "loss": 0.322, "lr": 2.1340966014428744e-05, "epoch": 2.6600985221674875, "percentage": 53.2, "elapsed_time": "3:42:11", "remaining_time": "3:15:26"}
{"current_steps": 545, "total_steps": 1015, "loss": 0.3207, "lr": 2.0997463217354803e-05, "epoch": 2.684729064039409, "percentage": 53.69, "elapsed_time": "3:44:17", "remaining_time": "3:13:25"}
{"current_steps": 550, "total_steps": 1015, "loss": 0.3208, "lr": 2.065366517404071e-05, "epoch": 2.70935960591133, "percentage": 54.19, "elapsed_time": "3:46:18", "remaining_time": "3:11:20"}
{"current_steps": 555, "total_steps": 1015, "loss": 0.317, "lr": 2.030967364771733e-05, "epoch": 2.7339901477832513, "percentage": 54.68, "elapsed_time": "3:48:14", "remaining_time": "3:09:10"}
{"current_steps": 560, "total_steps": 1015, "loss": 0.3193, "lr": 1.996559045888593e-05, "epoch": 2.7586206896551726, "percentage": 55.17, "elapsed_time": "3:50:07", "remaining_time": "3:06:58"}
{"current_steps": 565, "total_steps": 1015, "loss": 0.323, "lr": 1.9621517455179627e-05, "epoch": 2.7832512315270934, "percentage": 55.67, "elapsed_time": "3:52:01", "remaining_time": "3:04:47"}
{"current_steps": 570, "total_steps": 1015, "loss": 0.3192, "lr": 1.9277556481216737e-05, "epoch": 2.8078817733990147, "percentage": 56.16, "elapsed_time": "3:53:57", "remaining_time": "3:02:38"}
{"current_steps": 575, "total_steps": 1015, "loss": 0.3226, "lr": 1.893380934845514e-05, "epoch": 2.832512315270936, "percentage": 56.65, "elapsed_time": "3:56:00", "remaining_time": "3:00:35"}
{"current_steps": 580, "total_steps": 1015, "loss": 0.3192, "lr": 1.8590377805056306e-05, "epoch": 2.857142857142857, "percentage": 57.14, "elapsed_time": "3:57:51", "remaining_time": "2:58:23"}
{"current_steps": 585, "total_steps": 1015, "loss": 0.3153, "lr": 1.8247363505768177e-05, "epoch": 2.8817733990147785, "percentage": 57.64, "elapsed_time": "3:59:49", "remaining_time": "2:56:17"}
{"current_steps": 590, "total_steps": 1015, "loss": 0.3229, "lr": 1.7904867981835617e-05, "epoch": 2.9064039408866993, "percentage": 58.13, "elapsed_time": "4:01:43", "remaining_time": "2:54:07"}
{"current_steps": 595, "total_steps": 1015, "loss": 0.3165, "lr": 1.7562992610947517e-05, "epoch": 2.9310344827586206, "percentage": 58.62, "elapsed_time": "4:03:41", "remaining_time": "2:52:01"}
{"current_steps": 600, "total_steps": 1015, "loss": 0.3234, "lr": 1.7221838587229215e-05, "epoch": 2.955665024630542, "percentage": 59.11, "elapsed_time": "4:05:33", "remaining_time": "2:49:50"}
{"current_steps": 605, "total_steps": 1015, "loss": 0.3164, "lr": 1.6881506891289386e-05, "epoch": 2.980295566502463, "percentage": 59.61, "elapsed_time": "4:07:38", "remaining_time": "2:47:49"}
{"current_steps": 610, "total_steps": 1015, "loss": 0.3148, "lr": 1.654209826033004e-05, "epoch": 3.0049261083743843, "percentage": 60.1, "elapsed_time": "4:09:41", "remaining_time": "2:45:46"}
{"current_steps": 615, "total_steps": 1015, "loss": 0.3046, "lr": 1.6203713158328626e-05, "epoch": 3.0295566502463056, "percentage": 60.59, "elapsed_time": "4:11:34", "remaining_time": "2:43:37"}
{"current_steps": 620, "total_steps": 1015, "loss": 0.2992, "lr": 1.586645174630094e-05, "epoch": 3.0541871921182264, "percentage": 61.08, "elapsed_time": "4:13:44", "remaining_time": "2:41:39"}
{"current_steps": 625, "total_steps": 1015, "loss": 0.3028, "lr": 1.5530413852653816e-05, "epoch": 3.0788177339901477, "percentage": 61.58, "elapsed_time": "4:15:46", "remaining_time": "2:39:35"}
{"current_steps": 630, "total_steps": 1015, "loss": 0.2978, "lr": 1.5195698943636135e-05, "epoch": 3.103448275862069, "percentage": 62.07, "elapsed_time": "4:17:48", "remaining_time": "2:37:32"}
{"current_steps": 635, "total_steps": 1015, "loss": 0.2999, "lr": 1.4862406093897175e-05, "epoch": 3.12807881773399, "percentage": 62.56, "elapsed_time": "4:19:51", "remaining_time": "2:35:30"}
{"current_steps": 640, "total_steps": 1015, "loss": 0.3012, "lr": 1.4530633957160733e-05, "epoch": 3.1527093596059115, "percentage": 63.05, "elapsed_time": "4:21:47", "remaining_time": "2:33:23"}
{"current_steps": 645, "total_steps": 1015, "loss": 0.3004, "lr": 1.4200480737023943e-05, "epoch": 3.1773399014778327, "percentage": 63.55, "elapsed_time": "4:23:50", "remaining_time": "2:31:21"}
{"current_steps": 650, "total_steps": 1015, "loss": 0.3019, "lr": 1.3872044157889297e-05, "epoch": 3.2019704433497536, "percentage": 64.04, "elapsed_time": "4:25:53", "remaining_time": "2:29:18"}
{"current_steps": 655, "total_steps": 1015, "loss": 0.3094, "lr": 1.3545421436038477e-05, "epoch": 3.226600985221675, "percentage": 64.53, "elapsed_time": "4:27:43", "remaining_time": "2:27:08"}
{"current_steps": 660, "total_steps": 1015, "loss": 0.3058, "lr": 1.3220709250856656e-05, "epoch": 3.251231527093596, "percentage": 65.02, "elapsed_time": "4:29:48", "remaining_time": "2:25:07"}
{"current_steps": 665, "total_steps": 1015, "loss": 0.2999, "lr": 1.2898003716215626e-05, "epoch": 3.2758620689655173, "percentage": 65.52, "elapsed_time": "4:31:45", "remaining_time": "2:23:02"}
{"current_steps": 670, "total_steps": 1015, "loss": 0.3011, "lr": 1.2577400352024426e-05, "epoch": 3.3004926108374386, "percentage": 66.01, "elapsed_time": "4:33:41", "remaining_time": "2:20:56"}
{"current_steps": 675, "total_steps": 1015, "loss": 0.2993, "lr": 1.2258994055955658e-05, "epoch": 3.3251231527093594, "percentage": 66.5, "elapsed_time": "4:35:36", "remaining_time": "2:18:49"}
{"current_steps": 680, "total_steps": 1015, "loss": 0.3, "lr": 1.1942879075356135e-05, "epoch": 3.3497536945812807, "percentage": 67.0, "elapsed_time": "4:37:35", "remaining_time": "2:16:45"}
{"current_steps": 685, "total_steps": 1015, "loss": 0.2964, "lr": 1.1629148979349836e-05, "epoch": 3.374384236453202, "percentage": 67.49, "elapsed_time": "4:39:32", "remaining_time": "2:14:40"}
{"current_steps": 690, "total_steps": 1015, "loss": 0.3052, "lr": 1.1317896631141814e-05, "epoch": 3.399014778325123, "percentage": 67.98, "elapsed_time": "4:41:27", "remaining_time": "2:12:34"}
{"current_steps": 695, "total_steps": 1015, "loss": 0.3036, "lr": 1.1009214160530875e-05, "epoch": 3.4236453201970445, "percentage": 68.47, "elapsed_time": "4:43:24", "remaining_time": "2:10:29"}
{"current_steps": 700, "total_steps": 1015, "loss": 0.3022, "lr": 1.0703192936639481e-05, "epoch": 3.4482758620689653, "percentage": 68.97, "elapsed_time": "4:45:19", "remaining_time": "2:08:23"}
{"current_steps": 705, "total_steps": 1015, "loss": 0.3059, "lr": 1.0399923540868712e-05, "epoch": 3.4729064039408866, "percentage": 69.46, "elapsed_time": "4:47:23", "remaining_time": "2:06:22"}
{"current_steps": 710, "total_steps": 1015, "loss": 0.3009, "lr": 1.0099495740086454e-05, "epoch": 3.497536945812808, "percentage": 69.95, "elapsed_time": "4:49:14", "remaining_time": "2:04:15"}
{"current_steps": 715, "total_steps": 1015, "loss": 0.3038, "lr": 9.801998460056643e-06, "epoch": 3.522167487684729, "percentage": 70.44, "elapsed_time": "4:51:13", "remaining_time": "2:02:11"}
{"current_steps": 720, "total_steps": 1015, "loss": 0.2991, "lr": 9.507519759117546e-06, "epoch": 3.5467980295566504, "percentage": 70.94, "elapsed_time": "4:53:13", "remaining_time": "2:00:08"}
{"current_steps": 725, "total_steps": 1015, "loss": 0.3053, "lr": 9.216146802116676e-06, "epoch": 3.571428571428571, "percentage": 71.43, "elapsed_time": "4:55:08", "remaining_time": "1:58:03"}
{"current_steps": 730, "total_steps": 1015, "loss": 0.3064, "lr": 8.92796583461031e-06, "epoch": 3.596059113300493, "percentage": 71.92, "elapsed_time": "4:57:10", "remaining_time": "1:56:01"}
{"current_steps": 735, "total_steps": 1015, "loss": 0.2994, "lr": 8.643062157335e-06, "epoch": 3.6206896551724137, "percentage": 72.41, "elapsed_time": "4:59:06", "remaining_time": "1:53:56"}
{"current_steps": 740, "total_steps": 1015, "loss": 0.2982, "lr": 8.361520100958856e-06, "epoch": 3.645320197044335, "percentage": 72.91, "elapsed_time": "5:01:11", "remaining_time": "1:51:55"}
{"current_steps": 745, "total_steps": 1015, "loss": 0.3, "lr": 8.083423001119855e-06, "epoch": 3.6699507389162562, "percentage": 73.4, "elapsed_time": "5:03:10", "remaining_time": "1:49:52"}
{"current_steps": 750, "total_steps": 1015, "loss": 0.2991, "lr": 7.80885317375877e-06, "epoch": 3.6945812807881775, "percentage": 73.89, "elapsed_time": "5:05:09", "remaining_time": "1:47:49"}
{"current_steps": 755, "total_steps": 1015, "loss": 0.2975, "lr": 7.537891890753879e-06, "epoch": 3.7192118226600988, "percentage": 74.38, "elapsed_time": "5:07:04", "remaining_time": "1:45:44"}
{"current_steps": 760, "total_steps": 1015, "loss": 0.3027, "lr": 7.27061935586471e-06, "epoch": 3.7438423645320196, "percentage": 74.88, "elapsed_time": "5:09:08", "remaining_time": "1:43:43"}
{"current_steps": 765, "total_steps": 1015, "loss": 0.3049, "lr": 7.007114680991995e-06, "epoch": 3.768472906403941, "percentage": 75.37, "elapsed_time": "5:11:09", "remaining_time": "1:41:41"}
{"current_steps": 770, "total_steps": 1015, "loss": 0.3018, "lr": 6.747455862760723e-06, "epoch": 3.793103448275862, "percentage": 75.86, "elapsed_time": "5:12:53", "remaining_time": "1:39:33"}
{"current_steps": 775, "total_steps": 1015, "loss": 0.3021, "lr": 6.491719759433414e-06, "epoch": 3.8177339901477834, "percentage": 76.35, "elapsed_time": "5:14:53", "remaining_time": "1:37:31"}
{"current_steps": 780, "total_steps": 1015, "loss": 0.2989, "lr": 6.239982068160251e-06, "epoch": 3.8423645320197046, "percentage": 76.85, "elapsed_time": "5:16:51", "remaining_time": "1:35:27"}
{"current_steps": 785, "total_steps": 1015, "loss": 0.3018, "lr": 5.9923173025729895e-06, "epoch": 3.8669950738916254, "percentage": 77.34, "elapsed_time": "5:18:48", "remaining_time": "1:33:24"}
{"current_steps": 790, "total_steps": 1015, "loss": 0.2968, "lr": 5.748798770729071e-06, "epoch": 3.8916256157635467, "percentage": 77.83, "elapsed_time": "5:20:44", "remaining_time": "1:31:20"}
{"current_steps": 795, "total_steps": 1015, "loss": 0.3041, "lr": 5.509498553412727e-06, "epoch": 3.916256157635468, "percentage": 78.33, "elapsed_time": "5:22:34", "remaining_time": "1:29:15"}
{"current_steps": 800, "total_steps": 1015, "loss": 0.2993, "lr": 5.274487482799206e-06, "epoch": 3.9408866995073892, "percentage": 78.82, "elapsed_time": "5:24:39", "remaining_time": "1:27:15"}
{"current_steps": 805, "total_steps": 1015, "loss": 0.307, "lr": 5.04383512148871e-06, "epoch": 3.9655172413793105, "percentage": 79.31, "elapsed_time": "5:26:36", "remaining_time": "1:25:12"}
{"current_steps": 810, "total_steps": 1015, "loss": 0.3056, "lr": 4.817609741916009e-06, "epoch": 3.9901477832512313, "percentage": 79.8, "elapsed_time": "5:28:29", "remaining_time": "1:23:08"}
{"current_steps": 815, "total_steps": 1015, "loss": 0.2883, "lr": 4.595878306142059e-06, "epoch": 4.014778325123153, "percentage": 80.3, "elapsed_time": "5:30:26", "remaining_time": "1:21:05"}
{"current_steps": 820, "total_steps": 1015, "loss": 0.2915, "lr": 4.37870644603336e-06, "epoch": 4.039408866995074, "percentage": 80.79, "elapsed_time": "5:32:32", "remaining_time": "1:19:04"}
{"current_steps": 825, "total_steps": 1015, "loss": 0.2911, "lr": 4.1661584438351645e-06, "epoch": 4.064039408866995, "percentage": 81.28, "elapsed_time": "5:34:32", "remaining_time": "1:17:02"}
{"current_steps": 830, "total_steps": 1015, "loss": 0.2928, "lr": 3.958297213144084e-06, "epoch": 4.088669950738916, "percentage": 81.77, "elapsed_time": "5:36:26", "remaining_time": "1:14:59"}
{"current_steps": 835, "total_steps": 1015, "loss": 0.2892, "lr": 3.7551842802858772e-06, "epoch": 4.113300492610837, "percentage": 82.27, "elapsed_time": "5:38:17", "remaining_time": "1:12:55"}
{"current_steps": 840, "total_steps": 1015, "loss": 0.2913, "lr": 3.5568797661038004e-06, "epoch": 4.137931034482759, "percentage": 82.76, "elapsed_time": "5:40:16", "remaining_time": "1:10:53"}
{"current_steps": 845, "total_steps": 1015, "loss": 0.2926, "lr": 3.3634423681630392e-06, "epoch": 4.16256157635468, "percentage": 83.25, "elapsed_time": "5:42:16", "remaining_time": "1:08:51"}
{"current_steps": 850, "total_steps": 1015, "loss": 0.2942, "lr": 3.174929343376374e-06, "epoch": 4.187192118226601, "percentage": 83.74, "elapsed_time": "5:44:16", "remaining_time": "1:06:49"}
{"current_steps": 855, "total_steps": 1015, "loss": 0.2835, "lr": 2.991396491056331e-06, "epoch": 4.211822660098522, "percentage": 84.24, "elapsed_time": "5:46:25", "remaining_time": "1:04:49"}
{"current_steps": 860, "total_steps": 1015, "loss": 0.2938, "lr": 2.812898136398705e-06, "epoch": 4.236453201970443, "percentage": 84.73, "elapsed_time": "5:48:26", "remaining_time": "1:02:47"}
{"current_steps": 865, "total_steps": 1015, "loss": 0.2879, "lr": 2.6394871144024926e-06, "epoch": 4.261083743842365, "percentage": 85.22, "elapsed_time": "5:50:35", "remaining_time": "1:00:47"}
{"current_steps": 870, "total_steps": 1015, "loss": 0.2894, "lr": 2.471214754230866e-06, "epoch": 4.285714285714286, "percentage": 85.71, "elapsed_time": "5:52:30", "remaining_time": "0:58:45"}
{"current_steps": 875, "total_steps": 1015, "loss": 0.2893, "lr": 2.3081308640178945e-06, "epoch": 4.310344827586207, "percentage": 86.21, "elapsed_time": "5:54:25", "remaining_time": "0:56:42"}
{"current_steps": 880, "total_steps": 1015, "loss": 0.2916, "lr": 2.1502837161254873e-06, "epoch": 4.334975369458128, "percentage": 86.7, "elapsed_time": "5:56:28", "remaining_time": "0:54:41"}
{"current_steps": 885, "total_steps": 1015, "loss": 0.2853, "lr": 1.9977200328548953e-06, "epoch": 4.359605911330049, "percentage": 87.19, "elapsed_time": "5:58:21", "remaining_time": "0:52:38"}
{"current_steps": 890, "total_steps": 1015, "loss": 0.2913, "lr": 1.8504849726170637e-06, "epoch": 4.384236453201971, "percentage": 87.68, "elapsed_time": "6:00:26", "remaining_time": "0:50:37"}
{"current_steps": 895, "total_steps": 1015, "loss": 0.2897, "lr": 1.7086221165658544e-06, "epoch": 4.4088669950738915, "percentage": 88.18, "elapsed_time": "6:02:25", "remaining_time": "0:48:35"}
{"current_steps": 900, "total_steps": 1015, "loss": 0.2918, "lr": 1.5721734556981761e-06, "epoch": 4.433497536945813, "percentage": 88.67, "elapsed_time": "6:04:21", "remaining_time": "0:46:33"}
{"current_steps": 905, "total_steps": 1015, "loss": 0.2921, "lr": 1.4411793784247263e-06, "epoch": 4.458128078817734, "percentage": 89.16, "elapsed_time": "6:06:28", "remaining_time": "0:44:32"}
{"current_steps": 910, "total_steps": 1015, "loss": 0.294, "lr": 1.3156786586151916e-06, "epoch": 4.482758620689655, "percentage": 89.66, "elapsed_time": "6:08:24", "remaining_time": "0:42:30"}
{"current_steps": 915, "total_steps": 1015, "loss": 0.2912, "lr": 1.195708444121253e-06, "epoch": 4.5073891625615765, "percentage": 90.15, "elapsed_time": "6:10:27", "remaining_time": "0:40:29"}
{"current_steps": 920, "total_steps": 1015, "loss": 0.2891, "lr": 1.0813042457809497e-06, "epoch": 4.532019704433497, "percentage": 90.64, "elapsed_time": "6:12:26", "remaining_time": "0:38:27"}
{"current_steps": 925, "total_steps": 1015, "loss": 0.2919, "lr": 9.724999269075598e-07, "epoch": 4.556650246305419, "percentage": 91.13, "elapsed_time": "6:14:31", "remaining_time": "0:36:26"}
{"current_steps": 930, "total_steps": 1015, "loss": 0.2918, "lr": 8.693276932661732e-07, "epoch": 4.58128078817734, "percentage": 91.63, "elapsed_time": "6:16:24", "remaining_time": "0:34:24"}
{"current_steps": 935, "total_steps": 1015, "loss": 0.2899, "lr": 7.718180835408584e-07, "epoch": 4.605911330049262, "percentage": 92.12, "elapsed_time": "6:18:19", "remaining_time": "0:32:22"}
{"current_steps": 940, "total_steps": 1015, "loss": 0.2903, "lr": 6.799999602953189e-07, "epoch": 4.630541871921182, "percentage": 92.61, "elapsed_time": "6:20:23", "remaining_time": "0:30:21"}
{"current_steps": 945, "total_steps": 1015, "loss": 0.2904, "lr": 5.939005014296428e-07, "epoch": 4.655172413793103, "percentage": 93.1, "elapsed_time": "6:22:29", "remaining_time": "0:28:19"}
{"current_steps": 950, "total_steps": 1015, "loss": 0.2939, "lr": 5.135451921357337e-07, "epoch": 4.679802955665025, "percentage": 93.6, "elapsed_time": "6:24:30", "remaining_time": "0:26:18"}
{"current_steps": 955, "total_steps": 1015, "loss": 0.2896, "lr": 4.3895781735375566e-07, "epoch": 4.704433497536946, "percentage": 94.09, "elapsed_time": "6:26:34", "remaining_time": "0:24:17"}
{"current_steps": 960, "total_steps": 1015, "loss": 0.2935, "lr": 3.70160454731876e-07, "epoch": 4.7290640394088665, "percentage": 94.58, "elapsed_time": "6:28:36", "remaining_time": "0:22:15"}
{"current_steps": 965, "total_steps": 1015, "loss": 0.2902, "lr": 3.0717346809132407e-07, "epoch": 4.753694581280788, "percentage": 95.07, "elapsed_time": "6:30:43", "remaining_time": "0:20:14"}
{"current_steps": 970, "total_steps": 1015, "loss": 0.2858, "lr": 2.5001550139877707e-07, "epoch": 4.778325123152709, "percentage": 95.57, "elapsed_time": "6:32:46", "remaining_time": "0:18:13"}
{"current_steps": 975, "total_steps": 1015, "loss": 0.2892, "lr": 1.987034732477877e-07, "epoch": 4.802955665024631, "percentage": 96.06, "elapsed_time": "6:34:46", "remaining_time": "0:16:11"}
{"current_steps": 980, "total_steps": 1015, "loss": 0.2851, "lr": 1.5325257185093923e-07, "epoch": 4.827586206896552, "percentage": 96.55, "elapsed_time": "6:36:43", "remaining_time": "0:14:10"}
{"current_steps": 985, "total_steps": 1015, "loss": 0.2884, "lr": 1.1367625054416575e-07, "epoch": 4.852216748768473, "percentage": 97.04, "elapsed_time": "6:38:33", "remaining_time": "0:12:08"}
{"current_steps": 990, "total_steps": 1015, "loss": 0.2921, "lr": 7.998622380461563e-08, "epoch": 4.876847290640394, "percentage": 97.54, "elapsed_time": "6:40:25", "remaining_time": "0:10:06"}
{"current_steps": 995, "total_steps": 1015, "loss": 0.2887, "lr": 5.219246378319387e-08, "epoch": 4.901477832512315, "percentage": 98.03, "elapsed_time": "6:42:22", "remaining_time": "0:08:05"}
{"current_steps": 1000, "total_steps": 1015, "loss": 0.3004, "lr": 3.030319735283449e-08, "epoch": 4.926108374384237, "percentage": 98.52, "elapsed_time": "6:44:09", "remaining_time": "0:06:03"}
{"current_steps": 1005, "total_steps": 1015, "loss": 0.2872, "lr": 1.4324903673370583e-08, "epoch": 4.9507389162561575, "percentage": 99.01, "elapsed_time": "6:46:02", "remaining_time": "0:04:02"}
{"current_steps": 1010, "total_steps": 1015, "loss": 0.2894, "lr": 4.262312273721758e-09, "epoch": 4.975369458128079, "percentage": 99.51, "elapsed_time": "6:47:57", "remaining_time": "0:02:01"}
{"current_steps": 1015, "total_steps": 1015, "loss": 0.2931, "lr": 1.184016519673037e-10, "epoch": 5.0, "percentage": 100.0, "elapsed_time": "6:49:55", "remaining_time": "0:00:00"}
{"current_steps": 1015, "total_steps": 1015, "epoch": 5.0, "percentage": 100.0, "elapsed_time": "6:50:03", "remaining_time": "0:00:00"}
{"current_steps": 1015, "total_steps": 1015, "epoch": 5.0, "percentage": 100.0, "elapsed_time": "0:00:00", "remaining_time": "0:00:00"}
{"current_steps": 1015, "total_steps": 1015, "epoch": 5.0, "percentage": 100.0, "elapsed_time": "0:00:00", "remaining_time": "0:00:00"}
{"current_steps": 1015, "total_steps": 1015, "epoch": 5.0, "percentage": 100.0, "elapsed_time": "0:00:00", "remaining_time": "0:00:00"}
{"current_steps": 1015, "total_steps": 1015, "epoch": 5.0, "percentage": 100.0, "elapsed_time": "0:00:00", "remaining_time": "0:00:00"}
{"current_steps": 1015, "total_steps": 1015, "epoch": 5.0, "percentage": 100.0, "elapsed_time": "0:00:00", "remaining_time": "0:00:00"}

2276
trainer_state.json Normal file

File diff suppressed because it is too large Load Diff

3
training_args.bin Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:468e0baad232fc61d8cd3a37812208bed22ae3ffc9ee6cfc49591214fe989642
size 8721

BIN
training_loss.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

1
vocab.json Normal file

File diff suppressed because one or more lines are too long