初始化项目,由ModelHub XC社区提供模型
Model: waleko/Qwen3-8B-SFT-envbench_qwen-all Source: Original Platform
This commit is contained in:
37
.gitattributes
vendored
Normal file
37
.gitattributes
vendored
Normal file
@@ -0,0 +1,37 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||
result_model/tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||
67
README.md
Normal file
67
README.md
Normal file
@@ -0,0 +1,67 @@
|
||||
---
|
||||
library_name: transformers
|
||||
license: apache-2.0
|
||||
base_model: Qwen/Qwen3-8B
|
||||
tags:
|
||||
- llama-factory
|
||||
- full
|
||||
- generated_from_trainer
|
||||
metrics:
|
||||
- accuracy
|
||||
model-index:
|
||||
- name: Qwen3-8B-SFT-envbench_qwen-all
|
||||
results: []
|
||||
---
|
||||
|
||||
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
||||
should probably proofread and complete it, then remove this comment. -->
|
||||
|
||||
# Qwen3-8B-SFT-envbench_qwen-all
|
||||
|
||||
This model is a fine-tuned version of [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) on the envbench_qwen-all dataset.
|
||||
It achieves the following results on the evaluation set:
|
||||
- Loss: 0.1477
|
||||
- Accuracy: 0.9511
|
||||
- Num Input Tokens Seen: 36600520
|
||||
|
||||
## Model description
|
||||
|
||||
More information needed
|
||||
|
||||
## Intended uses & limitations
|
||||
|
||||
More information needed
|
||||
|
||||
## Training and evaluation data
|
||||
|
||||
More information needed
|
||||
|
||||
## Training procedure
|
||||
|
||||
### Training hyperparameters
|
||||
|
||||
The following hyperparameters were used during training:
|
||||
- learning_rate: 5e-05
|
||||
- train_batch_size: 1
|
||||
- eval_batch_size: 1
|
||||
- seed: 42
|
||||
- distributed_type: multi-GPU
|
||||
- num_devices: 4
|
||||
- gradient_accumulation_steps: 4
|
||||
- total_train_batch_size: 16
|
||||
- total_eval_batch_size: 4
|
||||
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
||||
- lr_scheduler_type: cosine
|
||||
- lr_scheduler_warmup_ratio: 0.1
|
||||
- num_epochs: 5.0
|
||||
|
||||
### Training results
|
||||
|
||||
|
||||
|
||||
### Framework versions
|
||||
|
||||
- Transformers 4.52.4
|
||||
- Pytorch 2.6.0a0+df5bbc09d1.nv24.12
|
||||
- Datasets 3.6.0
|
||||
- Tokenizers 0.21.1
|
||||
28
added_tokens.json
Normal file
28
added_tokens.json
Normal file
@@ -0,0 +1,28 @@
|
||||
{
|
||||
"</think>": 151668,
|
||||
"</tool_call>": 151658,
|
||||
"</tool_response>": 151666,
|
||||
"<think>": 151667,
|
||||
"<tool_call>": 151657,
|
||||
"<tool_response>": 151665,
|
||||
"<|box_end|>": 151649,
|
||||
"<|box_start|>": 151648,
|
||||
"<|endoftext|>": 151643,
|
||||
"<|file_sep|>": 151664,
|
||||
"<|fim_middle|>": 151660,
|
||||
"<|fim_pad|>": 151662,
|
||||
"<|fim_prefix|>": 151659,
|
||||
"<|fim_suffix|>": 151661,
|
||||
"<|im_end|>": 151645,
|
||||
"<|im_start|>": 151644,
|
||||
"<|image_pad|>": 151655,
|
||||
"<|object_ref_end|>": 151647,
|
||||
"<|object_ref_start|>": 151646,
|
||||
"<|quad_end|>": 151651,
|
||||
"<|quad_start|>": 151650,
|
||||
"<|repo_name|>": 151663,
|
||||
"<|video_pad|>": 151656,
|
||||
"<|vision_end|>": 151653,
|
||||
"<|vision_pad|>": 151654,
|
||||
"<|vision_start|>": 151652
|
||||
}
|
||||
14
all_results.json
Normal file
14
all_results.json
Normal file
@@ -0,0 +1,14 @@
|
||||
{
|
||||
"epoch": 5.0,
|
||||
"eval_accuracy": 0.9510975495604264,
|
||||
"eval_loss": 0.1477060168981552,
|
||||
"eval_runtime": 4.8444,
|
||||
"eval_samples_per_second": 11.766,
|
||||
"eval_steps_per_second": 3.096,
|
||||
"num_input_tokens_seen": 36600520,
|
||||
"total_flos": 1.6620454705385964e+18,
|
||||
"train_loss": 0.11312562568641421,
|
||||
"train_runtime": 4017.5693,
|
||||
"train_samples_per_second": 1.325,
|
||||
"train_steps_per_second": 0.083
|
||||
}
|
||||
89
chat_template.jinja
Normal file
89
chat_template.jinja
Normal file
@@ -0,0 +1,89 @@
|
||||
{%- if tools %}
|
||||
{{- '<|im_start|>system\n' }}
|
||||
{%- if messages[0].role == 'system' %}
|
||||
{{- messages[0].content + '\n\n' }}
|
||||
{%- endif %}
|
||||
{{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
|
||||
{%- for tool in tools %}
|
||||
{{- "\n" }}
|
||||
{{- tool | tojson }}
|
||||
{%- endfor %}
|
||||
{{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
|
||||
{%- else %}
|
||||
{%- if messages[0].role == 'system' %}
|
||||
{{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
|
||||
{%- for message in messages[::-1] %}
|
||||
{%- set index = (messages|length - 1) - loop.index0 %}
|
||||
{%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
|
||||
{%- set ns.multi_step_tool = false %}
|
||||
{%- set ns.last_query_index = index %}
|
||||
{%- endif %}
|
||||
{%- endfor %}
|
||||
{%- for message in messages %}
|
||||
{%- if message.content is string %}
|
||||
{%- set content = message.content %}
|
||||
{%- else %}
|
||||
{%- set content = '' %}
|
||||
{%- endif %}
|
||||
{%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
|
||||
{{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
|
||||
{%- elif message.role == "assistant" %}
|
||||
{%- set reasoning_content = '' %}
|
||||
{%- if message.reasoning_content is string %}
|
||||
{%- set reasoning_content = message.reasoning_content %}
|
||||
{%- else %}
|
||||
{%- if '</think>' in content %}
|
||||
{%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
|
||||
{%- set content = content.split('</think>')[-1].lstrip('\n') %}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
{%- if loop.index0 > ns.last_query_index %}
|
||||
{%- if loop.last or (not loop.last and reasoning_content) %}
|
||||
{{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
|
||||
{%- else %}
|
||||
{{- '<|im_start|>' + message.role + '\n' + content }}
|
||||
{%- endif %}
|
||||
{%- else %}
|
||||
{{- '<|im_start|>' + message.role + '\n' + content }}
|
||||
{%- endif %}
|
||||
{%- if message.tool_calls %}
|
||||
{%- for tool_call in message.tool_calls %}
|
||||
{%- if (loop.first and content) or (not loop.first) %}
|
||||
{{- '\n' }}
|
||||
{%- endif %}
|
||||
{%- if tool_call.function %}
|
||||
{%- set tool_call = tool_call.function %}
|
||||
{%- endif %}
|
||||
{{- '<tool_call>\n{"name": "' }}
|
||||
{{- tool_call.name }}
|
||||
{{- '", "arguments": ' }}
|
||||
{%- if tool_call.arguments is string %}
|
||||
{{- tool_call.arguments }}
|
||||
{%- else %}
|
||||
{{- tool_call.arguments | tojson }}
|
||||
{%- endif %}
|
||||
{{- '}\n</tool_call>' }}
|
||||
{%- endfor %}
|
||||
{%- endif %}
|
||||
{{- '<|im_end|>\n' }}
|
||||
{%- elif message.role == "tool" %}
|
||||
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
|
||||
{{- '<|im_start|>user' }}
|
||||
{%- endif %}
|
||||
{{- '\n<tool_response>\n' }}
|
||||
{{- content }}
|
||||
{{- '\n</tool_response>' }}
|
||||
{%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
|
||||
{{- '<|im_end|>\n' }}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
{%- endfor %}
|
||||
{%- if add_generation_prompt %}
|
||||
{{- '<|im_start|>assistant\n' }}
|
||||
{%- if enable_thinking is defined and enable_thinking is false %}
|
||||
{{- '<think>\n\n</think>\n\n' }}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
30
config.json
Normal file
30
config.json
Normal file
@@ -0,0 +1,30 @@
|
||||
{
|
||||
"architectures": [
|
||||
"Qwen3ForCausalLM"
|
||||
],
|
||||
"attention_bias": false,
|
||||
"attention_dropout": 0.0,
|
||||
"bos_token_id": 151643,
|
||||
"eos_token_id": 151645,
|
||||
"head_dim": 128,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 4096,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 12288,
|
||||
"max_position_embeddings": 40960,
|
||||
"max_window_layers": 36,
|
||||
"model_type": "qwen3",
|
||||
"num_attention_heads": 32,
|
||||
"num_hidden_layers": 36,
|
||||
"num_key_value_heads": 8,
|
||||
"rms_norm_eps": 1e-06,
|
||||
"rope_scaling": null,
|
||||
"rope_theta": 1000000,
|
||||
"sliding_window": null,
|
||||
"tie_word_embeddings": false,
|
||||
"torch_dtype": "bfloat16",
|
||||
"transformers_version": "4.52.4",
|
||||
"use_cache": false,
|
||||
"use_sliding_window": false,
|
||||
"vocab_size": 151936
|
||||
}
|
||||
9
eval_results.json
Normal file
9
eval_results.json
Normal file
@@ -0,0 +1,9 @@
|
||||
{
|
||||
"epoch": 5.0,
|
||||
"eval_accuracy": 0.9510975495604264,
|
||||
"eval_loss": 0.1477060168981552,
|
||||
"eval_runtime": 4.8444,
|
||||
"eval_samples_per_second": 11.766,
|
||||
"eval_steps_per_second": 3.096,
|
||||
"num_input_tokens_seen": 36600520
|
||||
}
|
||||
13
generation_config.json
Normal file
13
generation_config.json
Normal file
@@ -0,0 +1,13 @@
|
||||
{
|
||||
"bos_token_id": 151643,
|
||||
"do_sample": true,
|
||||
"eos_token_id": [
|
||||
151645,
|
||||
151643
|
||||
],
|
||||
"pad_token_id": 151643,
|
||||
"temperature": 0.6,
|
||||
"top_k": 20,
|
||||
"top_p": 0.95,
|
||||
"transformers_version": "4.52.4"
|
||||
}
|
||||
151388
merges.txt
Normal file
151388
merges.txt
Normal file
File diff suppressed because it is too large
Load Diff
3
model-00001-of-00004.safetensors
Normal file
3
model-00001-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:6b8012220e3fb38fff67a4f8818c0a51131ec60470665522c8786d7a8eb1a010
|
||||
size 4902257696
|
||||
3
model-00002-of-00004.safetensors
Normal file
3
model-00002-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:f74bc059400695f3917abe530211119bceb6de32e2c55a962a10655a798e0377
|
||||
size 4915960368
|
||||
3
model-00003-of-00004.safetensors
Normal file
3
model-00003-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:a8e315ffca3383a933652fb67d8b280fa325d2ea9ef09d09540f7236934ea340
|
||||
size 4983068496
|
||||
3
model-00004-of-00004.safetensors
Normal file
3
model-00004-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:714a88991191919c9995840c3c1319c2ba567a921b186b9755fde4cc1bf019a0
|
||||
size 1580230264
|
||||
406
model.safetensors.index.json
Normal file
406
model.safetensors.index.json
Normal file
@@ -0,0 +1,406 @@
|
||||
{
|
||||
"metadata": {
|
||||
"total_size": 16381470720
|
||||
},
|
||||
"weight_map": {
|
||||
"lm_head.weight": "model-00004-of-00004.safetensors",
|
||||
"model.embed_tokens.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.20.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.30.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.35.input_layernorm.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.self_attn.k_norm.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.35.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.self_attn.q_norm.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.35.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.norm.weight": "model-00004-of-00004.safetensors"
|
||||
}
|
||||
}
|
||||
66
result_model/README.md
Normal file
66
result_model/README.md
Normal file
@@ -0,0 +1,66 @@
|
||||
---
|
||||
library_name: transformers
|
||||
license: apache-2.0
|
||||
base_model: Qwen/Qwen3-8B
|
||||
tags:
|
||||
- llama-factory
|
||||
- generated_from_trainer
|
||||
metrics:
|
||||
- accuracy
|
||||
model-index:
|
||||
- name: Qwen3-8B-SFT-envbench_qwen-all
|
||||
results: []
|
||||
---
|
||||
|
||||
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
||||
should probably proofread and complete it, then remove this comment. -->
|
||||
|
||||
# Qwen3-8B-SFT-envbench_qwen-all
|
||||
|
||||
This model is a fine-tuned version of [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) on an unknown dataset.
|
||||
It achieves the following results on the evaluation set:
|
||||
- Loss: 0.1477
|
||||
- Accuracy: 0.9511
|
||||
- Num Input Tokens Seen: 35549872
|
||||
|
||||
## Model description
|
||||
|
||||
More information needed
|
||||
|
||||
## Intended uses & limitations
|
||||
|
||||
More information needed
|
||||
|
||||
## Training and evaluation data
|
||||
|
||||
More information needed
|
||||
|
||||
## Training procedure
|
||||
|
||||
### Training hyperparameters
|
||||
|
||||
The following hyperparameters were used during training:
|
||||
- learning_rate: 5e-05
|
||||
- train_batch_size: 1
|
||||
- eval_batch_size: 1
|
||||
- seed: 42
|
||||
- distributed_type: multi-GPU
|
||||
- num_devices: 4
|
||||
- gradient_accumulation_steps: 4
|
||||
- total_train_batch_size: 16
|
||||
- total_eval_batch_size: 4
|
||||
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
||||
- lr_scheduler_type: cosine
|
||||
- lr_scheduler_warmup_ratio: 0.1
|
||||
- num_epochs: 5.0
|
||||
|
||||
### Training results
|
||||
|
||||
|
||||
|
||||
### Framework versions
|
||||
|
||||
- Transformers 4.52.4
|
||||
- Pytorch 2.6.0a0+df5bbc09d1.nv24.12
|
||||
- Datasets 3.6.0
|
||||
- Tokenizers 0.21.1
|
||||
28
result_model/added_tokens.json
Normal file
28
result_model/added_tokens.json
Normal file
@@ -0,0 +1,28 @@
|
||||
{
|
||||
"</think>": 151668,
|
||||
"</tool_call>": 151658,
|
||||
"</tool_response>": 151666,
|
||||
"<think>": 151667,
|
||||
"<tool_call>": 151657,
|
||||
"<tool_response>": 151665,
|
||||
"<|box_end|>": 151649,
|
||||
"<|box_start|>": 151648,
|
||||
"<|endoftext|>": 151643,
|
||||
"<|file_sep|>": 151664,
|
||||
"<|fim_middle|>": 151660,
|
||||
"<|fim_pad|>": 151662,
|
||||
"<|fim_prefix|>": 151659,
|
||||
"<|fim_suffix|>": 151661,
|
||||
"<|im_end|>": 151645,
|
||||
"<|im_start|>": 151644,
|
||||
"<|image_pad|>": 151655,
|
||||
"<|object_ref_end|>": 151647,
|
||||
"<|object_ref_start|>": 151646,
|
||||
"<|quad_end|>": 151651,
|
||||
"<|quad_start|>": 151650,
|
||||
"<|repo_name|>": 151663,
|
||||
"<|video_pad|>": 151656,
|
||||
"<|vision_end|>": 151653,
|
||||
"<|vision_pad|>": 151654,
|
||||
"<|vision_start|>": 151652
|
||||
}
|
||||
9
result_model/all_results.json
Normal file
9
result_model/all_results.json
Normal file
@@ -0,0 +1,9 @@
|
||||
{
|
||||
"epoch": 5.0,
|
||||
"num_input_tokens_seen": 36600520,
|
||||
"total_flos": 1.6620454705385964e+18,
|
||||
"train_loss": 0.11312562568641421,
|
||||
"train_runtime": 4017.5693,
|
||||
"train_samples_per_second": 1.325,
|
||||
"train_steps_per_second": 0.083
|
||||
}
|
||||
89
result_model/chat_template.jinja
Normal file
89
result_model/chat_template.jinja
Normal file
@@ -0,0 +1,89 @@
|
||||
{%- if tools %}
|
||||
{{- '<|im_start|>system\n' }}
|
||||
{%- if messages[0].role == 'system' %}
|
||||
{{- messages[0].content + '\n\n' }}
|
||||
{%- endif %}
|
||||
{{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
|
||||
{%- for tool in tools %}
|
||||
{{- "\n" }}
|
||||
{{- tool | tojson }}
|
||||
{%- endfor %}
|
||||
{{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
|
||||
{%- else %}
|
||||
{%- if messages[0].role == 'system' %}
|
||||
{{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
|
||||
{%- for message in messages[::-1] %}
|
||||
{%- set index = (messages|length - 1) - loop.index0 %}
|
||||
{%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
|
||||
{%- set ns.multi_step_tool = false %}
|
||||
{%- set ns.last_query_index = index %}
|
||||
{%- endif %}
|
||||
{%- endfor %}
|
||||
{%- for message in messages %}
|
||||
{%- if message.content is string %}
|
||||
{%- set content = message.content %}
|
||||
{%- else %}
|
||||
{%- set content = '' %}
|
||||
{%- endif %}
|
||||
{%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
|
||||
{{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
|
||||
{%- elif message.role == "assistant" %}
|
||||
{%- set reasoning_content = '' %}
|
||||
{%- if message.reasoning_content is string %}
|
||||
{%- set reasoning_content = message.reasoning_content %}
|
||||
{%- else %}
|
||||
{%- if '</think>' in content %}
|
||||
{%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
|
||||
{%- set content = content.split('</think>')[-1].lstrip('\n') %}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
{%- if loop.index0 > ns.last_query_index %}
|
||||
{%- if loop.last or (not loop.last and reasoning_content) %}
|
||||
{{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
|
||||
{%- else %}
|
||||
{{- '<|im_start|>' + message.role + '\n' + content }}
|
||||
{%- endif %}
|
||||
{%- else %}
|
||||
{{- '<|im_start|>' + message.role + '\n' + content }}
|
||||
{%- endif %}
|
||||
{%- if message.tool_calls %}
|
||||
{%- for tool_call in message.tool_calls %}
|
||||
{%- if (loop.first and content) or (not loop.first) %}
|
||||
{{- '\n' }}
|
||||
{%- endif %}
|
||||
{%- if tool_call.function %}
|
||||
{%- set tool_call = tool_call.function %}
|
||||
{%- endif %}
|
||||
{{- '<tool_call>\n{"name": "' }}
|
||||
{{- tool_call.name }}
|
||||
{{- '", "arguments": ' }}
|
||||
{%- if tool_call.arguments is string %}
|
||||
{{- tool_call.arguments }}
|
||||
{%- else %}
|
||||
{{- tool_call.arguments | tojson }}
|
||||
{%- endif %}
|
||||
{{- '}\n</tool_call>' }}
|
||||
{%- endfor %}
|
||||
{%- endif %}
|
||||
{{- '<|im_end|>\n' }}
|
||||
{%- elif message.role == "tool" %}
|
||||
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
|
||||
{{- '<|im_start|>user' }}
|
||||
{%- endif %}
|
||||
{{- '\n<tool_response>\n' }}
|
||||
{{- content }}
|
||||
{{- '\n</tool_response>' }}
|
||||
{%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
|
||||
{{- '<|im_end|>\n' }}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
{%- endfor %}
|
||||
{%- if add_generation_prompt %}
|
||||
{{- '<|im_start|>assistant\n' }}
|
||||
{%- if enable_thinking is defined and enable_thinking is false %}
|
||||
{{- '<think>\n\n</think>\n\n' }}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
30
result_model/config.json
Normal file
30
result_model/config.json
Normal file
@@ -0,0 +1,30 @@
|
||||
{
|
||||
"architectures": [
|
||||
"Qwen3ForCausalLM"
|
||||
],
|
||||
"attention_bias": false,
|
||||
"attention_dropout": 0.0,
|
||||
"bos_token_id": 151643,
|
||||
"eos_token_id": 151645,
|
||||
"head_dim": 128,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 4096,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 12288,
|
||||
"max_position_embeddings": 40960,
|
||||
"max_window_layers": 36,
|
||||
"model_type": "qwen3",
|
||||
"num_attention_heads": 32,
|
||||
"num_hidden_layers": 36,
|
||||
"num_key_value_heads": 8,
|
||||
"rms_norm_eps": 1e-06,
|
||||
"rope_scaling": null,
|
||||
"rope_theta": 1000000,
|
||||
"sliding_window": null,
|
||||
"tie_word_embeddings": false,
|
||||
"torch_dtype": "bfloat16",
|
||||
"transformers_version": "4.52.4",
|
||||
"use_cache": false,
|
||||
"use_sliding_window": false,
|
||||
"vocab_size": 151936
|
||||
}
|
||||
13
result_model/generation_config.json
Normal file
13
result_model/generation_config.json
Normal file
@@ -0,0 +1,13 @@
|
||||
{
|
||||
"bos_token_id": 151643,
|
||||
"do_sample": true,
|
||||
"eos_token_id": [
|
||||
151645,
|
||||
151643
|
||||
],
|
||||
"pad_token_id": 151643,
|
||||
"temperature": 0.6,
|
||||
"top_k": 20,
|
||||
"top_p": 0.95,
|
||||
"transformers_version": "4.52.4"
|
||||
}
|
||||
151388
result_model/merges.txt
Normal file
151388
result_model/merges.txt
Normal file
File diff suppressed because it is too large
Load Diff
3
result_model/model-00001-of-00004.safetensors
Normal file
3
result_model/model-00001-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:6b8012220e3fb38fff67a4f8818c0a51131ec60470665522c8786d7a8eb1a010
|
||||
size 4902257696
|
||||
3
result_model/model-00002-of-00004.safetensors
Normal file
3
result_model/model-00002-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:f74bc059400695f3917abe530211119bceb6de32e2c55a962a10655a798e0377
|
||||
size 4915960368
|
||||
3
result_model/model-00003-of-00004.safetensors
Normal file
3
result_model/model-00003-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:a8e315ffca3383a933652fb67d8b280fa325d2ea9ef09d09540f7236934ea340
|
||||
size 4983068496
|
||||
3
result_model/model-00004-of-00004.safetensors
Normal file
3
result_model/model-00004-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:714a88991191919c9995840c3c1319c2ba567a921b186b9755fde4cc1bf019a0
|
||||
size 1580230264
|
||||
406
result_model/model.safetensors.index.json
Normal file
406
result_model/model.safetensors.index.json
Normal file
@@ -0,0 +1,406 @@
|
||||
{
|
||||
"metadata": {
|
||||
"total_size": 16381470720
|
||||
},
|
||||
"weight_map": {
|
||||
"lm_head.weight": "model-00004-of-00004.safetensors",
|
||||
"model.embed_tokens.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.20.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.30.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.32.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.33.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.34.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.35.input_layernorm.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.self_attn.k_norm.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.35.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.self_attn.q_norm.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.35.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.35.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.norm.weight": "model-00004-of-00004.safetensors"
|
||||
}
|
||||
}
|
||||
31
result_model/special_tokens_map.json
Normal file
31
result_model/special_tokens_map.json
Normal file
@@ -0,0 +1,31 @@
|
||||
{
|
||||
"additional_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>",
|
||||
"<|object_ref_start|>",
|
||||
"<|object_ref_end|>",
|
||||
"<|box_start|>",
|
||||
"<|box_end|>",
|
||||
"<|quad_start|>",
|
||||
"<|quad_end|>",
|
||||
"<|vision_start|>",
|
||||
"<|vision_end|>",
|
||||
"<|vision_pad|>",
|
||||
"<|image_pad|>",
|
||||
"<|video_pad|>"
|
||||
],
|
||||
"eos_token": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
BIN
result_model/tokenizer.json
(Stored with Git LFS)
Normal file
BIN
result_model/tokenizer.json
(Stored with Git LFS)
Normal file
Binary file not shown.
240
result_model/tokenizer_config.json
Normal file
240
result_model/tokenizer_config.json
Normal file
@@ -0,0 +1,240 @@
|
||||
{
|
||||
"add_bos_token": false,
|
||||
"add_prefix_space": false,
|
||||
"added_tokens_decoder": {
|
||||
"151643": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151644": {
|
||||
"content": "<|im_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151645": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151646": {
|
||||
"content": "<|object_ref_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151647": {
|
||||
"content": "<|object_ref_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151648": {
|
||||
"content": "<|box_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151649": {
|
||||
"content": "<|box_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151650": {
|
||||
"content": "<|quad_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151651": {
|
||||
"content": "<|quad_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151652": {
|
||||
"content": "<|vision_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151653": {
|
||||
"content": "<|vision_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151654": {
|
||||
"content": "<|vision_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151655": {
|
||||
"content": "<|image_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151656": {
|
||||
"content": "<|video_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151657": {
|
||||
"content": "<tool_call>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151658": {
|
||||
"content": "</tool_call>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151659": {
|
||||
"content": "<|fim_prefix|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151660": {
|
||||
"content": "<|fim_middle|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151661": {
|
||||
"content": "<|fim_suffix|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151662": {
|
||||
"content": "<|fim_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151663": {
|
||||
"content": "<|repo_name|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151664": {
|
||||
"content": "<|file_sep|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151665": {
|
||||
"content": "<tool_response>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151666": {
|
||||
"content": "</tool_response>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151667": {
|
||||
"content": "<think>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151668": {
|
||||
"content": "</think>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
}
|
||||
},
|
||||
"additional_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>",
|
||||
"<|object_ref_start|>",
|
||||
"<|object_ref_end|>",
|
||||
"<|box_start|>",
|
||||
"<|box_end|>",
|
||||
"<|quad_start|>",
|
||||
"<|quad_end|>",
|
||||
"<|vision_start|>",
|
||||
"<|vision_end|>",
|
||||
"<|vision_pad|>",
|
||||
"<|image_pad|>",
|
||||
"<|video_pad|>"
|
||||
],
|
||||
"bos_token": null,
|
||||
"clean_up_tokenization_spaces": false,
|
||||
"eos_token": "<|im_end|>",
|
||||
"errors": "replace",
|
||||
"extra_special_tokens": {},
|
||||
"model_max_length": 131072,
|
||||
"pad_token": "<|endoftext|>",
|
||||
"padding_side": "right",
|
||||
"split_special_tokens": false,
|
||||
"tokenizer_class": "Qwen2Tokenizer",
|
||||
"unk_token": null
|
||||
}
|
||||
9
result_model/train_results.json
Normal file
9
result_model/train_results.json
Normal file
@@ -0,0 +1,9 @@
|
||||
{
|
||||
"epoch": 5.0,
|
||||
"num_input_tokens_seen": 36600520,
|
||||
"total_flos": 1.6620454705385964e+18,
|
||||
"train_loss": 0.11312562568641421,
|
||||
"train_runtime": 4017.5693,
|
||||
"train_samples_per_second": 1.325,
|
||||
"train_steps_per_second": 0.083
|
||||
}
|
||||
349
result_model/trainer_log.jsonl
Normal file
349
result_model/trainer_log.jsonl
Normal file
@@ -0,0 +1,349 @@
|
||||
{"current_steps": 1, "total_steps": 335, "loss": 0.287, "lr": 0.0, "epoch": 0.0149812734082397, "percentage": 0.3, "elapsed_time": "0:00:23", "remaining_time": "2:08:45", "throughput": 4592.7, "total_tokens": 106224}
|
||||
{"current_steps": 2, "total_steps": 335, "loss": 0.1593, "lr": 1.4705882352941177e-06, "epoch": 0.0299625468164794, "percentage": 0.6, "elapsed_time": "0:00:33", "remaining_time": "1:32:45", "throughput": 6653.32, "total_tokens": 222384}
|
||||
{"current_steps": 3, "total_steps": 335, "loss": 0.1572, "lr": 2.9411764705882355e-06, "epoch": 0.0449438202247191, "percentage": 0.9, "elapsed_time": "0:00:43", "remaining_time": "1:20:39", "throughput": 7608.07, "total_tokens": 332728}
|
||||
{"current_steps": 4, "total_steps": 335, "loss": 0.2334, "lr": 4.411764705882353e-06, "epoch": 0.0599250936329588, "percentage": 1.19, "elapsed_time": "0:00:54", "remaining_time": "1:14:38", "throughput": 8126.02, "total_tokens": 439824}
|
||||
{"current_steps": 5, "total_steps": 335, "loss": 0.0885, "lr": 5.882352941176471e-06, "epoch": 0.0749063670411985, "percentage": 1.49, "elapsed_time": "0:01:04", "remaining_time": "1:11:09", "throughput": 8546.79, "total_tokens": 552832}
|
||||
{"current_steps": 6, "total_steps": 335, "loss": 0.1054, "lr": 7.3529411764705884e-06, "epoch": 0.0898876404494382, "percentage": 1.79, "elapsed_time": "0:01:16", "remaining_time": "1:09:59", "throughput": 8709.68, "total_tokens": 667008}
|
||||
{"current_steps": 7, "total_steps": 335, "loss": 0.2192, "lr": 8.823529411764707e-06, "epoch": 0.10486891385767791, "percentage": 2.09, "elapsed_time": "0:01:28", "remaining_time": "1:08:46", "throughput": 8854.25, "total_tokens": 779728}
|
||||
{"current_steps": 8, "total_steps": 335, "loss": 0.1713, "lr": 1.0294117647058824e-05, "epoch": 0.1198501872659176, "percentage": 2.39, "elapsed_time": "0:01:39", "remaining_time": "1:07:57", "throughput": 8955.12, "total_tokens": 893296}
|
||||
{"current_steps": 9, "total_steps": 335, "loss": 0.1316, "lr": 1.1764705882352942e-05, "epoch": 0.1348314606741573, "percentage": 2.69, "elapsed_time": "0:01:51", "remaining_time": "1:07:23", "throughput": 9051.15, "total_tokens": 1010424}
|
||||
{"current_steps": 10, "total_steps": 335, "loss": 0.0925, "lr": 1.323529411764706e-05, "epoch": 0.149812734082397, "percentage": 2.99, "elapsed_time": "0:02:03", "remaining_time": "1:06:56", "throughput": 9082.97, "total_tokens": 1122608}
|
||||
{"current_steps": 11, "total_steps": 335, "loss": 0.1682, "lr": 1.4705882352941177e-05, "epoch": 0.1647940074906367, "percentage": 3.28, "elapsed_time": "0:02:14", "remaining_time": "1:06:15", "throughput": 9131.41, "total_tokens": 1232432}
|
||||
{"current_steps": 12, "total_steps": 335, "loss": 0.1501, "lr": 1.6176470588235296e-05, "epoch": 0.1797752808988764, "percentage": 3.58, "elapsed_time": "0:02:26", "remaining_time": "1:05:40", "throughput": 9163.0, "total_tokens": 1341448}
|
||||
{"current_steps": 13, "total_steps": 335, "loss": 0.1541, "lr": 1.7647058823529414e-05, "epoch": 0.1947565543071161, "percentage": 3.88, "elapsed_time": "0:02:37", "remaining_time": "1:05:00", "throughput": 9135.17, "total_tokens": 1438440}
|
||||
{"current_steps": 14, "total_steps": 335, "loss": 0.1972, "lr": 1.9117647058823528e-05, "epoch": 0.20973782771535582, "percentage": 4.18, "elapsed_time": "0:02:49", "remaining_time": "1:04:35", "throughput": 9154.92, "total_tokens": 1547408}
|
||||
{"current_steps": 15, "total_steps": 335, "loss": 0.1355, "lr": 2.058823529411765e-05, "epoch": 0.2247191011235955, "percentage": 4.48, "elapsed_time": "0:03:00", "remaining_time": "1:04:09", "throughput": 9200.34, "total_tokens": 1660336}
|
||||
{"current_steps": 16, "total_steps": 335, "loss": 0.1175, "lr": 2.2058823529411766e-05, "epoch": 0.2397003745318352, "percentage": 4.78, "elapsed_time": "0:03:11", "remaining_time": "1:03:47", "throughput": 9245.29, "total_tokens": 1774704}
|
||||
{"current_steps": 17, "total_steps": 335, "loss": 0.2153, "lr": 2.3529411764705884e-05, "epoch": 0.2546816479400749, "percentage": 5.07, "elapsed_time": "0:03:23", "remaining_time": "1:03:24", "throughput": 9277.16, "total_tokens": 1887056}
|
||||
{"current_steps": 18, "total_steps": 335, "loss": 0.1604, "lr": 2.5e-05, "epoch": 0.2696629213483146, "percentage": 5.37, "elapsed_time": "0:03:34", "remaining_time": "1:03:02", "throughput": 9290.02, "total_tokens": 1995456}
|
||||
{"current_steps": 19, "total_steps": 335, "loss": 0.1922, "lr": 2.647058823529412e-05, "epoch": 0.2846441947565543, "percentage": 5.67, "elapsed_time": "0:03:46", "remaining_time": "1:02:43", "throughput": 9333.75, "total_tokens": 2111824}
|
||||
{"current_steps": 20, "total_steps": 335, "loss": 0.2839, "lr": 2.7941176470588236e-05, "epoch": 0.299625468164794, "percentage": 5.97, "elapsed_time": "0:03:57", "remaining_time": "1:02:22", "throughput": 9333.97, "total_tokens": 2217760}
|
||||
{"current_steps": 21, "total_steps": 335, "loss": 0.1694, "lr": 2.9411764705882354e-05, "epoch": 0.3146067415730337, "percentage": 6.27, "elapsed_time": "0:04:08", "remaining_time": "1:02:02", "throughput": 9365.2, "total_tokens": 2331736}
|
||||
{"current_steps": 22, "total_steps": 335, "loss": 0.0844, "lr": 3.0882352941176475e-05, "epoch": 0.3295880149812734, "percentage": 6.57, "elapsed_time": "0:04:20", "remaining_time": "1:01:42", "throughput": 9362.17, "total_tokens": 2436696}
|
||||
{"current_steps": 23, "total_steps": 335, "loss": 0.1601, "lr": 3.235294117647059e-05, "epoch": 0.3445692883895131, "percentage": 6.87, "elapsed_time": "0:04:31", "remaining_time": "1:01:23", "throughput": 9390.54, "total_tokens": 2549584}
|
||||
{"current_steps": 24, "total_steps": 335, "loss": 0.1835, "lr": 3.382352941176471e-05, "epoch": 0.3595505617977528, "percentage": 7.16, "elapsed_time": "0:04:42", "remaining_time": "1:01:05", "throughput": 9387.07, "total_tokens": 2655016}
|
||||
{"current_steps": 25, "total_steps": 335, "loss": 0.2083, "lr": 3.529411764705883e-05, "epoch": 0.37453183520599254, "percentage": 7.46, "elapsed_time": "0:04:54", "remaining_time": "1:00:48", "throughput": 9385.24, "total_tokens": 2761784}
|
||||
{"current_steps": 25, "total_steps": 335, "eval_loss": 0.22450505197048187, "epoch": 0.37453183520599254, "percentage": 7.46, "elapsed_time": "0:04:59", "remaining_time": "1:01:50", "throughput": 9230.07, "total_tokens": 2761784}
|
||||
{"current_steps": 26, "total_steps": 335, "loss": 0.1667, "lr": 3.6764705882352945e-05, "epoch": 0.3895131086142322, "percentage": 7.76, "elapsed_time": "0:05:10", "remaining_time": "1:01:32", "throughput": 9257.73, "total_tokens": 2876408}
|
||||
{"current_steps": 27, "total_steps": 335, "loss": 0.0896, "lr": 3.8235294117647055e-05, "epoch": 0.4044943820224719, "percentage": 8.06, "elapsed_time": "0:05:21", "remaining_time": "1:01:10", "throughput": 9285.19, "total_tokens": 2987992}
|
||||
{"current_steps": 28, "total_steps": 335, "loss": 0.2299, "lr": 3.970588235294117e-05, "epoch": 0.41947565543071164, "percentage": 8.36, "elapsed_time": "0:05:33", "remaining_time": "1:00:52", "throughput": 9285.3, "total_tokens": 3093200}
|
||||
{"current_steps": 29, "total_steps": 335, "loss": 0.269, "lr": 4.11764705882353e-05, "epoch": 0.4344569288389513, "percentage": 8.66, "elapsed_time": "0:05:44", "remaining_time": "1:00:31", "throughput": 9277.28, "total_tokens": 3192624}
|
||||
{"current_steps": 30, "total_steps": 335, "loss": 0.159, "lr": 4.2647058823529415e-05, "epoch": 0.449438202247191, "percentage": 8.96, "elapsed_time": "0:05:55", "remaining_time": "1:00:15", "throughput": 9297.36, "total_tokens": 3306120}
|
||||
{"current_steps": 31, "total_steps": 335, "loss": 0.229, "lr": 4.411764705882353e-05, "epoch": 0.46441947565543074, "percentage": 9.25, "elapsed_time": "0:06:07", "remaining_time": "0:59:59", "throughput": 9316.47, "total_tokens": 3419688}
|
||||
{"current_steps": 32, "total_steps": 335, "loss": 0.1994, "lr": 4.558823529411765e-05, "epoch": 0.4794007490636704, "percentage": 9.55, "elapsed_time": "0:06:18", "remaining_time": "0:59:43", "throughput": 9307.12, "total_tokens": 3521896}
|
||||
{"current_steps": 33, "total_steps": 335, "loss": 0.109, "lr": 4.705882352941177e-05, "epoch": 0.4943820224719101, "percentage": 9.85, "elapsed_time": "0:06:29", "remaining_time": "0:59:28", "throughput": 9348.15, "total_tokens": 3644912}
|
||||
{"current_steps": 34, "total_steps": 335, "loss": 0.1888, "lr": 4.8529411764705885e-05, "epoch": 0.5093632958801498, "percentage": 10.15, "elapsed_time": "0:06:41", "remaining_time": "0:59:12", "throughput": 9348.9, "total_tokens": 3751480}
|
||||
{"current_steps": 35, "total_steps": 335, "loss": 0.1841, "lr": 5e-05, "epoch": 0.5243445692883895, "percentage": 10.45, "elapsed_time": "0:06:52", "remaining_time": "0:58:56", "throughput": 9337.77, "total_tokens": 3853072}
|
||||
{"current_steps": 36, "total_steps": 335, "loss": 0.2391, "lr": 4.999863832700438e-05, "epoch": 0.5393258426966292, "percentage": 10.75, "elapsed_time": "0:07:03", "remaining_time": "0:58:38", "throughput": 9336.64, "total_tokens": 3954992}
|
||||
{"current_steps": 37, "total_steps": 335, "loss": 0.2589, "lr": 4.999455345634978e-05, "epoch": 0.5543071161048689, "percentage": 11.04, "elapsed_time": "0:07:14", "remaining_time": "0:58:23", "throughput": 9335.05, "total_tokens": 4060312}
|
||||
{"current_steps": 38, "total_steps": 335, "loss": 0.1603, "lr": 4.9987745833016855e-05, "epoch": 0.5692883895131086, "percentage": 11.34, "elapsed_time": "0:07:26", "remaining_time": "0:58:09", "throughput": 9317.09, "total_tokens": 4159664}
|
||||
{"current_steps": 39, "total_steps": 335, "loss": 0.1837, "lr": 4.9978216198586135e-05, "epoch": 0.5842696629213483, "percentage": 11.64, "elapsed_time": "0:07:37", "remaining_time": "0:57:55", "throughput": 9333.8, "total_tokens": 4273696}
|
||||
{"current_steps": 40, "total_steps": 335, "loss": 0.2044, "lr": 4.996596559115731e-05, "epoch": 0.599250936329588, "percentage": 11.94, "elapsed_time": "0:07:49", "remaining_time": "0:57:39", "throughput": 9340.79, "total_tokens": 4381080}
|
||||
{"current_steps": 41, "total_steps": 335, "loss": 0.1326, "lr": 4.995099534523607e-05, "epoch": 0.6142322097378277, "percentage": 12.24, "elapsed_time": "0:08:00", "remaining_time": "0:57:24", "throughput": 9366.78, "total_tokens": 4499912}
|
||||
{"current_steps": 42, "total_steps": 335, "loss": 0.1795, "lr": 4.9933307091588796e-05, "epoch": 0.6292134831460674, "percentage": 12.54, "elapsed_time": "0:08:11", "remaining_time": "0:57:10", "throughput": 9367.17, "total_tokens": 4606816}
|
||||
{"current_steps": 43, "total_steps": 335, "loss": 0.188, "lr": 4.991290275706486e-05, "epoch": 0.6441947565543071, "percentage": 12.84, "elapsed_time": "0:08:23", "remaining_time": "0:56:57", "throughput": 9379.7, "total_tokens": 4720528}
|
||||
{"current_steps": 44, "total_steps": 335, "loss": 0.1692, "lr": 4.988978456438678e-05, "epoch": 0.6591760299625468, "percentage": 13.13, "elapsed_time": "0:08:34", "remaining_time": "0:56:44", "throughput": 9392.58, "total_tokens": 4834552}
|
||||
{"current_steps": 45, "total_steps": 335, "loss": 0.1526, "lr": 4.986395503190805e-05, "epoch": 0.6741573033707865, "percentage": 13.43, "elapsed_time": "0:08:45", "remaining_time": "0:56:28", "throughput": 9395.72, "total_tokens": 4940840}
|
||||
{"current_steps": 46, "total_steps": 335, "loss": 0.2274, "lr": 4.983541697333881e-05, "epoch": 0.6891385767790262, "percentage": 13.73, "elapsed_time": "0:08:57", "remaining_time": "0:56:14", "throughput": 9393.02, "total_tokens": 5044880}
|
||||
{"current_steps": 47, "total_steps": 335, "loss": 0.1199, "lr": 4.980417349743936e-05, "epoch": 0.704119850187266, "percentage": 14.03, "elapsed_time": "0:09:07", "remaining_time": "0:55:57", "throughput": 9424.16, "total_tokens": 5164256}
|
||||
{"current_steps": 48, "total_steps": 335, "loss": 0.2262, "lr": 4.9770228007681494e-05, "epoch": 0.7191011235955056, "percentage": 14.33, "elapsed_time": "0:09:17", "remaining_time": "0:55:35", "throughput": 9432.85, "total_tokens": 5262840}
|
||||
{"current_steps": 49, "total_steps": 335, "loss": 0.1684, "lr": 4.973358420187776e-05, "epoch": 0.7340823970037453, "percentage": 14.63, "elapsed_time": "0:09:28", "remaining_time": "0:55:16", "throughput": 9458.31, "total_tokens": 5374992}
|
||||
{"current_steps": 50, "total_steps": 335, "loss": 0.1599, "lr": 4.9694246071778604e-05, "epoch": 0.7490636704119851, "percentage": 14.93, "elapsed_time": "0:09:38", "remaining_time": "0:54:58", "throughput": 9479.7, "total_tokens": 5486368}
|
||||
{"current_steps": 50, "total_steps": 335, "eval_loss": 0.22489887475967407, "epoch": 0.7490636704119851, "percentage": 14.93, "elapsed_time": "0:09:43", "remaining_time": "0:55:26", "throughput": 9399.66, "total_tokens": 5486368}
|
||||
{"current_steps": 51, "total_steps": 335, "loss": 0.2025, "lr": 4.9652217902637596e-05, "epoch": 0.7640449438202247, "percentage": 15.22, "elapsed_time": "0:09:53", "remaining_time": "0:55:04", "throughput": 9407.91, "total_tokens": 5582648}
|
||||
{"current_steps": 52, "total_steps": 335, "loss": 0.1592, "lr": 4.9607504272744575e-05, "epoch": 0.7790262172284644, "percentage": 15.52, "elapsed_time": "0:10:03", "remaining_time": "0:54:45", "throughput": 9431.45, "total_tokens": 5692920}
|
||||
{"current_steps": 53, "total_steps": 335, "loss": 0.2657, "lr": 4.956011005292692e-05, "epoch": 0.7940074906367042, "percentage": 15.82, "elapsed_time": "0:10:13", "remaining_time": "0:54:26", "throughput": 9441.15, "total_tokens": 5795728}
|
||||
{"current_steps": 54, "total_steps": 335, "loss": 0.1878, "lr": 4.951004040601898e-05, "epoch": 0.8089887640449438, "percentage": 16.12, "elapsed_time": "0:10:24", "remaining_time": "0:54:08", "throughput": 9471.15, "total_tokens": 5911816}
|
||||
{"current_steps": 55, "total_steps": 335, "loss": 0.2157, "lr": 4.945730078629964e-05, "epoch": 0.8239700374531835, "percentage": 16.42, "elapsed_time": "0:10:34", "remaining_time": "0:53:49", "throughput": 9482.67, "total_tokens": 6015648}
|
||||
{"current_steps": 56, "total_steps": 335, "loss": 0.1789, "lr": 4.9401896938898185e-05, "epoch": 0.8389513108614233, "percentage": 16.72, "elapsed_time": "0:10:44", "remaining_time": "0:53:32", "throughput": 9510.27, "total_tokens": 6132248}
|
||||
{"current_steps": 57, "total_steps": 335, "loss": 0.2019, "lr": 4.934383489916843e-05, "epoch": 0.8539325842696629, "percentage": 17.01, "elapsed_time": "0:10:55", "remaining_time": "0:53:15", "throughput": 9537.52, "total_tokens": 6249344}
|
||||
{"current_steps": 58, "total_steps": 335, "loss": 0.132, "lr": 4.928312099203131e-05, "epoch": 0.8689138576779026, "percentage": 17.31, "elapsed_time": "0:11:05", "remaining_time": "0:52:59", "throughput": 9564.24, "total_tokens": 6366872}
|
||||
{"current_steps": 59, "total_steps": 335, "loss": 0.2022, "lr": 4.921976183128585e-05, "epoch": 0.8838951310861424, "percentage": 17.61, "elapsed_time": "0:11:16", "remaining_time": "0:52:42", "throughput": 9578.22, "total_tokens": 6475464}
|
||||
{"current_steps": 60, "total_steps": 335, "loss": 0.1605, "lr": 4.9153764318888706e-05, "epoch": 0.898876404494382, "percentage": 17.91, "elapsed_time": "0:11:27", "remaining_time": "0:52:31", "throughput": 9578.72, "total_tokens": 6587040}
|
||||
{"current_steps": 61, "total_steps": 335, "loss": 0.2062, "lr": 4.908513564420231e-05, "epoch": 0.9138576779026217, "percentage": 18.21, "elapsed_time": "0:11:39", "remaining_time": "0:52:20", "throughput": 9586.02, "total_tokens": 6702552}
|
||||
{"current_steps": 62, "total_steps": 335, "loss": 0.1485, "lr": 4.90138832832117e-05, "epoch": 0.9288389513108615, "percentage": 18.51, "elapsed_time": "0:11:50", "remaining_time": "0:52:09", "throughput": 9580.3, "total_tokens": 6809352}
|
||||
{"current_steps": 63, "total_steps": 335, "loss": 0.1896, "lr": 4.894001499771015e-05, "epoch": 0.9438202247191011, "percentage": 18.81, "elapsed_time": "0:12:02", "remaining_time": "0:51:58", "throughput": 9566.27, "total_tokens": 6909928}
|
||||
{"current_steps": 64, "total_steps": 335, "loss": 0.1141, "lr": 4.886353883445363e-05, "epoch": 0.9588014981273408, "percentage": 19.1, "elapsed_time": "0:12:13", "remaining_time": "0:51:48", "throughput": 9576.69, "total_tokens": 7029288}
|
||||
{"current_steps": 65, "total_steps": 335, "loss": 0.2227, "lr": 4.878446312428424e-05, "epoch": 0.9737827715355806, "percentage": 19.4, "elapsed_time": "0:12:25", "remaining_time": "0:51:36", "throughput": 9572.45, "total_tokens": 7136544}
|
||||
{"current_steps": 66, "total_steps": 335, "loss": 0.1648, "lr": 4.8702796481222714e-05, "epoch": 0.9887640449438202, "percentage": 19.7, "elapsed_time": "0:12:37", "remaining_time": "0:51:25", "throughput": 9569.3, "total_tokens": 7244184}
|
||||
{"current_steps": 67, "total_steps": 335, "loss": 0.2552, "lr": 4.861854780153004e-05, "epoch": 1.0, "percentage": 20.0, "elapsed_time": "0:12:41", "remaining_time": "0:50:47", "throughput": 9607.64, "total_tokens": 7319544}
|
||||
{"current_steps": 68, "total_steps": 335, "loss": 0.1038, "lr": 4.853172626273841e-05, "epoch": 1.0149812734082397, "percentage": 20.3, "elapsed_time": "0:12:53", "remaining_time": "0:50:37", "throughput": 9615.77, "total_tokens": 7437632}
|
||||
{"current_steps": 69, "total_steps": 335, "loss": 0.1202, "lr": 4.8442341322651385e-05, "epoch": 1.0299625468164795, "percentage": 20.6, "elapsed_time": "0:13:04", "remaining_time": "0:50:26", "throughput": 9614.67, "total_tokens": 7547280}
|
||||
{"current_steps": 70, "total_steps": 335, "loss": 0.1851, "lr": 4.83504027183137e-05, "epoch": 1.0449438202247192, "percentage": 20.9, "elapsed_time": "0:13:16", "remaining_time": "0:50:16", "throughput": 9610.6, "total_tokens": 7658904}
|
||||
{"current_steps": 71, "total_steps": 335, "loss": 0.1193, "lr": 4.825592046495054e-05, "epoch": 1.0599250936329587, "percentage": 21.19, "elapsed_time": "0:13:28", "remaining_time": "0:50:06", "throughput": 9601.61, "total_tokens": 7762712}
|
||||
{"current_steps": 72, "total_steps": 335, "loss": 0.1442, "lr": 4.8158904854876555e-05, "epoch": 1.0749063670411985, "percentage": 21.49, "elapsed_time": "0:13:40", "remaining_time": "0:49:56", "throughput": 9598.73, "total_tokens": 7875080}
|
||||
{"current_steps": 73, "total_steps": 335, "loss": 0.1783, "lr": 4.805936645637463e-05, "epoch": 1.0898876404494382, "percentage": 21.79, "elapsed_time": "0:13:52", "remaining_time": "0:49:47", "throughput": 9599.16, "total_tokens": 7989424}
|
||||
{"current_steps": 74, "total_steps": 335, "loss": 0.096, "lr": 4.795731611254473e-05, "epoch": 1.104868913857678, "percentage": 22.09, "elapsed_time": "0:14:04", "remaining_time": "0:49:36", "throughput": 9601.65, "total_tokens": 8104200}
|
||||
{"current_steps": 75, "total_steps": 335, "loss": 0.1223, "lr": 4.785276494012263e-05, "epoch": 1.1198501872659177, "percentage": 22.39, "elapsed_time": "0:14:15", "remaining_time": "0:49:27", "throughput": 9598.92, "total_tokens": 8216400}
|
||||
{"current_steps": 75, "total_steps": 335, "eval_loss": 0.20777302980422974, "epoch": 1.1198501872659177, "percentage": 22.39, "elapsed_time": "0:14:20", "remaining_time": "0:49:44", "throughput": 9543.77, "total_tokens": 8216400}
|
||||
{"current_steps": 76, "total_steps": 335, "loss": 0.1293, "lr": 4.7745724328269e-05, "epoch": 1.1348314606741572, "percentage": 22.69, "elapsed_time": "0:14:32", "remaining_time": "0:49:34", "throughput": 9543.26, "total_tokens": 8330424}
|
||||
{"current_steps": 77, "total_steps": 335, "loss": 0.1562, "lr": 4.763620593732867e-05, "epoch": 1.149812734082397, "percentage": 22.99, "elapsed_time": "0:14:44", "remaining_time": "0:49:24", "throughput": 9537.93, "total_tokens": 8438312}
|
||||
{"current_steps": 78, "total_steps": 335, "loss": 0.1081, "lr": 4.752422169756048e-05, "epoch": 1.1647940074906367, "percentage": 23.28, "elapsed_time": "0:14:56", "remaining_time": "0:49:14", "throughput": 9524.14, "total_tokens": 8538856}
|
||||
{"current_steps": 79, "total_steps": 335, "loss": 0.0907, "lr": 4.740978380783765e-05, "epoch": 1.1797752808988764, "percentage": 23.58, "elapsed_time": "0:15:08", "remaining_time": "0:49:03", "throughput": 9520.37, "total_tokens": 8648688}
|
||||
{"current_steps": 80, "total_steps": 335, "loss": 0.1497, "lr": 4.7292904734318924e-05, "epoch": 1.1947565543071161, "percentage": 23.88, "elapsed_time": "0:15:20", "remaining_time": "0:48:53", "throughput": 9515.44, "total_tokens": 8757528}
|
||||
{"current_steps": 81, "total_steps": 335, "loss": 0.1343, "lr": 4.7173597209090534e-05, "epoch": 1.2097378277153559, "percentage": 24.18, "elapsed_time": "0:15:32", "remaining_time": "0:48:42", "throughput": 9517.49, "total_tokens": 8871600}
|
||||
{"current_steps": 82, "total_steps": 335, "loss": 0.1842, "lr": 4.70518742287793e-05, "epoch": 1.2247191011235956, "percentage": 24.48, "elapsed_time": "0:15:43", "remaining_time": "0:48:31", "throughput": 9512.25, "total_tokens": 8975328}
|
||||
{"current_steps": 83, "total_steps": 335, "loss": 0.1342, "lr": 4.6927749053136866e-05, "epoch": 1.2397003745318351, "percentage": 24.78, "elapsed_time": "0:15:55", "remaining_time": "0:48:19", "throughput": 9518.43, "total_tokens": 9090992}
|
||||
{"current_steps": 84, "total_steps": 335, "loss": 0.1938, "lr": 4.6801235203595195e-05, "epoch": 1.2546816479400749, "percentage": 25.07, "elapsed_time": "0:16:06", "remaining_time": "0:48:07", "throughput": 9520.44, "total_tokens": 9201320}
|
||||
{"current_steps": 85, "total_steps": 335, "loss": 0.1673, "lr": 4.667234646179368e-05, "epoch": 1.2696629213483146, "percentage": 25.37, "elapsed_time": "0:16:17", "remaining_time": "0:47:55", "throughput": 9517.24, "total_tokens": 9304160}
|
||||
{"current_steps": 86, "total_steps": 335, "loss": 0.2025, "lr": 4.654109686807787e-05, "epoch": 1.2846441947565543, "percentage": 25.67, "elapsed_time": "0:16:29", "remaining_time": "0:47:44", "throughput": 9512.15, "total_tokens": 9409224}
|
||||
{"current_steps": 87, "total_steps": 335, "loss": 0.1421, "lr": 4.640750071996995e-05, "epoch": 1.299625468164794, "percentage": 25.97, "elapsed_time": "0:16:40", "remaining_time": "0:47:32", "throughput": 9507.57, "total_tokens": 9514232}
|
||||
{"current_steps": 88, "total_steps": 335, "loss": 0.1485, "lr": 4.6271572570611296e-05, "epoch": 1.3146067415730336, "percentage": 26.27, "elapsed_time": "0:16:52", "remaining_time": "0:47:21", "throughput": 9507.1, "total_tokens": 9623752}
|
||||
{"current_steps": 89, "total_steps": 335, "loss": 0.1504, "lr": 4.613332722717714e-05, "epoch": 1.3295880149812733, "percentage": 26.57, "elapsed_time": "0:17:03", "remaining_time": "0:47:10", "throughput": 9507.85, "total_tokens": 9734808}
|
||||
{"current_steps": 90, "total_steps": 335, "loss": 0.1232, "lr": 4.5992779749263546e-05, "epoch": 1.344569288389513, "percentage": 26.87, "elapsed_time": "0:17:15", "remaining_time": "0:46:58", "throughput": 9510.01, "total_tokens": 9847464}
|
||||
{"current_steps": 91, "total_steps": 335, "loss": 0.1916, "lr": 4.584994544724695e-05, "epoch": 1.3595505617977528, "percentage": 27.16, "elapsed_time": "0:17:26", "remaining_time": "0:46:47", "throughput": 9494.85, "total_tokens": 9940464}
|
||||
{"current_steps": 92, "total_steps": 335, "loss": 0.1665, "lr": 4.5704839880616296e-05, "epoch": 1.3745318352059925, "percentage": 27.46, "elapsed_time": "0:17:38", "remaining_time": "0:46:35", "throughput": 9498.56, "total_tokens": 10054728}
|
||||
{"current_steps": 93, "total_steps": 335, "loss": 0.102, "lr": 4.5557478856278114e-05, "epoch": 1.3895131086142323, "percentage": 27.76, "elapsed_time": "0:17:50", "remaining_time": "0:46:24", "throughput": 9504.9, "total_tokens": 10172456}
|
||||
{"current_steps": 94, "total_steps": 335, "loss": 0.1167, "lr": 4.5407878426834596e-05, "epoch": 1.404494382022472, "percentage": 28.06, "elapsed_time": "0:18:01", "remaining_time": "0:46:13", "throughput": 9501.49, "total_tokens": 10279024}
|
||||
{"current_steps": 95, "total_steps": 335, "loss": 0.1945, "lr": 4.5256054888834934e-05, "epoch": 1.4194756554307117, "percentage": 28.36, "elapsed_time": "0:18:13", "remaining_time": "0:46:02", "throughput": 9505.4, "total_tokens": 10394120}
|
||||
{"current_steps": 96, "total_steps": 335, "loss": 0.1576, "lr": 4.5102024781000077e-05, "epoch": 1.4344569288389513, "percentage": 28.66, "elapsed_time": "0:18:25", "remaining_time": "0:45:51", "throughput": 9505.4, "total_tokens": 10503768}
|
||||
{"current_steps": 97, "total_steps": 335, "loss": 0.1266, "lr": 4.4945804882421086e-05, "epoch": 1.449438202247191, "percentage": 28.96, "elapsed_time": "0:18:36", "remaining_time": "0:45:39", "throughput": 9507.51, "total_tokens": 10616136}
|
||||
{"current_steps": 98, "total_steps": 335, "loss": 0.0974, "lr": 4.478741221073136e-05, "epoch": 1.4644194756554307, "percentage": 29.25, "elapsed_time": "0:18:48", "remaining_time": "0:45:28", "throughput": 9507.33, "total_tokens": 10725704}
|
||||
{"current_steps": 99, "total_steps": 335, "loss": 0.0942, "lr": 4.4626864020252774e-05, "epoch": 1.4794007490636705, "percentage": 29.55, "elapsed_time": "0:18:59", "remaining_time": "0:45:16", "throughput": 9510.15, "total_tokens": 10838848}
|
||||
{"current_steps": 100, "total_steps": 335, "loss": 0.16, "lr": 4.446417780011618e-05, "epoch": 1.49438202247191, "percentage": 29.85, "elapsed_time": "0:19:11", "remaining_time": "0:45:05", "throughput": 9513.83, "total_tokens": 10953704}
|
||||
{"current_steps": 100, "total_steps": 335, "eval_loss": 0.20240993797779083, "epoch": 1.49438202247191, "percentage": 29.85, "elapsed_time": "0:19:16", "remaining_time": "0:45:17", "throughput": 9473.13, "total_tokens": 10953704}
|
||||
{"current_steps": 101, "total_steps": 335, "loss": 0.1192, "lr": 4.42993712723562e-05, "epoch": 1.5093632958801497, "percentage": 30.15, "elapsed_time": "0:19:27", "remaining_time": "0:45:05", "throughput": 9481.33, "total_tokens": 11073888}
|
||||
{"current_steps": 102, "total_steps": 335, "loss": 0.1767, "lr": 4.413246238998069e-05, "epoch": 1.5243445692883895, "percentage": 30.45, "elapsed_time": "0:19:39", "remaining_time": "0:44:54", "throughput": 9476.87, "total_tokens": 11178896}
|
||||
{"current_steps": 103, "total_steps": 335, "loss": 0.1383, "lr": 4.3963469335015085e-05, "epoch": 1.5393258426966292, "percentage": 30.75, "elapsed_time": "0:19:51", "remaining_time": "0:44:42", "throughput": 9477.94, "total_tokens": 11289112}
|
||||
{"current_steps": 104, "total_steps": 335, "loss": 0.1421, "lr": 4.379241051652174e-05, "epoch": 1.554307116104869, "percentage": 31.04, "elapsed_time": "0:20:02", "remaining_time": "0:44:31", "throughput": 9481.53, "total_tokens": 11401952}
|
||||
{"current_steps": 105, "total_steps": 335, "loss": 0.1201, "lr": 4.361930456859455e-05, "epoch": 1.5692883895131087, "percentage": 31.34, "elapsed_time": "0:20:13", "remaining_time": "0:44:19", "throughput": 9482.92, "total_tokens": 11511848}
|
||||
{"current_steps": 106, "total_steps": 335, "loss": 0.0623, "lr": 4.34441703483291e-05, "epoch": 1.5842696629213484, "percentage": 31.64, "elapsed_time": "0:20:25", "remaining_time": "0:44:07", "throughput": 9486.84, "total_tokens": 11625728}
|
||||
{"current_steps": 107, "total_steps": 335, "loss": 0.193, "lr": 4.326702693376844e-05, "epoch": 1.5992509363295881, "percentage": 31.94, "elapsed_time": "0:20:37", "remaining_time": "0:43:55", "throughput": 9491.7, "total_tokens": 11741544}
|
||||
{"current_steps": 108, "total_steps": 335, "loss": 0.0936, "lr": 4.308789362182492e-05, "epoch": 1.6142322097378277, "percentage": 32.24, "elapsed_time": "0:20:48", "remaining_time": "0:43:44", "throughput": 9492.36, "total_tokens": 11851240}
|
||||
{"current_steps": 109, "total_steps": 335, "loss": 0.1468, "lr": 4.2906789926177975e-05, "epoch": 1.6292134831460674, "percentage": 32.54, "elapsed_time": "0:21:00", "remaining_time": "0:43:33", "throughput": 9492.37, "total_tokens": 11963664}
|
||||
{"current_steps": 110, "total_steps": 335, "loss": 0.1707, "lr": 4.272373557514858e-05, "epoch": 1.6441947565543071, "percentage": 32.84, "elapsed_time": "0:21:11", "remaining_time": "0:43:20", "throughput": 9491.21, "total_tokens": 12067544}
|
||||
{"current_steps": 111, "total_steps": 335, "loss": 0.1829, "lr": 4.2538750509550054e-05, "epoch": 1.6591760299625467, "percentage": 33.13, "elapsed_time": "0:21:22", "remaining_time": "0:43:08", "throughput": 9482.13, "total_tokens": 12164792}
|
||||
{"current_steps": 112, "total_steps": 335, "loss": 0.1401, "lr": 4.235185488051585e-05, "epoch": 1.6741573033707864, "percentage": 33.43, "elapsed_time": "0:21:34", "remaining_time": "0:42:58", "throughput": 9484.52, "total_tokens": 12281440}
|
||||
{"current_steps": 113, "total_steps": 335, "loss": 0.1412, "lr": 4.216306904730447e-05, "epoch": 1.6891385767790261, "percentage": 33.73, "elapsed_time": "0:21:46", "remaining_time": "0:42:47", "throughput": 9481.49, "total_tokens": 12389800}
|
||||
{"current_steps": 114, "total_steps": 335, "loss": 0.1908, "lr": 4.1972413575081595e-05, "epoch": 1.7041198501872659, "percentage": 34.03, "elapsed_time": "0:21:58", "remaining_time": "0:42:35", "throughput": 9480.71, "total_tokens": 12498360}
|
||||
{"current_steps": 115, "total_steps": 335, "loss": 0.1783, "lr": 4.177990923267986e-05, "epoch": 1.7191011235955056, "percentage": 34.33, "elapsed_time": "0:22:09", "remaining_time": "0:42:24", "throughput": 9475.38, "total_tokens": 12601072}
|
||||
{"current_steps": 116, "total_steps": 335, "loss": 0.1246, "lr": 4.158557699033644e-05, "epoch": 1.7340823970037453, "percentage": 34.63, "elapsed_time": "0:22:21", "remaining_time": "0:42:12", "throughput": 9469.52, "total_tokens": 12704456}
|
||||
{"current_steps": 117, "total_steps": 335, "loss": 0.0917, "lr": 4.138943801740865e-05, "epoch": 1.749063670411985, "percentage": 34.93, "elapsed_time": "0:22:33", "remaining_time": "0:42:01", "throughput": 9458.59, "total_tokens": 12801568}
|
||||
{"current_steps": 118, "total_steps": 335, "loss": 0.0672, "lr": 4.119151368006793e-05, "epoch": 1.7640449438202248, "percentage": 35.22, "elapsed_time": "0:22:45", "remaining_time": "0:41:50", "throughput": 9462.14, "total_tokens": 12917448}
|
||||
{"current_steps": 119, "total_steps": 335, "loss": 0.1358, "lr": 4.099182553897229e-05, "epoch": 1.7790262172284645, "percentage": 35.52, "elapsed_time": "0:22:56", "remaining_time": "0:41:39", "throughput": 9457.8, "total_tokens": 13022432}
|
||||
{"current_steps": 120, "total_steps": 335, "loss": 0.1048, "lr": 4.079039534691767e-05, "epoch": 1.7940074906367043, "percentage": 35.82, "elapsed_time": "0:23:08", "remaining_time": "0:41:28", "throughput": 9454.12, "total_tokens": 13129888}
|
||||
{"current_steps": 121, "total_steps": 335, "loss": 0.1369, "lr": 4.058724504646834e-05, "epoch": 1.8089887640449438, "percentage": 36.12, "elapsed_time": "0:23:20", "remaining_time": "0:41:17", "throughput": 9449.6, "total_tokens": 13235312}
|
||||
{"current_steps": 122, "total_steps": 335, "loss": 0.1564, "lr": 4.0382396767566536e-05, "epoch": 1.8239700374531835, "percentage": 36.42, "elapsed_time": "0:23:32", "remaining_time": "0:41:05", "throughput": 9452.36, "total_tokens": 13350920}
|
||||
{"current_steps": 123, "total_steps": 335, "loss": 0.1292, "lr": 4.017587282512181e-05, "epoch": 1.8389513108614233, "percentage": 36.72, "elapsed_time": "0:23:44", "remaining_time": "0:40:55", "throughput": 9448.3, "total_tokens": 13458096}
|
||||
{"current_steps": 124, "total_steps": 335, "loss": 0.1175, "lr": 3.9967695716580224e-05, "epoch": 1.8539325842696628, "percentage": 37.01, "elapsed_time": "0:23:56", "remaining_time": "0:40:44", "throughput": 9444.9, "total_tokens": 13566016}
|
||||
{"current_steps": 125, "total_steps": 335, "loss": 0.1814, "lr": 3.975788811947351e-05, "epoch": 1.8689138576779025, "percentage": 37.31, "elapsed_time": "0:24:08", "remaining_time": "0:40:32", "throughput": 9444.35, "total_tokens": 13676808}
|
||||
{"current_steps": 125, "total_steps": 335, "eval_loss": 0.18464037775993347, "epoch": 1.8689138576779025, "percentage": 37.31, "elapsed_time": "0:24:13", "remaining_time": "0:40:41", "throughput": 9412.31, "total_tokens": 13676808}
|
||||
{"current_steps": 126, "total_steps": 335, "loss": 0.0969, "lr": 3.954647288894883e-05, "epoch": 1.8838951310861423, "percentage": 37.61, "elapsed_time": "0:24:24", "remaining_time": "0:40:29", "throughput": 9410.44, "total_tokens": 13785624}
|
||||
{"current_steps": 127, "total_steps": 335, "loss": 0.1431, "lr": 3.933347305527898e-05, "epoch": 1.898876404494382, "percentage": 37.91, "elapsed_time": "0:24:36", "remaining_time": "0:40:18", "throughput": 9409.77, "total_tokens": 13896368}
|
||||
{"current_steps": 128, "total_steps": 335, "loss": 0.1552, "lr": 3.911891182135371e-05, "epoch": 1.9138576779026217, "percentage": 38.21, "elapsed_time": "0:24:48", "remaining_time": "0:40:07", "throughput": 9410.85, "total_tokens": 14010984}
|
||||
{"current_steps": 129, "total_steps": 335, "loss": 0.1472, "lr": 3.8902812560152066e-05, "epoch": 1.9288389513108615, "percentage": 38.51, "elapsed_time": "0:25:00", "remaining_time": "0:39:55", "throughput": 9405.64, "total_tokens": 14112168}
|
||||
{"current_steps": 130, "total_steps": 335, "loss": 0.1115, "lr": 3.868519881219631e-05, "epoch": 1.9438202247191012, "percentage": 38.81, "elapsed_time": "0:25:12", "remaining_time": "0:39:44", "throughput": 9408.53, "total_tokens": 14227128}
|
||||
{"current_steps": 131, "total_steps": 335, "loss": 0.1027, "lr": 3.846609428298757e-05, "epoch": 1.958801498127341, "percentage": 39.1, "elapsed_time": "0:25:24", "remaining_time": "0:39:33", "throughput": 9410.53, "total_tokens": 14342592}
|
||||
{"current_steps": 132, "total_steps": 335, "loss": 0.1057, "lr": 3.824552284042351e-05, "epoch": 1.9737827715355807, "percentage": 39.4, "elapsed_time": "0:25:36", "remaining_time": "0:39:22", "throughput": 9414.82, "total_tokens": 14461768}
|
||||
{"current_steps": 133, "total_steps": 335, "loss": 0.1326, "lr": 3.8023508512198256e-05, "epoch": 1.9887640449438202, "percentage": 39.7, "elapsed_time": "0:25:47", "remaining_time": "0:39:10", "throughput": 9412.45, "total_tokens": 14568520}
|
||||
{"current_steps": 134, "total_steps": 335, "loss": 0.1245, "lr": 3.780007548318507e-05, "epoch": 2.0, "percentage": 40.0, "elapsed_time": "0:25:57", "remaining_time": "0:38:56", "throughput": 9400.19, "total_tokens": 14641496}
|
||||
{"current_steps": 135, "total_steps": 335, "loss": 0.158, "lr": 3.7575248092801686e-05, "epoch": 2.0149812734082397, "percentage": 40.3, "elapsed_time": "0:26:09", "remaining_time": "0:38:45", "throughput": 9395.51, "total_tokens": 14745856}
|
||||
{"current_steps": 136, "total_steps": 335, "loss": 0.122, "lr": 3.734905083235901e-05, "epoch": 2.0299625468164795, "percentage": 40.6, "elapsed_time": "0:26:21", "remaining_time": "0:38:33", "throughput": 9391.71, "total_tokens": 14851856}
|
||||
{"current_steps": 137, "total_steps": 335, "loss": 0.1392, "lr": 3.712150834239313e-05, "epoch": 2.044943820224719, "percentage": 40.9, "elapsed_time": "0:26:33", "remaining_time": "0:38:22", "throughput": 9392.06, "total_tokens": 14962208}
|
||||
{"current_steps": 138, "total_steps": 335, "loss": 0.0892, "lr": 3.689264540998116e-05, "epoch": 2.059925093632959, "percentage": 41.19, "elapsed_time": "0:26:44", "remaining_time": "0:38:10", "throughput": 9392.19, "total_tokens": 15071712}
|
||||
{"current_steps": 139, "total_steps": 335, "loss": 0.0706, "lr": 3.66624869660411e-05, "epoch": 2.0749063670411987, "percentage": 41.49, "elapsed_time": "0:26:56", "remaining_time": "0:37:59", "throughput": 9391.03, "total_tokens": 15178568}
|
||||
{"current_steps": 140, "total_steps": 335, "loss": 0.0695, "lr": 3.6431058082615964e-05, "epoch": 2.0898876404494384, "percentage": 41.79, "elapsed_time": "0:27:07", "remaining_time": "0:37:47", "throughput": 9395.69, "total_tokens": 15295296}
|
||||
{"current_steps": 141, "total_steps": 335, "loss": 0.1314, "lr": 3.619838397014263e-05, "epoch": 2.1048689138576777, "percentage": 42.09, "elapsed_time": "0:27:19", "remaining_time": "0:37:35", "throughput": 9394.32, "total_tokens": 15401968}
|
||||
{"current_steps": 142, "total_steps": 335, "loss": 0.1043, "lr": 3.5964489974705553e-05, "epoch": 2.1198501872659175, "percentage": 42.39, "elapsed_time": "0:27:30", "remaining_time": "0:37:23", "throughput": 9395.3, "total_tokens": 15510128}
|
||||
{"current_steps": 143, "total_steps": 335, "loss": 0.1566, "lr": 3.572940157527572e-05, "epoch": 2.134831460674157, "percentage": 42.69, "elapsed_time": "0:27:41", "remaining_time": "0:37:11", "throughput": 9390.77, "total_tokens": 15606536}
|
||||
{"current_steps": 144, "total_steps": 335, "loss": 0.0907, "lr": 3.549314438093515e-05, "epoch": 2.149812734082397, "percentage": 42.99, "elapsed_time": "0:27:53", "remaining_time": "0:36:59", "throughput": 9393.32, "total_tokens": 15717520}
|
||||
{"current_steps": 145, "total_steps": 335, "loss": 0.1258, "lr": 3.525574412808717e-05, "epoch": 2.1647940074906367, "percentage": 43.28, "elapsed_time": "0:28:04", "remaining_time": "0:36:47", "throughput": 9394.67, "total_tokens": 15827848}
|
||||
{"current_steps": 146, "total_steps": 335, "loss": 0.1402, "lr": 3.501722667765286e-05, "epoch": 2.1797752808988764, "percentage": 43.58, "elapsed_time": "0:28:16", "remaining_time": "0:36:35", "throughput": 9394.05, "total_tokens": 15934960}
|
||||
{"current_steps": 147, "total_steps": 335, "loss": 0.0751, "lr": 3.47776180122539e-05, "epoch": 2.194756554307116, "percentage": 43.88, "elapsed_time": "0:28:27", "remaining_time": "0:36:23", "throughput": 9392.38, "total_tokens": 16038664}
|
||||
{"current_steps": 148, "total_steps": 335, "loss": 0.1599, "lr": 3.453694423338225e-05, "epoch": 2.209737827715356, "percentage": 44.18, "elapsed_time": "0:28:38", "remaining_time": "0:36:11", "throughput": 9392.28, "total_tokens": 16142344}
|
||||
{"current_steps": 149, "total_steps": 335, "loss": 0.1017, "lr": 3.4295231558556715e-05, "epoch": 2.2247191011235956, "percentage": 44.48, "elapsed_time": "0:28:50", "remaining_time": "0:35:59", "throughput": 9387.97, "total_tokens": 16242008}
|
||||
{"current_steps": 150, "total_steps": 335, "loss": 0.0857, "lr": 3.4052506318467084e-05, "epoch": 2.2397003745318353, "percentage": 44.78, "elapsed_time": "0:29:01", "remaining_time": "0:35:47", "throughput": 9389.91, "total_tokens": 16353368}
|
||||
{"current_steps": 150, "total_steps": 335, "eval_loss": 0.1802486777305603, "epoch": 2.2397003745318353, "percentage": 44.78, "elapsed_time": "0:29:06", "remaining_time": "0:35:54", "throughput": 9363.41, "total_tokens": 16353368}
|
||||
{"current_steps": 151, "total_steps": 335, "loss": 0.12, "lr": 3.3808794954105716e-05, "epoch": 2.254681647940075, "percentage": 45.07, "elapsed_time": "0:29:18", "remaining_time": "0:35:42", "throughput": 9364.17, "total_tokens": 16462800}
|
||||
{"current_steps": 152, "total_steps": 335, "loss": 0.202, "lr": 3.356412401388732e-05, "epoch": 2.2696629213483144, "percentage": 45.37, "elapsed_time": "0:29:29", "remaining_time": "0:35:30", "throughput": 9366.88, "total_tokens": 16576136}
|
||||
{"current_steps": 153, "total_steps": 335, "loss": 0.0774, "lr": 3.3318520150756846e-05, "epoch": 2.284644194756554, "percentage": 45.67, "elapsed_time": "0:29:41", "remaining_time": "0:35:18", "throughput": 9367.72, "total_tokens": 16685072}
|
||||
{"current_steps": 154, "total_steps": 335, "loss": 0.0896, "lr": 3.307201011928616e-05, "epoch": 2.299625468164794, "percentage": 45.97, "elapsed_time": "0:29:52", "remaining_time": "0:35:06", "throughput": 9371.6, "total_tokens": 16799472}
|
||||
{"current_steps": 155, "total_steps": 335, "loss": 0.1516, "lr": 3.282462077275947e-05, "epoch": 2.3146067415730336, "percentage": 46.27, "elapsed_time": "0:30:04", "remaining_time": "0:34:55", "throughput": 9376.06, "total_tokens": 16916072}
|
||||
{"current_steps": 156, "total_steps": 335, "loss": 0.1394, "lr": 3.257637906024822e-05, "epoch": 2.3295880149812733, "percentage": 46.57, "elapsed_time": "0:30:15", "remaining_time": "0:34:43", "throughput": 9382.12, "total_tokens": 17036352}
|
||||
{"current_steps": 157, "total_steps": 335, "loss": 0.1162, "lr": 3.2327312023675287e-05, "epoch": 2.344569288389513, "percentage": 46.87, "elapsed_time": "0:30:27", "remaining_time": "0:34:31", "throughput": 9380.77, "total_tokens": 17141704}
|
||||
{"current_steps": 158, "total_steps": 335, "loss": 0.1081, "lr": 3.2077446794869295e-05, "epoch": 2.359550561797753, "percentage": 47.16, "elapsed_time": "0:30:38", "remaining_time": "0:34:19", "throughput": 9379.79, "total_tokens": 17247616}
|
||||
{"current_steps": 159, "total_steps": 335, "loss": 0.1278, "lr": 3.1826810592609036e-05, "epoch": 2.3745318352059925, "percentage": 47.46, "elapsed_time": "0:30:50", "remaining_time": "0:34:07", "throughput": 9383.19, "total_tokens": 17360352}
|
||||
{"current_steps": 160, "total_steps": 335, "loss": 0.1027, "lr": 3.157543071965835e-05, "epoch": 2.3895131086142323, "percentage": 47.76, "elapsed_time": "0:31:01", "remaining_time": "0:33:56", "throughput": 9384.92, "total_tokens": 17472040}
|
||||
{"current_steps": 161, "total_steps": 335, "loss": 0.1247, "lr": 3.132333455979202e-05, "epoch": 2.404494382022472, "percentage": 48.06, "elapsed_time": "0:31:13", "remaining_time": "0:33:44", "throughput": 9384.26, "total_tokens": 17579232}
|
||||
{"current_steps": 162, "total_steps": 335, "loss": 0.0773, "lr": 3.107054957481271e-05, "epoch": 2.4194756554307117, "percentage": 48.36, "elapsed_time": "0:31:24", "remaining_time": "0:33:32", "throughput": 9383.27, "total_tokens": 17686392}
|
||||
{"current_steps": 163, "total_steps": 335, "loss": 0.0579, "lr": 3.081710330155942e-05, "epoch": 2.4344569288389515, "percentage": 48.66, "elapsed_time": "0:31:36", "remaining_time": "0:33:21", "throughput": 9386.19, "total_tokens": 17800024}
|
||||
{"current_steps": 164, "total_steps": 335, "loss": 0.0756, "lr": 3.056302334890786e-05, "epoch": 2.449438202247191, "percentage": 48.96, "elapsed_time": "0:31:48", "remaining_time": "0:33:09", "throughput": 9386.26, "total_tokens": 17909576}
|
||||
{"current_steps": 165, "total_steps": 335, "loss": 0.1386, "lr": 3.030833739476285e-05, "epoch": 2.464419475655431, "percentage": 49.25, "elapsed_time": "0:31:59", "remaining_time": "0:32:57", "throughput": 9383.53, "total_tokens": 18009360}
|
||||
{"current_steps": 166, "total_steps": 335, "loss": 0.1432, "lr": 3.0053073183043256e-05, "epoch": 2.4794007490636703, "percentage": 49.55, "elapsed_time": "0:32:10", "remaining_time": "0:32:45", "throughput": 9383.13, "total_tokens": 18114736}
|
||||
{"current_steps": 167, "total_steps": 335, "loss": 0.1071, "lr": 2.979725852065981e-05, "epoch": 2.49438202247191, "percentage": 49.85, "elapsed_time": "0:32:22", "remaining_time": "0:32:33", "throughput": 9385.23, "total_tokens": 18226888}
|
||||
{"current_steps": 168, "total_steps": 335, "loss": 0.114, "lr": 2.954092127448591e-05, "epoch": 2.5093632958801497, "percentage": 50.15, "elapsed_time": "0:32:33", "remaining_time": "0:32:22", "throughput": 9386.72, "total_tokens": 18338720}
|
||||
{"current_steps": 169, "total_steps": 335, "loss": 0.0981, "lr": 2.9284089368322045e-05, "epoch": 2.5243445692883895, "percentage": 50.45, "elapsed_time": "0:32:45", "remaining_time": "0:32:10", "throughput": 9388.58, "total_tokens": 18451496}
|
||||
{"current_steps": 170, "total_steps": 335, "loss": 0.1347, "lr": 2.9026790779853874e-05, "epoch": 2.539325842696629, "percentage": 50.75, "elapsed_time": "0:32:56", "remaining_time": "0:31:58", "throughput": 9388.81, "total_tokens": 18556776}
|
||||
{"current_steps": 171, "total_steps": 335, "loss": 0.0833, "lr": 2.876905353760459e-05, "epoch": 2.554307116104869, "percentage": 51.04, "elapsed_time": "0:33:08", "remaining_time": "0:31:46", "throughput": 9388.0, "total_tokens": 18664112}
|
||||
{"current_steps": 172, "total_steps": 335, "loss": 0.1111, "lr": 2.8510905717881614e-05, "epoch": 2.5692883895131087, "percentage": 51.34, "elapsed_time": "0:33:19", "remaining_time": "0:31:34", "throughput": 9387.76, "total_tokens": 18769448}
|
||||
{"current_steps": 173, "total_steps": 335, "loss": 0.1501, "lr": 2.8252375441718137e-05, "epoch": 2.5842696629213484, "percentage": 51.64, "elapsed_time": "0:33:30", "remaining_time": "0:31:23", "throughput": 9391.37, "total_tokens": 18884864}
|
||||
{"current_steps": 174, "total_steps": 335, "loss": 0.1171, "lr": 2.7993490871809808e-05, "epoch": 2.599250936329588, "percentage": 51.94, "elapsed_time": "0:33:42", "remaining_time": "0:31:11", "throughput": 9391.68, "total_tokens": 18993424}
|
||||
{"current_steps": 175, "total_steps": 335, "loss": 0.1261, "lr": 2.7734280209446865e-05, "epoch": 2.6142322097378274, "percentage": 52.24, "elapsed_time": "0:33:53", "remaining_time": "0:30:59", "throughput": 9395.99, "total_tokens": 19111296}
|
||||
{"current_steps": 175, "total_steps": 335, "eval_loss": 0.1688879132270813, "epoch": 2.6142322097378274, "percentage": 52.24, "elapsed_time": "0:33:58", "remaining_time": "0:31:04", "throughput": 9373.27, "total_tokens": 19111296}
|
||||
{"current_steps": 176, "total_steps": 335, "loss": 0.0987, "lr": 2.7474771691442018e-05, "epoch": 2.629213483146067, "percentage": 52.54, "elapsed_time": "0:34:10", "remaining_time": "0:30:52", "throughput": 9372.14, "total_tokens": 19213824}
|
||||
{"current_steps": 177, "total_steps": 335, "loss": 0.054, "lr": 2.721499358705458e-05, "epoch": 2.644194756554307, "percentage": 52.84, "elapsed_time": "0:34:21", "remaining_time": "0:30:40", "throughput": 9379.61, "total_tokens": 19338104}
|
||||
{"current_steps": 178, "total_steps": 335, "loss": 0.0683, "lr": 2.6954974194910888e-05, "epoch": 2.6591760299625467, "percentage": 53.13, "elapsed_time": "0:34:33", "remaining_time": "0:30:28", "throughput": 9380.91, "total_tokens": 19449848}
|
||||
{"current_steps": 179, "total_steps": 335, "loss": 0.1121, "lr": 2.6694741839921732e-05, "epoch": 2.6741573033707864, "percentage": 53.43, "elapsed_time": "0:34:44", "remaining_time": "0:30:17", "throughput": 9386.83, "total_tokens": 19571008}
|
||||
{"current_steps": 180, "total_steps": 335, "loss": 0.0888, "lr": 2.6434324870196748e-05, "epoch": 2.689138576779026, "percentage": 53.73, "elapsed_time": "0:34:56", "remaining_time": "0:30:05", "throughput": 9390.88, "total_tokens": 19686872}
|
||||
{"current_steps": 181, "total_steps": 335, "loss": 0.0751, "lr": 2.617375165395634e-05, "epoch": 2.704119850187266, "percentage": 54.03, "elapsed_time": "0:35:07", "remaining_time": "0:29:53", "throughput": 9392.65, "total_tokens": 19797960}
|
||||
{"current_steps": 182, "total_steps": 335, "loss": 0.1033, "lr": 2.5913050576441477e-05, "epoch": 2.7191011235955056, "percentage": 54.33, "elapsed_time": "0:35:19", "remaining_time": "0:29:41", "throughput": 9392.4, "total_tokens": 19905184}
|
||||
{"current_steps": 183, "total_steps": 335, "loss": 0.0867, "lr": 2.5652250036821523e-05, "epoch": 2.7340823970037453, "percentage": 54.63, "elapsed_time": "0:35:30", "remaining_time": "0:29:29", "throughput": 9392.21, "total_tokens": 20013120}
|
||||
{"current_steps": 184, "total_steps": 335, "loss": 0.1323, "lr": 2.5391378445100644e-05, "epoch": 2.749063670411985, "percentage": 54.93, "elapsed_time": "0:35:41", "remaining_time": "0:29:17", "throughput": 9388.6, "total_tokens": 20109488}
|
||||
{"current_steps": 185, "total_steps": 335, "loss": 0.0935, "lr": 2.5130464219022992e-05, "epoch": 2.764044943820225, "percentage": 55.22, "elapsed_time": "0:35:53", "remaining_time": "0:29:06", "throughput": 9392.86, "total_tokens": 20227088}
|
||||
{"current_steps": 186, "total_steps": 335, "loss": 0.095, "lr": 2.486953578097702e-05, "epoch": 2.7790262172284645, "percentage": 55.52, "elapsed_time": "0:36:04", "remaining_time": "0:28:54", "throughput": 9390.64, "total_tokens": 20330176}
|
||||
{"current_steps": 187, "total_steps": 335, "loss": 0.1094, "lr": 2.4608621554899362e-05, "epoch": 2.7940074906367043, "percentage": 55.82, "elapsed_time": "0:36:16", "remaining_time": "0:28:42", "throughput": 9394.72, "total_tokens": 20448288}
|
||||
{"current_steps": 188, "total_steps": 335, "loss": 0.094, "lr": 2.4347749963178486e-05, "epoch": 2.808988764044944, "percentage": 56.12, "elapsed_time": "0:36:28", "remaining_time": "0:28:30", "throughput": 9392.91, "total_tokens": 20552120}
|
||||
{"current_steps": 189, "total_steps": 335, "loss": 0.0948, "lr": 2.4086949423558526e-05, "epoch": 2.8239700374531838, "percentage": 56.42, "elapsed_time": "0:36:39", "remaining_time": "0:28:19", "throughput": 9394.24, "total_tokens": 20664640}
|
||||
{"current_steps": 190, "total_steps": 335, "loss": 0.0838, "lr": 2.3826248346043663e-05, "epoch": 2.8389513108614235, "percentage": 56.72, "elapsed_time": "0:36:51", "remaining_time": "0:28:07", "throughput": 9395.91, "total_tokens": 20777328}
|
||||
{"current_steps": 191, "total_steps": 335, "loss": 0.1071, "lr": 2.356567512980326e-05, "epoch": 2.853932584269663, "percentage": 57.01, "elapsed_time": "0:37:02", "remaining_time": "0:27:55", "throughput": 9399.79, "total_tokens": 20895424}
|
||||
{"current_steps": 192, "total_steps": 335, "loss": 0.0939, "lr": 2.3305258160078274e-05, "epoch": 2.8689138576779025, "percentage": 57.31, "elapsed_time": "0:37:14", "remaining_time": "0:27:44", "throughput": 9401.52, "total_tokens": 21007912}
|
||||
{"current_steps": 193, "total_steps": 335, "loss": 0.1093, "lr": 2.3045025805089118e-05, "epoch": 2.8838951310861423, "percentage": 57.61, "elapsed_time": "0:37:25", "remaining_time": "0:27:32", "throughput": 9401.46, "total_tokens": 21112424}
|
||||
{"current_steps": 194, "total_steps": 335, "loss": 0.1156, "lr": 2.278500641294543e-05, "epoch": 2.898876404494382, "percentage": 57.91, "elapsed_time": "0:37:36", "remaining_time": "0:27:20", "throughput": 9403.18, "total_tokens": 21221136}
|
||||
{"current_steps": 195, "total_steps": 335, "loss": 0.0693, "lr": 2.252522830855798e-05, "epoch": 2.9138576779026217, "percentage": 58.21, "elapsed_time": "0:37:48", "remaining_time": "0:27:08", "throughput": 9403.81, "total_tokens": 21331720}
|
||||
{"current_steps": 196, "total_steps": 335, "loss": 0.0907, "lr": 2.2265719790553147e-05, "epoch": 2.9288389513108615, "percentage": 58.51, "elapsed_time": "0:37:59", "remaining_time": "0:26:56", "throughput": 9406.82, "total_tokens": 21447512}
|
||||
{"current_steps": 197, "total_steps": 335, "loss": 0.0821, "lr": 2.2006509128190195e-05, "epoch": 2.943820224719101, "percentage": 58.81, "elapsed_time": "0:38:11", "remaining_time": "0:26:45", "throughput": 9405.24, "total_tokens": 21553192}
|
||||
{"current_steps": 198, "total_steps": 335, "loss": 0.1252, "lr": 2.174762455828187e-05, "epoch": 2.958801498127341, "percentage": 59.1, "elapsed_time": "0:38:22", "remaining_time": "0:26:33", "throughput": 9403.48, "total_tokens": 21655488}
|
||||
{"current_steps": 199, "total_steps": 335, "loss": 0.0859, "lr": 2.1489094282118395e-05, "epoch": 2.9737827715355807, "percentage": 59.4, "elapsed_time": "0:38:34", "remaining_time": "0:26:21", "throughput": 9405.0, "total_tokens": 21767256}
|
||||
{"current_steps": 200, "total_steps": 335, "loss": 0.1024, "lr": 2.123094646239541e-05, "epoch": 2.98876404494382, "percentage": 59.7, "elapsed_time": "0:38:45", "remaining_time": "0:26:09", "throughput": 9407.0, "total_tokens": 21879928}
|
||||
{"current_steps": 200, "total_steps": 335, "eval_loss": 0.1642482578754425, "epoch": 2.98876404494382, "percentage": 59.7, "elapsed_time": "0:38:50", "remaining_time": "0:26:13", "throughput": 9387.14, "total_tokens": 21879928}
|
||||
{"current_steps": 201, "total_steps": 335, "loss": 0.1114, "lr": 2.0973209220146135e-05, "epoch": 3.0, "percentage": 60.0, "elapsed_time": "0:39:00", "remaining_time": "0:26:00", "throughput": 9382.92, "total_tokens": 21962520}
|
||||
{"current_steps": 202, "total_steps": 335, "loss": 0.0762, "lr": 2.0715910631677968e-05, "epoch": 3.0149812734082397, "percentage": 60.3, "elapsed_time": "0:39:12", "remaining_time": "0:25:48", "throughput": 9380.46, "total_tokens": 22064872}
|
||||
{"current_steps": 203, "total_steps": 335, "loss": 0.0883, "lr": 2.0459078725514092e-05, "epoch": 3.0299625468164795, "percentage": 60.6, "elapsed_time": "0:39:23", "remaining_time": "0:25:36", "throughput": 9381.48, "total_tokens": 22169728}
|
||||
{"current_steps": 204, "total_steps": 335, "loss": 0.0756, "lr": 2.020274147934019e-05, "epoch": 3.044943820224719, "percentage": 60.9, "elapsed_time": "0:39:34", "remaining_time": "0:25:24", "throughput": 9384.55, "total_tokens": 22285928}
|
||||
{"current_steps": 205, "total_steps": 335, "loss": 0.0887, "lr": 1.9946926816956743e-05, "epoch": 3.059925093632959, "percentage": 61.19, "elapsed_time": "0:39:45", "remaining_time": "0:25:12", "throughput": 9383.39, "total_tokens": 22387040}
|
||||
{"current_steps": 206, "total_steps": 335, "loss": 0.0926, "lr": 1.9691662605237166e-05, "epoch": 3.0749063670411987, "percentage": 61.49, "elapsed_time": "0:39:57", "remaining_time": "0:25:01", "throughput": 9385.8, "total_tokens": 22498720}
|
||||
{"current_steps": 207, "total_steps": 335, "loss": 0.1224, "lr": 1.9436976651092144e-05, "epoch": 3.0898876404494384, "percentage": 61.79, "elapsed_time": "0:40:08", "remaining_time": "0:24:49", "throughput": 9391.28, "total_tokens": 22621072}
|
||||
{"current_steps": 208, "total_steps": 335, "loss": 0.0856, "lr": 1.9182896698440584e-05, "epoch": 3.1048689138576777, "percentage": 62.09, "elapsed_time": "0:40:20", "remaining_time": "0:24:37", "throughput": 9389.43, "total_tokens": 22724704}
|
||||
{"current_steps": 209, "total_steps": 335, "loss": 0.0621, "lr": 1.89294504251873e-05, "epoch": 3.1198501872659175, "percentage": 62.39, "elapsed_time": "0:40:31", "remaining_time": "0:24:26", "throughput": 9391.65, "total_tokens": 22838936}
|
||||
{"current_steps": 210, "total_steps": 335, "loss": 0.1196, "lr": 1.867666544020798e-05, "epoch": 3.134831460674157, "percentage": 62.69, "elapsed_time": "0:40:43", "remaining_time": "0:24:14", "throughput": 9389.51, "total_tokens": 22939008}
|
||||
{"current_steps": 211, "total_steps": 335, "loss": 0.1071, "lr": 1.8424569280341653e-05, "epoch": 3.149812734082397, "percentage": 62.99, "elapsed_time": "0:40:54", "remaining_time": "0:24:02", "throughput": 9391.99, "total_tokens": 23054112}
|
||||
{"current_steps": 212, "total_steps": 335, "loss": 0.0932, "lr": 1.817318940739098e-05, "epoch": 3.1647940074906367, "percentage": 63.28, "elapsed_time": "0:41:06", "remaining_time": "0:23:50", "throughput": 9389.71, "total_tokens": 23156632}
|
||||
{"current_steps": 213, "total_steps": 335, "loss": 0.0792, "lr": 1.7922553205130707e-05, "epoch": 3.1797752808988764, "percentage": 63.58, "elapsed_time": "0:41:17", "remaining_time": "0:23:39", "throughput": 9392.23, "total_tokens": 23271912}
|
||||
{"current_steps": 214, "total_steps": 335, "loss": 0.0513, "lr": 1.767268797632472e-05, "epoch": 3.194756554307116, "percentage": 63.88, "elapsed_time": "0:41:29", "remaining_time": "0:23:27", "throughput": 9392.66, "total_tokens": 23381816}
|
||||
{"current_steps": 215, "total_steps": 335, "loss": 0.0903, "lr": 1.7423620939751788e-05, "epoch": 3.209737827715356, "percentage": 64.18, "elapsed_time": "0:41:40", "remaining_time": "0:23:15", "throughput": 9392.39, "total_tokens": 23489552}
|
||||
{"current_steps": 216, "total_steps": 335, "loss": 0.0763, "lr": 1.7175379227240523e-05, "epoch": 3.2247191011235956, "percentage": 64.48, "elapsed_time": "0:41:52", "remaining_time": "0:23:04", "throughput": 9393.94, "total_tokens": 23602136}
|
||||
{"current_steps": 217, "total_steps": 335, "loss": 0.0656, "lr": 1.692798988071385e-05, "epoch": 3.2397003745318353, "percentage": 64.78, "elapsed_time": "0:42:03", "remaining_time": "0:22:52", "throughput": 9392.32, "total_tokens": 23705952}
|
||||
{"current_steps": 218, "total_steps": 335, "loss": 0.1015, "lr": 1.6681479849243153e-05, "epoch": 3.254681647940075, "percentage": 65.07, "elapsed_time": "0:42:15", "remaining_time": "0:22:40", "throughput": 9395.14, "total_tokens": 23821824}
|
||||
{"current_steps": 219, "total_steps": 335, "loss": 0.1126, "lr": 1.6435875986112685e-05, "epoch": 3.2696629213483144, "percentage": 65.37, "elapsed_time": "0:42:27", "remaining_time": "0:22:29", "throughput": 9396.38, "total_tokens": 23933400}
|
||||
{"current_steps": 220, "total_steps": 335, "loss": 0.0704, "lr": 1.6191205045894283e-05, "epoch": 3.284644194756554, "percentage": 65.67, "elapsed_time": "0:42:38", "remaining_time": "0:22:17", "throughput": 9397.5, "total_tokens": 24044912}
|
||||
{"current_steps": 221, "total_steps": 335, "loss": 0.0695, "lr": 1.594749368153292e-05, "epoch": 3.299625468164794, "percentage": 65.97, "elapsed_time": "0:42:50", "remaining_time": "0:22:05", "throughput": 9402.14, "total_tokens": 24165512}
|
||||
{"current_steps": 222, "total_steps": 335, "loss": 0.0775, "lr": 1.570476844144329e-05, "epoch": 3.3146067415730336, "percentage": 66.27, "elapsed_time": "0:43:01", "remaining_time": "0:21:54", "throughput": 9399.66, "total_tokens": 24265384}
|
||||
{"current_steps": 223, "total_steps": 335, "loss": 0.0852, "lr": 1.546305576661776e-05, "epoch": 3.3295880149812733, "percentage": 66.57, "elapsed_time": "0:43:13", "remaining_time": "0:21:42", "throughput": 9399.1, "total_tokens": 24373048}
|
||||
{"current_steps": 224, "total_steps": 335, "loss": 0.0791, "lr": 1.5222381987746104e-05, "epoch": 3.344569288389513, "percentage": 66.87, "elapsed_time": "0:43:24", "remaining_time": "0:21:30", "throughput": 9399.95, "total_tokens": 24483840}
|
||||
{"current_steps": 225, "total_steps": 335, "loss": 0.0617, "lr": 1.4982773322347144e-05, "epoch": 3.359550561797753, "percentage": 67.16, "elapsed_time": "0:43:36", "remaining_time": "0:21:19", "throughput": 9399.39, "total_tokens": 24591096}
|
||||
{"current_steps": 225, "total_steps": 335, "eval_loss": 0.1583455204963684, "epoch": 3.359550561797753, "percentage": 67.16, "elapsed_time": "0:43:41", "remaining_time": "0:21:21", "throughput": 9381.69, "total_tokens": 24591096}
|
||||
{"current_steps": 226, "total_steps": 335, "loss": 0.0616, "lr": 1.4744255871912823e-05, "epoch": 3.3745318352059925, "percentage": 67.46, "elapsed_time": "0:43:52", "remaining_time": "0:21:09", "throughput": 9380.09, "total_tokens": 24690968}
|
||||
{"current_steps": 227, "total_steps": 335, "loss": 0.0903, "lr": 1.4506855619064846e-05, "epoch": 3.3895131086142323, "percentage": 67.76, "elapsed_time": "0:44:03", "remaining_time": "0:20:57", "throughput": 9380.74, "total_tokens": 24799096}
|
||||
{"current_steps": 228, "total_steps": 335, "loss": 0.0394, "lr": 1.4270598424724292e-05, "epoch": 3.404494382022472, "percentage": 68.06, "elapsed_time": "0:44:15", "remaining_time": "0:20:46", "throughput": 9381.52, "total_tokens": 24909896}
|
||||
{"current_steps": 229, "total_steps": 335, "loss": 0.0985, "lr": 1.4035510025294462e-05, "epoch": 3.4194756554307117, "percentage": 68.36, "elapsed_time": "0:44:26", "remaining_time": "0:20:34", "throughput": 9381.72, "total_tokens": 25020096}
|
||||
{"current_steps": 230, "total_steps": 335, "loss": 0.0929, "lr": 1.3801616029857378e-05, "epoch": 3.4344569288389515, "percentage": 68.66, "elapsed_time": "0:44:38", "remaining_time": "0:20:22", "throughput": 9383.87, "total_tokens": 25134904}
|
||||
{"current_steps": 231, "total_steps": 335, "loss": 0.0724, "lr": 1.3568941917384036e-05, "epoch": 3.449438202247191, "percentage": 68.96, "elapsed_time": "0:44:49", "remaining_time": "0:20:11", "throughput": 9382.56, "total_tokens": 25238032}
|
||||
{"current_steps": 232, "total_steps": 335, "loss": 0.0646, "lr": 1.3337513033958904e-05, "epoch": 3.464419475655431, "percentage": 69.25, "elapsed_time": "0:45:01", "remaining_time": "0:19:59", "throughput": 9382.3, "total_tokens": 25346080}
|
||||
{"current_steps": 233, "total_steps": 335, "loss": 0.0783, "lr": 1.310735459001884e-05, "epoch": 3.4794007490636703, "percentage": 69.55, "elapsed_time": "0:45:12", "remaining_time": "0:19:47", "throughput": 9383.26, "total_tokens": 25456760}
|
||||
{"current_steps": 234, "total_steps": 335, "loss": 0.0632, "lr": 1.2878491657606872e-05, "epoch": 3.49438202247191, "percentage": 69.85, "elapsed_time": "0:45:24", "remaining_time": "0:19:35", "throughput": 9384.81, "total_tokens": 25565392}
|
||||
{"current_steps": 235, "total_steps": 335, "loss": 0.0887, "lr": 1.2650949167640997e-05, "epoch": 3.5093632958801497, "percentage": 70.15, "elapsed_time": "0:45:35", "remaining_time": "0:19:24", "throughput": 9386.3, "total_tokens": 25678520}
|
||||
{"current_steps": 236, "total_steps": 335, "loss": 0.094, "lr": 1.2424751907198312e-05, "epoch": 3.5243445692883895, "percentage": 70.45, "elapsed_time": "0:45:47", "remaining_time": "0:19:12", "throughput": 9387.25, "total_tokens": 25789432}
|
||||
{"current_steps": 237, "total_steps": 335, "loss": 0.0623, "lr": 1.2199924516814939e-05, "epoch": 3.539325842696629, "percentage": 70.75, "elapsed_time": "0:45:58", "remaining_time": "0:19:00", "throughput": 9385.82, "total_tokens": 25893768}
|
||||
{"current_steps": 238, "total_steps": 335, "loss": 0.1051, "lr": 1.1976491487801748e-05, "epoch": 3.554307116104869, "percentage": 71.04, "elapsed_time": "0:46:10", "remaining_time": "0:18:49", "throughput": 9387.15, "total_tokens": 26005272}
|
||||
{"current_steps": 239, "total_steps": 335, "loss": 0.069, "lr": 1.1754477159576499e-05, "epoch": 3.5692883895131087, "percentage": 71.34, "elapsed_time": "0:46:21", "remaining_time": "0:18:37", "throughput": 9386.86, "total_tokens": 26112160}
|
||||
{"current_steps": 240, "total_steps": 335, "loss": 0.0561, "lr": 1.1533905717012424e-05, "epoch": 3.5842696629213484, "percentage": 71.64, "elapsed_time": "0:46:33", "remaining_time": "0:18:25", "throughput": 9389.35, "total_tokens": 26227496}
|
||||
{"current_steps": 241, "total_steps": 335, "loss": 0.0824, "lr": 1.1314801187803686e-05, "epoch": 3.599250936329588, "percentage": 71.94, "elapsed_time": "0:46:44", "remaining_time": "0:18:13", "throughput": 9386.62, "total_tokens": 26323944}
|
||||
{"current_steps": 242, "total_steps": 335, "loss": 0.083, "lr": 1.1097187439847939e-05, "epoch": 3.6142322097378274, "percentage": 72.24, "elapsed_time": "0:46:55", "remaining_time": "0:18:01", "throughput": 9385.38, "total_tokens": 26423816}
|
||||
{"current_steps": 243, "total_steps": 335, "loss": 0.0969, "lr": 1.088108817864629e-05, "epoch": 3.629213483146067, "percentage": 72.54, "elapsed_time": "0:47:07", "remaining_time": "0:17:50", "throughput": 9384.46, "total_tokens": 26530000}
|
||||
{"current_steps": 244, "total_steps": 335, "loss": 0.0487, "lr": 1.0666526944721016e-05, "epoch": 3.644194756554307, "percentage": 72.84, "elapsed_time": "0:47:18", "remaining_time": "0:17:38", "throughput": 9385.11, "total_tokens": 26639920}
|
||||
{"current_steps": 245, "total_steps": 335, "loss": 0.0861, "lr": 1.0453527111051184e-05, "epoch": 3.6591760299625467, "percentage": 73.13, "elapsed_time": "0:47:30", "remaining_time": "0:17:26", "throughput": 9387.82, "total_tokens": 26755952}
|
||||
{"current_steps": 246, "total_steps": 335, "loss": 0.0879, "lr": 1.0242111880526495e-05, "epoch": 3.6741573033707864, "percentage": 73.43, "elapsed_time": "0:47:41", "remaining_time": "0:17:15", "throughput": 9389.07, "total_tokens": 26867776}
|
||||
{"current_steps": 247, "total_steps": 335, "loss": 0.081, "lr": 1.003230428341979e-05, "epoch": 3.689138576779026, "percentage": 73.73, "elapsed_time": "0:47:53", "remaining_time": "0:17:03", "throughput": 9388.89, "total_tokens": 26975080}
|
||||
{"current_steps": 248, "total_steps": 335, "loss": 0.0758, "lr": 9.824127174878195e-06, "epoch": 3.704119850187266, "percentage": 74.03, "elapsed_time": "0:48:04", "remaining_time": "0:16:51", "throughput": 9390.54, "total_tokens": 27088208}
|
||||
{"current_steps": 249, "total_steps": 335, "loss": 0.1284, "lr": 9.617603232433475e-06, "epoch": 3.7191011235955056, "percentage": 74.33, "elapsed_time": "0:48:16", "remaining_time": "0:16:40", "throughput": 9391.56, "total_tokens": 27199040}
|
||||
{"current_steps": 250, "total_steps": 335, "loss": 0.0883, "lr": 9.412754953531663e-06, "epoch": 3.7340823970037453, "percentage": 74.63, "elapsed_time": "0:48:27", "remaining_time": "0:16:28", "throughput": 9391.3, "total_tokens": 27307192}
|
||||
{"current_steps": 250, "total_steps": 335, "eval_loss": 0.15280824899673462, "epoch": 3.7340823970037453, "percentage": 74.63, "elapsed_time": "0:48:32", "remaining_time": "0:16:30", "throughput": 9375.39, "total_tokens": 27307192}
|
||||
{"current_steps": 251, "total_steps": 335, "loss": 0.0618, "lr": 9.209604653082326e-06, "epoch": 3.749063670411985, "percentage": 74.93, "elapsed_time": "0:48:44", "remaining_time": "0:16:18", "throughput": 9377.03, "total_tokens": 27419216}
|
||||
{"current_steps": 252, "total_steps": 335, "loss": 0.0664, "lr": 9.008174461027724e-06, "epoch": 3.764044943820225, "percentage": 75.22, "elapsed_time": "0:48:55", "remaining_time": "0:16:06", "throughput": 9379.42, "total_tokens": 27534416}
|
||||
{"current_steps": 253, "total_steps": 335, "loss": 0.0691, "lr": 8.808486319932083e-06, "epoch": 3.7790262172284645, "percentage": 75.52, "elapsed_time": "0:49:07", "remaining_time": "0:15:55", "throughput": 9381.83, "total_tokens": 27650456}
|
||||
{"current_steps": 254, "total_steps": 335, "loss": 0.1072, "lr": 8.610561982591357e-06, "epoch": 3.7940074906367043, "percentage": 75.82, "elapsed_time": "0:49:18", "remaining_time": "0:15:43", "throughput": 9384.22, "total_tokens": 27766296}
|
||||
{"current_steps": 255, "total_steps": 335, "loss": 0.1113, "lr": 8.414423009663563e-06, "epoch": 3.808988764044944, "percentage": 76.12, "elapsed_time": "0:49:30", "remaining_time": "0:15:31", "throughput": 9385.2, "total_tokens": 27877960}
|
||||
{"current_steps": 256, "total_steps": 335, "loss": 0.0787, "lr": 8.220090767320137e-06, "epoch": 3.8239700374531838, "percentage": 76.42, "elapsed_time": "0:49:41", "remaining_time": "0:15:20", "throughput": 9387.25, "total_tokens": 27992400}
|
||||
{"current_steps": 257, "total_steps": 335, "loss": 0.0436, "lr": 8.027586424918412e-06, "epoch": 3.8389513108614235, "percentage": 76.72, "elapsed_time": "0:49:53", "remaining_time": "0:15:08", "throughput": 9386.94, "total_tokens": 28099232}
|
||||
{"current_steps": 258, "total_steps": 335, "loss": 0.0761, "lr": 7.836930952695533e-06, "epoch": 3.853932584269663, "percentage": 77.01, "elapsed_time": "0:50:04", "remaining_time": "0:14:56", "throughput": 9388.64, "total_tokens": 28212712}
|
||||
{"current_steps": 259, "total_steps": 335, "loss": 0.0876, "lr": 7.648145119484152e-06, "epoch": 3.8689138576779025, "percentage": 77.31, "elapsed_time": "0:50:16", "remaining_time": "0:14:45", "throughput": 9391.0, "total_tokens": 28327232}
|
||||
{"current_steps": 260, "total_steps": 335, "loss": 0.0689, "lr": 7.461249490449954e-06, "epoch": 3.8838951310861423, "percentage": 77.61, "elapsed_time": "0:50:28", "remaining_time": "0:14:33", "throughput": 9393.33, "total_tokens": 28444136}
|
||||
{"current_steps": 261, "total_steps": 335, "loss": 0.0934, "lr": 7.2762644248514255e-06, "epoch": 3.898876404494382, "percentage": 77.91, "elapsed_time": "0:50:39", "remaining_time": "0:14:21", "throughput": 9393.84, "total_tokens": 28553608}
|
||||
{"current_steps": 262, "total_steps": 335, "loss": 0.0616, "lr": 7.0932100738220265e-06, "epoch": 3.9138576779026217, "percentage": 78.21, "elapsed_time": "0:50:51", "remaining_time": "0:14:10", "throughput": 9391.95, "total_tokens": 28655944}
|
||||
{"current_steps": 263, "total_steps": 335, "loss": 0.0505, "lr": 6.912106378175098e-06, "epoch": 3.9288389513108615, "percentage": 78.51, "elapsed_time": "0:51:02", "remaining_time": "0:13:58", "throughput": 9393.85, "total_tokens": 28770240}
|
||||
{"current_steps": 264, "total_steps": 335, "loss": 0.0716, "lr": 6.732973066231563e-06, "epoch": 3.943820224719101, "percentage": 78.81, "elapsed_time": "0:51:14", "remaining_time": "0:13:46", "throughput": 9394.36, "total_tokens": 28879896}
|
||||
{"current_steps": 265, "total_steps": 335, "loss": 0.0925, "lr": 6.555829651670911e-06, "epoch": 3.958801498127341, "percentage": 79.1, "elapsed_time": "0:51:25", "remaining_time": "0:13:35", "throughput": 9392.0, "total_tokens": 28979616}
|
||||
{"current_steps": 266, "total_steps": 335, "loss": 0.082, "lr": 6.380695431405453e-06, "epoch": 3.9737827715355807, "percentage": 79.4, "elapsed_time": "0:51:37", "remaining_time": "0:13:23", "throughput": 9394.61, "total_tokens": 29095336}
|
||||
{"current_steps": 267, "total_steps": 335, "loss": 0.1735, "lr": 6.207589483478266e-06, "epoch": 3.98876404494382, "percentage": 79.7, "elapsed_time": "0:51:48", "remaining_time": "0:13:11", "throughput": 9393.51, "total_tokens": 29200208}
|
||||
{"current_steps": 268, "total_steps": 335, "loss": 0.0554, "lr": 6.0365306649849214e-06, "epoch": 4.0, "percentage": 80.0, "elapsed_time": "0:51:58", "remaining_time": "0:12:59", "throughput": 9390.22, "total_tokens": 29282608}
|
||||
{"current_steps": 269, "total_steps": 335, "loss": 0.0374, "lr": 5.867537610019317e-06, "epoch": 4.01498127340824, "percentage": 80.3, "elapsed_time": "0:52:09", "remaining_time": "0:12:47", "throughput": 9390.42, "total_tokens": 29391848}
|
||||
{"current_steps": 270, "total_steps": 335, "loss": 0.0644, "lr": 5.700628727643806e-06, "epoch": 4.0299625468164795, "percentage": 80.6, "elapsed_time": "0:52:21", "remaining_time": "0:12:36", "throughput": 9392.65, "total_tokens": 29507360}
|
||||
{"current_steps": 271, "total_steps": 335, "loss": 0.0621, "lr": 5.53582219988382e-06, "epoch": 4.044943820224719, "percentage": 80.9, "elapsed_time": "0:52:33", "remaining_time": "0:12:24", "throughput": 9390.32, "total_tokens": 29607936}
|
||||
{"current_steps": 272, "total_steps": 335, "loss": 0.0525, "lr": 5.373135979747227e-06, "epoch": 4.059925093632959, "percentage": 81.19, "elapsed_time": "0:52:44", "remaining_time": "0:12:12", "throughput": 9389.68, "total_tokens": 29710240}
|
||||
{"current_steps": 273, "total_steps": 335, "loss": 0.072, "lr": 5.2125877892686496e-06, "epoch": 4.074906367041199, "percentage": 81.49, "elapsed_time": "0:52:55", "remaining_time": "0:12:01", "throughput": 9390.09, "total_tokens": 29819600}
|
||||
{"current_steps": 274, "total_steps": 335, "loss": 0.1253, "lr": 5.054195117578914e-06, "epoch": 4.089887640449438, "percentage": 81.79, "elapsed_time": "0:53:07", "remaining_time": "0:11:49", "throughput": 9390.34, "total_tokens": 29927712}
|
||||
{"current_steps": 275, "total_steps": 335, "loss": 0.0516, "lr": 4.897975218999926e-06, "epoch": 4.104868913857678, "percentage": 82.09, "elapsed_time": "0:53:18", "remaining_time": "0:11:37", "throughput": 9390.6, "total_tokens": 30036912}
|
||||
{"current_steps": 275, "total_steps": 335, "eval_loss": 0.148418128490448, "epoch": 4.104868913857678, "percentage": 82.09, "elapsed_time": "0:53:23", "remaining_time": "0:11:38", "throughput": 9376.15, "total_tokens": 30036912}
|
||||
{"current_steps": 276, "total_steps": 335, "loss": 0.0597, "lr": 4.743945111165068e-06, "epoch": 4.119850187265918, "percentage": 82.39, "elapsed_time": "0:53:35", "remaining_time": "0:11:27", "throughput": 9375.27, "total_tokens": 30142632}
|
||||
{"current_steps": 277, "total_steps": 335, "loss": 0.0481, "lr": 4.592121573165414e-06, "epoch": 4.134831460674158, "percentage": 82.69, "elapsed_time": "0:53:46", "remaining_time": "0:11:15", "throughput": 9374.82, "total_tokens": 30249816}
|
||||
{"current_steps": 278, "total_steps": 335, "loss": 0.0528, "lr": 4.442521143721892e-06, "epoch": 4.149812734082397, "percentage": 82.99, "elapsed_time": "0:53:58", "remaining_time": "0:11:03", "throughput": 9375.52, "total_tokens": 30360248}
|
||||
{"current_steps": 279, "total_steps": 335, "loss": 0.0558, "lr": 4.295160119383712e-06, "epoch": 4.164794007490637, "percentage": 83.28, "elapsed_time": "0:54:09", "remaining_time": "0:10:52", "throughput": 9375.14, "total_tokens": 30466592}
|
||||
{"current_steps": 280, "total_steps": 335, "loss": 0.0739, "lr": 4.150054552753055e-06, "epoch": 4.179775280898877, "percentage": 83.58, "elapsed_time": "0:54:21", "remaining_time": "0:10:40", "throughput": 9373.16, "total_tokens": 30567952}
|
||||
{"current_steps": 281, "total_steps": 335, "loss": 0.059, "lr": 4.007220250736454e-06, "epoch": 4.194756554307116, "percentage": 83.88, "elapsed_time": "0:54:32", "remaining_time": "0:10:28", "throughput": 9372.82, "total_tokens": 30674984}
|
||||
{"current_steps": 282, "total_steps": 335, "loss": 0.0275, "lr": 3.866672772822863e-06, "epoch": 4.209737827715355, "percentage": 84.18, "elapsed_time": "0:54:44", "remaining_time": "0:10:17", "throughput": 9375.22, "total_tokens": 30791864}
|
||||
{"current_steps": 283, "total_steps": 335, "loss": 0.041, "lr": 3.728427429388709e-06, "epoch": 4.224719101123595, "percentage": 84.48, "elapsed_time": "0:54:56", "remaining_time": "0:10:05", "throughput": 9377.5, "total_tokens": 30908384}
|
||||
{"current_steps": 284, "total_steps": 335, "loss": 0.0492, "lr": 3.592499280030057e-06, "epoch": 4.239700374531835, "percentage": 84.78, "elapsed_time": "0:55:07", "remaining_time": "0:09:53", "throughput": 9379.52, "total_tokens": 31023848}
|
||||
{"current_steps": 285, "total_steps": 335, "loss": 0.0555, "lr": 3.458903131922134e-06, "epoch": 4.254681647940075, "percentage": 85.07, "elapsed_time": "0:55:19", "remaining_time": "0:09:42", "throughput": 9380.89, "total_tokens": 31137384}
|
||||
{"current_steps": 286, "total_steps": 335, "loss": 0.0493, "lr": 3.3276535382063183e-06, "epoch": 4.269662921348314, "percentage": 85.37, "elapsed_time": "0:55:30", "remaining_time": "0:09:30", "throughput": 9380.65, "total_tokens": 31244936}
|
||||
{"current_steps": 287, "total_steps": 335, "loss": 0.0492, "lr": 3.198764796404807e-06, "epoch": 4.284644194756554, "percentage": 85.67, "elapsed_time": "0:55:42", "remaining_time": "0:09:18", "throughput": 9381.5, "total_tokens": 31355616}
|
||||
{"current_steps": 288, "total_steps": 335, "loss": 0.0649, "lr": 3.0722509468631392e-06, "epoch": 4.299625468164794, "percentage": 85.97, "elapsed_time": "0:55:53", "remaining_time": "0:09:07", "throughput": 9382.0, "total_tokens": 31463648}
|
||||
{"current_steps": 289, "total_steps": 335, "loss": 0.0481, "lr": 2.948125771220697e-06, "epoch": 4.314606741573034, "percentage": 86.27, "elapsed_time": "0:56:05", "remaining_time": "0:08:55", "throughput": 9383.06, "total_tokens": 31577056}
|
||||
{"current_steps": 290, "total_steps": 335, "loss": 0.0455, "lr": 2.8264027909094715e-06, "epoch": 4.329588014981273, "percentage": 86.57, "elapsed_time": "0:56:16", "remaining_time": "0:08:43", "throughput": 9382.39, "total_tokens": 31682424}
|
||||
{"current_steps": 291, "total_steps": 335, "loss": 0.0588, "lr": 2.707095265681081e-06, "epoch": 4.344569288389513, "percentage": 86.87, "elapsed_time": "0:56:28", "remaining_time": "0:08:32", "throughput": 9382.37, "total_tokens": 31790168}
|
||||
{"current_steps": 292, "total_steps": 335, "loss": 0.0553, "lr": 2.5902161921623454e-06, "epoch": 4.359550561797753, "percentage": 87.16, "elapsed_time": "0:56:39", "remaining_time": "0:08:20", "throughput": 9384.36, "total_tokens": 31905520}
|
||||
{"current_steps": 293, "total_steps": 335, "loss": 0.0452, "lr": 2.475778302439524e-06, "epoch": 4.3745318352059925, "percentage": 87.46, "elapsed_time": "0:56:51", "remaining_time": "0:08:09", "throughput": 9385.92, "total_tokens": 32020200}
|
||||
{"current_steps": 294, "total_steps": 335, "loss": 0.0707, "lr": 2.3637940626713346e-06, "epoch": 4.389513108614232, "percentage": 87.76, "elapsed_time": "0:57:02", "remaining_time": "0:07:57", "throughput": 9386.46, "total_tokens": 32129744}
|
||||
{"current_steps": 295, "total_steps": 335, "loss": 0.0611, "lr": 2.254275671731007e-06, "epoch": 4.404494382022472, "percentage": 88.06, "elapsed_time": "0:57:14", "remaining_time": "0:07:45", "throughput": 9388.99, "total_tokens": 32247024}
|
||||
{"current_steps": 296, "total_steps": 335, "loss": 0.058, "lr": 2.14723505987737e-06, "epoch": 4.419475655430712, "percentage": 88.36, "elapsed_time": "0:57:26", "remaining_time": "0:07:34", "throughput": 9390.79, "total_tokens": 32361392}
|
||||
{"current_steps": 297, "total_steps": 335, "loss": 0.0571, "lr": 2.0426838874552714e-06, "epoch": 4.4344569288389515, "percentage": 88.66, "elapsed_time": "0:57:37", "remaining_time": "0:07:22", "throughput": 9390.72, "total_tokens": 32469248}
|
||||
{"current_steps": 298, "total_steps": 335, "loss": 0.0364, "lr": 1.9406335436253724e-06, "epoch": 4.449438202247191, "percentage": 88.96, "elapsed_time": "0:57:49", "remaining_time": "0:07:10", "throughput": 9391.9, "total_tokens": 32582736}
|
||||
{"current_steps": 299, "total_steps": 335, "loss": 0.034, "lr": 1.8410951451234533e-06, "epoch": 4.464419475655431, "percentage": 89.25, "elapsed_time": "0:58:00", "remaining_time": "0:06:59", "throughput": 9392.47, "total_tokens": 32691704}
|
||||
{"current_steps": 300, "total_steps": 335, "loss": 0.0675, "lr": 1.7440795350494588e-06, "epoch": 4.479400749063671, "percentage": 89.55, "elapsed_time": "0:58:12", "remaining_time": "0:06:47", "throughput": 9394.63, "total_tokens": 32807520}
|
||||
{"current_steps": 300, "total_steps": 335, "eval_loss": 0.14898425340652466, "epoch": 4.479400749063671, "percentage": 89.55, "elapsed_time": "0:58:17", "remaining_time": "0:06:47", "throughput": 9381.38, "total_tokens": 32807520}
|
||||
{"current_steps": 301, "total_steps": 335, "loss": 0.0563, "lr": 1.649597281686302e-06, "epoch": 4.49438202247191, "percentage": 89.85, "elapsed_time": "0:58:28", "remaining_time": "0:06:36", "throughput": 9382.01, "total_tokens": 32917472}
|
||||
{"current_steps": 302, "total_steps": 335, "loss": 0.0582, "lr": 1.5576586773486195e-06, "epoch": 4.50936329588015, "percentage": 90.15, "elapsed_time": "0:58:39", "remaining_time": "0:06:24", "throughput": 9382.63, "total_tokens": 33026552}
|
||||
{"current_steps": 303, "total_steps": 335, "loss": 0.048, "lr": 1.4682737372615967e-06, "epoch": 4.52434456928839, "percentage": 90.45, "elapsed_time": "0:58:51", "remaining_time": "0:06:12", "throughput": 9383.41, "total_tokens": 33135312}
|
||||
{"current_steps": 304, "total_steps": 335, "loss": 0.0556, "lr": 1.3814521984699596e-06, "epoch": 4.539325842696629, "percentage": 90.75, "elapsed_time": "0:59:02", "remaining_time": "0:06:01", "throughput": 9385.02, "total_tokens": 33249640}
|
||||
{"current_steps": 305, "total_steps": 335, "loss": 0.0427, "lr": 1.297203518777293e-06, "epoch": 4.554307116104869, "percentage": 91.04, "elapsed_time": "0:59:14", "remaining_time": "0:05:49", "throughput": 9385.22, "total_tokens": 33356584}
|
||||
{"current_steps": 306, "total_steps": 335, "loss": 0.095, "lr": 1.2155368757157643e-06, "epoch": 4.569288389513108, "percentage": 91.34, "elapsed_time": "0:59:25", "remaining_time": "0:05:37", "throughput": 9385.24, "total_tokens": 33465096}
|
||||
{"current_steps": 307, "total_steps": 335, "loss": 0.0329, "lr": 1.1364611655463736e-06, "epoch": 4.584269662921348, "percentage": 91.64, "elapsed_time": "0:59:37", "remaining_time": "0:05:26", "throughput": 9389.71, "total_tokens": 33589904}
|
||||
{"current_steps": 308, "total_steps": 335, "loss": 0.048, "lr": 1.0599850022898539e-06, "epoch": 4.599250936329588, "percentage": 91.94, "elapsed_time": "0:59:48", "remaining_time": "0:05:14", "throughput": 9388.76, "total_tokens": 33693528}
|
||||
{"current_steps": 309, "total_steps": 335, "loss": 0.0709, "lr": 9.861167167883046e-07, "epoch": 4.614232209737827, "percentage": 92.24, "elapsed_time": "1:00:00", "remaining_time": "0:05:02", "throughput": 9389.03, "total_tokens": 33800928}
|
||||
{"current_steps": 310, "total_steps": 335, "loss": 0.0807, "lr": 9.148643557976955e-07, "epoch": 4.629213483146067, "percentage": 92.54, "elapsed_time": "1:00:11", "remaining_time": "0:04:51", "throughput": 9388.49, "total_tokens": 33904464}
|
||||
{"current_steps": 311, "total_steps": 335, "loss": 0.0501, "lr": 8.462356811112987e-07, "epoch": 4.644194756554307, "percentage": 92.84, "elapsed_time": "1:00:22", "remaining_time": "0:04:39", "throughput": 9391.16, "total_tokens": 34020608}
|
||||
{"current_steps": 312, "total_steps": 335, "loss": 0.0499, "lr": 7.802381687141535e-07, "epoch": 4.659176029962547, "percentage": 93.13, "elapsed_time": "1:00:34", "remaining_time": "0:04:27", "throughput": 9391.31, "total_tokens": 34129480}
|
||||
{"current_steps": 313, "total_steps": 335, "loss": 0.086, "lr": 7.168790079686932e-07, "epoch": 4.674157303370786, "percentage": 93.43, "elapsed_time": "1:00:45", "remaining_time": "0:04:16", "throughput": 9389.83, "total_tokens": 34229672}
|
||||
{"current_steps": 314, "total_steps": 335, "loss": 0.0711, "lr": 6.561651008315738e-07, "epoch": 4.689138576779026, "percentage": 93.73, "elapsed_time": "1:00:56", "remaining_time": "0:04:04", "throughput": 9390.0, "total_tokens": 34335640}
|
||||
{"current_steps": 315, "total_steps": 335, "loss": 0.0417, "lr": 5.981030611018234e-07, "epoch": 4.704119850187266, "percentage": 94.03, "elapsed_time": "1:01:07", "remaining_time": "0:03:52", "throughput": 9387.62, "total_tokens": 34431984}
|
||||
{"current_steps": 316, "total_steps": 335, "loss": 0.0668, "lr": 5.426992137003622e-07, "epoch": 4.719101123595506, "percentage": 94.33, "elapsed_time": "1:01:19", "remaining_time": "0:03:41", "throughput": 9389.27, "total_tokens": 34547560}
|
||||
{"current_steps": 317, "total_steps": 335, "loss": 0.0582, "lr": 4.899595939810236e-07, "epoch": 4.734082397003745, "percentage": 94.63, "elapsed_time": "1:01:30", "remaining_time": "0:03:29", "throughput": 9389.25, "total_tokens": 34651384}
|
||||
{"current_steps": 318, "total_steps": 335, "loss": 0.0559, "lr": 4.398899470730827e-07, "epoch": 4.749063670411985, "percentage": 94.93, "elapsed_time": "1:01:42", "remaining_time": "0:03:17", "throughput": 9387.95, "total_tokens": 34759152}
|
||||
{"current_steps": 319, "total_steps": 335, "loss": 0.0529, "lr": 3.9249572725543196e-07, "epoch": 4.764044943820225, "percentage": 95.22, "elapsed_time": "1:01:54", "remaining_time": "0:03:06", "throughput": 9388.73, "total_tokens": 34874632}
|
||||
{"current_steps": 320, "total_steps": 335, "loss": 0.0524, "lr": 3.477820973624063e-07, "epoch": 4.7790262172284645, "percentage": 95.52, "elapsed_time": "1:02:06", "remaining_time": "0:02:54", "throughput": 9389.16, "total_tokens": 34988104}
|
||||
{"current_steps": 321, "total_steps": 335, "loss": 0.0521, "lr": 3.0575392822139726e-07, "epoch": 4.794007490636704, "percentage": 95.82, "elapsed_time": "1:02:18", "remaining_time": "0:02:43", "throughput": 9388.55, "total_tokens": 35096592}
|
||||
{"current_steps": 322, "total_steps": 335, "loss": 0.0796, "lr": 2.664157981222437e-07, "epoch": 4.808988764044944, "percentage": 96.12, "elapsed_time": "1:02:30", "remaining_time": "0:02:31", "throughput": 9389.32, "total_tokens": 35211304}
|
||||
{"current_steps": 323, "total_steps": 335, "loss": 0.0674, "lr": 2.297719923185032e-07, "epoch": 4.823970037453184, "percentage": 96.42, "elapsed_time": "1:02:41", "remaining_time": "0:02:19", "throughput": 9390.15, "total_tokens": 35323056}
|
||||
{"current_steps": 324, "total_steps": 335, "loss": 0.0803, "lr": 1.9582650256064205e-07, "epoch": 4.8389513108614235, "percentage": 96.72, "elapsed_time": "1:02:53", "remaining_time": "0:02:08", "throughput": 9390.72, "total_tokens": 35436552}
|
||||
{"current_steps": 325, "total_steps": 335, "loss": 0.0626, "lr": 1.645830266611914e-07, "epoch": 4.853932584269663, "percentage": 97.01, "elapsed_time": "1:03:05", "remaining_time": "0:01:56", "throughput": 9390.88, "total_tokens": 35549872}
|
||||
{"current_steps": 325, "total_steps": 335, "eval_loss": 0.14768485724925995, "epoch": 4.853932584269663, "percentage": 97.01, "elapsed_time": "1:03:10", "remaining_time": "0:01:56", "throughput": 9378.65, "total_tokens": 35549872}
|
||||
{"current_steps": 326, "total_steps": 335, "loss": 0.0551, "lr": 1.3604496809195288e-07, "epoch": 4.868913857677903, "percentage": 97.31, "elapsed_time": "1:03:22", "remaining_time": "0:01:44", "throughput": 9378.41, "total_tokens": 35659600}
|
||||
{"current_steps": 327, "total_steps": 335, "loss": 0.0536, "lr": 1.1021543561322012e-07, "epoch": 4.883895131086143, "percentage": 97.61, "elapsed_time": "1:03:34", "remaining_time": "0:01:33", "throughput": 9378.21, "total_tokens": 35770904}
|
||||
{"current_steps": 328, "total_steps": 335, "loss": 0.0664, "lr": 8.709724293513854e-08, "epoch": 4.898876404494382, "percentage": 97.91, "elapsed_time": "1:03:46", "remaining_time": "0:01:21", "throughput": 9377.38, "total_tokens": 35879784}
|
||||
{"current_steps": 329, "total_steps": 335, "loss": 0.0641, "lr": 6.66929084112089e-08, "epoch": 4.913857677902621, "percentage": 98.21, "elapsed_time": "1:03:58", "remaining_time": "0:01:09", "throughput": 9376.47, "total_tokens": 35988344}
|
||||
{"current_steps": 330, "total_steps": 335, "loss": 0.0624, "lr": 4.900465476393168e-08, "epoch": 4.928838951310862, "percentage": 98.51, "elapsed_time": "1:04:10", "remaining_time": "0:00:58", "throughput": 9374.74, "total_tokens": 36093032}
|
||||
{"current_steps": 331, "total_steps": 335, "loss": 0.0484, "lr": 3.403440884269526e-08, "epoch": 4.943820224719101, "percentage": 98.81, "elapsed_time": "1:04:21", "remaining_time": "0:00:46", "throughput": 9373.49, "total_tokens": 36199864}
|
||||
{"current_steps": 332, "total_steps": 335, "loss": 0.0649, "lr": 2.1783801413866046e-08, "epoch": 4.9588014981273405, "percentage": 99.1, "elapsed_time": "1:04:33", "remaining_time": "0:00:35", "throughput": 9371.2, "total_tokens": 36302712}
|
||||
{"current_steps": 333, "total_steps": 335, "loss": 0.0684, "lr": 1.2254166983152737e-08, "epoch": 4.97378277153558, "percentage": 99.4, "elapsed_time": "1:04:45", "remaining_time": "0:00:23", "throughput": 9371.42, "total_tokens": 36412088}
|
||||
{"current_steps": 334, "total_steps": 335, "loss": 0.0744, "lr": 5.446543650219904e-09, "epoch": 4.98876404494382, "percentage": 99.7, "elapsed_time": "1:04:57", "remaining_time": "0:00:11", "throughput": 9371.29, "total_tokens": 36523328}
|
||||
{"current_steps": 335, "total_steps": 335, "loss": 0.0815, "lr": 1.3616729956228425e-09, "epoch": 5.0, "percentage": 100.0, "elapsed_time": "1:05:01", "remaining_time": "0:00:00", "throughput": 9380.01, "total_tokens": 36600520}
|
||||
{"current_steps": 335, "total_steps": 335, "epoch": 5.0, "percentage": 100.0, "elapsed_time": "1:06:56", "remaining_time": "0:00:00", "throughput": 9113.08, "total_tokens": 36600520}
|
||||
3524
result_model/trainer_state.json
Normal file
3524
result_model/trainer_state.json
Normal file
File diff suppressed because it is too large
Load Diff
3
result_model/training_args.bin
Normal file
3
result_model/training_args.bin
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:81cc1d5fde6260609814bcc0a743a85a34008732b524ae0f2211452a4ef21d71
|
||||
size 7736
|
||||
BIN
result_model/training_eval_accuracy.png
Normal file
BIN
result_model/training_eval_accuracy.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 39 KiB |
BIN
result_model/training_eval_loss.png
Normal file
BIN
result_model/training_eval_loss.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 40 KiB |
BIN
result_model/training_loss.png
Normal file
BIN
result_model/training_loss.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 66 KiB |
1
result_model/vocab.json
Normal file
1
result_model/vocab.json
Normal file
File diff suppressed because one or more lines are too long
31
special_tokens_map.json
Normal file
31
special_tokens_map.json
Normal file
@@ -0,0 +1,31 @@
|
||||
{
|
||||
"additional_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>",
|
||||
"<|object_ref_start|>",
|
||||
"<|object_ref_end|>",
|
||||
"<|box_start|>",
|
||||
"<|box_end|>",
|
||||
"<|quad_start|>",
|
||||
"<|quad_end|>",
|
||||
"<|vision_start|>",
|
||||
"<|vision_end|>",
|
||||
"<|vision_pad|>",
|
||||
"<|image_pad|>",
|
||||
"<|video_pad|>"
|
||||
],
|
||||
"eos_token": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
BIN
tokenizer.json
(Stored with Git LFS)
Normal file
BIN
tokenizer.json
(Stored with Git LFS)
Normal file
Binary file not shown.
240
tokenizer_config.json
Normal file
240
tokenizer_config.json
Normal file
@@ -0,0 +1,240 @@
|
||||
{
|
||||
"add_bos_token": false,
|
||||
"add_prefix_space": false,
|
||||
"added_tokens_decoder": {
|
||||
"151643": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151644": {
|
||||
"content": "<|im_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151645": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151646": {
|
||||
"content": "<|object_ref_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151647": {
|
||||
"content": "<|object_ref_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151648": {
|
||||
"content": "<|box_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151649": {
|
||||
"content": "<|box_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151650": {
|
||||
"content": "<|quad_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151651": {
|
||||
"content": "<|quad_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151652": {
|
||||
"content": "<|vision_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151653": {
|
||||
"content": "<|vision_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151654": {
|
||||
"content": "<|vision_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151655": {
|
||||
"content": "<|image_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151656": {
|
||||
"content": "<|video_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151657": {
|
||||
"content": "<tool_call>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151658": {
|
||||
"content": "</tool_call>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151659": {
|
||||
"content": "<|fim_prefix|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151660": {
|
||||
"content": "<|fim_middle|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151661": {
|
||||
"content": "<|fim_suffix|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151662": {
|
||||
"content": "<|fim_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151663": {
|
||||
"content": "<|repo_name|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151664": {
|
||||
"content": "<|file_sep|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151665": {
|
||||
"content": "<tool_response>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151666": {
|
||||
"content": "</tool_response>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151667": {
|
||||
"content": "<think>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151668": {
|
||||
"content": "</think>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
}
|
||||
},
|
||||
"additional_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>",
|
||||
"<|object_ref_start|>",
|
||||
"<|object_ref_end|>",
|
||||
"<|box_start|>",
|
||||
"<|box_end|>",
|
||||
"<|quad_start|>",
|
||||
"<|quad_end|>",
|
||||
"<|vision_start|>",
|
||||
"<|vision_end|>",
|
||||
"<|vision_pad|>",
|
||||
"<|image_pad|>",
|
||||
"<|video_pad|>"
|
||||
],
|
||||
"bos_token": null,
|
||||
"clean_up_tokenization_spaces": false,
|
||||
"eos_token": "<|im_end|>",
|
||||
"errors": "replace",
|
||||
"extra_special_tokens": {},
|
||||
"model_max_length": 131072,
|
||||
"pad_token": "<|endoftext|>",
|
||||
"padding_side": "right",
|
||||
"split_special_tokens": false,
|
||||
"tokenizer_class": "Qwen2Tokenizer",
|
||||
"unk_token": null
|
||||
}
|
||||
9
train_results.json
Normal file
9
train_results.json
Normal file
@@ -0,0 +1,9 @@
|
||||
{
|
||||
"epoch": 5.0,
|
||||
"num_input_tokens_seen": 36600520,
|
||||
"total_flos": 1.6620454705385964e+18,
|
||||
"train_loss": 0.11312562568641421,
|
||||
"train_runtime": 4017.5693,
|
||||
"train_samples_per_second": 1.325,
|
||||
"train_steps_per_second": 0.083
|
||||
}
|
||||
349
trainer_log.jsonl
Normal file
349
trainer_log.jsonl
Normal file
@@ -0,0 +1,349 @@
|
||||
{"current_steps": 1, "total_steps": 335, "loss": 0.287, "lr": 0.0, "epoch": 0.0149812734082397, "percentage": 0.3, "elapsed_time": "0:00:23", "remaining_time": "2:08:45", "throughput": 4592.7, "total_tokens": 106224}
|
||||
{"current_steps": 2, "total_steps": 335, "loss": 0.1593, "lr": 1.4705882352941177e-06, "epoch": 0.0299625468164794, "percentage": 0.6, "elapsed_time": "0:00:33", "remaining_time": "1:32:45", "throughput": 6653.32, "total_tokens": 222384}
|
||||
{"current_steps": 3, "total_steps": 335, "loss": 0.1572, "lr": 2.9411764705882355e-06, "epoch": 0.0449438202247191, "percentage": 0.9, "elapsed_time": "0:00:43", "remaining_time": "1:20:39", "throughput": 7608.07, "total_tokens": 332728}
|
||||
{"current_steps": 4, "total_steps": 335, "loss": 0.2334, "lr": 4.411764705882353e-06, "epoch": 0.0599250936329588, "percentage": 1.19, "elapsed_time": "0:00:54", "remaining_time": "1:14:38", "throughput": 8126.02, "total_tokens": 439824}
|
||||
{"current_steps": 5, "total_steps": 335, "loss": 0.0885, "lr": 5.882352941176471e-06, "epoch": 0.0749063670411985, "percentage": 1.49, "elapsed_time": "0:01:04", "remaining_time": "1:11:09", "throughput": 8546.79, "total_tokens": 552832}
|
||||
{"current_steps": 6, "total_steps": 335, "loss": 0.1054, "lr": 7.3529411764705884e-06, "epoch": 0.0898876404494382, "percentage": 1.79, "elapsed_time": "0:01:16", "remaining_time": "1:09:59", "throughput": 8709.68, "total_tokens": 667008}
|
||||
{"current_steps": 7, "total_steps": 335, "loss": 0.2192, "lr": 8.823529411764707e-06, "epoch": 0.10486891385767791, "percentage": 2.09, "elapsed_time": "0:01:28", "remaining_time": "1:08:46", "throughput": 8854.25, "total_tokens": 779728}
|
||||
{"current_steps": 8, "total_steps": 335, "loss": 0.1713, "lr": 1.0294117647058824e-05, "epoch": 0.1198501872659176, "percentage": 2.39, "elapsed_time": "0:01:39", "remaining_time": "1:07:57", "throughput": 8955.12, "total_tokens": 893296}
|
||||
{"current_steps": 9, "total_steps": 335, "loss": 0.1316, "lr": 1.1764705882352942e-05, "epoch": 0.1348314606741573, "percentage": 2.69, "elapsed_time": "0:01:51", "remaining_time": "1:07:23", "throughput": 9051.15, "total_tokens": 1010424}
|
||||
{"current_steps": 10, "total_steps": 335, "loss": 0.0925, "lr": 1.323529411764706e-05, "epoch": 0.149812734082397, "percentage": 2.99, "elapsed_time": "0:02:03", "remaining_time": "1:06:56", "throughput": 9082.97, "total_tokens": 1122608}
|
||||
{"current_steps": 11, "total_steps": 335, "loss": 0.1682, "lr": 1.4705882352941177e-05, "epoch": 0.1647940074906367, "percentage": 3.28, "elapsed_time": "0:02:14", "remaining_time": "1:06:15", "throughput": 9131.41, "total_tokens": 1232432}
|
||||
{"current_steps": 12, "total_steps": 335, "loss": 0.1501, "lr": 1.6176470588235296e-05, "epoch": 0.1797752808988764, "percentage": 3.58, "elapsed_time": "0:02:26", "remaining_time": "1:05:40", "throughput": 9163.0, "total_tokens": 1341448}
|
||||
{"current_steps": 13, "total_steps": 335, "loss": 0.1541, "lr": 1.7647058823529414e-05, "epoch": 0.1947565543071161, "percentage": 3.88, "elapsed_time": "0:02:37", "remaining_time": "1:05:00", "throughput": 9135.17, "total_tokens": 1438440}
|
||||
{"current_steps": 14, "total_steps": 335, "loss": 0.1972, "lr": 1.9117647058823528e-05, "epoch": 0.20973782771535582, "percentage": 4.18, "elapsed_time": "0:02:49", "remaining_time": "1:04:35", "throughput": 9154.92, "total_tokens": 1547408}
|
||||
{"current_steps": 15, "total_steps": 335, "loss": 0.1355, "lr": 2.058823529411765e-05, "epoch": 0.2247191011235955, "percentage": 4.48, "elapsed_time": "0:03:00", "remaining_time": "1:04:09", "throughput": 9200.34, "total_tokens": 1660336}
|
||||
{"current_steps": 16, "total_steps": 335, "loss": 0.1175, "lr": 2.2058823529411766e-05, "epoch": 0.2397003745318352, "percentage": 4.78, "elapsed_time": "0:03:11", "remaining_time": "1:03:47", "throughput": 9245.29, "total_tokens": 1774704}
|
||||
{"current_steps": 17, "total_steps": 335, "loss": 0.2153, "lr": 2.3529411764705884e-05, "epoch": 0.2546816479400749, "percentage": 5.07, "elapsed_time": "0:03:23", "remaining_time": "1:03:24", "throughput": 9277.16, "total_tokens": 1887056}
|
||||
{"current_steps": 18, "total_steps": 335, "loss": 0.1604, "lr": 2.5e-05, "epoch": 0.2696629213483146, "percentage": 5.37, "elapsed_time": "0:03:34", "remaining_time": "1:03:02", "throughput": 9290.02, "total_tokens": 1995456}
|
||||
{"current_steps": 19, "total_steps": 335, "loss": 0.1922, "lr": 2.647058823529412e-05, "epoch": 0.2846441947565543, "percentage": 5.67, "elapsed_time": "0:03:46", "remaining_time": "1:02:43", "throughput": 9333.75, "total_tokens": 2111824}
|
||||
{"current_steps": 20, "total_steps": 335, "loss": 0.2839, "lr": 2.7941176470588236e-05, "epoch": 0.299625468164794, "percentage": 5.97, "elapsed_time": "0:03:57", "remaining_time": "1:02:22", "throughput": 9333.97, "total_tokens": 2217760}
|
||||
{"current_steps": 21, "total_steps": 335, "loss": 0.1694, "lr": 2.9411764705882354e-05, "epoch": 0.3146067415730337, "percentage": 6.27, "elapsed_time": "0:04:08", "remaining_time": "1:02:02", "throughput": 9365.2, "total_tokens": 2331736}
|
||||
{"current_steps": 22, "total_steps": 335, "loss": 0.0844, "lr": 3.0882352941176475e-05, "epoch": 0.3295880149812734, "percentage": 6.57, "elapsed_time": "0:04:20", "remaining_time": "1:01:42", "throughput": 9362.17, "total_tokens": 2436696}
|
||||
{"current_steps": 23, "total_steps": 335, "loss": 0.1601, "lr": 3.235294117647059e-05, "epoch": 0.3445692883895131, "percentage": 6.87, "elapsed_time": "0:04:31", "remaining_time": "1:01:23", "throughput": 9390.54, "total_tokens": 2549584}
|
||||
{"current_steps": 24, "total_steps": 335, "loss": 0.1835, "lr": 3.382352941176471e-05, "epoch": 0.3595505617977528, "percentage": 7.16, "elapsed_time": "0:04:42", "remaining_time": "1:01:05", "throughput": 9387.07, "total_tokens": 2655016}
|
||||
{"current_steps": 25, "total_steps": 335, "loss": 0.2083, "lr": 3.529411764705883e-05, "epoch": 0.37453183520599254, "percentage": 7.46, "elapsed_time": "0:04:54", "remaining_time": "1:00:48", "throughput": 9385.24, "total_tokens": 2761784}
|
||||
{"current_steps": 25, "total_steps": 335, "eval_loss": 0.22450505197048187, "epoch": 0.37453183520599254, "percentage": 7.46, "elapsed_time": "0:04:59", "remaining_time": "1:01:50", "throughput": 9230.07, "total_tokens": 2761784}
|
||||
{"current_steps": 26, "total_steps": 335, "loss": 0.1667, "lr": 3.6764705882352945e-05, "epoch": 0.3895131086142322, "percentage": 7.76, "elapsed_time": "0:05:10", "remaining_time": "1:01:32", "throughput": 9257.73, "total_tokens": 2876408}
|
||||
{"current_steps": 27, "total_steps": 335, "loss": 0.0896, "lr": 3.8235294117647055e-05, "epoch": 0.4044943820224719, "percentage": 8.06, "elapsed_time": "0:05:21", "remaining_time": "1:01:10", "throughput": 9285.19, "total_tokens": 2987992}
|
||||
{"current_steps": 28, "total_steps": 335, "loss": 0.2299, "lr": 3.970588235294117e-05, "epoch": 0.41947565543071164, "percentage": 8.36, "elapsed_time": "0:05:33", "remaining_time": "1:00:52", "throughput": 9285.3, "total_tokens": 3093200}
|
||||
{"current_steps": 29, "total_steps": 335, "loss": 0.269, "lr": 4.11764705882353e-05, "epoch": 0.4344569288389513, "percentage": 8.66, "elapsed_time": "0:05:44", "remaining_time": "1:00:31", "throughput": 9277.28, "total_tokens": 3192624}
|
||||
{"current_steps": 30, "total_steps": 335, "loss": 0.159, "lr": 4.2647058823529415e-05, "epoch": 0.449438202247191, "percentage": 8.96, "elapsed_time": "0:05:55", "remaining_time": "1:00:15", "throughput": 9297.36, "total_tokens": 3306120}
|
||||
{"current_steps": 31, "total_steps": 335, "loss": 0.229, "lr": 4.411764705882353e-05, "epoch": 0.46441947565543074, "percentage": 9.25, "elapsed_time": "0:06:07", "remaining_time": "0:59:59", "throughput": 9316.47, "total_tokens": 3419688}
|
||||
{"current_steps": 32, "total_steps": 335, "loss": 0.1994, "lr": 4.558823529411765e-05, "epoch": 0.4794007490636704, "percentage": 9.55, "elapsed_time": "0:06:18", "remaining_time": "0:59:43", "throughput": 9307.12, "total_tokens": 3521896}
|
||||
{"current_steps": 33, "total_steps": 335, "loss": 0.109, "lr": 4.705882352941177e-05, "epoch": 0.4943820224719101, "percentage": 9.85, "elapsed_time": "0:06:29", "remaining_time": "0:59:28", "throughput": 9348.15, "total_tokens": 3644912}
|
||||
{"current_steps": 34, "total_steps": 335, "loss": 0.1888, "lr": 4.8529411764705885e-05, "epoch": 0.5093632958801498, "percentage": 10.15, "elapsed_time": "0:06:41", "remaining_time": "0:59:12", "throughput": 9348.9, "total_tokens": 3751480}
|
||||
{"current_steps": 35, "total_steps": 335, "loss": 0.1841, "lr": 5e-05, "epoch": 0.5243445692883895, "percentage": 10.45, "elapsed_time": "0:06:52", "remaining_time": "0:58:56", "throughput": 9337.77, "total_tokens": 3853072}
|
||||
{"current_steps": 36, "total_steps": 335, "loss": 0.2391, "lr": 4.999863832700438e-05, "epoch": 0.5393258426966292, "percentage": 10.75, "elapsed_time": "0:07:03", "remaining_time": "0:58:38", "throughput": 9336.64, "total_tokens": 3954992}
|
||||
{"current_steps": 37, "total_steps": 335, "loss": 0.2589, "lr": 4.999455345634978e-05, "epoch": 0.5543071161048689, "percentage": 11.04, "elapsed_time": "0:07:14", "remaining_time": "0:58:23", "throughput": 9335.05, "total_tokens": 4060312}
|
||||
{"current_steps": 38, "total_steps": 335, "loss": 0.1603, "lr": 4.9987745833016855e-05, "epoch": 0.5692883895131086, "percentage": 11.34, "elapsed_time": "0:07:26", "remaining_time": "0:58:09", "throughput": 9317.09, "total_tokens": 4159664}
|
||||
{"current_steps": 39, "total_steps": 335, "loss": 0.1837, "lr": 4.9978216198586135e-05, "epoch": 0.5842696629213483, "percentage": 11.64, "elapsed_time": "0:07:37", "remaining_time": "0:57:55", "throughput": 9333.8, "total_tokens": 4273696}
|
||||
{"current_steps": 40, "total_steps": 335, "loss": 0.2044, "lr": 4.996596559115731e-05, "epoch": 0.599250936329588, "percentage": 11.94, "elapsed_time": "0:07:49", "remaining_time": "0:57:39", "throughput": 9340.79, "total_tokens": 4381080}
|
||||
{"current_steps": 41, "total_steps": 335, "loss": 0.1326, "lr": 4.995099534523607e-05, "epoch": 0.6142322097378277, "percentage": 12.24, "elapsed_time": "0:08:00", "remaining_time": "0:57:24", "throughput": 9366.78, "total_tokens": 4499912}
|
||||
{"current_steps": 42, "total_steps": 335, "loss": 0.1795, "lr": 4.9933307091588796e-05, "epoch": 0.6292134831460674, "percentage": 12.54, "elapsed_time": "0:08:11", "remaining_time": "0:57:10", "throughput": 9367.17, "total_tokens": 4606816}
|
||||
{"current_steps": 43, "total_steps": 335, "loss": 0.188, "lr": 4.991290275706486e-05, "epoch": 0.6441947565543071, "percentage": 12.84, "elapsed_time": "0:08:23", "remaining_time": "0:56:57", "throughput": 9379.7, "total_tokens": 4720528}
|
||||
{"current_steps": 44, "total_steps": 335, "loss": 0.1692, "lr": 4.988978456438678e-05, "epoch": 0.6591760299625468, "percentage": 13.13, "elapsed_time": "0:08:34", "remaining_time": "0:56:44", "throughput": 9392.58, "total_tokens": 4834552}
|
||||
{"current_steps": 45, "total_steps": 335, "loss": 0.1526, "lr": 4.986395503190805e-05, "epoch": 0.6741573033707865, "percentage": 13.43, "elapsed_time": "0:08:45", "remaining_time": "0:56:28", "throughput": 9395.72, "total_tokens": 4940840}
|
||||
{"current_steps": 46, "total_steps": 335, "loss": 0.2274, "lr": 4.983541697333881e-05, "epoch": 0.6891385767790262, "percentage": 13.73, "elapsed_time": "0:08:57", "remaining_time": "0:56:14", "throughput": 9393.02, "total_tokens": 5044880}
|
||||
{"current_steps": 47, "total_steps": 335, "loss": 0.1199, "lr": 4.980417349743936e-05, "epoch": 0.704119850187266, "percentage": 14.03, "elapsed_time": "0:09:07", "remaining_time": "0:55:57", "throughput": 9424.16, "total_tokens": 5164256}
|
||||
{"current_steps": 48, "total_steps": 335, "loss": 0.2262, "lr": 4.9770228007681494e-05, "epoch": 0.7191011235955056, "percentage": 14.33, "elapsed_time": "0:09:17", "remaining_time": "0:55:35", "throughput": 9432.85, "total_tokens": 5262840}
|
||||
{"current_steps": 49, "total_steps": 335, "loss": 0.1684, "lr": 4.973358420187776e-05, "epoch": 0.7340823970037453, "percentage": 14.63, "elapsed_time": "0:09:28", "remaining_time": "0:55:16", "throughput": 9458.31, "total_tokens": 5374992}
|
||||
{"current_steps": 50, "total_steps": 335, "loss": 0.1599, "lr": 4.9694246071778604e-05, "epoch": 0.7490636704119851, "percentage": 14.93, "elapsed_time": "0:09:38", "remaining_time": "0:54:58", "throughput": 9479.7, "total_tokens": 5486368}
|
||||
{"current_steps": 50, "total_steps": 335, "eval_loss": 0.22489887475967407, "epoch": 0.7490636704119851, "percentage": 14.93, "elapsed_time": "0:09:43", "remaining_time": "0:55:26", "throughput": 9399.66, "total_tokens": 5486368}
|
||||
{"current_steps": 51, "total_steps": 335, "loss": 0.2025, "lr": 4.9652217902637596e-05, "epoch": 0.7640449438202247, "percentage": 15.22, "elapsed_time": "0:09:53", "remaining_time": "0:55:04", "throughput": 9407.91, "total_tokens": 5582648}
|
||||
{"current_steps": 52, "total_steps": 335, "loss": 0.1592, "lr": 4.9607504272744575e-05, "epoch": 0.7790262172284644, "percentage": 15.52, "elapsed_time": "0:10:03", "remaining_time": "0:54:45", "throughput": 9431.45, "total_tokens": 5692920}
|
||||
{"current_steps": 53, "total_steps": 335, "loss": 0.2657, "lr": 4.956011005292692e-05, "epoch": 0.7940074906367042, "percentage": 15.82, "elapsed_time": "0:10:13", "remaining_time": "0:54:26", "throughput": 9441.15, "total_tokens": 5795728}
|
||||
{"current_steps": 54, "total_steps": 335, "loss": 0.1878, "lr": 4.951004040601898e-05, "epoch": 0.8089887640449438, "percentage": 16.12, "elapsed_time": "0:10:24", "remaining_time": "0:54:08", "throughput": 9471.15, "total_tokens": 5911816}
|
||||
{"current_steps": 55, "total_steps": 335, "loss": 0.2157, "lr": 4.945730078629964e-05, "epoch": 0.8239700374531835, "percentage": 16.42, "elapsed_time": "0:10:34", "remaining_time": "0:53:49", "throughput": 9482.67, "total_tokens": 6015648}
|
||||
{"current_steps": 56, "total_steps": 335, "loss": 0.1789, "lr": 4.9401896938898185e-05, "epoch": 0.8389513108614233, "percentage": 16.72, "elapsed_time": "0:10:44", "remaining_time": "0:53:32", "throughput": 9510.27, "total_tokens": 6132248}
|
||||
{"current_steps": 57, "total_steps": 335, "loss": 0.2019, "lr": 4.934383489916843e-05, "epoch": 0.8539325842696629, "percentage": 17.01, "elapsed_time": "0:10:55", "remaining_time": "0:53:15", "throughput": 9537.52, "total_tokens": 6249344}
|
||||
{"current_steps": 58, "total_steps": 335, "loss": 0.132, "lr": 4.928312099203131e-05, "epoch": 0.8689138576779026, "percentage": 17.31, "elapsed_time": "0:11:05", "remaining_time": "0:52:59", "throughput": 9564.24, "total_tokens": 6366872}
|
||||
{"current_steps": 59, "total_steps": 335, "loss": 0.2022, "lr": 4.921976183128585e-05, "epoch": 0.8838951310861424, "percentage": 17.61, "elapsed_time": "0:11:16", "remaining_time": "0:52:42", "throughput": 9578.22, "total_tokens": 6475464}
|
||||
{"current_steps": 60, "total_steps": 335, "loss": 0.1605, "lr": 4.9153764318888706e-05, "epoch": 0.898876404494382, "percentage": 17.91, "elapsed_time": "0:11:27", "remaining_time": "0:52:31", "throughput": 9578.72, "total_tokens": 6587040}
|
||||
{"current_steps": 61, "total_steps": 335, "loss": 0.2062, "lr": 4.908513564420231e-05, "epoch": 0.9138576779026217, "percentage": 18.21, "elapsed_time": "0:11:39", "remaining_time": "0:52:20", "throughput": 9586.02, "total_tokens": 6702552}
|
||||
{"current_steps": 62, "total_steps": 335, "loss": 0.1485, "lr": 4.90138832832117e-05, "epoch": 0.9288389513108615, "percentage": 18.51, "elapsed_time": "0:11:50", "remaining_time": "0:52:09", "throughput": 9580.3, "total_tokens": 6809352}
|
||||
{"current_steps": 63, "total_steps": 335, "loss": 0.1896, "lr": 4.894001499771015e-05, "epoch": 0.9438202247191011, "percentage": 18.81, "elapsed_time": "0:12:02", "remaining_time": "0:51:58", "throughput": 9566.27, "total_tokens": 6909928}
|
||||
{"current_steps": 64, "total_steps": 335, "loss": 0.1141, "lr": 4.886353883445363e-05, "epoch": 0.9588014981273408, "percentage": 19.1, "elapsed_time": "0:12:13", "remaining_time": "0:51:48", "throughput": 9576.69, "total_tokens": 7029288}
|
||||
{"current_steps": 65, "total_steps": 335, "loss": 0.2227, "lr": 4.878446312428424e-05, "epoch": 0.9737827715355806, "percentage": 19.4, "elapsed_time": "0:12:25", "remaining_time": "0:51:36", "throughput": 9572.45, "total_tokens": 7136544}
|
||||
{"current_steps": 66, "total_steps": 335, "loss": 0.1648, "lr": 4.8702796481222714e-05, "epoch": 0.9887640449438202, "percentage": 19.7, "elapsed_time": "0:12:37", "remaining_time": "0:51:25", "throughput": 9569.3, "total_tokens": 7244184}
|
||||
{"current_steps": 67, "total_steps": 335, "loss": 0.2552, "lr": 4.861854780153004e-05, "epoch": 1.0, "percentage": 20.0, "elapsed_time": "0:12:41", "remaining_time": "0:50:47", "throughput": 9607.64, "total_tokens": 7319544}
|
||||
{"current_steps": 68, "total_steps": 335, "loss": 0.1038, "lr": 4.853172626273841e-05, "epoch": 1.0149812734082397, "percentage": 20.3, "elapsed_time": "0:12:53", "remaining_time": "0:50:37", "throughput": 9615.77, "total_tokens": 7437632}
|
||||
{"current_steps": 69, "total_steps": 335, "loss": 0.1202, "lr": 4.8442341322651385e-05, "epoch": 1.0299625468164795, "percentage": 20.6, "elapsed_time": "0:13:04", "remaining_time": "0:50:26", "throughput": 9614.67, "total_tokens": 7547280}
|
||||
{"current_steps": 70, "total_steps": 335, "loss": 0.1851, "lr": 4.83504027183137e-05, "epoch": 1.0449438202247192, "percentage": 20.9, "elapsed_time": "0:13:16", "remaining_time": "0:50:16", "throughput": 9610.6, "total_tokens": 7658904}
|
||||
{"current_steps": 71, "total_steps": 335, "loss": 0.1193, "lr": 4.825592046495054e-05, "epoch": 1.0599250936329587, "percentage": 21.19, "elapsed_time": "0:13:28", "remaining_time": "0:50:06", "throughput": 9601.61, "total_tokens": 7762712}
|
||||
{"current_steps": 72, "total_steps": 335, "loss": 0.1442, "lr": 4.8158904854876555e-05, "epoch": 1.0749063670411985, "percentage": 21.49, "elapsed_time": "0:13:40", "remaining_time": "0:49:56", "throughput": 9598.73, "total_tokens": 7875080}
|
||||
{"current_steps": 73, "total_steps": 335, "loss": 0.1783, "lr": 4.805936645637463e-05, "epoch": 1.0898876404494382, "percentage": 21.79, "elapsed_time": "0:13:52", "remaining_time": "0:49:47", "throughput": 9599.16, "total_tokens": 7989424}
|
||||
{"current_steps": 74, "total_steps": 335, "loss": 0.096, "lr": 4.795731611254473e-05, "epoch": 1.104868913857678, "percentage": 22.09, "elapsed_time": "0:14:04", "remaining_time": "0:49:36", "throughput": 9601.65, "total_tokens": 8104200}
|
||||
{"current_steps": 75, "total_steps": 335, "loss": 0.1223, "lr": 4.785276494012263e-05, "epoch": 1.1198501872659177, "percentage": 22.39, "elapsed_time": "0:14:15", "remaining_time": "0:49:27", "throughput": 9598.92, "total_tokens": 8216400}
|
||||
{"current_steps": 75, "total_steps": 335, "eval_loss": 0.20777302980422974, "epoch": 1.1198501872659177, "percentage": 22.39, "elapsed_time": "0:14:20", "remaining_time": "0:49:44", "throughput": 9543.77, "total_tokens": 8216400}
|
||||
{"current_steps": 76, "total_steps": 335, "loss": 0.1293, "lr": 4.7745724328269e-05, "epoch": 1.1348314606741572, "percentage": 22.69, "elapsed_time": "0:14:32", "remaining_time": "0:49:34", "throughput": 9543.26, "total_tokens": 8330424}
|
||||
{"current_steps": 77, "total_steps": 335, "loss": 0.1562, "lr": 4.763620593732867e-05, "epoch": 1.149812734082397, "percentage": 22.99, "elapsed_time": "0:14:44", "remaining_time": "0:49:24", "throughput": 9537.93, "total_tokens": 8438312}
|
||||
{"current_steps": 78, "total_steps": 335, "loss": 0.1081, "lr": 4.752422169756048e-05, "epoch": 1.1647940074906367, "percentage": 23.28, "elapsed_time": "0:14:56", "remaining_time": "0:49:14", "throughput": 9524.14, "total_tokens": 8538856}
|
||||
{"current_steps": 79, "total_steps": 335, "loss": 0.0907, "lr": 4.740978380783765e-05, "epoch": 1.1797752808988764, "percentage": 23.58, "elapsed_time": "0:15:08", "remaining_time": "0:49:03", "throughput": 9520.37, "total_tokens": 8648688}
|
||||
{"current_steps": 80, "total_steps": 335, "loss": 0.1497, "lr": 4.7292904734318924e-05, "epoch": 1.1947565543071161, "percentage": 23.88, "elapsed_time": "0:15:20", "remaining_time": "0:48:53", "throughput": 9515.44, "total_tokens": 8757528}
|
||||
{"current_steps": 81, "total_steps": 335, "loss": 0.1343, "lr": 4.7173597209090534e-05, "epoch": 1.2097378277153559, "percentage": 24.18, "elapsed_time": "0:15:32", "remaining_time": "0:48:42", "throughput": 9517.49, "total_tokens": 8871600}
|
||||
{"current_steps": 82, "total_steps": 335, "loss": 0.1842, "lr": 4.70518742287793e-05, "epoch": 1.2247191011235956, "percentage": 24.48, "elapsed_time": "0:15:43", "remaining_time": "0:48:31", "throughput": 9512.25, "total_tokens": 8975328}
|
||||
{"current_steps": 83, "total_steps": 335, "loss": 0.1342, "lr": 4.6927749053136866e-05, "epoch": 1.2397003745318351, "percentage": 24.78, "elapsed_time": "0:15:55", "remaining_time": "0:48:19", "throughput": 9518.43, "total_tokens": 9090992}
|
||||
{"current_steps": 84, "total_steps": 335, "loss": 0.1938, "lr": 4.6801235203595195e-05, "epoch": 1.2546816479400749, "percentage": 25.07, "elapsed_time": "0:16:06", "remaining_time": "0:48:07", "throughput": 9520.44, "total_tokens": 9201320}
|
||||
{"current_steps": 85, "total_steps": 335, "loss": 0.1673, "lr": 4.667234646179368e-05, "epoch": 1.2696629213483146, "percentage": 25.37, "elapsed_time": "0:16:17", "remaining_time": "0:47:55", "throughput": 9517.24, "total_tokens": 9304160}
|
||||
{"current_steps": 86, "total_steps": 335, "loss": 0.2025, "lr": 4.654109686807787e-05, "epoch": 1.2846441947565543, "percentage": 25.67, "elapsed_time": "0:16:29", "remaining_time": "0:47:44", "throughput": 9512.15, "total_tokens": 9409224}
|
||||
{"current_steps": 87, "total_steps": 335, "loss": 0.1421, "lr": 4.640750071996995e-05, "epoch": 1.299625468164794, "percentage": 25.97, "elapsed_time": "0:16:40", "remaining_time": "0:47:32", "throughput": 9507.57, "total_tokens": 9514232}
|
||||
{"current_steps": 88, "total_steps": 335, "loss": 0.1485, "lr": 4.6271572570611296e-05, "epoch": 1.3146067415730336, "percentage": 26.27, "elapsed_time": "0:16:52", "remaining_time": "0:47:21", "throughput": 9507.1, "total_tokens": 9623752}
|
||||
{"current_steps": 89, "total_steps": 335, "loss": 0.1504, "lr": 4.613332722717714e-05, "epoch": 1.3295880149812733, "percentage": 26.57, "elapsed_time": "0:17:03", "remaining_time": "0:47:10", "throughput": 9507.85, "total_tokens": 9734808}
|
||||
{"current_steps": 90, "total_steps": 335, "loss": 0.1232, "lr": 4.5992779749263546e-05, "epoch": 1.344569288389513, "percentage": 26.87, "elapsed_time": "0:17:15", "remaining_time": "0:46:58", "throughput": 9510.01, "total_tokens": 9847464}
|
||||
{"current_steps": 91, "total_steps": 335, "loss": 0.1916, "lr": 4.584994544724695e-05, "epoch": 1.3595505617977528, "percentage": 27.16, "elapsed_time": "0:17:26", "remaining_time": "0:46:47", "throughput": 9494.85, "total_tokens": 9940464}
|
||||
{"current_steps": 92, "total_steps": 335, "loss": 0.1665, "lr": 4.5704839880616296e-05, "epoch": 1.3745318352059925, "percentage": 27.46, "elapsed_time": "0:17:38", "remaining_time": "0:46:35", "throughput": 9498.56, "total_tokens": 10054728}
|
||||
{"current_steps": 93, "total_steps": 335, "loss": 0.102, "lr": 4.5557478856278114e-05, "epoch": 1.3895131086142323, "percentage": 27.76, "elapsed_time": "0:17:50", "remaining_time": "0:46:24", "throughput": 9504.9, "total_tokens": 10172456}
|
||||
{"current_steps": 94, "total_steps": 335, "loss": 0.1167, "lr": 4.5407878426834596e-05, "epoch": 1.404494382022472, "percentage": 28.06, "elapsed_time": "0:18:01", "remaining_time": "0:46:13", "throughput": 9501.49, "total_tokens": 10279024}
|
||||
{"current_steps": 95, "total_steps": 335, "loss": 0.1945, "lr": 4.5256054888834934e-05, "epoch": 1.4194756554307117, "percentage": 28.36, "elapsed_time": "0:18:13", "remaining_time": "0:46:02", "throughput": 9505.4, "total_tokens": 10394120}
|
||||
{"current_steps": 96, "total_steps": 335, "loss": 0.1576, "lr": 4.5102024781000077e-05, "epoch": 1.4344569288389513, "percentage": 28.66, "elapsed_time": "0:18:25", "remaining_time": "0:45:51", "throughput": 9505.4, "total_tokens": 10503768}
|
||||
{"current_steps": 97, "total_steps": 335, "loss": 0.1266, "lr": 4.4945804882421086e-05, "epoch": 1.449438202247191, "percentage": 28.96, "elapsed_time": "0:18:36", "remaining_time": "0:45:39", "throughput": 9507.51, "total_tokens": 10616136}
|
||||
{"current_steps": 98, "total_steps": 335, "loss": 0.0974, "lr": 4.478741221073136e-05, "epoch": 1.4644194756554307, "percentage": 29.25, "elapsed_time": "0:18:48", "remaining_time": "0:45:28", "throughput": 9507.33, "total_tokens": 10725704}
|
||||
{"current_steps": 99, "total_steps": 335, "loss": 0.0942, "lr": 4.4626864020252774e-05, "epoch": 1.4794007490636705, "percentage": 29.55, "elapsed_time": "0:18:59", "remaining_time": "0:45:16", "throughput": 9510.15, "total_tokens": 10838848}
|
||||
{"current_steps": 100, "total_steps": 335, "loss": 0.16, "lr": 4.446417780011618e-05, "epoch": 1.49438202247191, "percentage": 29.85, "elapsed_time": "0:19:11", "remaining_time": "0:45:05", "throughput": 9513.83, "total_tokens": 10953704}
|
||||
{"current_steps": 100, "total_steps": 335, "eval_loss": 0.20240993797779083, "epoch": 1.49438202247191, "percentage": 29.85, "elapsed_time": "0:19:16", "remaining_time": "0:45:17", "throughput": 9473.13, "total_tokens": 10953704}
|
||||
{"current_steps": 101, "total_steps": 335, "loss": 0.1192, "lr": 4.42993712723562e-05, "epoch": 1.5093632958801497, "percentage": 30.15, "elapsed_time": "0:19:27", "remaining_time": "0:45:05", "throughput": 9481.33, "total_tokens": 11073888}
|
||||
{"current_steps": 102, "total_steps": 335, "loss": 0.1767, "lr": 4.413246238998069e-05, "epoch": 1.5243445692883895, "percentage": 30.45, "elapsed_time": "0:19:39", "remaining_time": "0:44:54", "throughput": 9476.87, "total_tokens": 11178896}
|
||||
{"current_steps": 103, "total_steps": 335, "loss": 0.1383, "lr": 4.3963469335015085e-05, "epoch": 1.5393258426966292, "percentage": 30.75, "elapsed_time": "0:19:51", "remaining_time": "0:44:42", "throughput": 9477.94, "total_tokens": 11289112}
|
||||
{"current_steps": 104, "total_steps": 335, "loss": 0.1421, "lr": 4.379241051652174e-05, "epoch": 1.554307116104869, "percentage": 31.04, "elapsed_time": "0:20:02", "remaining_time": "0:44:31", "throughput": 9481.53, "total_tokens": 11401952}
|
||||
{"current_steps": 105, "total_steps": 335, "loss": 0.1201, "lr": 4.361930456859455e-05, "epoch": 1.5692883895131087, "percentage": 31.34, "elapsed_time": "0:20:13", "remaining_time": "0:44:19", "throughput": 9482.92, "total_tokens": 11511848}
|
||||
{"current_steps": 106, "total_steps": 335, "loss": 0.0623, "lr": 4.34441703483291e-05, "epoch": 1.5842696629213484, "percentage": 31.64, "elapsed_time": "0:20:25", "remaining_time": "0:44:07", "throughput": 9486.84, "total_tokens": 11625728}
|
||||
{"current_steps": 107, "total_steps": 335, "loss": 0.193, "lr": 4.326702693376844e-05, "epoch": 1.5992509363295881, "percentage": 31.94, "elapsed_time": "0:20:37", "remaining_time": "0:43:55", "throughput": 9491.7, "total_tokens": 11741544}
|
||||
{"current_steps": 108, "total_steps": 335, "loss": 0.0936, "lr": 4.308789362182492e-05, "epoch": 1.6142322097378277, "percentage": 32.24, "elapsed_time": "0:20:48", "remaining_time": "0:43:44", "throughput": 9492.36, "total_tokens": 11851240}
|
||||
{"current_steps": 109, "total_steps": 335, "loss": 0.1468, "lr": 4.2906789926177975e-05, "epoch": 1.6292134831460674, "percentage": 32.54, "elapsed_time": "0:21:00", "remaining_time": "0:43:33", "throughput": 9492.37, "total_tokens": 11963664}
|
||||
{"current_steps": 110, "total_steps": 335, "loss": 0.1707, "lr": 4.272373557514858e-05, "epoch": 1.6441947565543071, "percentage": 32.84, "elapsed_time": "0:21:11", "remaining_time": "0:43:20", "throughput": 9491.21, "total_tokens": 12067544}
|
||||
{"current_steps": 111, "total_steps": 335, "loss": 0.1829, "lr": 4.2538750509550054e-05, "epoch": 1.6591760299625467, "percentage": 33.13, "elapsed_time": "0:21:22", "remaining_time": "0:43:08", "throughput": 9482.13, "total_tokens": 12164792}
|
||||
{"current_steps": 112, "total_steps": 335, "loss": 0.1401, "lr": 4.235185488051585e-05, "epoch": 1.6741573033707864, "percentage": 33.43, "elapsed_time": "0:21:34", "remaining_time": "0:42:58", "throughput": 9484.52, "total_tokens": 12281440}
|
||||
{"current_steps": 113, "total_steps": 335, "loss": 0.1412, "lr": 4.216306904730447e-05, "epoch": 1.6891385767790261, "percentage": 33.73, "elapsed_time": "0:21:46", "remaining_time": "0:42:47", "throughput": 9481.49, "total_tokens": 12389800}
|
||||
{"current_steps": 114, "total_steps": 335, "loss": 0.1908, "lr": 4.1972413575081595e-05, "epoch": 1.7041198501872659, "percentage": 34.03, "elapsed_time": "0:21:58", "remaining_time": "0:42:35", "throughput": 9480.71, "total_tokens": 12498360}
|
||||
{"current_steps": 115, "total_steps": 335, "loss": 0.1783, "lr": 4.177990923267986e-05, "epoch": 1.7191011235955056, "percentage": 34.33, "elapsed_time": "0:22:09", "remaining_time": "0:42:24", "throughput": 9475.38, "total_tokens": 12601072}
|
||||
{"current_steps": 116, "total_steps": 335, "loss": 0.1246, "lr": 4.158557699033644e-05, "epoch": 1.7340823970037453, "percentage": 34.63, "elapsed_time": "0:22:21", "remaining_time": "0:42:12", "throughput": 9469.52, "total_tokens": 12704456}
|
||||
{"current_steps": 117, "total_steps": 335, "loss": 0.0917, "lr": 4.138943801740865e-05, "epoch": 1.749063670411985, "percentage": 34.93, "elapsed_time": "0:22:33", "remaining_time": "0:42:01", "throughput": 9458.59, "total_tokens": 12801568}
|
||||
{"current_steps": 118, "total_steps": 335, "loss": 0.0672, "lr": 4.119151368006793e-05, "epoch": 1.7640449438202248, "percentage": 35.22, "elapsed_time": "0:22:45", "remaining_time": "0:41:50", "throughput": 9462.14, "total_tokens": 12917448}
|
||||
{"current_steps": 119, "total_steps": 335, "loss": 0.1358, "lr": 4.099182553897229e-05, "epoch": 1.7790262172284645, "percentage": 35.52, "elapsed_time": "0:22:56", "remaining_time": "0:41:39", "throughput": 9457.8, "total_tokens": 13022432}
|
||||
{"current_steps": 120, "total_steps": 335, "loss": 0.1048, "lr": 4.079039534691767e-05, "epoch": 1.7940074906367043, "percentage": 35.82, "elapsed_time": "0:23:08", "remaining_time": "0:41:28", "throughput": 9454.12, "total_tokens": 13129888}
|
||||
{"current_steps": 121, "total_steps": 335, "loss": 0.1369, "lr": 4.058724504646834e-05, "epoch": 1.8089887640449438, "percentage": 36.12, "elapsed_time": "0:23:20", "remaining_time": "0:41:17", "throughput": 9449.6, "total_tokens": 13235312}
|
||||
{"current_steps": 122, "total_steps": 335, "loss": 0.1564, "lr": 4.0382396767566536e-05, "epoch": 1.8239700374531835, "percentage": 36.42, "elapsed_time": "0:23:32", "remaining_time": "0:41:05", "throughput": 9452.36, "total_tokens": 13350920}
|
||||
{"current_steps": 123, "total_steps": 335, "loss": 0.1292, "lr": 4.017587282512181e-05, "epoch": 1.8389513108614233, "percentage": 36.72, "elapsed_time": "0:23:44", "remaining_time": "0:40:55", "throughput": 9448.3, "total_tokens": 13458096}
|
||||
{"current_steps": 124, "total_steps": 335, "loss": 0.1175, "lr": 3.9967695716580224e-05, "epoch": 1.8539325842696628, "percentage": 37.01, "elapsed_time": "0:23:56", "remaining_time": "0:40:44", "throughput": 9444.9, "total_tokens": 13566016}
|
||||
{"current_steps": 125, "total_steps": 335, "loss": 0.1814, "lr": 3.975788811947351e-05, "epoch": 1.8689138576779025, "percentage": 37.31, "elapsed_time": "0:24:08", "remaining_time": "0:40:32", "throughput": 9444.35, "total_tokens": 13676808}
|
||||
{"current_steps": 125, "total_steps": 335, "eval_loss": 0.18464037775993347, "epoch": 1.8689138576779025, "percentage": 37.31, "elapsed_time": "0:24:13", "remaining_time": "0:40:41", "throughput": 9412.31, "total_tokens": 13676808}
|
||||
{"current_steps": 126, "total_steps": 335, "loss": 0.0969, "lr": 3.954647288894883e-05, "epoch": 1.8838951310861423, "percentage": 37.61, "elapsed_time": "0:24:24", "remaining_time": "0:40:29", "throughput": 9410.44, "total_tokens": 13785624}
|
||||
{"current_steps": 127, "total_steps": 335, "loss": 0.1431, "lr": 3.933347305527898e-05, "epoch": 1.898876404494382, "percentage": 37.91, "elapsed_time": "0:24:36", "remaining_time": "0:40:18", "throughput": 9409.77, "total_tokens": 13896368}
|
||||
{"current_steps": 128, "total_steps": 335, "loss": 0.1552, "lr": 3.911891182135371e-05, "epoch": 1.9138576779026217, "percentage": 38.21, "elapsed_time": "0:24:48", "remaining_time": "0:40:07", "throughput": 9410.85, "total_tokens": 14010984}
|
||||
{"current_steps": 129, "total_steps": 335, "loss": 0.1472, "lr": 3.8902812560152066e-05, "epoch": 1.9288389513108615, "percentage": 38.51, "elapsed_time": "0:25:00", "remaining_time": "0:39:55", "throughput": 9405.64, "total_tokens": 14112168}
|
||||
{"current_steps": 130, "total_steps": 335, "loss": 0.1115, "lr": 3.868519881219631e-05, "epoch": 1.9438202247191012, "percentage": 38.81, "elapsed_time": "0:25:12", "remaining_time": "0:39:44", "throughput": 9408.53, "total_tokens": 14227128}
|
||||
{"current_steps": 131, "total_steps": 335, "loss": 0.1027, "lr": 3.846609428298757e-05, "epoch": 1.958801498127341, "percentage": 39.1, "elapsed_time": "0:25:24", "remaining_time": "0:39:33", "throughput": 9410.53, "total_tokens": 14342592}
|
||||
{"current_steps": 132, "total_steps": 335, "loss": 0.1057, "lr": 3.824552284042351e-05, "epoch": 1.9737827715355807, "percentage": 39.4, "elapsed_time": "0:25:36", "remaining_time": "0:39:22", "throughput": 9414.82, "total_tokens": 14461768}
|
||||
{"current_steps": 133, "total_steps": 335, "loss": 0.1326, "lr": 3.8023508512198256e-05, "epoch": 1.9887640449438202, "percentage": 39.7, "elapsed_time": "0:25:47", "remaining_time": "0:39:10", "throughput": 9412.45, "total_tokens": 14568520}
|
||||
{"current_steps": 134, "total_steps": 335, "loss": 0.1245, "lr": 3.780007548318507e-05, "epoch": 2.0, "percentage": 40.0, "elapsed_time": "0:25:57", "remaining_time": "0:38:56", "throughput": 9400.19, "total_tokens": 14641496}
|
||||
{"current_steps": 135, "total_steps": 335, "loss": 0.158, "lr": 3.7575248092801686e-05, "epoch": 2.0149812734082397, "percentage": 40.3, "elapsed_time": "0:26:09", "remaining_time": "0:38:45", "throughput": 9395.51, "total_tokens": 14745856}
|
||||
{"current_steps": 136, "total_steps": 335, "loss": 0.122, "lr": 3.734905083235901e-05, "epoch": 2.0299625468164795, "percentage": 40.6, "elapsed_time": "0:26:21", "remaining_time": "0:38:33", "throughput": 9391.71, "total_tokens": 14851856}
|
||||
{"current_steps": 137, "total_steps": 335, "loss": 0.1392, "lr": 3.712150834239313e-05, "epoch": 2.044943820224719, "percentage": 40.9, "elapsed_time": "0:26:33", "remaining_time": "0:38:22", "throughput": 9392.06, "total_tokens": 14962208}
|
||||
{"current_steps": 138, "total_steps": 335, "loss": 0.0892, "lr": 3.689264540998116e-05, "epoch": 2.059925093632959, "percentage": 41.19, "elapsed_time": "0:26:44", "remaining_time": "0:38:10", "throughput": 9392.19, "total_tokens": 15071712}
|
||||
{"current_steps": 139, "total_steps": 335, "loss": 0.0706, "lr": 3.66624869660411e-05, "epoch": 2.0749063670411987, "percentage": 41.49, "elapsed_time": "0:26:56", "remaining_time": "0:37:59", "throughput": 9391.03, "total_tokens": 15178568}
|
||||
{"current_steps": 140, "total_steps": 335, "loss": 0.0695, "lr": 3.6431058082615964e-05, "epoch": 2.0898876404494384, "percentage": 41.79, "elapsed_time": "0:27:07", "remaining_time": "0:37:47", "throughput": 9395.69, "total_tokens": 15295296}
|
||||
{"current_steps": 141, "total_steps": 335, "loss": 0.1314, "lr": 3.619838397014263e-05, "epoch": 2.1048689138576777, "percentage": 42.09, "elapsed_time": "0:27:19", "remaining_time": "0:37:35", "throughput": 9394.32, "total_tokens": 15401968}
|
||||
{"current_steps": 142, "total_steps": 335, "loss": 0.1043, "lr": 3.5964489974705553e-05, "epoch": 2.1198501872659175, "percentage": 42.39, "elapsed_time": "0:27:30", "remaining_time": "0:37:23", "throughput": 9395.3, "total_tokens": 15510128}
|
||||
{"current_steps": 143, "total_steps": 335, "loss": 0.1566, "lr": 3.572940157527572e-05, "epoch": 2.134831460674157, "percentage": 42.69, "elapsed_time": "0:27:41", "remaining_time": "0:37:11", "throughput": 9390.77, "total_tokens": 15606536}
|
||||
{"current_steps": 144, "total_steps": 335, "loss": 0.0907, "lr": 3.549314438093515e-05, "epoch": 2.149812734082397, "percentage": 42.99, "elapsed_time": "0:27:53", "remaining_time": "0:36:59", "throughput": 9393.32, "total_tokens": 15717520}
|
||||
{"current_steps": 145, "total_steps": 335, "loss": 0.1258, "lr": 3.525574412808717e-05, "epoch": 2.1647940074906367, "percentage": 43.28, "elapsed_time": "0:28:04", "remaining_time": "0:36:47", "throughput": 9394.67, "total_tokens": 15827848}
|
||||
{"current_steps": 146, "total_steps": 335, "loss": 0.1402, "lr": 3.501722667765286e-05, "epoch": 2.1797752808988764, "percentage": 43.58, "elapsed_time": "0:28:16", "remaining_time": "0:36:35", "throughput": 9394.05, "total_tokens": 15934960}
|
||||
{"current_steps": 147, "total_steps": 335, "loss": 0.0751, "lr": 3.47776180122539e-05, "epoch": 2.194756554307116, "percentage": 43.88, "elapsed_time": "0:28:27", "remaining_time": "0:36:23", "throughput": 9392.38, "total_tokens": 16038664}
|
||||
{"current_steps": 148, "total_steps": 335, "loss": 0.1599, "lr": 3.453694423338225e-05, "epoch": 2.209737827715356, "percentage": 44.18, "elapsed_time": "0:28:38", "remaining_time": "0:36:11", "throughput": 9392.28, "total_tokens": 16142344}
|
||||
{"current_steps": 149, "total_steps": 335, "loss": 0.1017, "lr": 3.4295231558556715e-05, "epoch": 2.2247191011235956, "percentage": 44.48, "elapsed_time": "0:28:50", "remaining_time": "0:35:59", "throughput": 9387.97, "total_tokens": 16242008}
|
||||
{"current_steps": 150, "total_steps": 335, "loss": 0.0857, "lr": 3.4052506318467084e-05, "epoch": 2.2397003745318353, "percentage": 44.78, "elapsed_time": "0:29:01", "remaining_time": "0:35:47", "throughput": 9389.91, "total_tokens": 16353368}
|
||||
{"current_steps": 150, "total_steps": 335, "eval_loss": 0.1802486777305603, "epoch": 2.2397003745318353, "percentage": 44.78, "elapsed_time": "0:29:06", "remaining_time": "0:35:54", "throughput": 9363.41, "total_tokens": 16353368}
|
||||
{"current_steps": 151, "total_steps": 335, "loss": 0.12, "lr": 3.3808794954105716e-05, "epoch": 2.254681647940075, "percentage": 45.07, "elapsed_time": "0:29:18", "remaining_time": "0:35:42", "throughput": 9364.17, "total_tokens": 16462800}
|
||||
{"current_steps": 152, "total_steps": 335, "loss": 0.202, "lr": 3.356412401388732e-05, "epoch": 2.2696629213483144, "percentage": 45.37, "elapsed_time": "0:29:29", "remaining_time": "0:35:30", "throughput": 9366.88, "total_tokens": 16576136}
|
||||
{"current_steps": 153, "total_steps": 335, "loss": 0.0774, "lr": 3.3318520150756846e-05, "epoch": 2.284644194756554, "percentage": 45.67, "elapsed_time": "0:29:41", "remaining_time": "0:35:18", "throughput": 9367.72, "total_tokens": 16685072}
|
||||
{"current_steps": 154, "total_steps": 335, "loss": 0.0896, "lr": 3.307201011928616e-05, "epoch": 2.299625468164794, "percentage": 45.97, "elapsed_time": "0:29:52", "remaining_time": "0:35:06", "throughput": 9371.6, "total_tokens": 16799472}
|
||||
{"current_steps": 155, "total_steps": 335, "loss": 0.1516, "lr": 3.282462077275947e-05, "epoch": 2.3146067415730336, "percentage": 46.27, "elapsed_time": "0:30:04", "remaining_time": "0:34:55", "throughput": 9376.06, "total_tokens": 16916072}
|
||||
{"current_steps": 156, "total_steps": 335, "loss": 0.1394, "lr": 3.257637906024822e-05, "epoch": 2.3295880149812733, "percentage": 46.57, "elapsed_time": "0:30:15", "remaining_time": "0:34:43", "throughput": 9382.12, "total_tokens": 17036352}
|
||||
{"current_steps": 157, "total_steps": 335, "loss": 0.1162, "lr": 3.2327312023675287e-05, "epoch": 2.344569288389513, "percentage": 46.87, "elapsed_time": "0:30:27", "remaining_time": "0:34:31", "throughput": 9380.77, "total_tokens": 17141704}
|
||||
{"current_steps": 158, "total_steps": 335, "loss": 0.1081, "lr": 3.2077446794869295e-05, "epoch": 2.359550561797753, "percentage": 47.16, "elapsed_time": "0:30:38", "remaining_time": "0:34:19", "throughput": 9379.79, "total_tokens": 17247616}
|
||||
{"current_steps": 159, "total_steps": 335, "loss": 0.1278, "lr": 3.1826810592609036e-05, "epoch": 2.3745318352059925, "percentage": 47.46, "elapsed_time": "0:30:50", "remaining_time": "0:34:07", "throughput": 9383.19, "total_tokens": 17360352}
|
||||
{"current_steps": 160, "total_steps": 335, "loss": 0.1027, "lr": 3.157543071965835e-05, "epoch": 2.3895131086142323, "percentage": 47.76, "elapsed_time": "0:31:01", "remaining_time": "0:33:56", "throughput": 9384.92, "total_tokens": 17472040}
|
||||
{"current_steps": 161, "total_steps": 335, "loss": 0.1247, "lr": 3.132333455979202e-05, "epoch": 2.404494382022472, "percentage": 48.06, "elapsed_time": "0:31:13", "remaining_time": "0:33:44", "throughput": 9384.26, "total_tokens": 17579232}
|
||||
{"current_steps": 162, "total_steps": 335, "loss": 0.0773, "lr": 3.107054957481271e-05, "epoch": 2.4194756554307117, "percentage": 48.36, "elapsed_time": "0:31:24", "remaining_time": "0:33:32", "throughput": 9383.27, "total_tokens": 17686392}
|
||||
{"current_steps": 163, "total_steps": 335, "loss": 0.0579, "lr": 3.081710330155942e-05, "epoch": 2.4344569288389515, "percentage": 48.66, "elapsed_time": "0:31:36", "remaining_time": "0:33:21", "throughput": 9386.19, "total_tokens": 17800024}
|
||||
{"current_steps": 164, "total_steps": 335, "loss": 0.0756, "lr": 3.056302334890786e-05, "epoch": 2.449438202247191, "percentage": 48.96, "elapsed_time": "0:31:48", "remaining_time": "0:33:09", "throughput": 9386.26, "total_tokens": 17909576}
|
||||
{"current_steps": 165, "total_steps": 335, "loss": 0.1386, "lr": 3.030833739476285e-05, "epoch": 2.464419475655431, "percentage": 49.25, "elapsed_time": "0:31:59", "remaining_time": "0:32:57", "throughput": 9383.53, "total_tokens": 18009360}
|
||||
{"current_steps": 166, "total_steps": 335, "loss": 0.1432, "lr": 3.0053073183043256e-05, "epoch": 2.4794007490636703, "percentage": 49.55, "elapsed_time": "0:32:10", "remaining_time": "0:32:45", "throughput": 9383.13, "total_tokens": 18114736}
|
||||
{"current_steps": 167, "total_steps": 335, "loss": 0.1071, "lr": 2.979725852065981e-05, "epoch": 2.49438202247191, "percentage": 49.85, "elapsed_time": "0:32:22", "remaining_time": "0:32:33", "throughput": 9385.23, "total_tokens": 18226888}
|
||||
{"current_steps": 168, "total_steps": 335, "loss": 0.114, "lr": 2.954092127448591e-05, "epoch": 2.5093632958801497, "percentage": 50.15, "elapsed_time": "0:32:33", "remaining_time": "0:32:22", "throughput": 9386.72, "total_tokens": 18338720}
|
||||
{"current_steps": 169, "total_steps": 335, "loss": 0.0981, "lr": 2.9284089368322045e-05, "epoch": 2.5243445692883895, "percentage": 50.45, "elapsed_time": "0:32:45", "remaining_time": "0:32:10", "throughput": 9388.58, "total_tokens": 18451496}
|
||||
{"current_steps": 170, "total_steps": 335, "loss": 0.1347, "lr": 2.9026790779853874e-05, "epoch": 2.539325842696629, "percentage": 50.75, "elapsed_time": "0:32:56", "remaining_time": "0:31:58", "throughput": 9388.81, "total_tokens": 18556776}
|
||||
{"current_steps": 171, "total_steps": 335, "loss": 0.0833, "lr": 2.876905353760459e-05, "epoch": 2.554307116104869, "percentage": 51.04, "elapsed_time": "0:33:08", "remaining_time": "0:31:46", "throughput": 9388.0, "total_tokens": 18664112}
|
||||
{"current_steps": 172, "total_steps": 335, "loss": 0.1111, "lr": 2.8510905717881614e-05, "epoch": 2.5692883895131087, "percentage": 51.34, "elapsed_time": "0:33:19", "remaining_time": "0:31:34", "throughput": 9387.76, "total_tokens": 18769448}
|
||||
{"current_steps": 173, "total_steps": 335, "loss": 0.1501, "lr": 2.8252375441718137e-05, "epoch": 2.5842696629213484, "percentage": 51.64, "elapsed_time": "0:33:30", "remaining_time": "0:31:23", "throughput": 9391.37, "total_tokens": 18884864}
|
||||
{"current_steps": 174, "total_steps": 335, "loss": 0.1171, "lr": 2.7993490871809808e-05, "epoch": 2.599250936329588, "percentage": 51.94, "elapsed_time": "0:33:42", "remaining_time": "0:31:11", "throughput": 9391.68, "total_tokens": 18993424}
|
||||
{"current_steps": 175, "total_steps": 335, "loss": 0.1261, "lr": 2.7734280209446865e-05, "epoch": 2.6142322097378274, "percentage": 52.24, "elapsed_time": "0:33:53", "remaining_time": "0:30:59", "throughput": 9395.99, "total_tokens": 19111296}
|
||||
{"current_steps": 175, "total_steps": 335, "eval_loss": 0.1688879132270813, "epoch": 2.6142322097378274, "percentage": 52.24, "elapsed_time": "0:33:58", "remaining_time": "0:31:04", "throughput": 9373.27, "total_tokens": 19111296}
|
||||
{"current_steps": 176, "total_steps": 335, "loss": 0.0987, "lr": 2.7474771691442018e-05, "epoch": 2.629213483146067, "percentage": 52.54, "elapsed_time": "0:34:10", "remaining_time": "0:30:52", "throughput": 9372.14, "total_tokens": 19213824}
|
||||
{"current_steps": 177, "total_steps": 335, "loss": 0.054, "lr": 2.721499358705458e-05, "epoch": 2.644194756554307, "percentage": 52.84, "elapsed_time": "0:34:21", "remaining_time": "0:30:40", "throughput": 9379.61, "total_tokens": 19338104}
|
||||
{"current_steps": 178, "total_steps": 335, "loss": 0.0683, "lr": 2.6954974194910888e-05, "epoch": 2.6591760299625467, "percentage": 53.13, "elapsed_time": "0:34:33", "remaining_time": "0:30:28", "throughput": 9380.91, "total_tokens": 19449848}
|
||||
{"current_steps": 179, "total_steps": 335, "loss": 0.1121, "lr": 2.6694741839921732e-05, "epoch": 2.6741573033707864, "percentage": 53.43, "elapsed_time": "0:34:44", "remaining_time": "0:30:17", "throughput": 9386.83, "total_tokens": 19571008}
|
||||
{"current_steps": 180, "total_steps": 335, "loss": 0.0888, "lr": 2.6434324870196748e-05, "epoch": 2.689138576779026, "percentage": 53.73, "elapsed_time": "0:34:56", "remaining_time": "0:30:05", "throughput": 9390.88, "total_tokens": 19686872}
|
||||
{"current_steps": 181, "total_steps": 335, "loss": 0.0751, "lr": 2.617375165395634e-05, "epoch": 2.704119850187266, "percentage": 54.03, "elapsed_time": "0:35:07", "remaining_time": "0:29:53", "throughput": 9392.65, "total_tokens": 19797960}
|
||||
{"current_steps": 182, "total_steps": 335, "loss": 0.1033, "lr": 2.5913050576441477e-05, "epoch": 2.7191011235955056, "percentage": 54.33, "elapsed_time": "0:35:19", "remaining_time": "0:29:41", "throughput": 9392.4, "total_tokens": 19905184}
|
||||
{"current_steps": 183, "total_steps": 335, "loss": 0.0867, "lr": 2.5652250036821523e-05, "epoch": 2.7340823970037453, "percentage": 54.63, "elapsed_time": "0:35:30", "remaining_time": "0:29:29", "throughput": 9392.21, "total_tokens": 20013120}
|
||||
{"current_steps": 184, "total_steps": 335, "loss": 0.1323, "lr": 2.5391378445100644e-05, "epoch": 2.749063670411985, "percentage": 54.93, "elapsed_time": "0:35:41", "remaining_time": "0:29:17", "throughput": 9388.6, "total_tokens": 20109488}
|
||||
{"current_steps": 185, "total_steps": 335, "loss": 0.0935, "lr": 2.5130464219022992e-05, "epoch": 2.764044943820225, "percentage": 55.22, "elapsed_time": "0:35:53", "remaining_time": "0:29:06", "throughput": 9392.86, "total_tokens": 20227088}
|
||||
{"current_steps": 186, "total_steps": 335, "loss": 0.095, "lr": 2.486953578097702e-05, "epoch": 2.7790262172284645, "percentage": 55.52, "elapsed_time": "0:36:04", "remaining_time": "0:28:54", "throughput": 9390.64, "total_tokens": 20330176}
|
||||
{"current_steps": 187, "total_steps": 335, "loss": 0.1094, "lr": 2.4608621554899362e-05, "epoch": 2.7940074906367043, "percentage": 55.82, "elapsed_time": "0:36:16", "remaining_time": "0:28:42", "throughput": 9394.72, "total_tokens": 20448288}
|
||||
{"current_steps": 188, "total_steps": 335, "loss": 0.094, "lr": 2.4347749963178486e-05, "epoch": 2.808988764044944, "percentage": 56.12, "elapsed_time": "0:36:28", "remaining_time": "0:28:30", "throughput": 9392.91, "total_tokens": 20552120}
|
||||
{"current_steps": 189, "total_steps": 335, "loss": 0.0948, "lr": 2.4086949423558526e-05, "epoch": 2.8239700374531838, "percentage": 56.42, "elapsed_time": "0:36:39", "remaining_time": "0:28:19", "throughput": 9394.24, "total_tokens": 20664640}
|
||||
{"current_steps": 190, "total_steps": 335, "loss": 0.0838, "lr": 2.3826248346043663e-05, "epoch": 2.8389513108614235, "percentage": 56.72, "elapsed_time": "0:36:51", "remaining_time": "0:28:07", "throughput": 9395.91, "total_tokens": 20777328}
|
||||
{"current_steps": 191, "total_steps": 335, "loss": 0.1071, "lr": 2.356567512980326e-05, "epoch": 2.853932584269663, "percentage": 57.01, "elapsed_time": "0:37:02", "remaining_time": "0:27:55", "throughput": 9399.79, "total_tokens": 20895424}
|
||||
{"current_steps": 192, "total_steps": 335, "loss": 0.0939, "lr": 2.3305258160078274e-05, "epoch": 2.8689138576779025, "percentage": 57.31, "elapsed_time": "0:37:14", "remaining_time": "0:27:44", "throughput": 9401.52, "total_tokens": 21007912}
|
||||
{"current_steps": 193, "total_steps": 335, "loss": 0.1093, "lr": 2.3045025805089118e-05, "epoch": 2.8838951310861423, "percentage": 57.61, "elapsed_time": "0:37:25", "remaining_time": "0:27:32", "throughput": 9401.46, "total_tokens": 21112424}
|
||||
{"current_steps": 194, "total_steps": 335, "loss": 0.1156, "lr": 2.278500641294543e-05, "epoch": 2.898876404494382, "percentage": 57.91, "elapsed_time": "0:37:36", "remaining_time": "0:27:20", "throughput": 9403.18, "total_tokens": 21221136}
|
||||
{"current_steps": 195, "total_steps": 335, "loss": 0.0693, "lr": 2.252522830855798e-05, "epoch": 2.9138576779026217, "percentage": 58.21, "elapsed_time": "0:37:48", "remaining_time": "0:27:08", "throughput": 9403.81, "total_tokens": 21331720}
|
||||
{"current_steps": 196, "total_steps": 335, "loss": 0.0907, "lr": 2.2265719790553147e-05, "epoch": 2.9288389513108615, "percentage": 58.51, "elapsed_time": "0:37:59", "remaining_time": "0:26:56", "throughput": 9406.82, "total_tokens": 21447512}
|
||||
{"current_steps": 197, "total_steps": 335, "loss": 0.0821, "lr": 2.2006509128190195e-05, "epoch": 2.943820224719101, "percentage": 58.81, "elapsed_time": "0:38:11", "remaining_time": "0:26:45", "throughput": 9405.24, "total_tokens": 21553192}
|
||||
{"current_steps": 198, "total_steps": 335, "loss": 0.1252, "lr": 2.174762455828187e-05, "epoch": 2.958801498127341, "percentage": 59.1, "elapsed_time": "0:38:22", "remaining_time": "0:26:33", "throughput": 9403.48, "total_tokens": 21655488}
|
||||
{"current_steps": 199, "total_steps": 335, "loss": 0.0859, "lr": 2.1489094282118395e-05, "epoch": 2.9737827715355807, "percentage": 59.4, "elapsed_time": "0:38:34", "remaining_time": "0:26:21", "throughput": 9405.0, "total_tokens": 21767256}
|
||||
{"current_steps": 200, "total_steps": 335, "loss": 0.1024, "lr": 2.123094646239541e-05, "epoch": 2.98876404494382, "percentage": 59.7, "elapsed_time": "0:38:45", "remaining_time": "0:26:09", "throughput": 9407.0, "total_tokens": 21879928}
|
||||
{"current_steps": 200, "total_steps": 335, "eval_loss": 0.1642482578754425, "epoch": 2.98876404494382, "percentage": 59.7, "elapsed_time": "0:38:50", "remaining_time": "0:26:13", "throughput": 9387.14, "total_tokens": 21879928}
|
||||
{"current_steps": 201, "total_steps": 335, "loss": 0.1114, "lr": 2.0973209220146135e-05, "epoch": 3.0, "percentage": 60.0, "elapsed_time": "0:39:00", "remaining_time": "0:26:00", "throughput": 9382.92, "total_tokens": 21962520}
|
||||
{"current_steps": 202, "total_steps": 335, "loss": 0.0762, "lr": 2.0715910631677968e-05, "epoch": 3.0149812734082397, "percentage": 60.3, "elapsed_time": "0:39:12", "remaining_time": "0:25:48", "throughput": 9380.46, "total_tokens": 22064872}
|
||||
{"current_steps": 203, "total_steps": 335, "loss": 0.0883, "lr": 2.0459078725514092e-05, "epoch": 3.0299625468164795, "percentage": 60.6, "elapsed_time": "0:39:23", "remaining_time": "0:25:36", "throughput": 9381.48, "total_tokens": 22169728}
|
||||
{"current_steps": 204, "total_steps": 335, "loss": 0.0756, "lr": 2.020274147934019e-05, "epoch": 3.044943820224719, "percentage": 60.9, "elapsed_time": "0:39:34", "remaining_time": "0:25:24", "throughput": 9384.55, "total_tokens": 22285928}
|
||||
{"current_steps": 205, "total_steps": 335, "loss": 0.0887, "lr": 1.9946926816956743e-05, "epoch": 3.059925093632959, "percentage": 61.19, "elapsed_time": "0:39:45", "remaining_time": "0:25:12", "throughput": 9383.39, "total_tokens": 22387040}
|
||||
{"current_steps": 206, "total_steps": 335, "loss": 0.0926, "lr": 1.9691662605237166e-05, "epoch": 3.0749063670411987, "percentage": 61.49, "elapsed_time": "0:39:57", "remaining_time": "0:25:01", "throughput": 9385.8, "total_tokens": 22498720}
|
||||
{"current_steps": 207, "total_steps": 335, "loss": 0.1224, "lr": 1.9436976651092144e-05, "epoch": 3.0898876404494384, "percentage": 61.79, "elapsed_time": "0:40:08", "remaining_time": "0:24:49", "throughput": 9391.28, "total_tokens": 22621072}
|
||||
{"current_steps": 208, "total_steps": 335, "loss": 0.0856, "lr": 1.9182896698440584e-05, "epoch": 3.1048689138576777, "percentage": 62.09, "elapsed_time": "0:40:20", "remaining_time": "0:24:37", "throughput": 9389.43, "total_tokens": 22724704}
|
||||
{"current_steps": 209, "total_steps": 335, "loss": 0.0621, "lr": 1.89294504251873e-05, "epoch": 3.1198501872659175, "percentage": 62.39, "elapsed_time": "0:40:31", "remaining_time": "0:24:26", "throughput": 9391.65, "total_tokens": 22838936}
|
||||
{"current_steps": 210, "total_steps": 335, "loss": 0.1196, "lr": 1.867666544020798e-05, "epoch": 3.134831460674157, "percentage": 62.69, "elapsed_time": "0:40:43", "remaining_time": "0:24:14", "throughput": 9389.51, "total_tokens": 22939008}
|
||||
{"current_steps": 211, "total_steps": 335, "loss": 0.1071, "lr": 1.8424569280341653e-05, "epoch": 3.149812734082397, "percentage": 62.99, "elapsed_time": "0:40:54", "remaining_time": "0:24:02", "throughput": 9391.99, "total_tokens": 23054112}
|
||||
{"current_steps": 212, "total_steps": 335, "loss": 0.0932, "lr": 1.817318940739098e-05, "epoch": 3.1647940074906367, "percentage": 63.28, "elapsed_time": "0:41:06", "remaining_time": "0:23:50", "throughput": 9389.71, "total_tokens": 23156632}
|
||||
{"current_steps": 213, "total_steps": 335, "loss": 0.0792, "lr": 1.7922553205130707e-05, "epoch": 3.1797752808988764, "percentage": 63.58, "elapsed_time": "0:41:17", "remaining_time": "0:23:39", "throughput": 9392.23, "total_tokens": 23271912}
|
||||
{"current_steps": 214, "total_steps": 335, "loss": 0.0513, "lr": 1.767268797632472e-05, "epoch": 3.194756554307116, "percentage": 63.88, "elapsed_time": "0:41:29", "remaining_time": "0:23:27", "throughput": 9392.66, "total_tokens": 23381816}
|
||||
{"current_steps": 215, "total_steps": 335, "loss": 0.0903, "lr": 1.7423620939751788e-05, "epoch": 3.209737827715356, "percentage": 64.18, "elapsed_time": "0:41:40", "remaining_time": "0:23:15", "throughput": 9392.39, "total_tokens": 23489552}
|
||||
{"current_steps": 216, "total_steps": 335, "loss": 0.0763, "lr": 1.7175379227240523e-05, "epoch": 3.2247191011235956, "percentage": 64.48, "elapsed_time": "0:41:52", "remaining_time": "0:23:04", "throughput": 9393.94, "total_tokens": 23602136}
|
||||
{"current_steps": 217, "total_steps": 335, "loss": 0.0656, "lr": 1.692798988071385e-05, "epoch": 3.2397003745318353, "percentage": 64.78, "elapsed_time": "0:42:03", "remaining_time": "0:22:52", "throughput": 9392.32, "total_tokens": 23705952}
|
||||
{"current_steps": 218, "total_steps": 335, "loss": 0.1015, "lr": 1.6681479849243153e-05, "epoch": 3.254681647940075, "percentage": 65.07, "elapsed_time": "0:42:15", "remaining_time": "0:22:40", "throughput": 9395.14, "total_tokens": 23821824}
|
||||
{"current_steps": 219, "total_steps": 335, "loss": 0.1126, "lr": 1.6435875986112685e-05, "epoch": 3.2696629213483144, "percentage": 65.37, "elapsed_time": "0:42:27", "remaining_time": "0:22:29", "throughput": 9396.38, "total_tokens": 23933400}
|
||||
{"current_steps": 220, "total_steps": 335, "loss": 0.0704, "lr": 1.6191205045894283e-05, "epoch": 3.284644194756554, "percentage": 65.67, "elapsed_time": "0:42:38", "remaining_time": "0:22:17", "throughput": 9397.5, "total_tokens": 24044912}
|
||||
{"current_steps": 221, "total_steps": 335, "loss": 0.0695, "lr": 1.594749368153292e-05, "epoch": 3.299625468164794, "percentage": 65.97, "elapsed_time": "0:42:50", "remaining_time": "0:22:05", "throughput": 9402.14, "total_tokens": 24165512}
|
||||
{"current_steps": 222, "total_steps": 335, "loss": 0.0775, "lr": 1.570476844144329e-05, "epoch": 3.3146067415730336, "percentage": 66.27, "elapsed_time": "0:43:01", "remaining_time": "0:21:54", "throughput": 9399.66, "total_tokens": 24265384}
|
||||
{"current_steps": 223, "total_steps": 335, "loss": 0.0852, "lr": 1.546305576661776e-05, "epoch": 3.3295880149812733, "percentage": 66.57, "elapsed_time": "0:43:13", "remaining_time": "0:21:42", "throughput": 9399.1, "total_tokens": 24373048}
|
||||
{"current_steps": 224, "total_steps": 335, "loss": 0.0791, "lr": 1.5222381987746104e-05, "epoch": 3.344569288389513, "percentage": 66.87, "elapsed_time": "0:43:24", "remaining_time": "0:21:30", "throughput": 9399.95, "total_tokens": 24483840}
|
||||
{"current_steps": 225, "total_steps": 335, "loss": 0.0617, "lr": 1.4982773322347144e-05, "epoch": 3.359550561797753, "percentage": 67.16, "elapsed_time": "0:43:36", "remaining_time": "0:21:19", "throughput": 9399.39, "total_tokens": 24591096}
|
||||
{"current_steps": 225, "total_steps": 335, "eval_loss": 0.1583455204963684, "epoch": 3.359550561797753, "percentage": 67.16, "elapsed_time": "0:43:41", "remaining_time": "0:21:21", "throughput": 9381.69, "total_tokens": 24591096}
|
||||
{"current_steps": 226, "total_steps": 335, "loss": 0.0616, "lr": 1.4744255871912823e-05, "epoch": 3.3745318352059925, "percentage": 67.46, "elapsed_time": "0:43:52", "remaining_time": "0:21:09", "throughput": 9380.09, "total_tokens": 24690968}
|
||||
{"current_steps": 227, "total_steps": 335, "loss": 0.0903, "lr": 1.4506855619064846e-05, "epoch": 3.3895131086142323, "percentage": 67.76, "elapsed_time": "0:44:03", "remaining_time": "0:20:57", "throughput": 9380.74, "total_tokens": 24799096}
|
||||
{"current_steps": 228, "total_steps": 335, "loss": 0.0394, "lr": 1.4270598424724292e-05, "epoch": 3.404494382022472, "percentage": 68.06, "elapsed_time": "0:44:15", "remaining_time": "0:20:46", "throughput": 9381.52, "total_tokens": 24909896}
|
||||
{"current_steps": 229, "total_steps": 335, "loss": 0.0985, "lr": 1.4035510025294462e-05, "epoch": 3.4194756554307117, "percentage": 68.36, "elapsed_time": "0:44:26", "remaining_time": "0:20:34", "throughput": 9381.72, "total_tokens": 25020096}
|
||||
{"current_steps": 230, "total_steps": 335, "loss": 0.0929, "lr": 1.3801616029857378e-05, "epoch": 3.4344569288389515, "percentage": 68.66, "elapsed_time": "0:44:38", "remaining_time": "0:20:22", "throughput": 9383.87, "total_tokens": 25134904}
|
||||
{"current_steps": 231, "total_steps": 335, "loss": 0.0724, "lr": 1.3568941917384036e-05, "epoch": 3.449438202247191, "percentage": 68.96, "elapsed_time": "0:44:49", "remaining_time": "0:20:11", "throughput": 9382.56, "total_tokens": 25238032}
|
||||
{"current_steps": 232, "total_steps": 335, "loss": 0.0646, "lr": 1.3337513033958904e-05, "epoch": 3.464419475655431, "percentage": 69.25, "elapsed_time": "0:45:01", "remaining_time": "0:19:59", "throughput": 9382.3, "total_tokens": 25346080}
|
||||
{"current_steps": 233, "total_steps": 335, "loss": 0.0783, "lr": 1.310735459001884e-05, "epoch": 3.4794007490636703, "percentage": 69.55, "elapsed_time": "0:45:12", "remaining_time": "0:19:47", "throughput": 9383.26, "total_tokens": 25456760}
|
||||
{"current_steps": 234, "total_steps": 335, "loss": 0.0632, "lr": 1.2878491657606872e-05, "epoch": 3.49438202247191, "percentage": 69.85, "elapsed_time": "0:45:24", "remaining_time": "0:19:35", "throughput": 9384.81, "total_tokens": 25565392}
|
||||
{"current_steps": 235, "total_steps": 335, "loss": 0.0887, "lr": 1.2650949167640997e-05, "epoch": 3.5093632958801497, "percentage": 70.15, "elapsed_time": "0:45:35", "remaining_time": "0:19:24", "throughput": 9386.3, "total_tokens": 25678520}
|
||||
{"current_steps": 236, "total_steps": 335, "loss": 0.094, "lr": 1.2424751907198312e-05, "epoch": 3.5243445692883895, "percentage": 70.45, "elapsed_time": "0:45:47", "remaining_time": "0:19:12", "throughput": 9387.25, "total_tokens": 25789432}
|
||||
{"current_steps": 237, "total_steps": 335, "loss": 0.0623, "lr": 1.2199924516814939e-05, "epoch": 3.539325842696629, "percentage": 70.75, "elapsed_time": "0:45:58", "remaining_time": "0:19:00", "throughput": 9385.82, "total_tokens": 25893768}
|
||||
{"current_steps": 238, "total_steps": 335, "loss": 0.1051, "lr": 1.1976491487801748e-05, "epoch": 3.554307116104869, "percentage": 71.04, "elapsed_time": "0:46:10", "remaining_time": "0:18:49", "throughput": 9387.15, "total_tokens": 26005272}
|
||||
{"current_steps": 239, "total_steps": 335, "loss": 0.069, "lr": 1.1754477159576499e-05, "epoch": 3.5692883895131087, "percentage": 71.34, "elapsed_time": "0:46:21", "remaining_time": "0:18:37", "throughput": 9386.86, "total_tokens": 26112160}
|
||||
{"current_steps": 240, "total_steps": 335, "loss": 0.0561, "lr": 1.1533905717012424e-05, "epoch": 3.5842696629213484, "percentage": 71.64, "elapsed_time": "0:46:33", "remaining_time": "0:18:25", "throughput": 9389.35, "total_tokens": 26227496}
|
||||
{"current_steps": 241, "total_steps": 335, "loss": 0.0824, "lr": 1.1314801187803686e-05, "epoch": 3.599250936329588, "percentage": 71.94, "elapsed_time": "0:46:44", "remaining_time": "0:18:13", "throughput": 9386.62, "total_tokens": 26323944}
|
||||
{"current_steps": 242, "total_steps": 335, "loss": 0.083, "lr": 1.1097187439847939e-05, "epoch": 3.6142322097378274, "percentage": 72.24, "elapsed_time": "0:46:55", "remaining_time": "0:18:01", "throughput": 9385.38, "total_tokens": 26423816}
|
||||
{"current_steps": 243, "total_steps": 335, "loss": 0.0969, "lr": 1.088108817864629e-05, "epoch": 3.629213483146067, "percentage": 72.54, "elapsed_time": "0:47:07", "remaining_time": "0:17:50", "throughput": 9384.46, "total_tokens": 26530000}
|
||||
{"current_steps": 244, "total_steps": 335, "loss": 0.0487, "lr": 1.0666526944721016e-05, "epoch": 3.644194756554307, "percentage": 72.84, "elapsed_time": "0:47:18", "remaining_time": "0:17:38", "throughput": 9385.11, "total_tokens": 26639920}
|
||||
{"current_steps": 245, "total_steps": 335, "loss": 0.0861, "lr": 1.0453527111051184e-05, "epoch": 3.6591760299625467, "percentage": 73.13, "elapsed_time": "0:47:30", "remaining_time": "0:17:26", "throughput": 9387.82, "total_tokens": 26755952}
|
||||
{"current_steps": 246, "total_steps": 335, "loss": 0.0879, "lr": 1.0242111880526495e-05, "epoch": 3.6741573033707864, "percentage": 73.43, "elapsed_time": "0:47:41", "remaining_time": "0:17:15", "throughput": 9389.07, "total_tokens": 26867776}
|
||||
{"current_steps": 247, "total_steps": 335, "loss": 0.081, "lr": 1.003230428341979e-05, "epoch": 3.689138576779026, "percentage": 73.73, "elapsed_time": "0:47:53", "remaining_time": "0:17:03", "throughput": 9388.89, "total_tokens": 26975080}
|
||||
{"current_steps": 248, "total_steps": 335, "loss": 0.0758, "lr": 9.824127174878195e-06, "epoch": 3.704119850187266, "percentage": 74.03, "elapsed_time": "0:48:04", "remaining_time": "0:16:51", "throughput": 9390.54, "total_tokens": 27088208}
|
||||
{"current_steps": 249, "total_steps": 335, "loss": 0.1284, "lr": 9.617603232433475e-06, "epoch": 3.7191011235955056, "percentage": 74.33, "elapsed_time": "0:48:16", "remaining_time": "0:16:40", "throughput": 9391.56, "total_tokens": 27199040}
|
||||
{"current_steps": 250, "total_steps": 335, "loss": 0.0883, "lr": 9.412754953531663e-06, "epoch": 3.7340823970037453, "percentage": 74.63, "elapsed_time": "0:48:27", "remaining_time": "0:16:28", "throughput": 9391.3, "total_tokens": 27307192}
|
||||
{"current_steps": 250, "total_steps": 335, "eval_loss": 0.15280824899673462, "epoch": 3.7340823970037453, "percentage": 74.63, "elapsed_time": "0:48:32", "remaining_time": "0:16:30", "throughput": 9375.39, "total_tokens": 27307192}
|
||||
{"current_steps": 251, "total_steps": 335, "loss": 0.0618, "lr": 9.209604653082326e-06, "epoch": 3.749063670411985, "percentage": 74.93, "elapsed_time": "0:48:44", "remaining_time": "0:16:18", "throughput": 9377.03, "total_tokens": 27419216}
|
||||
{"current_steps": 252, "total_steps": 335, "loss": 0.0664, "lr": 9.008174461027724e-06, "epoch": 3.764044943820225, "percentage": 75.22, "elapsed_time": "0:48:55", "remaining_time": "0:16:06", "throughput": 9379.42, "total_tokens": 27534416}
|
||||
{"current_steps": 253, "total_steps": 335, "loss": 0.0691, "lr": 8.808486319932083e-06, "epoch": 3.7790262172284645, "percentage": 75.52, "elapsed_time": "0:49:07", "remaining_time": "0:15:55", "throughput": 9381.83, "total_tokens": 27650456}
|
||||
{"current_steps": 254, "total_steps": 335, "loss": 0.1072, "lr": 8.610561982591357e-06, "epoch": 3.7940074906367043, "percentage": 75.82, "elapsed_time": "0:49:18", "remaining_time": "0:15:43", "throughput": 9384.22, "total_tokens": 27766296}
|
||||
{"current_steps": 255, "total_steps": 335, "loss": 0.1113, "lr": 8.414423009663563e-06, "epoch": 3.808988764044944, "percentage": 76.12, "elapsed_time": "0:49:30", "remaining_time": "0:15:31", "throughput": 9385.2, "total_tokens": 27877960}
|
||||
{"current_steps": 256, "total_steps": 335, "loss": 0.0787, "lr": 8.220090767320137e-06, "epoch": 3.8239700374531838, "percentage": 76.42, "elapsed_time": "0:49:41", "remaining_time": "0:15:20", "throughput": 9387.25, "total_tokens": 27992400}
|
||||
{"current_steps": 257, "total_steps": 335, "loss": 0.0436, "lr": 8.027586424918412e-06, "epoch": 3.8389513108614235, "percentage": 76.72, "elapsed_time": "0:49:53", "remaining_time": "0:15:08", "throughput": 9386.94, "total_tokens": 28099232}
|
||||
{"current_steps": 258, "total_steps": 335, "loss": 0.0761, "lr": 7.836930952695533e-06, "epoch": 3.853932584269663, "percentage": 77.01, "elapsed_time": "0:50:04", "remaining_time": "0:14:56", "throughput": 9388.64, "total_tokens": 28212712}
|
||||
{"current_steps": 259, "total_steps": 335, "loss": 0.0876, "lr": 7.648145119484152e-06, "epoch": 3.8689138576779025, "percentage": 77.31, "elapsed_time": "0:50:16", "remaining_time": "0:14:45", "throughput": 9391.0, "total_tokens": 28327232}
|
||||
{"current_steps": 260, "total_steps": 335, "loss": 0.0689, "lr": 7.461249490449954e-06, "epoch": 3.8838951310861423, "percentage": 77.61, "elapsed_time": "0:50:28", "remaining_time": "0:14:33", "throughput": 9393.33, "total_tokens": 28444136}
|
||||
{"current_steps": 261, "total_steps": 335, "loss": 0.0934, "lr": 7.2762644248514255e-06, "epoch": 3.898876404494382, "percentage": 77.91, "elapsed_time": "0:50:39", "remaining_time": "0:14:21", "throughput": 9393.84, "total_tokens": 28553608}
|
||||
{"current_steps": 262, "total_steps": 335, "loss": 0.0616, "lr": 7.0932100738220265e-06, "epoch": 3.9138576779026217, "percentage": 78.21, "elapsed_time": "0:50:51", "remaining_time": "0:14:10", "throughput": 9391.95, "total_tokens": 28655944}
|
||||
{"current_steps": 263, "total_steps": 335, "loss": 0.0505, "lr": 6.912106378175098e-06, "epoch": 3.9288389513108615, "percentage": 78.51, "elapsed_time": "0:51:02", "remaining_time": "0:13:58", "throughput": 9393.85, "total_tokens": 28770240}
|
||||
{"current_steps": 264, "total_steps": 335, "loss": 0.0716, "lr": 6.732973066231563e-06, "epoch": 3.943820224719101, "percentage": 78.81, "elapsed_time": "0:51:14", "remaining_time": "0:13:46", "throughput": 9394.36, "total_tokens": 28879896}
|
||||
{"current_steps": 265, "total_steps": 335, "loss": 0.0925, "lr": 6.555829651670911e-06, "epoch": 3.958801498127341, "percentage": 79.1, "elapsed_time": "0:51:25", "remaining_time": "0:13:35", "throughput": 9392.0, "total_tokens": 28979616}
|
||||
{"current_steps": 266, "total_steps": 335, "loss": 0.082, "lr": 6.380695431405453e-06, "epoch": 3.9737827715355807, "percentage": 79.4, "elapsed_time": "0:51:37", "remaining_time": "0:13:23", "throughput": 9394.61, "total_tokens": 29095336}
|
||||
{"current_steps": 267, "total_steps": 335, "loss": 0.1735, "lr": 6.207589483478266e-06, "epoch": 3.98876404494382, "percentage": 79.7, "elapsed_time": "0:51:48", "remaining_time": "0:13:11", "throughput": 9393.51, "total_tokens": 29200208}
|
||||
{"current_steps": 268, "total_steps": 335, "loss": 0.0554, "lr": 6.0365306649849214e-06, "epoch": 4.0, "percentage": 80.0, "elapsed_time": "0:51:58", "remaining_time": "0:12:59", "throughput": 9390.22, "total_tokens": 29282608}
|
||||
{"current_steps": 269, "total_steps": 335, "loss": 0.0374, "lr": 5.867537610019317e-06, "epoch": 4.01498127340824, "percentage": 80.3, "elapsed_time": "0:52:09", "remaining_time": "0:12:47", "throughput": 9390.42, "total_tokens": 29391848}
|
||||
{"current_steps": 270, "total_steps": 335, "loss": 0.0644, "lr": 5.700628727643806e-06, "epoch": 4.0299625468164795, "percentage": 80.6, "elapsed_time": "0:52:21", "remaining_time": "0:12:36", "throughput": 9392.65, "total_tokens": 29507360}
|
||||
{"current_steps": 271, "total_steps": 335, "loss": 0.0621, "lr": 5.53582219988382e-06, "epoch": 4.044943820224719, "percentage": 80.9, "elapsed_time": "0:52:33", "remaining_time": "0:12:24", "throughput": 9390.32, "total_tokens": 29607936}
|
||||
{"current_steps": 272, "total_steps": 335, "loss": 0.0525, "lr": 5.373135979747227e-06, "epoch": 4.059925093632959, "percentage": 81.19, "elapsed_time": "0:52:44", "remaining_time": "0:12:12", "throughput": 9389.68, "total_tokens": 29710240}
|
||||
{"current_steps": 273, "total_steps": 335, "loss": 0.072, "lr": 5.2125877892686496e-06, "epoch": 4.074906367041199, "percentage": 81.49, "elapsed_time": "0:52:55", "remaining_time": "0:12:01", "throughput": 9390.09, "total_tokens": 29819600}
|
||||
{"current_steps": 274, "total_steps": 335, "loss": 0.1253, "lr": 5.054195117578914e-06, "epoch": 4.089887640449438, "percentage": 81.79, "elapsed_time": "0:53:07", "remaining_time": "0:11:49", "throughput": 9390.34, "total_tokens": 29927712}
|
||||
{"current_steps": 275, "total_steps": 335, "loss": 0.0516, "lr": 4.897975218999926e-06, "epoch": 4.104868913857678, "percentage": 82.09, "elapsed_time": "0:53:18", "remaining_time": "0:11:37", "throughput": 9390.6, "total_tokens": 30036912}
|
||||
{"current_steps": 275, "total_steps": 335, "eval_loss": 0.148418128490448, "epoch": 4.104868913857678, "percentage": 82.09, "elapsed_time": "0:53:23", "remaining_time": "0:11:38", "throughput": 9376.15, "total_tokens": 30036912}
|
||||
{"current_steps": 276, "total_steps": 335, "loss": 0.0597, "lr": 4.743945111165068e-06, "epoch": 4.119850187265918, "percentage": 82.39, "elapsed_time": "0:53:35", "remaining_time": "0:11:27", "throughput": 9375.27, "total_tokens": 30142632}
|
||||
{"current_steps": 277, "total_steps": 335, "loss": 0.0481, "lr": 4.592121573165414e-06, "epoch": 4.134831460674158, "percentage": 82.69, "elapsed_time": "0:53:46", "remaining_time": "0:11:15", "throughput": 9374.82, "total_tokens": 30249816}
|
||||
{"current_steps": 278, "total_steps": 335, "loss": 0.0528, "lr": 4.442521143721892e-06, "epoch": 4.149812734082397, "percentage": 82.99, "elapsed_time": "0:53:58", "remaining_time": "0:11:03", "throughput": 9375.52, "total_tokens": 30360248}
|
||||
{"current_steps": 279, "total_steps": 335, "loss": 0.0558, "lr": 4.295160119383712e-06, "epoch": 4.164794007490637, "percentage": 83.28, "elapsed_time": "0:54:09", "remaining_time": "0:10:52", "throughput": 9375.14, "total_tokens": 30466592}
|
||||
{"current_steps": 280, "total_steps": 335, "loss": 0.0739, "lr": 4.150054552753055e-06, "epoch": 4.179775280898877, "percentage": 83.58, "elapsed_time": "0:54:21", "remaining_time": "0:10:40", "throughput": 9373.16, "total_tokens": 30567952}
|
||||
{"current_steps": 281, "total_steps": 335, "loss": 0.059, "lr": 4.007220250736454e-06, "epoch": 4.194756554307116, "percentage": 83.88, "elapsed_time": "0:54:32", "remaining_time": "0:10:28", "throughput": 9372.82, "total_tokens": 30674984}
|
||||
{"current_steps": 282, "total_steps": 335, "loss": 0.0275, "lr": 3.866672772822863e-06, "epoch": 4.209737827715355, "percentage": 84.18, "elapsed_time": "0:54:44", "remaining_time": "0:10:17", "throughput": 9375.22, "total_tokens": 30791864}
|
||||
{"current_steps": 283, "total_steps": 335, "loss": 0.041, "lr": 3.728427429388709e-06, "epoch": 4.224719101123595, "percentage": 84.48, "elapsed_time": "0:54:56", "remaining_time": "0:10:05", "throughput": 9377.5, "total_tokens": 30908384}
|
||||
{"current_steps": 284, "total_steps": 335, "loss": 0.0492, "lr": 3.592499280030057e-06, "epoch": 4.239700374531835, "percentage": 84.78, "elapsed_time": "0:55:07", "remaining_time": "0:09:53", "throughput": 9379.52, "total_tokens": 31023848}
|
||||
{"current_steps": 285, "total_steps": 335, "loss": 0.0555, "lr": 3.458903131922134e-06, "epoch": 4.254681647940075, "percentage": 85.07, "elapsed_time": "0:55:19", "remaining_time": "0:09:42", "throughput": 9380.89, "total_tokens": 31137384}
|
||||
{"current_steps": 286, "total_steps": 335, "loss": 0.0493, "lr": 3.3276535382063183e-06, "epoch": 4.269662921348314, "percentage": 85.37, "elapsed_time": "0:55:30", "remaining_time": "0:09:30", "throughput": 9380.65, "total_tokens": 31244936}
|
||||
{"current_steps": 287, "total_steps": 335, "loss": 0.0492, "lr": 3.198764796404807e-06, "epoch": 4.284644194756554, "percentage": 85.67, "elapsed_time": "0:55:42", "remaining_time": "0:09:18", "throughput": 9381.5, "total_tokens": 31355616}
|
||||
{"current_steps": 288, "total_steps": 335, "loss": 0.0649, "lr": 3.0722509468631392e-06, "epoch": 4.299625468164794, "percentage": 85.97, "elapsed_time": "0:55:53", "remaining_time": "0:09:07", "throughput": 9382.0, "total_tokens": 31463648}
|
||||
{"current_steps": 289, "total_steps": 335, "loss": 0.0481, "lr": 2.948125771220697e-06, "epoch": 4.314606741573034, "percentage": 86.27, "elapsed_time": "0:56:05", "remaining_time": "0:08:55", "throughput": 9383.06, "total_tokens": 31577056}
|
||||
{"current_steps": 290, "total_steps": 335, "loss": 0.0455, "lr": 2.8264027909094715e-06, "epoch": 4.329588014981273, "percentage": 86.57, "elapsed_time": "0:56:16", "remaining_time": "0:08:43", "throughput": 9382.39, "total_tokens": 31682424}
|
||||
{"current_steps": 291, "total_steps": 335, "loss": 0.0588, "lr": 2.707095265681081e-06, "epoch": 4.344569288389513, "percentage": 86.87, "elapsed_time": "0:56:28", "remaining_time": "0:08:32", "throughput": 9382.37, "total_tokens": 31790168}
|
||||
{"current_steps": 292, "total_steps": 335, "loss": 0.0553, "lr": 2.5902161921623454e-06, "epoch": 4.359550561797753, "percentage": 87.16, "elapsed_time": "0:56:39", "remaining_time": "0:08:20", "throughput": 9384.36, "total_tokens": 31905520}
|
||||
{"current_steps": 293, "total_steps": 335, "loss": 0.0452, "lr": 2.475778302439524e-06, "epoch": 4.3745318352059925, "percentage": 87.46, "elapsed_time": "0:56:51", "remaining_time": "0:08:09", "throughput": 9385.92, "total_tokens": 32020200}
|
||||
{"current_steps": 294, "total_steps": 335, "loss": 0.0707, "lr": 2.3637940626713346e-06, "epoch": 4.389513108614232, "percentage": 87.76, "elapsed_time": "0:57:02", "remaining_time": "0:07:57", "throughput": 9386.46, "total_tokens": 32129744}
|
||||
{"current_steps": 295, "total_steps": 335, "loss": 0.0611, "lr": 2.254275671731007e-06, "epoch": 4.404494382022472, "percentage": 88.06, "elapsed_time": "0:57:14", "remaining_time": "0:07:45", "throughput": 9388.99, "total_tokens": 32247024}
|
||||
{"current_steps": 296, "total_steps": 335, "loss": 0.058, "lr": 2.14723505987737e-06, "epoch": 4.419475655430712, "percentage": 88.36, "elapsed_time": "0:57:26", "remaining_time": "0:07:34", "throughput": 9390.79, "total_tokens": 32361392}
|
||||
{"current_steps": 297, "total_steps": 335, "loss": 0.0571, "lr": 2.0426838874552714e-06, "epoch": 4.4344569288389515, "percentage": 88.66, "elapsed_time": "0:57:37", "remaining_time": "0:07:22", "throughput": 9390.72, "total_tokens": 32469248}
|
||||
{"current_steps": 298, "total_steps": 335, "loss": 0.0364, "lr": 1.9406335436253724e-06, "epoch": 4.449438202247191, "percentage": 88.96, "elapsed_time": "0:57:49", "remaining_time": "0:07:10", "throughput": 9391.9, "total_tokens": 32582736}
|
||||
{"current_steps": 299, "total_steps": 335, "loss": 0.034, "lr": 1.8410951451234533e-06, "epoch": 4.464419475655431, "percentage": 89.25, "elapsed_time": "0:58:00", "remaining_time": "0:06:59", "throughput": 9392.47, "total_tokens": 32691704}
|
||||
{"current_steps": 300, "total_steps": 335, "loss": 0.0675, "lr": 1.7440795350494588e-06, "epoch": 4.479400749063671, "percentage": 89.55, "elapsed_time": "0:58:12", "remaining_time": "0:06:47", "throughput": 9394.63, "total_tokens": 32807520}
|
||||
{"current_steps": 300, "total_steps": 335, "eval_loss": 0.14898425340652466, "epoch": 4.479400749063671, "percentage": 89.55, "elapsed_time": "0:58:17", "remaining_time": "0:06:47", "throughput": 9381.38, "total_tokens": 32807520}
|
||||
{"current_steps": 301, "total_steps": 335, "loss": 0.0563, "lr": 1.649597281686302e-06, "epoch": 4.49438202247191, "percentage": 89.85, "elapsed_time": "0:58:28", "remaining_time": "0:06:36", "throughput": 9382.01, "total_tokens": 32917472}
|
||||
{"current_steps": 302, "total_steps": 335, "loss": 0.0582, "lr": 1.5576586773486195e-06, "epoch": 4.50936329588015, "percentage": 90.15, "elapsed_time": "0:58:39", "remaining_time": "0:06:24", "throughput": 9382.63, "total_tokens": 33026552}
|
||||
{"current_steps": 303, "total_steps": 335, "loss": 0.048, "lr": 1.4682737372615967e-06, "epoch": 4.52434456928839, "percentage": 90.45, "elapsed_time": "0:58:51", "remaining_time": "0:06:12", "throughput": 9383.41, "total_tokens": 33135312}
|
||||
{"current_steps": 304, "total_steps": 335, "loss": 0.0556, "lr": 1.3814521984699596e-06, "epoch": 4.539325842696629, "percentage": 90.75, "elapsed_time": "0:59:02", "remaining_time": "0:06:01", "throughput": 9385.02, "total_tokens": 33249640}
|
||||
{"current_steps": 305, "total_steps": 335, "loss": 0.0427, "lr": 1.297203518777293e-06, "epoch": 4.554307116104869, "percentage": 91.04, "elapsed_time": "0:59:14", "remaining_time": "0:05:49", "throughput": 9385.22, "total_tokens": 33356584}
|
||||
{"current_steps": 306, "total_steps": 335, "loss": 0.095, "lr": 1.2155368757157643e-06, "epoch": 4.569288389513108, "percentage": 91.34, "elapsed_time": "0:59:25", "remaining_time": "0:05:37", "throughput": 9385.24, "total_tokens": 33465096}
|
||||
{"current_steps": 307, "total_steps": 335, "loss": 0.0329, "lr": 1.1364611655463736e-06, "epoch": 4.584269662921348, "percentage": 91.64, "elapsed_time": "0:59:37", "remaining_time": "0:05:26", "throughput": 9389.71, "total_tokens": 33589904}
|
||||
{"current_steps": 308, "total_steps": 335, "loss": 0.048, "lr": 1.0599850022898539e-06, "epoch": 4.599250936329588, "percentage": 91.94, "elapsed_time": "0:59:48", "remaining_time": "0:05:14", "throughput": 9388.76, "total_tokens": 33693528}
|
||||
{"current_steps": 309, "total_steps": 335, "loss": 0.0709, "lr": 9.861167167883046e-07, "epoch": 4.614232209737827, "percentage": 92.24, "elapsed_time": "1:00:00", "remaining_time": "0:05:02", "throughput": 9389.03, "total_tokens": 33800928}
|
||||
{"current_steps": 310, "total_steps": 335, "loss": 0.0807, "lr": 9.148643557976955e-07, "epoch": 4.629213483146067, "percentage": 92.54, "elapsed_time": "1:00:11", "remaining_time": "0:04:51", "throughput": 9388.49, "total_tokens": 33904464}
|
||||
{"current_steps": 311, "total_steps": 335, "loss": 0.0501, "lr": 8.462356811112987e-07, "epoch": 4.644194756554307, "percentage": 92.84, "elapsed_time": "1:00:22", "remaining_time": "0:04:39", "throughput": 9391.16, "total_tokens": 34020608}
|
||||
{"current_steps": 312, "total_steps": 335, "loss": 0.0499, "lr": 7.802381687141535e-07, "epoch": 4.659176029962547, "percentage": 93.13, "elapsed_time": "1:00:34", "remaining_time": "0:04:27", "throughput": 9391.31, "total_tokens": 34129480}
|
||||
{"current_steps": 313, "total_steps": 335, "loss": 0.086, "lr": 7.168790079686932e-07, "epoch": 4.674157303370786, "percentage": 93.43, "elapsed_time": "1:00:45", "remaining_time": "0:04:16", "throughput": 9389.83, "total_tokens": 34229672}
|
||||
{"current_steps": 314, "total_steps": 335, "loss": 0.0711, "lr": 6.561651008315738e-07, "epoch": 4.689138576779026, "percentage": 93.73, "elapsed_time": "1:00:56", "remaining_time": "0:04:04", "throughput": 9390.0, "total_tokens": 34335640}
|
||||
{"current_steps": 315, "total_steps": 335, "loss": 0.0417, "lr": 5.981030611018234e-07, "epoch": 4.704119850187266, "percentage": 94.03, "elapsed_time": "1:01:07", "remaining_time": "0:03:52", "throughput": 9387.62, "total_tokens": 34431984}
|
||||
{"current_steps": 316, "total_steps": 335, "loss": 0.0668, "lr": 5.426992137003622e-07, "epoch": 4.719101123595506, "percentage": 94.33, "elapsed_time": "1:01:19", "remaining_time": "0:03:41", "throughput": 9389.27, "total_tokens": 34547560}
|
||||
{"current_steps": 317, "total_steps": 335, "loss": 0.0582, "lr": 4.899595939810236e-07, "epoch": 4.734082397003745, "percentage": 94.63, "elapsed_time": "1:01:30", "remaining_time": "0:03:29", "throughput": 9389.25, "total_tokens": 34651384}
|
||||
{"current_steps": 318, "total_steps": 335, "loss": 0.0559, "lr": 4.398899470730827e-07, "epoch": 4.749063670411985, "percentage": 94.93, "elapsed_time": "1:01:42", "remaining_time": "0:03:17", "throughput": 9387.95, "total_tokens": 34759152}
|
||||
{"current_steps": 319, "total_steps": 335, "loss": 0.0529, "lr": 3.9249572725543196e-07, "epoch": 4.764044943820225, "percentage": 95.22, "elapsed_time": "1:01:54", "remaining_time": "0:03:06", "throughput": 9388.73, "total_tokens": 34874632}
|
||||
{"current_steps": 320, "total_steps": 335, "loss": 0.0524, "lr": 3.477820973624063e-07, "epoch": 4.7790262172284645, "percentage": 95.52, "elapsed_time": "1:02:06", "remaining_time": "0:02:54", "throughput": 9389.16, "total_tokens": 34988104}
|
||||
{"current_steps": 321, "total_steps": 335, "loss": 0.0521, "lr": 3.0575392822139726e-07, "epoch": 4.794007490636704, "percentage": 95.82, "elapsed_time": "1:02:18", "remaining_time": "0:02:43", "throughput": 9388.55, "total_tokens": 35096592}
|
||||
{"current_steps": 322, "total_steps": 335, "loss": 0.0796, "lr": 2.664157981222437e-07, "epoch": 4.808988764044944, "percentage": 96.12, "elapsed_time": "1:02:30", "remaining_time": "0:02:31", "throughput": 9389.32, "total_tokens": 35211304}
|
||||
{"current_steps": 323, "total_steps": 335, "loss": 0.0674, "lr": 2.297719923185032e-07, "epoch": 4.823970037453184, "percentage": 96.42, "elapsed_time": "1:02:41", "remaining_time": "0:02:19", "throughput": 9390.15, "total_tokens": 35323056}
|
||||
{"current_steps": 324, "total_steps": 335, "loss": 0.0803, "lr": 1.9582650256064205e-07, "epoch": 4.8389513108614235, "percentage": 96.72, "elapsed_time": "1:02:53", "remaining_time": "0:02:08", "throughput": 9390.72, "total_tokens": 35436552}
|
||||
{"current_steps": 325, "total_steps": 335, "loss": 0.0626, "lr": 1.645830266611914e-07, "epoch": 4.853932584269663, "percentage": 97.01, "elapsed_time": "1:03:05", "remaining_time": "0:01:56", "throughput": 9390.88, "total_tokens": 35549872}
|
||||
{"current_steps": 325, "total_steps": 335, "eval_loss": 0.14768485724925995, "epoch": 4.853932584269663, "percentage": 97.01, "elapsed_time": "1:03:10", "remaining_time": "0:01:56", "throughput": 9378.65, "total_tokens": 35549872}
|
||||
{"current_steps": 326, "total_steps": 335, "loss": 0.0551, "lr": 1.3604496809195288e-07, "epoch": 4.868913857677903, "percentage": 97.31, "elapsed_time": "1:03:22", "remaining_time": "0:01:44", "throughput": 9378.41, "total_tokens": 35659600}
|
||||
{"current_steps": 327, "total_steps": 335, "loss": 0.0536, "lr": 1.1021543561322012e-07, "epoch": 4.883895131086143, "percentage": 97.61, "elapsed_time": "1:03:34", "remaining_time": "0:01:33", "throughput": 9378.21, "total_tokens": 35770904}
|
||||
{"current_steps": 328, "total_steps": 335, "loss": 0.0664, "lr": 8.709724293513854e-08, "epoch": 4.898876404494382, "percentage": 97.91, "elapsed_time": "1:03:46", "remaining_time": "0:01:21", "throughput": 9377.38, "total_tokens": 35879784}
|
||||
{"current_steps": 329, "total_steps": 335, "loss": 0.0641, "lr": 6.66929084112089e-08, "epoch": 4.913857677902621, "percentage": 98.21, "elapsed_time": "1:03:58", "remaining_time": "0:01:09", "throughput": 9376.47, "total_tokens": 35988344}
|
||||
{"current_steps": 330, "total_steps": 335, "loss": 0.0624, "lr": 4.900465476393168e-08, "epoch": 4.928838951310862, "percentage": 98.51, "elapsed_time": "1:04:10", "remaining_time": "0:00:58", "throughput": 9374.74, "total_tokens": 36093032}
|
||||
{"current_steps": 331, "total_steps": 335, "loss": 0.0484, "lr": 3.403440884269526e-08, "epoch": 4.943820224719101, "percentage": 98.81, "elapsed_time": "1:04:21", "remaining_time": "0:00:46", "throughput": 9373.49, "total_tokens": 36199864}
|
||||
{"current_steps": 332, "total_steps": 335, "loss": 0.0649, "lr": 2.1783801413866046e-08, "epoch": 4.9588014981273405, "percentage": 99.1, "elapsed_time": "1:04:33", "remaining_time": "0:00:35", "throughput": 9371.2, "total_tokens": 36302712}
|
||||
{"current_steps": 333, "total_steps": 335, "loss": 0.0684, "lr": 1.2254166983152737e-08, "epoch": 4.97378277153558, "percentage": 99.4, "elapsed_time": "1:04:45", "remaining_time": "0:00:23", "throughput": 9371.42, "total_tokens": 36412088}
|
||||
{"current_steps": 334, "total_steps": 335, "loss": 0.0744, "lr": 5.446543650219904e-09, "epoch": 4.98876404494382, "percentage": 99.7, "elapsed_time": "1:04:57", "remaining_time": "0:00:11", "throughput": 9371.29, "total_tokens": 36523328}
|
||||
{"current_steps": 335, "total_steps": 335, "loss": 0.0815, "lr": 1.3616729956228425e-09, "epoch": 5.0, "percentage": 100.0, "elapsed_time": "1:05:01", "remaining_time": "0:00:00", "throughput": 9380.01, "total_tokens": 36600520}
|
||||
{"current_steps": 335, "total_steps": 335, "epoch": 5.0, "percentage": 100.0, "elapsed_time": "1:06:56", "remaining_time": "0:00:00", "throughput": 9113.08, "total_tokens": 36600520}
|
||||
3524
trainer_state.json
Normal file
3524
trainer_state.json
Normal file
File diff suppressed because it is too large
Load Diff
3
training_args.bin
Normal file
3
training_args.bin
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:81cc1d5fde6260609814bcc0a743a85a34008732b524ae0f2211452a4ef21d71
|
||||
size 7736
|
||||
BIN
training_eval_accuracy.png
Normal file
BIN
training_eval_accuracy.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 39 KiB |
BIN
training_eval_loss.png
Normal file
BIN
training_eval_loss.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 40 KiB |
BIN
training_loss.png
Normal file
BIN
training_loss.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 66 KiB |
1
vocab.json
Normal file
1
vocab.json
Normal file
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user