初始化项目,由ModelHub XC社区提供模型
Model: rbelanec/train_rte_42_1776331559 Source: Original Platform
This commit is contained in:
36
.gitattributes
vendored
Normal file
36
.gitattributes
vendored
Normal file
@@ -0,0 +1,36 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||
81
README.md
Normal file
81
README.md
Normal file
@@ -0,0 +1,81 @@
|
||||
---
|
||||
library_name: transformers
|
||||
license: llama3.2
|
||||
base_model: meta-llama/Llama-3.2-1B-Instruct
|
||||
tags:
|
||||
- peft-factory
|
||||
- full
|
||||
- llama-factory
|
||||
- generated_from_trainer
|
||||
model-index:
|
||||
- name: train_rte_42_1776331559
|
||||
results: []
|
||||
---
|
||||
|
||||
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
||||
should probably proofread and complete it, then remove this comment. -->
|
||||
|
||||
# train_rte_42_1776331559
|
||||
|
||||
This model is a fine-tuned version of [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) on the rte dataset.
|
||||
It achieves the following results on the evaluation set:
|
||||
- Loss: 0.1189
|
||||
- Num Input Tokens Seen: 2035272
|
||||
|
||||
## Model description
|
||||
|
||||
More information needed
|
||||
|
||||
## Intended uses & limitations
|
||||
|
||||
More information needed
|
||||
|
||||
## Training and evaluation data
|
||||
|
||||
More information needed
|
||||
|
||||
## Training procedure
|
||||
|
||||
### Training hyperparameters
|
||||
|
||||
The following hyperparameters were used during training:
|
||||
- learning_rate: 5e-06
|
||||
- train_batch_size: 8
|
||||
- eval_batch_size: 8
|
||||
- seed: 42
|
||||
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
||||
- lr_scheduler_type: cosine
|
||||
- lr_scheduler_warmup_ratio: 0.1
|
||||
- num_epochs: 5
|
||||
|
||||
### Training results
|
||||
|
||||
| Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
|
||||
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
|
||||
| 0.2309 | 0.2527 | 71 | 0.1802 | 105024 |
|
||||
| 0.1861 | 0.5053 | 142 | 0.2462 | 209536 |
|
||||
| 0.0658 | 0.7580 | 213 | 0.1589 | 312576 |
|
||||
| 0.0765 | 1.0107 | 284 | 0.1189 | 414040 |
|
||||
| 0.1848 | 1.2633 | 355 | 0.2128 | 517656 |
|
||||
| 0.0306 | 1.5160 | 426 | 0.1791 | 624344 |
|
||||
| 0.1029 | 1.7687 | 497 | 0.1360 | 725656 |
|
||||
| 0.1868 | 2.0214 | 568 | 0.1606 | 821416 |
|
||||
| 0.0259 | 2.2740 | 639 | 0.2542 | 926760 |
|
||||
| 0.029 | 2.5267 | 710 | 0.2361 | 1025320 |
|
||||
| 0.0005 | 2.7794 | 781 | 0.2352 | 1128104 |
|
||||
| 0.0001 | 3.0320 | 852 | 0.2580 | 1229440 |
|
||||
| 0.0001 | 3.2847 | 923 | 0.2295 | 1332544 |
|
||||
| 0.0001 | 3.5374 | 994 | 0.2405 | 1438336 |
|
||||
| 0.0 | 3.7900 | 1065 | 0.2512 | 1539072 |
|
||||
| 0.0 | 4.0427 | 1136 | 0.2552 | 1642696 |
|
||||
| 0.0 | 4.2954 | 1207 | 0.2572 | 1743624 |
|
||||
| 0.0 | 4.5480 | 1278 | 0.2590 | 1849416 |
|
||||
| 0.0 | 4.8007 | 1349 | 0.2602 | 1954568 |
|
||||
|
||||
|
||||
### Framework versions
|
||||
|
||||
- Transformers 4.51.3
|
||||
- Pytorch 2.10.0+cu128
|
||||
- Datasets 4.0.0
|
||||
- Tokenizers 0.21.4
|
||||
13
all_results.json
Normal file
13
all_results.json
Normal file
@@ -0,0 +1,13 @@
|
||||
{
|
||||
"epoch": 5.0,
|
||||
"eval_loss": 0.11889845132827759,
|
||||
"eval_runtime": 0.6177,
|
||||
"eval_samples_per_second": 403.126,
|
||||
"eval_steps_per_second": 51.807,
|
||||
"num_input_tokens_seen": 2035272,
|
||||
"total_flos": 1.1883702201974784e+16,
|
||||
"train_loss": 0.05568007128206763,
|
||||
"train_runtime": 1085.6649,
|
||||
"train_samples_per_second": 10.321,
|
||||
"train_steps_per_second": 1.294
|
||||
}
|
||||
39
config.json
Normal file
39
config.json
Normal file
@@ -0,0 +1,39 @@
|
||||
{
|
||||
"architectures": [
|
||||
"LlamaForCausalLM"
|
||||
],
|
||||
"attention_bias": false,
|
||||
"attention_dropout": 0.0,
|
||||
"bos_token_id": 128000,
|
||||
"eos_token_id": [
|
||||
128001,
|
||||
128008,
|
||||
128009
|
||||
],
|
||||
"head_dim": 64,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 2048,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 8192,
|
||||
"max_position_embeddings": 131072,
|
||||
"mlp_bias": false,
|
||||
"model_type": "llama",
|
||||
"num_attention_heads": 32,
|
||||
"num_hidden_layers": 16,
|
||||
"num_key_value_heads": 8,
|
||||
"pretraining_tp": 1,
|
||||
"rms_norm_eps": 1e-05,
|
||||
"rope_scaling": {
|
||||
"factor": 32.0,
|
||||
"high_freq_factor": 4.0,
|
||||
"low_freq_factor": 1.0,
|
||||
"original_max_position_embeddings": 8192,
|
||||
"rope_type": "llama3"
|
||||
},
|
||||
"rope_theta": 500000.0,
|
||||
"tie_word_embeddings": true,
|
||||
"torch_dtype": "float32",
|
||||
"transformers_version": "4.51.3",
|
||||
"use_cache": false,
|
||||
"vocab_size": 128256
|
||||
}
|
||||
8
eval_results.json
Normal file
8
eval_results.json
Normal file
@@ -0,0 +1,8 @@
|
||||
{
|
||||
"epoch": 5.0,
|
||||
"eval_loss": 0.11889845132827759,
|
||||
"eval_runtime": 0.6177,
|
||||
"eval_samples_per_second": 403.126,
|
||||
"eval_steps_per_second": 51.807,
|
||||
"num_input_tokens_seen": 2035272
|
||||
}
|
||||
12
generation_config.json
Normal file
12
generation_config.json
Normal file
@@ -0,0 +1,12 @@
|
||||
{
|
||||
"bos_token_id": 128000,
|
||||
"do_sample": true,
|
||||
"eos_token_id": [
|
||||
128001,
|
||||
128008,
|
||||
128009
|
||||
],
|
||||
"temperature": 0.6,
|
||||
"top_p": 0.9,
|
||||
"transformers_version": "4.51.3"
|
||||
}
|
||||
3
model.safetensors
Normal file
3
model.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:6f42b11bdaec0cab6d8549f9878c2966047f97164729de4a3943c13261f00c19
|
||||
size 4943274328
|
||||
26
special_tokens_map.json
Normal file
26
special_tokens_map.json
Normal file
@@ -0,0 +1,26 @@
|
||||
{
|
||||
"additional_special_tokens": [
|
||||
{
|
||||
"content": "<|eom_id|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
],
|
||||
"bos_token": {
|
||||
"content": "<|begin_of_text|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"eos_token": {
|
||||
"content": "<|eot_id|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": "<|eot_id|>"
|
||||
}
|
||||
3
tokenizer.json
Normal file
3
tokenizer.json
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:6b9e4e7fb171f92fd137b777cc2714bf87d11576700a1dcd7a399e7bbe39537b
|
||||
size 17209920
|
||||
2069
tokenizer_config.json
Normal file
2069
tokenizer_config.json
Normal file
File diff suppressed because it is too large
Load Diff
55
train.yaml
Normal file
55
train.yaml
Normal file
@@ -0,0 +1,55 @@
|
||||
seed: 42
|
||||
|
||||
### model
|
||||
model_name_or_path: meta-llama/Llama-3.2-1B-Instruct
|
||||
trust_remote_code: true
|
||||
flash_attn: auto
|
||||
use_cache: false
|
||||
|
||||
### method
|
||||
stage: sft
|
||||
do_train: true
|
||||
finetuning_type: full
|
||||
|
||||
### dataset
|
||||
dataset: rte
|
||||
template: llama3
|
||||
cutoff_len: 2048
|
||||
overwrite_cache: true
|
||||
preprocessing_num_workers: 4
|
||||
dataloader_num_workers: 4
|
||||
packing: false
|
||||
|
||||
### output
|
||||
output_dir: saves_bts_preliminary/base/llama-3.2-1b-instruct/train_rte_42_1776331559
|
||||
logging_steps: 5
|
||||
save_steps: 0.05
|
||||
overwrite_output_dir: true
|
||||
save_only_model: false
|
||||
plot_loss: true
|
||||
include_num_input_tokens_seen: true
|
||||
push_to_hub: true
|
||||
push_to_hub_organization: rbelanec
|
||||
load_best_model_at_end: true
|
||||
save_total_limit: 1
|
||||
|
||||
### train
|
||||
per_device_train_batch_size: 8
|
||||
learning_rate: 5.0e-6
|
||||
num_train_epochs: 5
|
||||
weight_decay: 1.0e-5
|
||||
lr_scheduler_type: cosine
|
||||
bf16: true
|
||||
ddp_timeout: 180000000
|
||||
resume_from_checkpoint: null
|
||||
warmup_ratio: 0.1
|
||||
optim: adamw_torch
|
||||
report_to:
|
||||
- wandb
|
||||
run_name: base_llama-3.2-1b-instruct_train_rte_42_1776331559
|
||||
|
||||
### eval
|
||||
per_device_eval_batch_size: 8
|
||||
eval_strategy: steps
|
||||
eval_steps: 0.05
|
||||
val_size: 0.1
|
||||
9
train_results.json
Normal file
9
train_results.json
Normal file
@@ -0,0 +1,9 @@
|
||||
{
|
||||
"epoch": 5.0,
|
||||
"num_input_tokens_seen": 2035272,
|
||||
"total_flos": 1.1883702201974784e+16,
|
||||
"train_loss": 0.05568007128206763,
|
||||
"train_runtime": 1085.6649,
|
||||
"train_samples_per_second": 10.321,
|
||||
"train_steps_per_second": 1.294
|
||||
}
|
||||
301
trainer_log.jsonl
Normal file
301
trainer_log.jsonl
Normal file
@@ -0,0 +1,301 @@
|
||||
{"current_steps": 5, "total_steps": 1405, "loss": 0.7401, "lr": 1.4184397163120568e-07, "epoch": 0.017793594306049824, "percentage": 0.36, "elapsed_time": "0:00:00", "remaining_time": "0:03:34", "throughput": 10273.35, "total_tokens": 7872}
|
||||
{"current_steps": 10, "total_steps": 1405, "loss": 0.6171, "lr": 3.1914893617021275e-07, "epoch": 0.03558718861209965, "percentage": 0.71, "elapsed_time": "0:00:01", "remaining_time": "0:02:49", "throughput": 12160.95, "total_tokens": 14784}
|
||||
{"current_steps": 15, "total_steps": 1405, "loss": 0.4092, "lr": 4.964539007092199e-07, "epoch": 0.05338078291814947, "percentage": 1.07, "elapsed_time": "0:00:01", "remaining_time": "0:02:38", "throughput": 13729.69, "total_tokens": 23424}
|
||||
{"current_steps": 20, "total_steps": 1405, "loss": 0.2659, "lr": 6.73758865248227e-07, "epoch": 0.0711743772241993, "percentage": 1.42, "elapsed_time": "0:00:02", "remaining_time": "0:02:28", "throughput": 13884.51, "total_tokens": 29824}
|
||||
{"current_steps": 25, "total_steps": 1405, "loss": 0.2666, "lr": 8.510638297872341e-07, "epoch": 0.08896797153024912, "percentage": 1.78, "elapsed_time": "0:00:02", "remaining_time": "0:02:24", "throughput": 14476.05, "total_tokens": 37824}
|
||||
{"current_steps": 30, "total_steps": 1405, "loss": 0.2573, "lr": 1.0283687943262412e-06, "epoch": 0.10676156583629894, "percentage": 2.14, "elapsed_time": "0:00:03", "remaining_time": "0:02:19", "throughput": 14604.05, "total_tokens": 44608}
|
||||
{"current_steps": 35, "total_steps": 1405, "loss": 0.3322, "lr": 1.2056737588652482e-06, "epoch": 0.12455516014234876, "percentage": 2.49, "elapsed_time": "0:00:03", "remaining_time": "0:02:17", "throughput": 14841.7, "total_tokens": 51968}
|
||||
{"current_steps": 40, "total_steps": 1405, "loss": 0.1441, "lr": 1.3829787234042555e-06, "epoch": 0.1423487544483986, "percentage": 2.85, "elapsed_time": "0:00:03", "remaining_time": "0:02:14", "throughput": 15033.22, "total_tokens": 59456}
|
||||
{"current_steps": 45, "total_steps": 1405, "loss": 0.163, "lr": 1.5602836879432626e-06, "epoch": 0.1601423487544484, "percentage": 3.2, "elapsed_time": "0:00:04", "remaining_time": "0:02:12", "throughput": 15135.47, "total_tokens": 66496}
|
||||
{"current_steps": 50, "total_steps": 1405, "loss": 0.1815, "lr": 1.7375886524822697e-06, "epoch": 0.17793594306049823, "percentage": 3.56, "elapsed_time": "0:00:04", "remaining_time": "0:02:11", "throughput": 15185.55, "total_tokens": 73408}
|
||||
{"current_steps": 55, "total_steps": 1405, "loss": 0.1484, "lr": 1.9148936170212767e-06, "epoch": 0.19572953736654805, "percentage": 3.91, "elapsed_time": "0:00:05", "remaining_time": "0:02:09", "throughput": 15274.84, "total_tokens": 80576}
|
||||
{"current_steps": 60, "total_steps": 1405, "loss": 0.1828, "lr": 2.092198581560284e-06, "epoch": 0.21352313167259787, "percentage": 4.27, "elapsed_time": "0:00:05", "remaining_time": "0:02:08", "throughput": 15397.99, "total_tokens": 88256}
|
||||
{"current_steps": 65, "total_steps": 1405, "loss": 0.1709, "lr": 2.269503546099291e-06, "epoch": 0.2313167259786477, "percentage": 4.63, "elapsed_time": "0:00:06", "remaining_time": "0:02:07", "throughput": 15537.96, "total_tokens": 96256}
|
||||
{"current_steps": 70, "total_steps": 1405, "loss": 0.2309, "lr": 2.446808510638298e-06, "epoch": 0.2491103202846975, "percentage": 4.98, "elapsed_time": "0:00:06", "remaining_time": "0:02:06", "throughput": 15587.07, "total_tokens": 103424}
|
||||
{"current_steps": 71, "total_steps": 1405, "eval_loss": 0.18017518520355225, "epoch": 0.2526690391459075, "percentage": 5.05, "elapsed_time": "0:00:07", "remaining_time": "0:02:17", "throughput": 14316.64, "total_tokens": 105024}
|
||||
{"current_steps": 75, "total_steps": 1405, "loss": 0.1733, "lr": 2.624113475177305e-06, "epoch": 0.2669039145907473, "percentage": 5.34, "elapsed_time": "0:01:18", "remaining_time": "0:23:06", "throughput": 1413.24, "total_tokens": 110528}
|
||||
{"current_steps": 80, "total_steps": 1405, "loss": 0.1469, "lr": 2.8014184397163125e-06, "epoch": 0.2846975088967972, "percentage": 5.69, "elapsed_time": "0:01:18", "remaining_time": "0:21:42", "throughput": 1493.33, "total_tokens": 117440}
|
||||
{"current_steps": 85, "total_steps": 1405, "loss": 0.1417, "lr": 2.978723404255319e-06, "epoch": 0.302491103202847, "percentage": 6.05, "elapsed_time": "0:01:19", "remaining_time": "0:20:28", "throughput": 1586.53, "total_tokens": 125504}
|
||||
{"current_steps": 90, "total_steps": 1405, "loss": 0.1887, "lr": 3.1560283687943267e-06, "epoch": 0.3202846975088968, "percentage": 6.41, "elapsed_time": "0:01:19", "remaining_time": "0:19:22", "throughput": 1663.97, "total_tokens": 132352}
|
||||
{"current_steps": 95, "total_steps": 1405, "loss": 0.1507, "lr": 3.3333333333333333e-06, "epoch": 0.33807829181494664, "percentage": 6.76, "elapsed_time": "0:01:19", "remaining_time": "0:18:22", "throughput": 1740.55, "total_tokens": 139200}
|
||||
{"current_steps": 100, "total_steps": 1405, "loss": 0.0966, "lr": 3.510638297872341e-06, "epoch": 0.35587188612099646, "percentage": 7.12, "elapsed_time": "0:01:20", "remaining_time": "0:17:29", "throughput": 1838.31, "total_tokens": 147904}
|
||||
{"current_steps": 105, "total_steps": 1405, "loss": 0.2218, "lr": 3.6879432624113475e-06, "epoch": 0.3736654804270463, "percentage": 7.47, "elapsed_time": "0:01:20", "remaining_time": "0:16:41", "throughput": 1907.09, "total_tokens": 154240}
|
||||
{"current_steps": 110, "total_steps": 1405, "loss": 0.1246, "lr": 3.865248226950355e-06, "epoch": 0.3914590747330961, "percentage": 7.83, "elapsed_time": "0:01:21", "remaining_time": "0:15:57", "throughput": 1985.57, "total_tokens": 161472}
|
||||
{"current_steps": 115, "total_steps": 1405, "loss": 0.1689, "lr": 4.042553191489362e-06, "epoch": 0.4092526690391459, "percentage": 8.19, "elapsed_time": "0:01:21", "remaining_time": "0:15:17", "throughput": 2057.26, "total_tokens": 168192}
|
||||
{"current_steps": 120, "total_steps": 1405, "loss": 0.1818, "lr": 4.219858156028369e-06, "epoch": 0.42704626334519574, "percentage": 8.54, "elapsed_time": "0:01:22", "remaining_time": "0:14:40", "throughput": 2125.23, "total_tokens": 174656}
|
||||
{"current_steps": 125, "total_steps": 1405, "loss": 0.1189, "lr": 4.397163120567377e-06, "epoch": 0.44483985765124556, "percentage": 8.9, "elapsed_time": "0:01:22", "remaining_time": "0:14:05", "throughput": 2198.52, "total_tokens": 181632}
|
||||
{"current_steps": 130, "total_steps": 1405, "loss": 0.0973, "lr": 4.574468085106383e-06, "epoch": 0.4626334519572954, "percentage": 9.25, "elapsed_time": "0:01:23", "remaining_time": "0:13:35", "throughput": 2303.5, "total_tokens": 191488}
|
||||
{"current_steps": 135, "total_steps": 1405, "loss": 0.1978, "lr": 4.751773049645391e-06, "epoch": 0.4804270462633452, "percentage": 9.61, "elapsed_time": "0:01:23", "remaining_time": "0:13:06", "throughput": 2379.18, "total_tokens": 198848}
|
||||
{"current_steps": 140, "total_steps": 1405, "loss": 0.1861, "lr": 4.929078014184397e-06, "epoch": 0.498220640569395, "percentage": 9.96, "elapsed_time": "0:01:24", "remaining_time": "0:12:39", "throughput": 2465.53, "total_tokens": 207232}
|
||||
{"current_steps": 142, "total_steps": 1405, "eval_loss": 0.2461702525615692, "epoch": 0.505338078291815, "percentage": 10.11, "elapsed_time": "0:01:24", "remaining_time": "0:12:34", "throughput": 2471.44, "total_tokens": 209536}
|
||||
{"current_steps": 145, "total_steps": 1405, "loss": 0.2676, "lr": 4.999930504592181e-06, "epoch": 0.5160142348754448, "percentage": 10.32, "elapsed_time": "0:02:00", "remaining_time": "0:17:27", "throughput": 1775.06, "total_tokens": 213952}
|
||||
{"current_steps": 150, "total_steps": 1405, "loss": 0.1686, "lr": 4.999505824425164e-06, "epoch": 0.5338078291814946, "percentage": 10.68, "elapsed_time": "0:02:00", "remaining_time": "0:16:52", "throughput": 1829.83, "total_tokens": 221376}
|
||||
{"current_steps": 155, "total_steps": 1405, "loss": 0.1074, "lr": 4.998695138156149e-06, "epoch": 0.5516014234875445, "percentage": 11.03, "elapsed_time": "0:02:01", "remaining_time": "0:16:19", "throughput": 1885.24, "total_tokens": 228928}
|
||||
{"current_steps": 160, "total_steps": 1405, "loss": 0.1216, "lr": 4.997498570981822e-06, "epoch": 0.5693950177935944, "percentage": 11.39, "elapsed_time": "0:02:01", "remaining_time": "0:15:48", "throughput": 1939.28, "total_tokens": 236352}
|
||||
{"current_steps": 165, "total_steps": 1405, "loss": 0.1426, "lr": 4.995916307691601e-06, "epoch": 0.5871886120996441, "percentage": 11.74, "elapsed_time": "0:02:02", "remaining_time": "0:15:19", "throughput": 1997.84, "total_tokens": 244416}
|
||||
{"current_steps": 170, "total_steps": 1405, "loss": 0.175, "lr": 4.993948592639105e-06, "epoch": 0.604982206405694, "percentage": 12.1, "elapsed_time": "0:02:02", "remaining_time": "0:14:51", "throughput": 2048.09, "total_tokens": 251456}
|
||||
{"current_steps": 175, "total_steps": 1405, "loss": 0.1179, "lr": 4.991595729704405e-06, "epoch": 0.6227758007117438, "percentage": 12.46, "elapsed_time": "0:02:03", "remaining_time": "0:14:26", "throughput": 2100.88, "total_tokens": 258880}
|
||||
{"current_steps": 180, "total_steps": 1405, "loss": 0.1235, "lr": 4.988858082247109e-06, "epoch": 0.6405693950177936, "percentage": 12.81, "elapsed_time": "0:02:03", "remaining_time": "0:14:01", "throughput": 2144.54, "total_tokens": 265152}
|
||||
{"current_steps": 185, "total_steps": 1405, "loss": 0.1596, "lr": 4.985736073050237e-06, "epoch": 0.6583629893238434, "percentage": 13.17, "elapsed_time": "0:02:04", "remaining_time": "0:13:38", "throughput": 2196.61, "total_tokens": 272576}
|
||||
{"current_steps": 190, "total_steps": 1405, "loss": 0.1188, "lr": 4.982230184254934e-06, "epoch": 0.6761565836298933, "percentage": 13.52, "elapsed_time": "0:02:04", "remaining_time": "0:13:16", "throughput": 2246.44, "total_tokens": 279744}
|
||||
{"current_steps": 195, "total_steps": 1405, "loss": 0.1255, "lr": 4.9783409572860105e-06, "epoch": 0.693950177935943, "percentage": 13.88, "elapsed_time": "0:02:04", "remaining_time": "0:12:55", "throughput": 2301.63, "total_tokens": 287680}
|
||||
{"current_steps": 200, "total_steps": 1405, "loss": 0.0801, "lr": 4.9740689927683314e-06, "epoch": 0.7117437722419929, "percentage": 14.23, "elapsed_time": "0:02:05", "remaining_time": "0:12:35", "throughput": 2348.72, "total_tokens": 294592}
|
||||
{"current_steps": 205, "total_steps": 1405, "loss": 0.0902, "lr": 4.9694149504340515e-06, "epoch": 0.7295373665480427, "percentage": 14.59, "elapsed_time": "0:02:05", "remaining_time": "0:12:16", "throughput": 2395.03, "total_tokens": 301440}
|
||||
{"current_steps": 210, "total_steps": 1405, "loss": 0.0658, "lr": 4.964379549020741e-06, "epoch": 0.7473309608540926, "percentage": 14.95, "elapsed_time": "0:02:06", "remaining_time": "0:11:58", "throughput": 2441.99, "total_tokens": 308416}
|
||||
{"current_steps": 213, "total_steps": 1405, "eval_loss": 0.1589389145374298, "epoch": 0.7580071174377224, "percentage": 15.16, "elapsed_time": "0:02:08", "remaining_time": "0:11:57", "throughput": 2437.63, "total_tokens": 312576}
|
||||
{"current_steps": 215, "total_steps": 1405, "loss": 0.1047, "lr": 4.9589635661603845e-06, "epoch": 0.7651245551601423, "percentage": 15.3, "elapsed_time": "0:02:58", "remaining_time": "0:16:26", "throughput": 1769.32, "total_tokens": 315328}
|
||||
{"current_steps": 220, "total_steps": 1405, "loss": 0.0899, "lr": 4.953167838259285e-06, "epoch": 0.7829181494661922, "percentage": 15.66, "elapsed_time": "0:02:58", "remaining_time": "0:16:02", "throughput": 1806.05, "total_tokens": 322688}
|
||||
{"current_steps": 225, "total_steps": 1405, "loss": 0.1884, "lr": 4.946993260368904e-06, "epoch": 0.800711743772242, "percentage": 16.01, "elapsed_time": "0:02:59", "remaining_time": "0:15:39", "throughput": 1838.5, "total_tokens": 329280}
|
||||
{"current_steps": 230, "total_steps": 1405, "loss": 0.0862, "lr": 4.9404407860476275e-06, "epoch": 0.8185053380782918, "percentage": 16.37, "elapsed_time": "0:02:59", "remaining_time": "0:15:17", "throughput": 1876.32, "total_tokens": 336896}
|
||||
{"current_steps": 235, "total_steps": 1405, "loss": 0.1129, "lr": 4.933511427213511e-06, "epoch": 0.8362989323843416, "percentage": 16.73, "elapsed_time": "0:02:59", "remaining_time": "0:14:56", "throughput": 1911.89, "total_tokens": 344128}
|
||||
{"current_steps": 240, "total_steps": 1405, "loss": 0.0612, "lr": 4.926206253988001e-06, "epoch": 0.8540925266903915, "percentage": 17.08, "elapsed_time": "0:03:00", "remaining_time": "0:14:35", "throughput": 1944.94, "total_tokens": 350912}
|
||||
{"current_steps": 245, "total_steps": 1405, "loss": 0.1364, "lr": 4.91852639453068e-06, "epoch": 0.8718861209964412, "percentage": 17.44, "elapsed_time": "0:03:00", "remaining_time": "0:14:16", "throughput": 1979.49, "total_tokens": 358016}
|
||||
{"current_steps": 250, "total_steps": 1405, "loss": 0.0751, "lr": 4.910473034865033e-06, "epoch": 0.8896797153024911, "percentage": 17.79, "elapsed_time": "0:03:01", "remaining_time": "0:13:57", "throughput": 2011.88, "total_tokens": 364736}
|
||||
{"current_steps": 255, "total_steps": 1405, "loss": 0.1051, "lr": 4.902047418695293e-06, "epoch": 0.9074733096085409, "percentage": 18.15, "elapsed_time": "0:03:01", "remaining_time": "0:13:39", "throughput": 2045.11, "total_tokens": 371648}
|
||||
{"current_steps": 260, "total_steps": 1405, "loss": 0.0675, "lr": 4.893250847214369e-06, "epoch": 0.9252669039145908, "percentage": 18.51, "elapsed_time": "0:03:02", "remaining_time": "0:13:22", "throughput": 2081.5, "total_tokens": 379200}
|
||||
{"current_steps": 265, "total_steps": 1405, "loss": 0.1611, "lr": 4.884084678902898e-06, "epoch": 0.9430604982206405, "percentage": 18.86, "elapsed_time": "0:03:02", "remaining_time": "0:13:05", "throughput": 2120.01, "total_tokens": 387200}
|
||||
{"current_steps": 270, "total_steps": 1405, "loss": 0.1232, "lr": 4.874550329319457e-06, "epoch": 0.9608540925266904, "percentage": 19.22, "elapsed_time": "0:03:03", "remaining_time": "0:12:49", "throughput": 2158.55, "total_tokens": 395264}
|
||||
{"current_steps": 275, "total_steps": 1405, "loss": 0.135, "lr": 4.864649270881944e-06, "epoch": 0.9786476868327402, "percentage": 19.57, "elapsed_time": "0:03:03", "remaining_time": "0:12:34", "throughput": 2191.09, "total_tokens": 402176}
|
||||
{"current_steps": 280, "total_steps": 1405, "loss": 0.0765, "lr": 4.854383032640196e-06, "epoch": 0.99644128113879, "percentage": 19.93, "elapsed_time": "0:03:04", "remaining_time": "0:12:19", "throughput": 2228.07, "total_tokens": 409984}
|
||||
{"current_steps": 284, "total_steps": 1405, "eval_loss": 0.11889845132827759, "epoch": 1.01067615658363, "percentage": 20.21, "elapsed_time": "0:03:05", "remaining_time": "0:12:10", "throughput": 2237.6, "total_tokens": 414040}
|
||||
{"current_steps": 285, "total_steps": 1405, "loss": 0.0754, "lr": 4.843753200039851e-06, "epoch": 1.0142348754448398, "percentage": 20.28, "elapsed_time": "0:04:34", "remaining_time": "0:17:58", "throughput": 1513.6, "total_tokens": 415256}
|
||||
{"current_steps": 290, "total_steps": 1405, "loss": 0.067, "lr": 4.832761414677502e-06, "epoch": 1.0320284697508897, "percentage": 20.64, "elapsed_time": "0:04:34", "remaining_time": "0:17:36", "throughput": 1538.59, "total_tokens": 422808}
|
||||
{"current_steps": 295, "total_steps": 1405, "loss": 0.0342, "lr": 4.821409374047184e-06, "epoch": 1.0498220640569396, "percentage": 21.0, "elapsed_time": "0:04:35", "remaining_time": "0:17:15", "throughput": 1562.61, "total_tokens": 430104}
|
||||
{"current_steps": 300, "total_steps": 1405, "loss": 0.0683, "lr": 4.809698831278217e-06, "epoch": 1.0676156583629894, "percentage": 21.35, "elapsed_time": "0:04:35", "remaining_time": "0:16:55", "throughput": 1584.31, "total_tokens": 436760}
|
||||
{"current_steps": 305, "total_steps": 1405, "loss": 0.1073, "lr": 4.797631594864475e-06, "epoch": 1.085409252669039, "percentage": 21.71, "elapsed_time": "0:04:36", "remaining_time": "0:16:35", "throughput": 1611.32, "total_tokens": 444952}
|
||||
{"current_steps": 310, "total_steps": 1405, "loss": 0.1453, "lr": 4.785209528385087e-06, "epoch": 1.103202846975089, "percentage": 22.06, "elapsed_time": "0:04:36", "remaining_time": "0:16:17", "throughput": 1636.86, "total_tokens": 452760}
|
||||
{"current_steps": 315, "total_steps": 1405, "loss": 0.2851, "lr": 4.7724345502166435e-06, "epoch": 1.1209964412811388, "percentage": 22.42, "elapsed_time": "0:04:37", "remaining_time": "0:15:58", "throughput": 1654.76, "total_tokens": 458392}
|
||||
{"current_steps": 320, "total_steps": 1405, "loss": 0.0427, "lr": 4.759308633236934e-06, "epoch": 1.1387900355871885, "percentage": 22.78, "elapsed_time": "0:04:37", "remaining_time": "0:15:40", "throughput": 1676.42, "total_tokens": 465112}
|
||||
{"current_steps": 325, "total_steps": 1405, "loss": 0.0369, "lr": 4.74583380452027e-06, "epoch": 1.1565836298932384, "percentage": 23.13, "elapsed_time": "0:04:37", "remaining_time": "0:15:23", "throughput": 1699.35, "total_tokens": 472216}
|
||||
{"current_steps": 330, "total_steps": 1405, "loss": 0.1237, "lr": 4.7320121450244395e-06, "epoch": 1.1743772241992882, "percentage": 23.49, "elapsed_time": "0:04:38", "remaining_time": "0:15:06", "throughput": 1723.08, "total_tokens": 479576}
|
||||
{"current_steps": 335, "total_steps": 1405, "loss": 0.0822, "lr": 4.717845789269333e-06, "epoch": 1.1921708185053381, "percentage": 23.84, "elapsed_time": "0:04:38", "remaining_time": "0:14:50", "throughput": 1745.4, "total_tokens": 486552}
|
||||
{"current_steps": 340, "total_steps": 1405, "loss": 0.0835, "lr": 4.703336925007311e-06, "epoch": 1.209964412811388, "percentage": 24.2, "elapsed_time": "0:04:39", "remaining_time": "0:14:34", "throughput": 1771.37, "total_tokens": 494616}
|
||||
{"current_steps": 345, "total_steps": 1405, "loss": 0.0413, "lr": 4.68848779288534e-06, "epoch": 1.2277580071174377, "percentage": 24.56, "elapsed_time": "0:04:39", "remaining_time": "0:14:19", "throughput": 1792.87, "total_tokens": 501400}
|
||||
{"current_steps": 350, "total_steps": 1405, "loss": 0.0284, "lr": 4.673300686098957e-06, "epoch": 1.2455516014234875, "percentage": 24.91, "elapsed_time": "0:04:40", "remaining_time": "0:14:04", "throughput": 1816.7, "total_tokens": 508888}
|
||||
{"current_steps": 355, "total_steps": 1405, "loss": 0.1848, "lr": 4.657777950038133e-06, "epoch": 1.2633451957295374, "percentage": 25.27, "elapsed_time": "0:04:40", "remaining_time": "0:13:49", "throughput": 1844.83, "total_tokens": 517656}
|
||||
{"current_steps": 355, "total_steps": 1405, "eval_loss": 0.21280953288078308, "epoch": 1.2633451957295374, "percentage": 25.27, "elapsed_time": "0:04:41", "remaining_time": "0:13:51", "throughput": 1840.79, "total_tokens": 517656}
|
||||
{"current_steps": 360, "total_steps": 1405, "loss": 0.0684, "lr": 4.641921981925064e-06, "epoch": 1.281138790035587, "percentage": 25.62, "elapsed_time": "0:05:17", "remaining_time": "0:15:21", "throughput": 1658.11, "total_tokens": 526232}
|
||||
{"current_steps": 365, "total_steps": 1405, "loss": 0.0556, "lr": 4.625735230443959e-06, "epoch": 1.298932384341637, "percentage": 25.98, "elapsed_time": "0:05:17", "remaining_time": "0:15:05", "throughput": 1678.36, "total_tokens": 533400}
|
||||
{"current_steps": 370, "total_steps": 1405, "loss": 0.1059, "lr": 4.609220195362886e-06, "epoch": 1.3167259786476868, "percentage": 26.33, "elapsed_time": "0:05:18", "remaining_time": "0:14:50", "throughput": 1703.34, "total_tokens": 542168}
|
||||
{"current_steps": 375, "total_steps": 1405, "loss": 0.0525, "lr": 4.592379427147722e-06, "epoch": 1.3345195729537367, "percentage": 26.69, "elapsed_time": "0:05:18", "remaining_time": "0:14:35", "throughput": 1725.39, "total_tokens": 549976}
|
||||
{"current_steps": 380, "total_steps": 1405, "loss": 0.118, "lr": 4.575215526568278e-06, "epoch": 1.3523131672597866, "percentage": 27.05, "elapsed_time": "0:05:19", "remaining_time": "0:14:20", "throughput": 1745.07, "total_tokens": 557016}
|
||||
{"current_steps": 385, "total_steps": 1405, "loss": 0.0831, "lr": 4.557731144296659e-06, "epoch": 1.3701067615658362, "percentage": 27.4, "elapsed_time": "0:05:19", "remaining_time": "0:14:06", "throughput": 1766.02, "total_tokens": 564504}
|
||||
{"current_steps": 390, "total_steps": 1405, "loss": 0.0283, "lr": 4.539928980497903e-06, "epoch": 1.387900355871886, "percentage": 27.76, "elapsed_time": "0:05:20", "remaining_time": "0:13:53", "throughput": 1786.54, "total_tokens": 571864}
|
||||
{"current_steps": 395, "total_steps": 1405, "loss": 0.0483, "lr": 4.521811784412996e-06, "epoch": 1.405693950177936, "percentage": 28.11, "elapsed_time": "0:05:20", "remaining_time": "0:13:39", "throughput": 1804.73, "total_tokens": 578456}
|
||||
{"current_steps": 400, "total_steps": 1405, "loss": 0.1155, "lr": 4.503382353934295e-06, "epoch": 1.4234875444839858, "percentage": 28.47, "elapsed_time": "0:05:20", "remaining_time": "0:13:26", "throughput": 1821.54, "total_tokens": 584600}
|
||||
{"current_steps": 405, "total_steps": 1405, "loss": 0.0086, "lr": 4.484643535173438e-06, "epoch": 1.4412811387900355, "percentage": 28.83, "elapsed_time": "0:05:21", "remaining_time": "0:13:13", "throughput": 1839.46, "total_tokens": 591128}
|
||||
{"current_steps": 410, "total_steps": 1405, "loss": 0.0969, "lr": 4.465598222021818e-06, "epoch": 1.4590747330960854, "percentage": 29.18, "elapsed_time": "0:05:21", "remaining_time": "0:13:00", "throughput": 1859.96, "total_tokens": 598552}
|
||||
{"current_steps": 415, "total_steps": 1405, "loss": 0.0715, "lr": 4.446249355703661e-06, "epoch": 1.4768683274021353, "percentage": 29.54, "elapsed_time": "0:05:22", "remaining_time": "0:12:48", "throughput": 1884.37, "total_tokens": 607320}
|
||||
{"current_steps": 420, "total_steps": 1405, "loss": 0.094, "lr": 4.426599924321815e-06, "epoch": 1.4946619217081851, "percentage": 29.89, "elapsed_time": "0:05:22", "remaining_time": "0:12:36", "throughput": 1904.77, "total_tokens": 614744}
|
||||
{"current_steps": 425, "total_steps": 1405, "loss": 0.0306, "lr": 4.406652962396278e-06, "epoch": 1.512455516014235, "percentage": 30.25, "elapsed_time": "0:05:23", "remaining_time": "0:12:25", "throughput": 1926.98, "total_tokens": 622808}
|
||||
{"current_steps": 426, "total_steps": 1405, "eval_loss": 0.17913997173309326, "epoch": 1.5160142348754448, "percentage": 30.32, "elapsed_time": "0:05:23", "remaining_time": "0:12:24", "throughput": 1927.68, "total_tokens": 624344}
|
||||
{"current_steps": 430, "total_steps": 1405, "loss": 0.103, "lr": 4.386411550395576e-06, "epoch": 1.5302491103202847, "percentage": 30.6, "elapsed_time": "0:05:57", "remaining_time": "0:13:30", "throughput": 1764.39, "total_tokens": 630488}
|
||||
{"current_steps": 435, "total_steps": 1405, "loss": 0.0775, "lr": 4.365878814261032e-06, "epoch": 1.5480427046263345, "percentage": 30.96, "elapsed_time": "0:05:57", "remaining_time": "0:13:17", "throughput": 1784.31, "total_tokens": 638424}
|
||||
{"current_steps": 440, "total_steps": 1405, "loss": 0.0767, "lr": 4.34505792492402e-06, "epoch": 1.5658362989323842, "percentage": 31.32, "elapsed_time": "0:05:58", "remaining_time": "0:13:05", "throughput": 1801.12, "total_tokens": 645208}
|
||||
{"current_steps": 445, "total_steps": 1405, "loss": 0.013, "lr": 4.3239520978162685e-06, "epoch": 1.583629893238434, "percentage": 31.67, "elapsed_time": "0:05:58", "remaining_time": "0:12:53", "throughput": 1820.61, "total_tokens": 653016}
|
||||
{"current_steps": 450, "total_steps": 1405, "loss": 0.0129, "lr": 4.302564592373293e-06, "epoch": 1.601423487544484, "percentage": 32.03, "elapsed_time": "0:05:59", "remaining_time": "0:12:42", "throughput": 1837.82, "total_tokens": 659992}
|
||||
{"current_steps": 455, "total_steps": 1405, "loss": 0.1234, "lr": 4.280898711531026e-06, "epoch": 1.6192170818505338, "percentage": 32.38, "elapsed_time": "0:05:59", "remaining_time": "0:12:30", "throughput": 1855.64, "total_tokens": 667224}
|
||||
{"current_steps": 460, "total_steps": 1405, "loss": 0.1334, "lr": 4.258957801215743e-06, "epoch": 1.6370106761565837, "percentage": 32.74, "elapsed_time": "0:06:00", "remaining_time": "0:12:19", "throughput": 1875.29, "total_tokens": 675160}
|
||||
{"current_steps": 465, "total_steps": 1405, "loss": 0.1297, "lr": 4.236745249827336e-06, "epoch": 1.6548042704626336, "percentage": 33.1, "elapsed_time": "0:06:00", "remaining_time": "0:12:08", "throughput": 1896.07, "total_tokens": 683544}
|
||||
{"current_steps": 470, "total_steps": 1405, "loss": 0.0334, "lr": 4.2142644877160334e-06, "epoch": 1.6725978647686834, "percentage": 33.45, "elapsed_time": "0:06:00", "remaining_time": "0:11:57", "throughput": 1910.07, "total_tokens": 689368}
|
||||
{"current_steps": 475, "total_steps": 1405, "loss": 0.0779, "lr": 4.191518986652642e-06, "epoch": 1.690391459074733, "percentage": 33.81, "elapsed_time": "0:06:01", "remaining_time": "0:11:47", "throughput": 1925.73, "total_tokens": 695832}
|
||||
{"current_steps": 480, "total_steps": 1405, "loss": 0.0085, "lr": 4.168512259292391e-06, "epoch": 1.708185053380783, "percentage": 34.16, "elapsed_time": "0:06:01", "remaining_time": "0:11:37", "throughput": 1943.52, "total_tokens": 703128}
|
||||
{"current_steps": 485, "total_steps": 1405, "loss": 0.0617, "lr": 4.14524785863246e-06, "epoch": 1.7259786476868326, "percentage": 34.52, "elapsed_time": "0:06:02", "remaining_time": "0:11:27", "throughput": 1958.92, "total_tokens": 709528}
|
||||
{"current_steps": 490, "total_steps": 1405, "loss": 0.0537, "lr": 4.121729377463285e-06, "epoch": 1.7437722419928825, "percentage": 34.88, "elapsed_time": "0:06:02", "remaining_time": "0:11:17", "throughput": 1975.31, "total_tokens": 716312}
|
||||
{"current_steps": 495, "total_steps": 1405, "loss": 0.1029, "lr": 4.0979604478137045e-06, "epoch": 1.7615658362989324, "percentage": 35.23, "elapsed_time": "0:06:03", "remaining_time": "0:11:07", "throughput": 1990.84, "total_tokens": 722776}
|
||||
{"current_steps": 497, "total_steps": 1405, "eval_loss": 0.13597266376018524, "epoch": 1.7686832740213523, "percentage": 35.37, "elapsed_time": "0:06:03", "remaining_time": "0:11:04", "throughput": 1994.61, "total_tokens": 725656}
|
||||
{"current_steps": 500, "total_steps": 1405, "loss": 0.1142, "lr": 4.0739447403900605e-06, "epoch": 1.7793594306049823, "percentage": 35.59, "elapsed_time": "0:06:38", "remaining_time": "0:12:01", "throughput": 1832.08, "total_tokens": 729944}
|
||||
{"current_steps": 505, "total_steps": 1405, "loss": 0.0837, "lr": 4.0496859640093215e-06, "epoch": 1.7971530249110321, "percentage": 35.94, "elapsed_time": "0:06:38", "remaining_time": "0:11:50", "throughput": 1848.01, "total_tokens": 737112}
|
||||
{"current_steps": 510, "total_steps": 1405, "loss": 0.0124, "lr": 4.025187865026311e-06, "epoch": 1.814946619217082, "percentage": 36.3, "elapsed_time": "0:06:39", "remaining_time": "0:11:40", "throughput": 1864.23, "total_tokens": 744408}
|
||||
{"current_steps": 515, "total_steps": 1405, "loss": 0.059, "lr": 4.0004542267551585e-06, "epoch": 1.8327402135231317, "percentage": 36.65, "elapsed_time": "0:06:39", "remaining_time": "0:11:30", "throughput": 1877.49, "total_tokens": 750488}
|
||||
{"current_steps": 520, "total_steps": 1405, "loss": 0.0662, "lr": 3.975488868885022e-06, "epoch": 1.8505338078291815, "percentage": 37.01, "elapsed_time": "0:06:40", "remaining_time": "0:11:21", "throughput": 1893.01, "total_tokens": 757528}
|
||||
{"current_steps": 525, "total_steps": 1405, "loss": 0.0299, "lr": 3.950295646890202e-06, "epoch": 1.8683274021352312, "percentage": 37.37, "elapsed_time": "0:06:40", "remaining_time": "0:11:11", "throughput": 1906.54, "total_tokens": 763736}
|
||||
{"current_steps": 530, "total_steps": 1405, "loss": 0.0666, "lr": 3.924878451434736e-06, "epoch": 1.886120996441281, "percentage": 37.72, "elapsed_time": "0:06:41", "remaining_time": "0:11:02", "throughput": 1924.6, "total_tokens": 771864}
|
||||
{"current_steps": 535, "total_steps": 1405, "loss": 0.072, "lr": 3.899241207771546e-06, "epoch": 1.903914590747331, "percentage": 38.08, "elapsed_time": "0:06:41", "remaining_time": "0:10:52", "throughput": 1939.57, "total_tokens": 778712}
|
||||
{"current_steps": 540, "total_steps": 1405, "loss": 0.0475, "lr": 3.873387875136252e-06, "epoch": 1.9217081850533808, "percentage": 38.43, "elapsed_time": "0:06:41", "remaining_time": "0:10:43", "throughput": 1951.5, "total_tokens": 784280}
|
||||
{"current_steps": 545, "total_steps": 1405, "loss": 0.1443, "lr": 3.847322446135736e-06, "epoch": 1.9395017793594307, "percentage": 38.79, "elapsed_time": "0:06:42", "remaining_time": "0:10:34", "throughput": 1969.14, "total_tokens": 792280}
|
||||
{"current_steps": 550, "total_steps": 1405, "loss": 0.1501, "lr": 3.821048946131549e-06, "epoch": 1.9572953736654806, "percentage": 39.15, "elapsed_time": "0:06:42", "remaining_time": "0:10:26", "throughput": 1982.52, "total_tokens": 798488}
|
||||
{"current_steps": 555, "total_steps": 1405, "loss": 0.0502, "lr": 3.794571432618267e-06, "epoch": 1.9750889679715302, "percentage": 39.5, "elapsed_time": "0:06:43", "remaining_time": "0:10:17", "throughput": 1999.18, "total_tokens": 806104}
|
||||
{"current_steps": 560, "total_steps": 1405, "loss": 0.0142, "lr": 3.767893994596876e-06, "epoch": 1.99288256227758, "percentage": 39.86, "elapsed_time": "0:06:43", "remaining_time": "0:10:09", "throughput": 2014.91, "total_tokens": 813336}
|
||||
{"current_steps": 565, "total_steps": 1405, "loss": 0.1868, "lr": 3.7410207519432972e-06, "epoch": 2.0106761565836297, "percentage": 40.21, "elapsed_time": "0:06:44", "remaining_time": "0:10:00", "throughput": 2022.87, "total_tokens": 817576}
|
||||
{"current_steps": 568, "total_steps": 1405, "eval_loss": 0.16060441732406616, "epoch": 2.02135231316726, "percentage": 40.43, "elapsed_time": "0:06:45", "remaining_time": "0:09:56", "throughput": 2028.08, "total_tokens": 821416}
|
||||
{"current_steps": 570, "total_steps": 1405, "loss": 0.0138, "lr": 3.713955854772144e-06, "epoch": 2.0284697508896796, "percentage": 40.57, "elapsed_time": "0:07:32", "remaining_time": "0:11:02", "throughput": 1821.36, "total_tokens": 823848}
|
||||
{"current_steps": 575, "total_steps": 1405, "loss": 0.1069, "lr": 3.686703482795802e-06, "epoch": 2.0462633451957295, "percentage": 40.93, "elapsed_time": "0:07:32", "remaining_time": "0:10:53", "throughput": 1837.99, "total_tokens": 832232}
|
||||
{"current_steps": 580, "total_steps": 1405, "loss": 0.042, "lr": 3.6592678446789516e-06, "epoch": 2.0640569395017794, "percentage": 41.28, "elapsed_time": "0:07:33", "remaining_time": "0:10:44", "throughput": 1854.17, "total_tokens": 840424}
|
||||
{"current_steps": 585, "total_steps": 1405, "loss": 0.0325, "lr": 3.631653177388605e-06, "epoch": 2.0818505338078293, "percentage": 41.64, "elapsed_time": "0:07:33", "remaining_time": "0:10:35", "throughput": 1866.56, "total_tokens": 846824}
|
||||
{"current_steps": 590, "total_steps": 1405, "loss": 0.003, "lr": 3.6038637455397802e-06, "epoch": 2.099644128113879, "percentage": 41.99, "elapsed_time": "0:07:34", "remaining_time": "0:10:27", "throughput": 1879.75, "total_tokens": 853608}
|
||||
{"current_steps": 595, "total_steps": 1405, "loss": 0.1002, "lr": 3.575903840736906e-06, "epoch": 2.117437722419929, "percentage": 42.35, "elapsed_time": "0:07:34", "remaining_time": "0:10:18", "throughput": 1894.1, "total_tokens": 860968}
|
||||
{"current_steps": 600, "total_steps": 1405, "loss": 0.0251, "lr": 3.547777780911055e-06, "epoch": 2.135231316725979, "percentage": 42.7, "elapsed_time": "0:07:35", "remaining_time": "0:10:10", "throughput": 1909.63, "total_tokens": 868904}
|
||||
{"current_steps": 605, "total_steps": 1405, "loss": 0.0005, "lr": 3.519489909653113e-06, "epoch": 2.1530249110320283, "percentage": 43.06, "elapsed_time": "0:07:35", "remaining_time": "0:10:02", "throughput": 1923.52, "total_tokens": 876072}
|
||||
{"current_steps": 610, "total_steps": 1405, "loss": 0.0155, "lr": 3.4910445955429856e-06, "epoch": 2.170818505338078, "percentage": 43.42, "elapsed_time": "0:07:35", "remaining_time": "0:09:54", "throughput": 1938.44, "total_tokens": 883752}
|
||||
{"current_steps": 615, "total_steps": 1405, "loss": 0.0004, "lr": 3.4624462314749447e-06, "epoch": 2.188612099644128, "percentage": 43.77, "elapsed_time": "0:07:36", "remaining_time": "0:09:46", "throughput": 1953.08, "total_tokens": 891304}
|
||||
{"current_steps": 620, "total_steps": 1405, "loss": 0.0163, "lr": 3.433699233979222e-06, "epoch": 2.206405693950178, "percentage": 44.13, "elapsed_time": "0:07:36", "remaining_time": "0:09:38", "throughput": 1968.35, "total_tokens": 899176}
|
||||
{"current_steps": 625, "total_steps": 1405, "loss": 0.0001, "lr": 3.4048080425399506e-06, "epoch": 2.224199288256228, "percentage": 44.48, "elapsed_time": "0:07:37", "remaining_time": "0:09:30", "throughput": 1984.65, "total_tokens": 907560}
|
||||
{"current_steps": 630, "total_steps": 1405, "loss": 0.0027, "lr": 3.375777118909561e-06, "epoch": 2.2419928825622777, "percentage": 44.84, "elapsed_time": "0:07:37", "remaining_time": "0:09:23", "throughput": 1999.46, "total_tokens": 915240}
|
||||
{"current_steps": 635, "total_steps": 1405, "loss": 0.0259, "lr": 3.346610946419743e-06, "epoch": 2.2597864768683276, "percentage": 45.2, "elapsed_time": "0:07:38", "remaining_time": "0:09:15", "throughput": 2011.06, "total_tokens": 921384}
|
||||
{"current_steps": 639, "total_steps": 1405, "eval_loss": 0.25424328446388245, "epoch": 2.2740213523131674, "percentage": 45.48, "elapsed_time": "0:07:39", "remaining_time": "0:09:10", "throughput": 2018.66, "total_tokens": 926760}
|
||||
{"current_steps": 640, "total_steps": 1405, "loss": 0.0233, "lr": 3.3173140292890673e-06, "epoch": 2.277580071174377, "percentage": 45.55, "elapsed_time": "0:08:47", "remaining_time": "0:10:30", "throughput": 1758.9, "total_tokens": 927528}
|
||||
{"current_steps": 645, "total_steps": 1405, "loss": 0.0266, "lr": 3.2878908919273867e-06, "epoch": 2.295373665480427, "percentage": 45.91, "elapsed_time": "0:08:47", "remaining_time": "0:10:21", "throughput": 1770.76, "total_tokens": 934568}
|
||||
{"current_steps": 650, "total_steps": 1405, "loss": 0.0006, "lr": 3.2583460782371217e-06, "epoch": 2.3131672597864767, "percentage": 46.26, "elapsed_time": "0:08:48", "remaining_time": "0:10:13", "throughput": 1783.78, "total_tokens": 942248}
|
||||
{"current_steps": 655, "total_steps": 1405, "loss": 0.0003, "lr": 3.228684150911527e-06, "epoch": 2.3309608540925266, "percentage": 46.62, "elapsed_time": "0:08:48", "remaining_time": "0:10:05", "throughput": 1795.27, "total_tokens": 949096}
|
||||
{"current_steps": 660, "total_steps": 1405, "loss": 0.0082, "lr": 3.1989096907300634e-06, "epoch": 2.3487544483985765, "percentage": 46.98, "elapsed_time": "0:08:49", "remaining_time": "0:09:57", "throughput": 1806.39, "total_tokens": 955752}
|
||||
{"current_steps": 665, "total_steps": 1405, "loss": 0.0486, "lr": 3.1690272958509772e-06, "epoch": 2.3665480427046264, "percentage": 47.33, "elapsed_time": "0:08:49", "remaining_time": "0:09:49", "throughput": 1818.88, "total_tokens": 963176}
|
||||
{"current_steps": 670, "total_steps": 1405, "loss": 0.0001, "lr": 3.139041581101187e-06, "epoch": 2.3843416370106763, "percentage": 47.69, "elapsed_time": "0:08:49", "remaining_time": "0:09:41", "throughput": 1827.08, "total_tokens": 968232}
|
||||
{"current_steps": 675, "total_steps": 1405, "loss": 0.0222, "lr": 3.108957177263608e-06, "epoch": 2.402135231316726, "percentage": 48.04, "elapsed_time": "0:08:50", "remaining_time": "0:09:33", "throughput": 1841.14, "total_tokens": 976552}
|
||||
{"current_steps": 680, "total_steps": 1405, "loss": 0.0075, "lr": 3.078778730362003e-06, "epoch": 2.419928825622776, "percentage": 48.4, "elapsed_time": "0:08:50", "remaining_time": "0:09:25", "throughput": 1853.1, "total_tokens": 983720}
|
||||
{"current_steps": 685, "total_steps": 1405, "loss": 0.0003, "lr": 3.0485109009434844e-06, "epoch": 2.4377224199288254, "percentage": 48.75, "elapsed_time": "0:08:51", "remaining_time": "0:09:18", "throughput": 1867.0, "total_tokens": 991976}
|
||||
{"current_steps": 690, "total_steps": 1405, "loss": 0.061, "lr": 3.018158363358773e-06, "epoch": 2.4555160142348753, "percentage": 49.11, "elapsed_time": "0:08:51", "remaining_time": "0:09:10", "throughput": 1877.23, "total_tokens": 998184}
|
||||
{"current_steps": 695, "total_steps": 1405, "loss": 0.0, "lr": 2.9877258050403214e-06, "epoch": 2.473309608540925, "percentage": 49.47, "elapsed_time": "0:08:52", "remaining_time": "0:09:03", "throughput": 1889.72, "total_tokens": 1005672}
|
||||
{"current_steps": 700, "total_steps": 1405, "loss": 0.0152, "lr": 2.9572179257784215e-06, "epoch": 2.491103202846975, "percentage": 49.82, "elapsed_time": "0:08:52", "remaining_time": "0:08:56", "throughput": 1902.07, "total_tokens": 1013096}
|
||||
{"current_steps": 705, "total_steps": 1405, "loss": 0.0006, "lr": 2.9266394369954056e-06, "epoch": 2.508896797153025, "percentage": 50.18, "elapsed_time": "0:08:53", "remaining_time": "0:08:49", "throughput": 1912.23, "total_tokens": 1019304}
|
||||
{"current_steps": 710, "total_steps": 1405, "loss": 0.029, "lr": 2.8959950610180376e-06, "epoch": 2.526690391459075, "percentage": 50.53, "elapsed_time": "0:08:53", "remaining_time": "0:08:42", "throughput": 1922.02, "total_tokens": 1025320}
|
||||
{"current_steps": 710, "total_steps": 1405, "eval_loss": 0.23608553409576416, "epoch": 2.526690391459075, "percentage": 50.53, "elapsed_time": "0:08:54", "remaining_time": "0:08:42", "throughput": 1919.77, "total_tokens": 1025320}
|
||||
{"current_steps": 715, "total_steps": 1405, "loss": 0.0, "lr": 2.865289530348243e-06, "epoch": 2.5444839857651247, "percentage": 50.89, "elapsed_time": "0:09:28", "remaining_time": "0:09:08", "throughput": 1816.38, "total_tokens": 1032552}
|
||||
{"current_steps": 720, "total_steps": 1405, "loss": 0.0, "lr": 2.8345275869322432e-06, "epoch": 2.562277580071174, "percentage": 51.25, "elapsed_time": "0:09:28", "remaining_time": "0:09:01", "throughput": 1827.88, "total_tokens": 1039912}
|
||||
{"current_steps": 725, "total_steps": 1405, "loss": 0.0092, "lr": 2.8037139814282494e-06, "epoch": 2.580071174377224, "percentage": 51.6, "elapsed_time": "0:09:29", "remaining_time": "0:08:54", "throughput": 1839.26, "total_tokens": 1047208}
|
||||
{"current_steps": 730, "total_steps": 1405, "loss": 0.0001, "lr": 2.7728534724728027e-06, "epoch": 2.597864768683274, "percentage": 51.96, "elapsed_time": "0:09:29", "remaining_time": "0:08:46", "throughput": 1849.66, "total_tokens": 1053928}
|
||||
{"current_steps": 735, "total_steps": 1405, "loss": 0.0806, "lr": 2.741950825945881e-06, "epoch": 2.6156583629893237, "percentage": 52.31, "elapsed_time": "0:09:30", "remaining_time": "0:08:39", "throughput": 1861.64, "total_tokens": 1061608}
|
||||
{"current_steps": 740, "total_steps": 1405, "loss": 0.0747, "lr": 2.7110108142348962e-06, "epoch": 2.6334519572953736, "percentage": 52.67, "elapsed_time": "0:09:30", "remaining_time": "0:08:32", "throughput": 1870.73, "total_tokens": 1067560}
|
||||
{"current_steps": 745, "total_steps": 1405, "loss": 0.0001, "lr": 2.6800382154976734e-06, "epoch": 2.6512455516014235, "percentage": 53.02, "elapsed_time": "0:09:31", "remaining_time": "0:08:25", "throughput": 1880.86, "total_tokens": 1074152}
|
||||
{"current_steps": 750, "total_steps": 1405, "loss": 0.0005, "lr": 2.64903781292455e-06, "epoch": 2.6690391459074734, "percentage": 53.38, "elapsed_time": "0:09:31", "remaining_time": "0:08:19", "throughput": 1894.52, "total_tokens": 1082856}
|
||||
{"current_steps": 755, "total_steps": 1405, "loss": 0.0003, "lr": 2.6180143939996926e-06, "epoch": 2.6868327402135233, "percentage": 53.74, "elapsed_time": "0:09:32", "remaining_time": "0:08:12", "throughput": 1904.74, "total_tokens": 1089512}
|
||||
{"current_steps": 760, "total_steps": 1405, "loss": 0.0433, "lr": 2.5869727497617495e-06, "epoch": 2.704626334519573, "percentage": 54.09, "elapsed_time": "0:09:32", "remaining_time": "0:08:05", "throughput": 1915.04, "total_tokens": 1096232}
|
||||
{"current_steps": 765, "total_steps": 1405, "loss": 0.0004, "lr": 2.55591767406396e-06, "epoch": 2.722419928825623, "percentage": 54.45, "elapsed_time": "0:09:32", "remaining_time": "0:07:59", "throughput": 1927.34, "total_tokens": 1104168}
|
||||
{"current_steps": 770, "total_steps": 1405, "loss": 0.1194, "lr": 2.524853962833825e-06, "epoch": 2.7402135231316724, "percentage": 54.8, "elapsed_time": "0:09:33", "remaining_time": "0:07:52", "throughput": 1939.85, "total_tokens": 1112232}
|
||||
{"current_steps": 775, "total_steps": 1405, "loss": 0.0031, "lr": 2.4937864133324514e-06, "epoch": 2.7580071174377223, "percentage": 55.16, "elapsed_time": "0:09:33", "remaining_time": "0:07:46", "throughput": 1950.21, "total_tokens": 1119016}
|
||||
{"current_steps": 780, "total_steps": 1405, "loss": 0.0005, "lr": 2.462719823413707e-06, "epoch": 2.775800711743772, "percentage": 55.52, "elapsed_time": "0:09:34", "remaining_time": "0:07:40", "throughput": 1962.04, "total_tokens": 1126696}
|
||||
{"current_steps": 781, "total_steps": 1405, "eval_loss": 0.23524385690689087, "epoch": 2.7793594306049823, "percentage": 55.59, "elapsed_time": "0:09:35", "remaining_time": "0:07:39", "throughput": 1960.39, "total_tokens": 1128104}
|
||||
{"current_steps": 785, "total_steps": 1405, "loss": 0.0423, "lr": 2.4316589907832654e-06, "epoch": 2.793594306049822, "percentage": 55.87, "elapsed_time": "0:10:24", "remaining_time": "0:08:13", "throughput": 1816.76, "total_tokens": 1134184}
|
||||
{"current_steps": 790, "total_steps": 1405, "loss": 0.001, "lr": 2.4006087122576867e-06, "epoch": 2.811387900355872, "percentage": 56.23, "elapsed_time": "0:10:24", "remaining_time": "0:08:06", "throughput": 1825.49, "total_tokens": 1140392}
|
||||
{"current_steps": 795, "total_steps": 1405, "loss": 0.031, "lr": 2.3695737830236263e-06, "epoch": 2.829181494661922, "percentage": 56.58, "elapsed_time": "0:10:25", "remaining_time": "0:07:59", "throughput": 1836.84, "total_tokens": 1148328}
|
||||
{"current_steps": 800, "total_steps": 1405, "loss": 0.0007, "lr": 2.3385589958973073e-06, "epoch": 2.8469750889679717, "percentage": 56.94, "elapsed_time": "0:10:25", "remaining_time": "0:07:53", "throughput": 1844.75, "total_tokens": 1154024}
|
||||
{"current_steps": 805, "total_steps": 1405, "loss": 0.0003, "lr": 2.3075691405843435e-06, "epoch": 2.864768683274021, "percentage": 57.3, "elapsed_time": "0:10:26", "remaining_time": "0:07:46", "throughput": 1854.32, "total_tokens": 1160808}
|
||||
{"current_steps": 810, "total_steps": 1405, "loss": 0.0299, "lr": 2.2766090029400573e-06, "epoch": 2.882562277580071, "percentage": 57.65, "elapsed_time": "0:10:26", "remaining_time": "0:07:40", "throughput": 1864.36, "total_tokens": 1167912}
|
||||
{"current_steps": 815, "total_steps": 1405, "loss": 0.0001, "lr": 2.2456833642303825e-06, "epoch": 2.900355871886121, "percentage": 58.01, "elapsed_time": "0:10:26", "remaining_time": "0:07:33", "throughput": 1873.7, "total_tokens": 1174568}
|
||||
{"current_steps": 820, "total_steps": 1405, "loss": 0.0001, "lr": 2.214797000393479e-06, "epoch": 2.9181494661921707, "percentage": 58.36, "elapsed_time": "0:10:27", "remaining_time": "0:07:27", "throughput": 1883.4, "total_tokens": 1181480}
|
||||
{"current_steps": 825, "total_steps": 1405, "loss": 0.0251, "lr": 2.183954681302173e-06, "epoch": 2.9359430604982206, "percentage": 58.72, "elapsed_time": "0:10:27", "remaining_time": "0:07:21", "throughput": 1895.44, "total_tokens": 1189928}
|
||||
{"current_steps": 830, "total_steps": 1405, "loss": 0.0001, "lr": 2.15316117002733e-06, "epoch": 2.9537366548042705, "percentage": 59.07, "elapsed_time": "0:10:28", "remaining_time": "0:07:15", "throughput": 1906.1, "total_tokens": 1197480}
|
||||
{"current_steps": 835, "total_steps": 1405, "loss": 0.039, "lr": 2.122421222102278e-06, "epoch": 2.9715302491103204, "percentage": 59.43, "elapsed_time": "0:10:28", "remaining_time": "0:07:09", "throughput": 1916.05, "total_tokens": 1204584}
|
||||
{"current_steps": 840, "total_steps": 1405, "loss": 0.0311, "lr": 2.0917395847884e-06, "epoch": 2.9893238434163703, "percentage": 59.79, "elapsed_time": "0:10:29", "remaining_time": "0:07:03", "throughput": 1927.35, "total_tokens": 1212584}
|
||||
{"current_steps": 845, "total_steps": 1405, "loss": 0.0104, "lr": 2.061120996341996e-06, "epoch": 3.00711743772242, "percentage": 60.14, "elapsed_time": "0:10:29", "remaining_time": "0:06:57", "throughput": 1934.0, "total_tokens": 1217856}
|
||||
{"current_steps": 850, "total_steps": 1405, "loss": 0.0001, "lr": 2.030570185282544e-06, "epoch": 3.0249110320284696, "percentage": 60.5, "elapsed_time": "0:10:30", "remaining_time": "0:06:51", "throughput": 1946.42, "total_tokens": 1226624}
|
||||
{"current_steps": 852, "total_steps": 1405, "eval_loss": 0.25802624225616455, "epoch": 3.0320284697508897, "percentage": 60.64, "elapsed_time": "0:10:30", "remaining_time": "0:06:49", "throughput": 1948.51, "total_tokens": 1229440}
|
||||
{"current_steps": 855, "total_steps": 1405, "loss": 0.0, "lr": 2.0000918696624587e-06, "epoch": 3.0427046263345194, "percentage": 60.85, "elapsed_time": "0:11:23", "remaining_time": "0:07:19", "throughput": 1805.25, "total_tokens": 1233152}
|
||||
{"current_steps": 860, "total_steps": 1405, "loss": 0.0, "lr": 1.9696907563384687e-06, "epoch": 3.0604982206405693, "percentage": 61.21, "elapsed_time": "0:11:23", "remaining_time": "0:07:13", "throughput": 1814.3, "total_tokens": 1240128}
|
||||
{"current_steps": 865, "total_steps": 1405, "loss": 0.0, "lr": 1.9393715402447228e-06, "epoch": 3.078291814946619, "percentage": 61.57, "elapsed_time": "0:11:23", "remaining_time": "0:07:07", "throughput": 1824.66, "total_tokens": 1248064}
|
||||
{"current_steps": 870, "total_steps": 1405, "loss": 0.0, "lr": 1.9091389036677384e-06, "epoch": 3.096085409252669, "percentage": 61.92, "elapsed_time": "0:11:24", "remaining_time": "0:07:00", "throughput": 1833.96, "total_tokens": 1255232}
|
||||
{"current_steps": 875, "total_steps": 1405, "loss": 0.0486, "lr": 1.878997515523299e-06, "epoch": 3.113879003558719, "percentage": 62.28, "elapsed_time": "0:11:24", "remaining_time": "0:06:54", "throughput": 1843.07, "total_tokens": 1262272}
|
||||
{"current_steps": 880, "total_steps": 1405, "loss": 0.0001, "lr": 1.8489520306354243e-06, "epoch": 3.131672597864769, "percentage": 62.63, "elapsed_time": "0:11:25", "remaining_time": "0:06:48", "throughput": 1852.6, "total_tokens": 1269632}
|
||||
{"current_steps": 885, "total_steps": 1405, "loss": 0.0001, "lr": 1.8190070890175082e-06, "epoch": 3.1494661921708187, "percentage": 62.99, "elapsed_time": "0:11:25", "remaining_time": "0:06:42", "throughput": 1862.56, "total_tokens": 1277312}
|
||||
{"current_steps": 890, "total_steps": 1405, "loss": 0.0502, "lr": 1.7891673151557493e-06, "epoch": 3.167259786476868, "percentage": 63.35, "elapsed_time": "0:11:26", "remaining_time": "0:06:37", "throughput": 1871.27, "total_tokens": 1284096}
|
||||
{"current_steps": 895, "total_steps": 1405, "loss": 0.0001, "lr": 1.7594373172949786e-06, "epoch": 3.185053380782918, "percentage": 63.7, "elapsed_time": "0:11:26", "remaining_time": "0:06:31", "throughput": 1881.04, "total_tokens": 1291648}
|
||||
{"current_steps": 900, "total_steps": 1405, "loss": 0.0001, "lr": 1.7298216867269906e-06, "epoch": 3.202846975088968, "percentage": 64.06, "elapsed_time": "0:11:27", "remaining_time": "0:06:25", "throughput": 1891.51, "total_tokens": 1299712}
|
||||
{"current_steps": 905, "total_steps": 1405, "loss": 0.0001, "lr": 1.7003249970815028e-06, "epoch": 3.2206405693950177, "percentage": 64.41, "elapsed_time": "0:11:27", "remaining_time": "0:06:19", "throughput": 1899.73, "total_tokens": 1306176}
|
||||
{"current_steps": 910, "total_steps": 1405, "loss": 0.0001, "lr": 1.6709518036198307e-06, "epoch": 3.2384341637010676, "percentage": 64.77, "elapsed_time": "0:11:28", "remaining_time": "0:06:14", "throughput": 1909.99, "total_tokens": 1314112}
|
||||
{"current_steps": 915, "total_steps": 1405, "loss": 0.0251, "lr": 1.6417066425314088e-06, "epoch": 3.2562277580071175, "percentage": 65.12, "elapsed_time": "0:11:28", "remaining_time": "0:06:08", "throughput": 1918.91, "total_tokens": 1321088}
|
||||
{"current_steps": 920, "total_steps": 1405, "loss": 0.0001, "lr": 1.612594030233252e-06, "epoch": 3.2740213523131674, "percentage": 65.48, "elapsed_time": "0:11:28", "remaining_time": "0:06:03", "throughput": 1928.42, "total_tokens": 1328512}
|
||||
{"current_steps": 923, "total_steps": 1405, "eval_loss": 0.22950421273708344, "epoch": 3.284697508896797, "percentage": 65.69, "elapsed_time": "0:11:29", "remaining_time": "0:06:00", "throughput": 1931.9, "total_tokens": 1332544}
|
||||
{"current_steps": 925, "total_steps": 1405, "loss": 0.0005, "lr": 1.5836184626724722e-06, "epoch": 3.2918149466192173, "percentage": 65.84, "elapsed_time": "0:12:15", "remaining_time": "0:06:21", "throughput": 1815.47, "total_tokens": 1336128}
|
||||
{"current_steps": 930, "total_steps": 1405, "loss": 0.0, "lr": 1.5547844146319547e-06, "epoch": 3.309608540925267, "percentage": 66.19, "elapsed_time": "0:12:16", "remaining_time": "0:06:16", "throughput": 1824.44, "total_tokens": 1343552}
|
||||
{"current_steps": 935, "total_steps": 1405, "loss": 0.0383, "lr": 1.5260963390393075e-06, "epoch": 3.3274021352313166, "percentage": 66.55, "elapsed_time": "0:12:16", "remaining_time": "0:06:10", "throughput": 1834.14, "total_tokens": 1351552}
|
||||
{"current_steps": 940, "total_steps": 1405, "loss": 0.0002, "lr": 1.4975586662791783e-06, "epoch": 3.3451957295373664, "percentage": 66.9, "elapsed_time": "0:12:17", "remaining_time": "0:06:04", "throughput": 1842.17, "total_tokens": 1358272}
|
||||
{"current_steps": 945, "total_steps": 1405, "loss": 0.0001, "lr": 1.4691758035090603e-06, "epoch": 3.3629893238434163, "percentage": 67.26, "elapsed_time": "0:12:17", "remaining_time": "0:05:59", "throughput": 1852.51, "total_tokens": 1366784}
|
||||
{"current_steps": 950, "total_steps": 1405, "loss": 0.0001, "lr": 1.4409521339786809e-06, "epoch": 3.380782918149466, "percentage": 67.62, "elapsed_time": "0:12:18", "remaining_time": "0:05:53", "throughput": 1860.29, "total_tokens": 1373312}
|
||||
{"current_steps": 955, "total_steps": 1405, "loss": 0.0001, "lr": 1.41289201635308e-06, "epoch": 3.398576512455516, "percentage": 67.97, "elapsed_time": "0:12:18", "remaining_time": "0:05:48", "throughput": 1869.21, "total_tokens": 1380736}
|
||||
{"current_steps": 960, "total_steps": 1405, "loss": 0.0001, "lr": 1.3849997840394943e-06, "epoch": 3.416370106761566, "percentage": 68.33, "elapsed_time": "0:12:19", "remaining_time": "0:05:42", "throughput": 1878.61, "total_tokens": 1388544}
|
||||
{"current_steps": 965, "total_steps": 1405, "loss": 0.0001, "lr": 1.3572797445181346e-06, "epoch": 3.434163701067616, "percentage": 68.68, "elapsed_time": "0:12:19", "remaining_time": "0:05:37", "throughput": 1887.76, "total_tokens": 1396160}
|
||||
{"current_steps": 970, "total_steps": 1405, "loss": 0.0, "lr": 1.3297361786769654e-06, "epoch": 3.4519572953736652, "percentage": 69.04, "elapsed_time": "0:12:20", "remaining_time": "0:05:31", "throughput": 1897.31, "total_tokens": 1404096}
|
||||
{"current_steps": 975, "total_steps": 1405, "loss": 0.0004, "lr": 1.302373340150598e-06, "epoch": 3.469750889679715, "percentage": 69.4, "elapsed_time": "0:12:20", "remaining_time": "0:05:26", "throughput": 1905.54, "total_tokens": 1411008}
|
||||
{"current_steps": 980, "total_steps": 1405, "loss": 0.0001, "lr": 1.2751954546633872e-06, "epoch": 3.487544483985765, "percentage": 69.75, "elapsed_time": "0:12:20", "remaining_time": "0:05:21", "throughput": 1914.97, "total_tokens": 1418880}
|
||||
{"current_steps": 985, "total_steps": 1405, "loss": 0.0, "lr": 1.2482067193768419e-06, "epoch": 3.505338078291815, "percentage": 70.11, "elapsed_time": "0:12:21", "remaining_time": "0:05:16", "throughput": 1923.5, "total_tokens": 1426048}
|
||||
{"current_steps": 990, "total_steps": 1405, "loss": 0.0001, "lr": 1.2214113022414448e-06, "epoch": 3.5231316725978647, "percentage": 70.46, "elapsed_time": "0:12:21", "remaining_time": "0:05:10", "throughput": 1930.54, "total_tokens": 1432064}
|
||||
{"current_steps": 994, "total_steps": 1405, "eval_loss": 0.24046999216079712, "epoch": 3.5373665480427046, "percentage": 70.75, "elapsed_time": "0:12:22", "remaining_time": "0:05:07", "throughput": 1936.33, "total_tokens": 1438336}
|
||||
{"current_steps": 995, "total_steps": 1405, "loss": 0.0, "lr": 1.1948133413529817e-06, "epoch": 3.5409252669039146, "percentage": 70.82, "elapsed_time": "0:13:12", "remaining_time": "0:05:26", "throughput": 1815.74, "total_tokens": 1439808}
|
||||
{"current_steps": 1000, "total_steps": 1405, "loss": 0.0001, "lr": 1.168416944313486e-06, "epoch": 3.5587188612099645, "percentage": 71.17, "elapsed_time": "0:13:13", "remaining_time": "0:05:21", "throughput": 1824.55, "total_tokens": 1447616}
|
||||
{"current_steps": 1005, "total_steps": 1405, "loss": 0.0, "lr": 1.1422261875968845e-06, "epoch": 3.5765124555160144, "percentage": 71.53, "elapsed_time": "0:13:13", "remaining_time": "0:05:15", "throughput": 1831.87, "total_tokens": 1454208}
|
||||
{"current_steps": 1010, "total_steps": 1405, "loss": 0.0, "lr": 1.1162451159194615e-06, "epoch": 3.5943060498220643, "percentage": 71.89, "elapsed_time": "0:13:14", "remaining_time": "0:05:10", "throughput": 1842.19, "total_tokens": 1463296}
|
||||
{"current_steps": 1015, "total_steps": 1405, "loss": 0.0009, "lr": 1.0904777416152166e-06, "epoch": 3.612099644128114, "percentage": 72.24, "elapsed_time": "0:13:14", "remaining_time": "0:05:05", "throughput": 1849.56, "total_tokens": 1469952}
|
||||
{"current_steps": 1020, "total_steps": 1405, "loss": 0.0, "lr": 1.0649280440162326e-06, "epoch": 3.6298932384341636, "percentage": 72.6, "elapsed_time": "0:13:15", "remaining_time": "0:05:00", "throughput": 1857.63, "total_tokens": 1477184}
|
||||
{"current_steps": 1025, "total_steps": 1405, "loss": 0.0, "lr": 1.0395999688381313e-06, "epoch": 3.6476868327402134, "percentage": 72.95, "elapsed_time": "0:13:15", "remaining_time": "0:04:54", "throughput": 1865.38, "total_tokens": 1484160}
|
||||
{"current_steps": 1030, "total_steps": 1405, "loss": 0.0001, "lr": 1.0144974275707243e-06, "epoch": 3.6654804270462633, "percentage": 73.31, "elapsed_time": "0:13:16", "remaining_time": "0:04:49", "throughput": 1873.19, "total_tokens": 1491200}
|
||||
{"current_steps": 1035, "total_steps": 1405, "loss": 0.0, "lr": 9.896242968739538e-07, "epoch": 3.683274021352313, "percentage": 73.67, "elapsed_time": "0:13:16", "remaining_time": "0:04:44", "throughput": 1881.15, "total_tokens": 1498368}
|
||||
{"current_steps": 1040, "total_steps": 1405, "loss": 0.0, "lr": 9.649844179792082e-07, "epoch": 3.701067615658363, "percentage": 74.02, "elapsed_time": "0:13:16", "remaining_time": "0:04:39", "throughput": 1889.65, "total_tokens": 1505984}
|
||||
{"current_steps": 1045, "total_steps": 1405, "loss": 0.0, "lr": 9.405815960961054e-07, "epoch": 3.718861209964413, "percentage": 74.38, "elapsed_time": "0:13:17", "remaining_time": "0:04:34", "throughput": 1895.83, "total_tokens": 1511680}
|
||||
{"current_steps": 1050, "total_steps": 1405, "loss": 0.0, "lr": 9.164195998248471e-07, "epoch": 3.7366548042704624, "percentage": 74.73, "elapsed_time": "0:13:17", "remaining_time": "0:04:29", "throughput": 1902.62, "total_tokens": 1517888}
|
||||
{"current_steps": 1055, "total_steps": 1405, "loss": 0.0109, "lr": 8.925021605742212e-07, "epoch": 3.7544483985765122, "percentage": 75.09, "elapsed_time": "0:13:18", "remaining_time": "0:04:24", "throughput": 1911.15, "total_tokens": 1525568}
|
||||
{"current_steps": 1060, "total_steps": 1405, "loss": 0.0, "lr": 8.68832971985347e-07, "epoch": 3.772241992882562, "percentage": 75.44, "elapsed_time": "0:13:18", "remaining_time": "0:04:19", "throughput": 1918.77, "total_tokens": 1532480}
|
||||
{"current_steps": 1065, "total_steps": 1405, "loss": 0.0, "lr": 8.454156893612592e-07, "epoch": 3.790035587188612, "percentage": 75.8, "elapsed_time": "0:13:19", "remaining_time": "0:04:15", "throughput": 1925.99, "total_tokens": 1539072}
|
||||
{"current_steps": 1065, "total_steps": 1405, "eval_loss": 0.2512344419956207, "epoch": 3.790035587188612, "percentage": 75.8, "elapsed_time": "0:13:19", "remaining_time": "0:04:15", "throughput": 1924.38, "total_tokens": 1539072}
|
||||
{"current_steps": 1070, "total_steps": 1405, "loss": 0.0, "lr": 8.222539291024079e-07, "epoch": 3.807829181494662, "percentage": 76.16, "elapsed_time": "0:13:56", "remaining_time": "0:04:21", "throughput": 1850.89, "total_tokens": 1547584}
|
||||
{"current_steps": 1075, "total_steps": 1405, "loss": 0.0, "lr": 7.993512681481638e-07, "epoch": 3.8256227758007118, "percentage": 76.51, "elapsed_time": "0:13:56", "remaining_time": "0:04:16", "throughput": 1857.97, "total_tokens": 1554304}
|
||||
{"current_steps": 1080, "total_steps": 1405, "loss": 0.0, "lr": 7.767112434244254e-07, "epoch": 3.8434163701067616, "percentage": 76.87, "elapsed_time": "0:13:56", "remaining_time": "0:04:11", "throughput": 1864.9, "total_tokens": 1560896}
|
||||
{"current_steps": 1085, "total_steps": 1405, "loss": 0.0, "lr": 7.543373512973947e-07, "epoch": 3.8612099644128115, "percentage": 77.22, "elapsed_time": "0:13:57", "remaining_time": "0:04:06", "throughput": 1872.11, "total_tokens": 1567744}
|
||||
{"current_steps": 1090, "total_steps": 1405, "loss": 0.032, "lr": 7.322330470336314e-07, "epoch": 3.8790035587188614, "percentage": 77.58, "elapsed_time": "0:13:57", "remaining_time": "0:04:02", "throughput": 1879.1, "total_tokens": 1574400}
|
||||
{"current_steps": 1095, "total_steps": 1405, "loss": 0.0187, "lr": 7.104017442664393e-07, "epoch": 3.8967971530249113, "percentage": 77.94, "elapsed_time": "0:13:58", "remaining_time": "0:03:57", "throughput": 1886.59, "total_tokens": 1581504}
|
||||
{"current_steps": 1100, "total_steps": 1405, "loss": 0.0, "lr": 6.88846814468691e-07, "epoch": 3.914590747330961, "percentage": 78.29, "elapsed_time": "0:13:58", "remaining_time": "0:03:52", "throughput": 1895.09, "total_tokens": 1589504}
|
||||
{"current_steps": 1105, "total_steps": 1405, "loss": 0.0369, "lr": 6.67571586432163e-07, "epoch": 3.9323843416370106, "percentage": 78.65, "elapsed_time": "0:13:59", "remaining_time": "0:03:47", "throughput": 1903.8, "total_tokens": 1597696}
|
||||
{"current_steps": 1110, "total_steps": 1405, "loss": 0.0, "lr": 6.465793457534553e-07, "epoch": 3.9501779359430604, "percentage": 79.0, "elapsed_time": "0:13:59", "remaining_time": "0:03:43", "throughput": 1911.77, "total_tokens": 1605248}
|
||||
{"current_steps": 1115, "total_steps": 1405, "loss": 0.0002, "lr": 6.258733343265933e-07, "epoch": 3.9679715302491103, "percentage": 79.36, "elapsed_time": "0:14:00", "remaining_time": "0:03:38", "throughput": 1921.04, "total_tokens": 1613952}
|
||||
{"current_steps": 1120, "total_steps": 1405, "loss": 0.0, "lr": 6.054567498423683e-07, "epoch": 3.98576512455516, "percentage": 79.72, "elapsed_time": "0:14:00", "remaining_time": "0:03:33", "throughput": 1927.54, "total_tokens": 1620224}
|
||||
{"current_steps": 1125, "total_steps": 1405, "loss": 0.0, "lr": 5.853327452945115e-07, "epoch": 4.00355871886121, "percentage": 80.07, "elapsed_time": "0:14:01", "remaining_time": "0:03:29", "throughput": 1932.97, "total_tokens": 1625800}
|
||||
{"current_steps": 1130, "total_steps": 1405, "loss": 0.0, "lr": 5.655044284927658e-07, "epoch": 4.0213523131672595, "percentage": 80.43, "elapsed_time": "0:14:01", "remaining_time": "0:03:24", "throughput": 1940.9, "total_tokens": 1633352}
|
||||
{"current_steps": 1135, "total_steps": 1405, "loss": 0.0, "lr": 5.459748615829355e-07, "epoch": 4.039145907473309, "percentage": 80.78, "elapsed_time": "0:14:01", "remaining_time": "0:03:20", "throughput": 1948.76, "total_tokens": 1640840}
|
||||
{"current_steps": 1136, "total_steps": 1405, "eval_loss": 0.2551669180393219, "epoch": 4.04270462633452, "percentage": 80.85, "elapsed_time": "0:14:03", "remaining_time": "0:03:19", "throughput": 1946.88, "total_tokens": 1642696}
|
||||
{"current_steps": 1140, "total_steps": 1405, "loss": 0.0, "lr": 5.267470605739953e-07, "epoch": 4.056939501779359, "percentage": 81.14, "elapsed_time": "0:14:44", "remaining_time": "0:03:25", "throughput": 1862.79, "total_tokens": 1648520}
|
||||
{"current_steps": 1145, "total_steps": 1405, "loss": 0.0, "lr": 5.078239948723154e-07, "epoch": 4.074733096085409, "percentage": 81.49, "elapsed_time": "0:14:45", "remaining_time": "0:03:21", "throughput": 1870.02, "total_tokens": 1655752}
|
||||
{"current_steps": 1150, "total_steps": 1405, "loss": 0.0, "lr": 4.892085868230881e-07, "epoch": 4.092526690391459, "percentage": 81.85, "elapsed_time": "0:14:45", "remaining_time": "0:03:16", "throughput": 1877.18, "total_tokens": 1662920}
|
||||
{"current_steps": 1155, "total_steps": 1405, "loss": 0.0, "lr": 4.7090371125902175e-07, "epoch": 4.110320284697509, "percentage": 82.21, "elapsed_time": "0:14:46", "remaining_time": "0:03:11", "throughput": 1884.13, "total_tokens": 1669896}
|
||||
{"current_steps": 1160, "total_steps": 1405, "loss": 0.0, "lr": 4.529121950563717e-07, "epoch": 4.128113879003559, "percentage": 82.56, "elapsed_time": "0:14:46", "remaining_time": "0:03:07", "throughput": 1889.49, "total_tokens": 1675400}
|
||||
{"current_steps": 1165, "total_steps": 1405, "loss": 0.0, "lr": 4.352368166983753e-07, "epoch": 4.145907473309609, "percentage": 82.92, "elapsed_time": "0:14:47", "remaining_time": "0:03:02", "throughput": 1897.03, "total_tokens": 1682952}
|
||||
{"current_steps": 1170, "total_steps": 1405, "loss": 0.0, "lr": 4.178803058461664e-07, "epoch": 4.1637010676156585, "percentage": 83.27, "elapsed_time": "0:14:47", "remaining_time": "0:02:58", "throughput": 1904.31, "total_tokens": 1690248}
|
||||
{"current_steps": 1175, "total_steps": 1405, "loss": 0.0, "lr": 4.0084534291722375e-07, "epoch": 4.181494661921708, "percentage": 83.63, "elapsed_time": "0:14:48", "remaining_time": "0:02:53", "throughput": 1910.82, "total_tokens": 1696840}
|
||||
{"current_steps": 1180, "total_steps": 1405, "loss": 0.0, "lr": 3.8413455867142513e-07, "epoch": 4.199288256227758, "percentage": 83.99, "elapsed_time": "0:14:48", "remaining_time": "0:02:49", "throughput": 1917.52, "total_tokens": 1703624}
|
||||
{"current_steps": 1185, "total_steps": 1405, "loss": 0.0, "lr": 3.6775053380477296e-07, "epoch": 4.217081850533808, "percentage": 84.34, "elapsed_time": "0:14:48", "remaining_time": "0:02:45", "throughput": 1923.81, "total_tokens": 1710024}
|
||||
{"current_steps": 1190, "total_steps": 1405, "loss": 0.0, "lr": 3.516957985508476e-07, "epoch": 4.234875444839858, "percentage": 84.7, "elapsed_time": "0:14:49", "remaining_time": "0:02:40", "throughput": 1931.54, "total_tokens": 1717768}
|
||||
{"current_steps": 1195, "total_steps": 1405, "loss": 0.0, "lr": 3.3597283229005877e-07, "epoch": 4.252669039145908, "percentage": 85.05, "elapsed_time": "0:14:49", "remaining_time": "0:02:36", "throughput": 1941.1, "total_tokens": 1727240}
|
||||
{"current_steps": 1200, "total_steps": 1405, "loss": 0.0, "lr": 3.2058406316674563e-07, "epoch": 4.270462633451958, "percentage": 85.41, "elapsed_time": "0:14:50", "remaining_time": "0:02:32", "throughput": 1948.19, "total_tokens": 1734408}
|
||||
{"current_steps": 1205, "total_steps": 1405, "loss": 0.0, "lr": 3.055318677141916e-07, "epoch": 4.288256227758007, "percentage": 85.77, "elapsed_time": "0:14:50", "remaining_time": "0:02:27", "throughput": 1954.59, "total_tokens": 1740936}
|
||||
{"current_steps": 1207, "total_steps": 1405, "eval_loss": 0.257210373878479, "epoch": 4.295373665480427, "percentage": 85.91, "elapsed_time": "0:14:51", "remaining_time": "0:02:26", "throughput": 1955.93, "total_tokens": 1743624}
|
||||
{"current_steps": 1210, "total_steps": 1405, "loss": 0.0, "lr": 2.9081857048761014e-07, "epoch": 4.306049822064057, "percentage": 86.12, "elapsed_time": "0:15:22", "remaining_time": "0:02:28", "throughput": 1895.27, "total_tokens": 1747784}
|
||||
{"current_steps": 1215, "total_steps": 1405, "loss": 0.0, "lr": 2.764464437051537e-07, "epoch": 4.3238434163701065, "percentage": 86.48, "elapsed_time": "0:15:22", "remaining_time": "0:02:24", "throughput": 1902.07, "total_tokens": 1754888}
|
||||
{"current_steps": 1220, "total_steps": 1405, "loss": 0.0, "lr": 2.624177068970124e-07, "epoch": 4.341637010676156, "percentage": 86.83, "elapsed_time": "0:15:23", "remaining_time": "0:02:19", "throughput": 1909.52, "total_tokens": 1762632}
|
||||
{"current_steps": 1225, "total_steps": 1405, "loss": 0.0, "lr": 2.4873452656264316e-07, "epoch": 4.359430604982206, "percentage": 87.19, "elapsed_time": "0:15:23", "remaining_time": "0:02:15", "throughput": 1916.5, "total_tokens": 1769928}
|
||||
{"current_steps": 1230, "total_steps": 1405, "loss": 0.0, "lr": 2.3539901583619186e-07, "epoch": 4.377224199288256, "percentage": 87.54, "elapsed_time": "0:15:23", "remaining_time": "0:02:11", "throughput": 1923.74, "total_tokens": 1777480}
|
||||
{"current_steps": 1235, "total_steps": 1405, "loss": 0.0, "lr": 2.2241323416015452e-07, "epoch": 4.395017793594306, "percentage": 87.9, "elapsed_time": "0:15:24", "remaining_time": "0:02:07", "throughput": 1930.78, "total_tokens": 1784840}
|
||||
{"current_steps": 1240, "total_steps": 1405, "loss": 0.0, "lr": 2.0977918696733103e-07, "epoch": 4.412811387900356, "percentage": 88.26, "elapsed_time": "0:15:24", "remaining_time": "0:02:03", "throughput": 1938.2, "total_tokens": 1792584}
|
||||
{"current_steps": 1245, "total_steps": 1405, "loss": 0.0, "lr": 1.9749882537112297e-07, "epoch": 4.430604982206406, "percentage": 88.61, "elapsed_time": "0:15:25", "remaining_time": "0:01:58", "throughput": 1946.27, "total_tokens": 1800968}
|
||||
{"current_steps": 1250, "total_steps": 1405, "loss": 0.0, "lr": 1.8557404586421413e-07, "epoch": 4.448398576512456, "percentage": 88.97, "elapsed_time": "0:15:25", "remaining_time": "0:01:54", "throughput": 1953.42, "total_tokens": 1808456}
|
||||
{"current_steps": 1255, "total_steps": 1405, "loss": 0.0, "lr": 1.7400669002569233e-07, "epoch": 4.4661921708185055, "percentage": 89.32, "elapsed_time": "0:15:26", "remaining_time": "0:01:50", "throughput": 1960.75, "total_tokens": 1816136}
|
||||
{"current_steps": 1260, "total_steps": 1405, "loss": 0.0, "lr": 1.62798544236647e-07, "epoch": 4.483985765124555, "percentage": 89.68, "elapsed_time": "0:15:26", "remaining_time": "0:01:46", "throughput": 1968.4, "total_tokens": 1824136}
|
||||
{"current_steps": 1265, "total_steps": 1405, "loss": 0.0, "lr": 1.5195133940429345e-07, "epoch": 4.501779359430605, "percentage": 90.04, "elapsed_time": "0:15:27", "remaining_time": "0:01:42", "throughput": 1975.18, "total_tokens": 1831304}
|
||||
{"current_steps": 1270, "total_steps": 1405, "loss": 0.0, "lr": 1.4146675069466403e-07, "epoch": 4.519572953736655, "percentage": 90.39, "elapsed_time": "0:15:27", "remaining_time": "0:01:38", "throughput": 1980.99, "total_tokens": 1837512}
|
||||
{"current_steps": 1275, "total_steps": 1405, "loss": 0.0, "lr": 1.313463972739068e-07, "epoch": 4.537366548042705, "percentage": 90.75, "elapsed_time": "0:15:28", "remaining_time": "0:01:34", "throughput": 1987.37, "total_tokens": 1844296}
|
||||
{"current_steps": 1278, "total_steps": 1405, "eval_loss": 0.259037584066391, "epoch": 4.548042704626335, "percentage": 90.96, "elapsed_time": "0:15:28", "remaining_time": "0:01:32", "throughput": 1991.0, "total_tokens": 1849416}
|
||||
{"current_steps": 1280, "total_steps": 1405, "loss": 0.0, "lr": 1.215918420582343e-07, "epoch": 4.555160142348754, "percentage": 91.1, "elapsed_time": "0:16:13", "remaining_time": "0:01:35", "throughput": 1902.89, "total_tokens": 1851720}
|
||||
{"current_steps": 1285, "total_steps": 1405, "loss": 0.0, "lr": 1.1220459147255642e-07, "epoch": 4.572953736654805, "percentage": 91.46, "elapsed_time": "0:16:13", "remaining_time": "0:01:30", "throughput": 1908.65, "total_tokens": 1858120}
|
||||
{"current_steps": 1290, "total_steps": 1405, "loss": 0.0, "lr": 1.0318609521783818e-07, "epoch": 4.590747330960854, "percentage": 91.81, "elapsed_time": "0:16:13", "remaining_time": "0:01:26", "throughput": 1915.77, "total_tokens": 1865928}
|
||||
{"current_steps": 1295, "total_steps": 1405, "loss": 0.0, "lr": 9.453774604721937e-08, "epoch": 4.608540925266904, "percentage": 92.17, "elapsed_time": "0:16:14", "remaining_time": "0:01:22", "throughput": 1922.95, "total_tokens": 1873800}
|
||||
{"current_steps": 1300, "total_steps": 1405, "loss": 0.0, "lr": 8.62608795509276e-08, "epoch": 4.6263345195729535, "percentage": 92.53, "elapsed_time": "0:16:14", "remaining_time": "0:01:18", "throughput": 1930.24, "total_tokens": 1881800}
|
||||
{"current_steps": 1305, "total_steps": 1405, "loss": 0.0, "lr": 7.835677395001795e-08, "epoch": 4.644128113879003, "percentage": 92.88, "elapsed_time": "0:16:15", "remaining_time": "0:01:14", "throughput": 1936.41, "total_tokens": 1888648}
|
||||
{"current_steps": 1310, "total_steps": 1405, "loss": 0.0, "lr": 7.082664989897486e-08, "epoch": 4.661921708185053, "percentage": 93.24, "elapsed_time": "0:16:15", "remaining_time": "0:01:10", "throughput": 1942.5, "total_tokens": 1895432}
|
||||
{"current_steps": 1315, "total_steps": 1405, "loss": 0.0, "lr": 6.367167029720234e-08, "epoch": 4.679715302491103, "percentage": 93.59, "elapsed_time": "0:16:16", "remaining_time": "0:01:06", "throughput": 1948.78, "total_tokens": 1902408}
|
||||
{"current_steps": 1320, "total_steps": 1405, "loss": 0.0277, "lr": 5.68929401094323e-08, "epoch": 4.697508896797153, "percentage": 93.95, "elapsed_time": "0:16:16", "remaining_time": "0:01:02", "throughput": 1955.98, "total_tokens": 1910344}
|
||||
{"current_steps": 1325, "total_steps": 1405, "loss": 0.0, "lr": 5.049150619508503e-08, "epoch": 4.715302491103203, "percentage": 94.31, "elapsed_time": "0:16:17", "remaining_time": "0:00:58", "throughput": 1963.37, "total_tokens": 1918472}
|
||||
{"current_steps": 1330, "total_steps": 1405, "loss": 0.0, "lr": 4.446835714659647e-08, "epoch": 4.733096085409253, "percentage": 94.66, "elapsed_time": "0:16:17", "remaining_time": "0:00:55", "throughput": 1968.94, "total_tokens": 1924744}
|
||||
{"current_steps": 1335, "total_steps": 1405, "loss": 0.0, "lr": 3.882442313674878e-08, "epoch": 4.750889679715303, "percentage": 95.02, "elapsed_time": "0:16:18", "remaining_time": "0:00:51", "throughput": 1976.31, "total_tokens": 1932872}
|
||||
{"current_steps": 1340, "total_steps": 1405, "loss": 0.0, "lr": 3.3560575775019866e-08, "epoch": 4.7686832740213525, "percentage": 95.37, "elapsed_time": "0:16:18", "remaining_time": "0:00:47", "throughput": 1982.75, "total_tokens": 1940040}
|
||||
{"current_steps": 1345, "total_steps": 1405, "loss": 0.0, "lr": 2.8677627972978905e-08, "epoch": 4.786476868327402, "percentage": 95.73, "elapsed_time": "0:16:18", "remaining_time": "0:00:43", "throughput": 1990.83, "total_tokens": 1948936}
|
||||
{"current_steps": 1349, "total_steps": 1405, "eval_loss": 0.2602100372314453, "epoch": 4.800711743772242, "percentage": 96.01, "elapsed_time": "0:16:19", "remaining_time": "0:00:40", "throughput": 1994.66, "total_tokens": 1954568}
|
||||
{"current_steps": 1350, "total_steps": 1405, "loss": 0.0, "lr": 2.4176333818745347e-08, "epoch": 4.804270462633452, "percentage": 96.09, "elapsed_time": "0:17:22", "remaining_time": "0:00:42", "throughput": 1875.69, "total_tokens": 1955912}
|
||||
{"current_steps": 1355, "total_steps": 1405, "loss": 0.0, "lr": 2.0057388460533733e-08, "epoch": 4.822064056939502, "percentage": 96.44, "elapsed_time": "0:17:23", "remaining_time": "0:00:38", "throughput": 1881.48, "total_tokens": 1962760}
|
||||
{"current_steps": 1360, "total_steps": 1405, "loss": 0.0, "lr": 1.6321427999298754e-08, "epoch": 4.839857651245552, "percentage": 96.8, "elapsed_time": "0:17:23", "remaining_time": "0:00:34", "throughput": 1886.85, "total_tokens": 1969160}
|
||||
{"current_steps": 1365, "total_steps": 1405, "loss": 0.0, "lr": 1.2969029390501597e-08, "epoch": 4.857651245551601, "percentage": 97.15, "elapsed_time": "0:17:24", "remaining_time": "0:00:30", "throughput": 1892.4, "total_tokens": 1975752}
|
||||
{"current_steps": 1370, "total_steps": 1405, "loss": 0.0, "lr": 1.000071035500816e-08, "epoch": 4.875444839857651, "percentage": 97.51, "elapsed_time": "0:17:24", "remaining_time": "0:00:26", "throughput": 1898.75, "total_tokens": 1983240}
|
||||
{"current_steps": 1375, "total_steps": 1405, "loss": 0.0, "lr": 7.416929299135511e-09, "epoch": 4.893238434163701, "percentage": 97.86, "elapsed_time": "0:17:24", "remaining_time": "0:00:22", "throughput": 1905.16, "total_tokens": 1990792}
|
||||
{"current_steps": 1380, "total_steps": 1405, "loss": 0.0, "lr": 5.218085243859639e-09, "epoch": 4.911032028469751, "percentage": 98.22, "elapsed_time": "0:17:25", "remaining_time": "0:00:18", "throughput": 1911.91, "total_tokens": 1998728}
|
||||
{"current_steps": 1385, "total_steps": 1405, "loss": 0.0, "lr": 3.4045177631936154e-09, "epoch": 4.9288256227758005, "percentage": 98.58, "elapsed_time": "0:17:25", "remaining_time": "0:00:15", "throughput": 1918.89, "total_tokens": 2006920}
|
||||
{"current_steps": 1390, "total_steps": 1405, "loss": 0.0, "lr": 1.976506931745392e-09, "epoch": 4.94661921708185, "percentage": 98.93, "elapsed_time": "0:17:26", "remaining_time": "0:00:11", "throughput": 1924.06, "total_tokens": 2013128}
|
||||
{"current_steps": 1395, "total_steps": 1405, "loss": 0.0, "lr": 9.3427328146517e-10, "epoch": 4.9644128113879, "percentage": 99.29, "elapsed_time": "0:17:26", "remaining_time": "0:00:07", "throughput": 1931.37, "total_tokens": 2021704}
|
||||
{"current_steps": 1400, "total_steps": 1405, "loss": 0.0, "lr": 2.7797776758903274e-10, "epoch": 4.98220640569395, "percentage": 99.64, "elapsed_time": "0:17:27", "remaining_time": "0:00:03", "throughput": 1937.41, "total_tokens": 2028872}
|
||||
{"current_steps": 1405, "total_steps": 1405, "loss": 0.0, "lr": 7.72174378022017e-12, "epoch": 5.0, "percentage": 100.0, "elapsed_time": "0:17:27", "remaining_time": "0:00:00", "throughput": 1942.54, "total_tokens": 2035272}
|
||||
{"current_steps": 1405, "total_steps": 1405, "epoch": 5.0, "percentage": 100.0, "elapsed_time": "0:18:04", "remaining_time": "0:00:00", "throughput": 1877.37, "total_tokens": 2035272}
|
||||
2463
trainer_state.json
Normal file
2463
trainer_state.json
Normal file
File diff suppressed because it is too large
Load Diff
3
training_args.bin
Normal file
3
training_args.bin
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:70ad1bfbb630f5f8a43a169d4a5d88405c2274dc7d3d7800201dd83bc958921c
|
||||
size 6289
|
||||
BIN
training_eval_loss.png
Normal file
BIN
training_eval_loss.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 47 KiB |
BIN
training_loss.png
Normal file
BIN
training_loss.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 46 KiB |
Reference in New Issue
Block a user