初始化项目,由ModelHub XC社区提供模型
Model: rbelanec/train_rte_42_1774791065 Source: Original Platform
This commit is contained in:
36
.gitattributes
vendored
Normal file
36
.gitattributes
vendored
Normal file
@@ -0,0 +1,36 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||
81
README.md
Normal file
81
README.md
Normal file
@@ -0,0 +1,81 @@
|
||||
---
|
||||
library_name: transformers
|
||||
license: llama3.2
|
||||
base_model: meta-llama/Llama-3.2-1B-Instruct
|
||||
tags:
|
||||
- peft-factory
|
||||
- full
|
||||
- llama-factory
|
||||
- generated_from_trainer
|
||||
model-index:
|
||||
- name: train_rte_42_1774791065
|
||||
results: []
|
||||
---
|
||||
|
||||
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
||||
should probably proofread and complete it, then remove this comment. -->
|
||||
|
||||
# train_rte_42_1774791065
|
||||
|
||||
This model is a fine-tuned version of [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) on the rte dataset.
|
||||
It achieves the following results on the evaluation set:
|
||||
- Loss: 0.1407
|
||||
- Num Input Tokens Seen: 2035272
|
||||
|
||||
## Model description
|
||||
|
||||
More information needed
|
||||
|
||||
## Intended uses & limitations
|
||||
|
||||
More information needed
|
||||
|
||||
## Training and evaluation data
|
||||
|
||||
More information needed
|
||||
|
||||
## Training procedure
|
||||
|
||||
### Training hyperparameters
|
||||
|
||||
The following hyperparameters were used during training:
|
||||
- learning_rate: 5e-05
|
||||
- train_batch_size: 8
|
||||
- eval_batch_size: 8
|
||||
- seed: 42
|
||||
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
||||
- lr_scheduler_type: cosine
|
||||
- lr_scheduler_warmup_ratio: 0.1
|
||||
- num_epochs: 5
|
||||
|
||||
### Training results
|
||||
|
||||
| Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
|
||||
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
|
||||
| 0.2508 | 0.2527 | 71 | 0.1407 | 105024 |
|
||||
| 0.1769 | 0.5053 | 142 | 0.1558 | 209536 |
|
||||
| 0.1924 | 0.7580 | 213 | 0.1600 | 312576 |
|
||||
| 0.1956 | 1.0107 | 284 | 0.1684 | 414040 |
|
||||
| 0.1589 | 1.2633 | 355 | 0.1601 | 517656 |
|
||||
| 0.1947 | 1.5160 | 426 | 0.1815 | 624344 |
|
||||
| 0.1825 | 1.7687 | 497 | 0.1647 | 725656 |
|
||||
| 0.1568 | 2.0214 | 568 | 0.1555 | 821416 |
|
||||
| 0.1597 | 2.2740 | 639 | 0.1567 | 926760 |
|
||||
| 0.1431 | 2.5267 | 710 | 0.1639 | 1025320 |
|
||||
| 0.1986 | 2.7794 | 781 | 0.1541 | 1128104 |
|
||||
| 0.137 | 3.0320 | 852 | 0.1852 | 1229440 |
|
||||
| 0.1422 | 3.2847 | 923 | 0.1646 | 1332544 |
|
||||
| 0.0911 | 3.5374 | 994 | 0.1804 | 1438336 |
|
||||
| 0.1203 | 3.7900 | 1065 | 0.1771 | 1539072 |
|
||||
| 0.0551 | 4.0427 | 1136 | 0.1983 | 1642696 |
|
||||
| 0.0577 | 4.2954 | 1207 | 0.3402 | 1743624 |
|
||||
| 0.0319 | 4.5480 | 1278 | 0.3532 | 1849416 |
|
||||
| 0.0846 | 4.8007 | 1349 | 0.3423 | 1954568 |
|
||||
|
||||
|
||||
### Framework versions
|
||||
|
||||
- Transformers 4.51.3
|
||||
- Pytorch 2.10.0+cu128
|
||||
- Datasets 4.0.0
|
||||
- Tokenizers 0.21.4
|
||||
13
all_results.json
Normal file
13
all_results.json
Normal file
@@ -0,0 +1,13 @@
|
||||
{
|
||||
"epoch": 5.0,
|
||||
"eval_loss": 0.14072927832603455,
|
||||
"eval_runtime": 0.5968,
|
||||
"eval_samples_per_second": 417.253,
|
||||
"eval_steps_per_second": 53.623,
|
||||
"num_input_tokens_seen": 2035272,
|
||||
"total_flos": 1.1883702201974784e+16,
|
||||
"train_loss": 0.17133487164477065,
|
||||
"train_runtime": 699.7603,
|
||||
"train_samples_per_second": 16.013,
|
||||
"train_steps_per_second": 2.008
|
||||
}
|
||||
39
config.json
Normal file
39
config.json
Normal file
@@ -0,0 +1,39 @@
|
||||
{
|
||||
"architectures": [
|
||||
"LlamaForCausalLM"
|
||||
],
|
||||
"attention_bias": false,
|
||||
"attention_dropout": 0.0,
|
||||
"bos_token_id": 128000,
|
||||
"eos_token_id": [
|
||||
128001,
|
||||
128008,
|
||||
128009
|
||||
],
|
||||
"head_dim": 64,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 2048,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 8192,
|
||||
"max_position_embeddings": 131072,
|
||||
"mlp_bias": false,
|
||||
"model_type": "llama",
|
||||
"num_attention_heads": 32,
|
||||
"num_hidden_layers": 16,
|
||||
"num_key_value_heads": 8,
|
||||
"pretraining_tp": 1,
|
||||
"rms_norm_eps": 1e-05,
|
||||
"rope_scaling": {
|
||||
"factor": 32.0,
|
||||
"high_freq_factor": 4.0,
|
||||
"low_freq_factor": 1.0,
|
||||
"original_max_position_embeddings": 8192,
|
||||
"rope_type": "llama3"
|
||||
},
|
||||
"rope_theta": 500000.0,
|
||||
"tie_word_embeddings": true,
|
||||
"torch_dtype": "float32",
|
||||
"transformers_version": "4.51.3",
|
||||
"use_cache": false,
|
||||
"vocab_size": 128256
|
||||
}
|
||||
8
eval_results.json
Normal file
8
eval_results.json
Normal file
@@ -0,0 +1,8 @@
|
||||
{
|
||||
"epoch": 5.0,
|
||||
"eval_loss": 0.14072927832603455,
|
||||
"eval_runtime": 0.5968,
|
||||
"eval_samples_per_second": 417.253,
|
||||
"eval_steps_per_second": 53.623,
|
||||
"num_input_tokens_seen": 2035272
|
||||
}
|
||||
12
generation_config.json
Normal file
12
generation_config.json
Normal file
@@ -0,0 +1,12 @@
|
||||
{
|
||||
"bos_token_id": 128000,
|
||||
"do_sample": true,
|
||||
"eos_token_id": [
|
||||
128001,
|
||||
128008,
|
||||
128009
|
||||
],
|
||||
"temperature": 0.6,
|
||||
"top_p": 0.9,
|
||||
"transformers_version": "4.51.3"
|
||||
}
|
||||
3
model.safetensors
Normal file
3
model.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:f03cff080234d858206e7fd3ce0da58830d2a503759adb77c9db1fd9daf44d96
|
||||
size 4943274328
|
||||
26
special_tokens_map.json
Normal file
26
special_tokens_map.json
Normal file
@@ -0,0 +1,26 @@
|
||||
{
|
||||
"additional_special_tokens": [
|
||||
{
|
||||
"content": "<|eom_id|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
],
|
||||
"bos_token": {
|
||||
"content": "<|begin_of_text|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"eos_token": {
|
||||
"content": "<|eot_id|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": "<|eot_id|>"
|
||||
}
|
||||
3
tokenizer.json
Normal file
3
tokenizer.json
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:6b9e4e7fb171f92fd137b777cc2714bf87d11576700a1dcd7a399e7bbe39537b
|
||||
size 17209920
|
||||
2069
tokenizer_config.json
Normal file
2069
tokenizer_config.json
Normal file
File diff suppressed because it is too large
Load Diff
55
train.yaml
Normal file
55
train.yaml
Normal file
@@ -0,0 +1,55 @@
|
||||
seed: 42
|
||||
|
||||
### model
|
||||
model_name_or_path: meta-llama/Llama-3.2-1B-Instruct
|
||||
trust_remote_code: true
|
||||
flash_attn: auto
|
||||
use_cache: false
|
||||
|
||||
### method
|
||||
stage: sft
|
||||
do_train: true
|
||||
finetuning_type: full
|
||||
|
||||
### dataset
|
||||
dataset: rte
|
||||
template: llama3
|
||||
cutoff_len: 2048
|
||||
overwrite_cache: true
|
||||
preprocessing_num_workers: 4
|
||||
dataloader_num_workers: 4
|
||||
packing: false
|
||||
|
||||
### output
|
||||
output_dir: saves_bts_preliminary/base/llama-3.2-1b-instruct/train_rte_42_1774791065
|
||||
logging_steps: 5
|
||||
save_steps: 0.05
|
||||
overwrite_output_dir: true
|
||||
save_only_model: false
|
||||
plot_loss: true
|
||||
include_num_input_tokens_seen: true
|
||||
push_to_hub: true
|
||||
push_to_hub_organization: rbelanec
|
||||
load_best_model_at_end: true
|
||||
save_total_limit: 1
|
||||
|
||||
### train
|
||||
per_device_train_batch_size: 8
|
||||
learning_rate: 5.0e-5
|
||||
num_train_epochs: 5
|
||||
weight_decay: 1.0e-5
|
||||
lr_scheduler_type: cosine
|
||||
bf16: true
|
||||
ddp_timeout: 180000000
|
||||
resume_from_checkpoint: null
|
||||
warmup_ratio: 0.1
|
||||
optim: adamw_torch
|
||||
report_to:
|
||||
- wandb
|
||||
run_name: base_llama-3.2-1b-instruct_train_rte_42_1774791065
|
||||
|
||||
### eval
|
||||
per_device_eval_batch_size: 8
|
||||
eval_strategy: steps
|
||||
eval_steps: 0.05
|
||||
val_size: 0.1
|
||||
9
train_results.json
Normal file
9
train_results.json
Normal file
@@ -0,0 +1,9 @@
|
||||
{
|
||||
"epoch": 5.0,
|
||||
"num_input_tokens_seen": 2035272,
|
||||
"total_flos": 1.1883702201974784e+16,
|
||||
"train_loss": 0.17133487164477065,
|
||||
"train_runtime": 699.7603,
|
||||
"train_samples_per_second": 16.013,
|
||||
"train_steps_per_second": 2.008
|
||||
}
|
||||
301
trainer_log.jsonl
Normal file
301
trainer_log.jsonl
Normal file
@@ -0,0 +1,301 @@
|
||||
{"current_steps": 5, "total_steps": 1405, "loss": 0.6704, "lr": 1.4184397163120568e-06, "epoch": 0.017793594306049824, "percentage": 0.36, "elapsed_time": "0:00:00", "remaining_time": "0:03:57", "throughput": 9261.64, "total_tokens": 7872}
|
||||
{"current_steps": 10, "total_steps": 1405, "loss": 0.2548, "lr": 3.1914893617021277e-06, "epoch": 0.03558718861209965, "percentage": 0.71, "elapsed_time": "0:00:01", "remaining_time": "0:03:00", "throughput": 11397.44, "total_tokens": 14784}
|
||||
{"current_steps": 15, "total_steps": 1405, "loss": 0.9227, "lr": 4.964539007092199e-06, "epoch": 0.05338078291814947, "percentage": 1.07, "elapsed_time": "0:00:01", "remaining_time": "0:02:45", "throughput": 13094.26, "total_tokens": 23424}
|
||||
{"current_steps": 20, "total_steps": 1405, "loss": 0.1819, "lr": 6.73758865248227e-06, "epoch": 0.0711743772241993, "percentage": 1.42, "elapsed_time": "0:00:02", "remaining_time": "0:02:34", "throughput": 13389.34, "total_tokens": 29824}
|
||||
{"current_steps": 25, "total_steps": 1405, "loss": 0.228, "lr": 8.510638297872341e-06, "epoch": 0.08896797153024912, "percentage": 1.78, "elapsed_time": "0:00:02", "remaining_time": "0:02:28", "throughput": 14061.51, "total_tokens": 37824}
|
||||
{"current_steps": 30, "total_steps": 1405, "loss": 0.1572, "lr": 1.0283687943262411e-05, "epoch": 0.10676156583629894, "percentage": 2.14, "elapsed_time": "0:00:03", "remaining_time": "0:02:23", "throughput": 14260.87, "total_tokens": 44608}
|
||||
{"current_steps": 35, "total_steps": 1405, "loss": 0.1609, "lr": 1.2056737588652483e-05, "epoch": 0.12455516014234876, "percentage": 2.49, "elapsed_time": "0:00:03", "remaining_time": "0:02:19", "throughput": 14545.74, "total_tokens": 51968}
|
||||
{"current_steps": 40, "total_steps": 1405, "loss": 0.2143, "lr": 1.3829787234042554e-05, "epoch": 0.1423487544483986, "percentage": 2.85, "elapsed_time": "0:00:04", "remaining_time": "0:02:17", "throughput": 14773.94, "total_tokens": 59456}
|
||||
{"current_steps": 45, "total_steps": 1405, "loss": 0.2034, "lr": 1.5602836879432626e-05, "epoch": 0.1601423487544484, "percentage": 3.2, "elapsed_time": "0:00:04", "remaining_time": "0:02:14", "throughput": 14905.26, "total_tokens": 66496}
|
||||
{"current_steps": 50, "total_steps": 1405, "loss": 0.2702, "lr": 1.7375886524822697e-05, "epoch": 0.17793594306049823, "percentage": 3.56, "elapsed_time": "0:00:04", "remaining_time": "0:02:12", "throughput": 14984.9, "total_tokens": 73408}
|
||||
{"current_steps": 55, "total_steps": 1405, "loss": 0.1793, "lr": 1.9148936170212766e-05, "epoch": 0.19572953736654805, "percentage": 3.91, "elapsed_time": "0:00:05", "remaining_time": "0:02:11", "throughput": 15094.9, "total_tokens": 80576}
|
||||
{"current_steps": 60, "total_steps": 1405, "loss": 0.161, "lr": 2.0921985815602837e-05, "epoch": 0.21352313167259787, "percentage": 4.27, "elapsed_time": "0:00:05", "remaining_time": "0:02:09", "throughput": 15238.44, "total_tokens": 88256}
|
||||
{"current_steps": 65, "total_steps": 1405, "loss": 0.1808, "lr": 2.269503546099291e-05, "epoch": 0.2313167259786477, "percentage": 4.63, "elapsed_time": "0:00:06", "remaining_time": "0:02:08", "throughput": 15388.52, "total_tokens": 96256}
|
||||
{"current_steps": 70, "total_steps": 1405, "loss": 0.2508, "lr": 2.446808510638298e-05, "epoch": 0.2491103202846975, "percentage": 4.98, "elapsed_time": "0:00:06", "remaining_time": "0:02:07", "throughput": 15441.44, "total_tokens": 103424}
|
||||
{"current_steps": 71, "total_steps": 1405, "eval_loss": 0.14072927832603455, "epoch": 0.2526690391459075, "percentage": 5.05, "elapsed_time": "0:00:07", "remaining_time": "0:02:18", "throughput": 14240.56, "total_tokens": 105024}
|
||||
{"current_steps": 75, "total_steps": 1405, "loss": 0.143, "lr": 2.624113475177305e-05, "epoch": 0.2669039145907473, "percentage": 5.34, "elapsed_time": "0:00:55", "remaining_time": "0:16:19", "throughput": 2001.71, "total_tokens": 110528}
|
||||
{"current_steps": 80, "total_steps": 1405, "loss": 0.2326, "lr": 2.8014184397163124e-05, "epoch": 0.2846975088967972, "percentage": 5.69, "elapsed_time": "0:00:55", "remaining_time": "0:15:21", "throughput": 2110.33, "total_tokens": 117440}
|
||||
{"current_steps": 85, "total_steps": 1405, "loss": 0.2053, "lr": 2.9787234042553192e-05, "epoch": 0.302491103202847, "percentage": 6.05, "elapsed_time": "0:00:56", "remaining_time": "0:14:31", "throughput": 2236.67, "total_tokens": 125504}
|
||||
{"current_steps": 90, "total_steps": 1405, "loss": 0.2409, "lr": 3.156028368794326e-05, "epoch": 0.3202846975088968, "percentage": 6.41, "elapsed_time": "0:00:56", "remaining_time": "0:13:46", "throughput": 2340.64, "total_tokens": 132352}
|
||||
{"current_steps": 95, "total_steps": 1405, "loss": 0.2063, "lr": 3.3333333333333335e-05, "epoch": 0.33807829181494664, "percentage": 6.76, "elapsed_time": "0:00:56", "remaining_time": "0:13:05", "throughput": 2443.01, "total_tokens": 139200}
|
||||
{"current_steps": 100, "total_steps": 1405, "loss": 0.244, "lr": 3.5106382978723407e-05, "epoch": 0.35587188612099646, "percentage": 7.12, "elapsed_time": "0:00:57", "remaining_time": "0:12:29", "throughput": 2574.04, "total_tokens": 147904}
|
||||
{"current_steps": 105, "total_steps": 1405, "loss": 0.183, "lr": 3.687943262411347e-05, "epoch": 0.3736654804270463, "percentage": 7.47, "elapsed_time": "0:00:57", "remaining_time": "0:11:56", "throughput": 2664.85, "total_tokens": 154240}
|
||||
{"current_steps": 110, "total_steps": 1405, "loss": 0.1615, "lr": 3.865248226950355e-05, "epoch": 0.3914590747330961, "percentage": 7.83, "elapsed_time": "0:00:58", "remaining_time": "0:11:26", "throughput": 2768.47, "total_tokens": 161472}
|
||||
{"current_steps": 115, "total_steps": 1405, "loss": 0.1703, "lr": 4.0425531914893614e-05, "epoch": 0.4092526690391459, "percentage": 8.19, "elapsed_time": "0:00:58", "remaining_time": "0:10:59", "throughput": 2862.5, "total_tokens": 168192}
|
||||
{"current_steps": 120, "total_steps": 1405, "loss": 0.246, "lr": 4.219858156028369e-05, "epoch": 0.42704626334519574, "percentage": 8.54, "elapsed_time": "0:00:59", "remaining_time": "0:10:33", "throughput": 2951.15, "total_tokens": 174656}
|
||||
{"current_steps": 125, "total_steps": 1405, "loss": 0.1665, "lr": 4.3971631205673764e-05, "epoch": 0.44483985765124556, "percentage": 8.9, "elapsed_time": "0:00:59", "remaining_time": "0:10:10", "throughput": 3046.8, "total_tokens": 181632}
|
||||
{"current_steps": 130, "total_steps": 1405, "loss": 0.1695, "lr": 4.574468085106383e-05, "epoch": 0.4626334519572954, "percentage": 9.25, "elapsed_time": "0:01:00", "remaining_time": "0:09:49", "throughput": 3184.81, "total_tokens": 191488}
|
||||
{"current_steps": 135, "total_steps": 1405, "loss": 0.1764, "lr": 4.751773049645391e-05, "epoch": 0.4804270462633452, "percentage": 9.61, "elapsed_time": "0:01:00", "remaining_time": "0:09:29", "throughput": 3282.77, "total_tokens": 198848}
|
||||
{"current_steps": 140, "total_steps": 1405, "loss": 0.1769, "lr": 4.929078014184397e-05, "epoch": 0.498220640569395, "percentage": 9.96, "elapsed_time": "0:01:01", "remaining_time": "0:09:11", "throughput": 3394.7, "total_tokens": 207232}
|
||||
{"current_steps": 142, "total_steps": 1405, "eval_loss": 0.15581394731998444, "epoch": 0.505338078291815, "percentage": 10.11, "elapsed_time": "0:01:01", "remaining_time": "0:09:09", "throughput": 3393.08, "total_tokens": 209536}
|
||||
{"current_steps": 145, "total_steps": 1405, "loss": 0.2155, "lr": 4.9999305045921804e-05, "epoch": 0.5160142348754448, "percentage": 10.32, "elapsed_time": "0:01:21", "remaining_time": "0:11:46", "throughput": 2630.39, "total_tokens": 213952}
|
||||
{"current_steps": 150, "total_steps": 1405, "loss": 0.185, "lr": 4.9995058244251644e-05, "epoch": 0.5338078291814946, "percentage": 10.68, "elapsed_time": "0:01:21", "remaining_time": "0:11:24", "throughput": 2706.78, "total_tokens": 221376}
|
||||
{"current_steps": 155, "total_steps": 1405, "loss": 0.2471, "lr": 4.998695138156149e-05, "epoch": 0.5516014234875445, "percentage": 11.03, "elapsed_time": "0:01:22", "remaining_time": "0:11:03", "throughput": 2783.85, "total_tokens": 228928}
|
||||
{"current_steps": 160, "total_steps": 1405, "loss": 0.2061, "lr": 4.997498570981822e-05, "epoch": 0.5693950177935944, "percentage": 11.39, "elapsed_time": "0:01:22", "remaining_time": "0:10:43", "throughput": 2858.73, "total_tokens": 236352}
|
||||
{"current_steps": 165, "total_steps": 1405, "loss": 0.1488, "lr": 4.995916307691601e-05, "epoch": 0.5871886120996441, "percentage": 11.74, "elapsed_time": "0:01:23", "remaining_time": "0:10:24", "throughput": 2939.86, "total_tokens": 244416}
|
||||
{"current_steps": 170, "total_steps": 1405, "loss": 0.1625, "lr": 4.993948592639104e-05, "epoch": 0.604982206405694, "percentage": 12.1, "elapsed_time": "0:01:23", "remaining_time": "0:10:07", "throughput": 3008.87, "total_tokens": 251456}
|
||||
{"current_steps": 175, "total_steps": 1405, "loss": 0.1635, "lr": 4.991595729704405e-05, "epoch": 0.6227758007117438, "percentage": 12.46, "elapsed_time": "0:01:24", "remaining_time": "0:09:50", "throughput": 3081.28, "total_tokens": 258880}
|
||||
{"current_steps": 180, "total_steps": 1405, "loss": 0.163, "lr": 4.9888580822471086e-05, "epoch": 0.6405693950177936, "percentage": 12.81, "elapsed_time": "0:01:24", "remaining_time": "0:09:34", "throughput": 3140.45, "total_tokens": 265152}
|
||||
{"current_steps": 185, "total_steps": 1405, "loss": 0.1599, "lr": 4.985736073050237e-05, "epoch": 0.6583629893238434, "percentage": 13.17, "elapsed_time": "0:01:24", "remaining_time": "0:09:19", "throughput": 3211.35, "total_tokens": 272576}
|
||||
{"current_steps": 190, "total_steps": 1405, "loss": 0.1669, "lr": 4.982230184254933e-05, "epoch": 0.6761565836298933, "percentage": 13.52, "elapsed_time": "0:01:25", "remaining_time": "0:09:05", "throughput": 3278.9, "total_tokens": 279744}
|
||||
{"current_steps": 195, "total_steps": 1405, "loss": 0.1659, "lr": 4.9783409572860105e-05, "epoch": 0.693950177935943, "percentage": 13.88, "elapsed_time": "0:01:25", "remaining_time": "0:08:52", "throughput": 3353.87, "total_tokens": 287680}
|
||||
{"current_steps": 200, "total_steps": 1405, "loss": 0.1729, "lr": 4.974068992768331e-05, "epoch": 0.7117437722419929, "percentage": 14.23, "elapsed_time": "0:01:26", "remaining_time": "0:08:39", "throughput": 3417.1, "total_tokens": 294592}
|
||||
{"current_steps": 205, "total_steps": 1405, "loss": 0.2655, "lr": 4.9694149504340517e-05, "epoch": 0.7295373665480427, "percentage": 14.59, "elapsed_time": "0:01:26", "remaining_time": "0:08:27", "throughput": 3479.09, "total_tokens": 301440}
|
||||
{"current_steps": 210, "total_steps": 1405, "loss": 0.1924, "lr": 4.964379549020741e-05, "epoch": 0.7473309608540926, "percentage": 14.95, "elapsed_time": "0:01:27", "remaining_time": "0:08:15", "throughput": 3541.9, "total_tokens": 308416}
|
||||
{"current_steps": 213, "total_steps": 1405, "eval_loss": 0.1600140929222107, "epoch": 0.7580071174377224, "percentage": 15.16, "elapsed_time": "0:01:27", "remaining_time": "0:08:11", "throughput": 3557.0, "total_tokens": 312576}
|
||||
{"current_steps": 215, "total_steps": 1405, "loss": 0.1666, "lr": 4.958963566160384e-05, "epoch": 0.7651245551601423, "percentage": 15.3, "elapsed_time": "0:01:49", "remaining_time": "0:10:07", "throughput": 2875.22, "total_tokens": 315328}
|
||||
{"current_steps": 220, "total_steps": 1405, "loss": 0.1668, "lr": 4.953167838259285e-05, "epoch": 0.7829181494661922, "percentage": 15.66, "elapsed_time": "0:01:50", "remaining_time": "0:09:53", "throughput": 2930.4, "total_tokens": 322688}
|
||||
{"current_steps": 225, "total_steps": 1405, "loss": 0.1826, "lr": 4.946993260368904e-05, "epoch": 0.800711743772242, "percentage": 16.01, "elapsed_time": "0:01:50", "remaining_time": "0:09:39", "throughput": 2978.7, "total_tokens": 329280}
|
||||
{"current_steps": 230, "total_steps": 1405, "loss": 0.1488, "lr": 4.940440786047628e-05, "epoch": 0.8185053380782918, "percentage": 16.37, "elapsed_time": "0:01:50", "remaining_time": "0:09:27", "throughput": 3035.34, "total_tokens": 336896}
|
||||
{"current_steps": 235, "total_steps": 1405, "loss": 0.2852, "lr": 4.933511427213511e-05, "epoch": 0.8362989323843416, "percentage": 16.73, "elapsed_time": "0:01:51", "remaining_time": "0:09:14", "throughput": 3088.26, "total_tokens": 344128}
|
||||
{"current_steps": 240, "total_steps": 1405, "loss": 0.1901, "lr": 4.926206253988001e-05, "epoch": 0.8540925266903915, "percentage": 17.08, "elapsed_time": "0:01:51", "remaining_time": "0:09:02", "throughput": 3137.08, "total_tokens": 350912}
|
||||
{"current_steps": 245, "total_steps": 1405, "loss": 0.1972, "lr": 4.91852639453068e-05, "epoch": 0.8718861209964412, "percentage": 17.44, "elapsed_time": "0:01:52", "remaining_time": "0:08:51", "throughput": 3188.08, "total_tokens": 358016}
|
||||
{"current_steps": 250, "total_steps": 1405, "loss": 0.3136, "lr": 4.910473034865033e-05, "epoch": 0.8896797153024911, "percentage": 17.79, "elapsed_time": "0:01:52", "remaining_time": "0:08:40", "throughput": 3235.7, "total_tokens": 364736}
|
||||
{"current_steps": 255, "total_steps": 1405, "loss": 0.1648, "lr": 4.902047418695292e-05, "epoch": 0.9074733096085409, "percentage": 18.15, "elapsed_time": "0:01:53", "remaining_time": "0:08:30", "throughput": 3284.51, "total_tokens": 371648}
|
||||
{"current_steps": 260, "total_steps": 1405, "loss": 0.1706, "lr": 4.893250847214369e-05, "epoch": 0.9252669039145908, "percentage": 18.51, "elapsed_time": "0:01:53", "remaining_time": "0:08:20", "throughput": 3337.95, "total_tokens": 379200}
|
||||
{"current_steps": 265, "total_steps": 1405, "loss": 0.2379, "lr": 4.884084678902898e-05, "epoch": 0.9430604982206405, "percentage": 18.86, "elapsed_time": "0:01:54", "remaining_time": "0:08:10", "throughput": 3394.56, "total_tokens": 387200}
|
||||
{"current_steps": 270, "total_steps": 1405, "loss": 0.1618, "lr": 4.874550329319457e-05, "epoch": 0.9608540925266904, "percentage": 19.22, "elapsed_time": "0:01:54", "remaining_time": "0:08:01", "throughput": 3450.89, "total_tokens": 395264}
|
||||
{"current_steps": 275, "total_steps": 1405, "loss": 0.1637, "lr": 4.864649270881944e-05, "epoch": 0.9786476868327402, "percentage": 19.57, "elapsed_time": "0:01:54", "remaining_time": "0:07:52", "throughput": 3498.0, "total_tokens": 402176}
|
||||
{"current_steps": 280, "total_steps": 1405, "loss": 0.1956, "lr": 4.8543830326401954e-05, "epoch": 0.99644128113879, "percentage": 19.93, "elapsed_time": "0:01:55", "remaining_time": "0:07:43", "throughput": 3551.83, "total_tokens": 409984}
|
||||
{"current_steps": 284, "total_steps": 1405, "eval_loss": 0.16843144595623016, "epoch": 1.01067615658363, "percentage": 20.21, "elapsed_time": "0:01:56", "remaining_time": "0:07:39", "throughput": 3556.6, "total_tokens": 414040}
|
||||
{"current_steps": 285, "total_steps": 1405, "loss": 0.1483, "lr": 4.843753200039851e-05, "epoch": 1.0142348754448398, "percentage": 20.28, "elapsed_time": "0:02:18", "remaining_time": "0:09:04", "throughput": 2994.8, "total_tokens": 415256}
|
||||
{"current_steps": 290, "total_steps": 1405, "loss": 0.1508, "lr": 4.832761414677503e-05, "epoch": 1.0320284697508897, "percentage": 20.64, "elapsed_time": "0:02:19", "remaining_time": "0:08:54", "throughput": 3039.39, "total_tokens": 422808}
|
||||
{"current_steps": 295, "total_steps": 1405, "loss": 0.1599, "lr": 4.8214093740471836e-05, "epoch": 1.0498220640569396, "percentage": 21.0, "elapsed_time": "0:02:19", "remaining_time": "0:08:45", "throughput": 3082.0, "total_tokens": 430104}
|
||||
{"current_steps": 300, "total_steps": 1405, "loss": 0.1629, "lr": 4.8096988312782174e-05, "epoch": 1.0676156583629894, "percentage": 21.35, "elapsed_time": "0:02:19", "remaining_time": "0:08:35", "throughput": 3120.14, "total_tokens": 436760}
|
||||
{"current_steps": 305, "total_steps": 1405, "loss": 0.1729, "lr": 4.7976315948644745e-05, "epoch": 1.085409252669039, "percentage": 21.71, "elapsed_time": "0:02:20", "remaining_time": "0:08:26", "throughput": 3168.2, "total_tokens": 444952}
|
||||
{"current_steps": 310, "total_steps": 1405, "loss": 3.0413, "lr": 4.7852095283850866e-05, "epoch": 1.103202846975089, "percentage": 22.06, "elapsed_time": "0:02:20", "remaining_time": "0:08:17", "throughput": 3213.29, "total_tokens": 452760}
|
||||
{"current_steps": 315, "total_steps": 1405, "loss": 0.1785, "lr": 4.772434550216643e-05, "epoch": 1.1209964412811388, "percentage": 22.42, "elapsed_time": "0:02:21", "remaining_time": "0:08:08", "throughput": 3243.84, "total_tokens": 458392}
|
||||
{"current_steps": 320, "total_steps": 1405, "loss": 0.1666, "lr": 4.7593086332369344e-05, "epoch": 1.1387900355871885, "percentage": 22.78, "elapsed_time": "0:02:21", "remaining_time": "0:08:00", "throughput": 3281.47, "total_tokens": 465112}
|
||||
{"current_steps": 325, "total_steps": 1405, "loss": 0.2395, "lr": 4.74583380452027e-05, "epoch": 1.1565836298932384, "percentage": 23.13, "elapsed_time": "0:02:22", "remaining_time": "0:07:52", "throughput": 3321.41, "total_tokens": 472216}
|
||||
{"current_steps": 330, "total_steps": 1405, "loss": 0.2229, "lr": 4.7320121450244394e-05, "epoch": 1.1743772241992882, "percentage": 23.49, "elapsed_time": "0:02:22", "remaining_time": "0:07:44", "throughput": 3362.68, "total_tokens": 479576}
|
||||
{"current_steps": 335, "total_steps": 1405, "loss": 0.2531, "lr": 4.717845789269333e-05, "epoch": 1.1921708185053381, "percentage": 23.84, "elapsed_time": "0:02:23", "remaining_time": "0:07:36", "throughput": 3401.21, "total_tokens": 486552}
|
||||
{"current_steps": 340, "total_steps": 1405, "loss": 0.2223, "lr": 4.703336925007311e-05, "epoch": 1.209964412811388, "percentage": 24.2, "elapsed_time": "0:02:23", "remaining_time": "0:07:29", "throughput": 3446.39, "total_tokens": 494616}
|
||||
{"current_steps": 345, "total_steps": 1405, "loss": 0.1898, "lr": 4.68848779288534e-05, "epoch": 1.2277580071174377, "percentage": 24.56, "elapsed_time": "0:02:23", "remaining_time": "0:07:22", "throughput": 3483.18, "total_tokens": 501400}
|
||||
{"current_steps": 350, "total_steps": 1405, "loss": 0.1662, "lr": 4.673300686098957e-05, "epoch": 1.2455516014234875, "percentage": 24.91, "elapsed_time": "0:02:24", "remaining_time": "0:07:15", "throughput": 3524.17, "total_tokens": 508888}
|
||||
{"current_steps": 355, "total_steps": 1405, "loss": 0.1589, "lr": 4.657777950038133e-05, "epoch": 1.2633451957295374, "percentage": 25.27, "elapsed_time": "0:02:24", "remaining_time": "0:07:08", "throughput": 3573.03, "total_tokens": 517656}
|
||||
{"current_steps": 355, "total_steps": 1405, "eval_loss": 0.1600693166255951, "epoch": 1.2633451957295374, "percentage": 25.27, "elapsed_time": "0:02:25", "remaining_time": "0:07:10", "throughput": 3558.09, "total_tokens": 517656}
|
||||
{"current_steps": 360, "total_steps": 1405, "loss": 0.1538, "lr": 4.6419219819250636e-05, "epoch": 1.281138790035587, "percentage": 25.62, "elapsed_time": "0:03:20", "remaining_time": "0:09:41", "throughput": 2624.92, "total_tokens": 526232}
|
||||
{"current_steps": 365, "total_steps": 1405, "loss": 0.1811, "lr": 4.62573523044396e-05, "epoch": 1.298932384341637, "percentage": 25.98, "elapsed_time": "0:03:20", "remaining_time": "0:09:32", "throughput": 2654.85, "total_tokens": 533400}
|
||||
{"current_steps": 370, "total_steps": 1405, "loss": 0.174, "lr": 4.609220195362886e-05, "epoch": 1.3167259786476868, "percentage": 26.33, "elapsed_time": "0:03:21", "remaining_time": "0:09:23", "throughput": 2691.99, "total_tokens": 542168}
|
||||
{"current_steps": 375, "total_steps": 1405, "loss": 0.1571, "lr": 4.5923794271477217e-05, "epoch": 1.3345195729537367, "percentage": 26.69, "elapsed_time": "0:03:21", "remaining_time": "0:09:14", "throughput": 2724.59, "total_tokens": 549976}
|
||||
{"current_steps": 380, "total_steps": 1405, "loss": 0.1641, "lr": 4.575215526568278e-05, "epoch": 1.3523131672597866, "percentage": 27.05, "elapsed_time": "0:03:22", "remaining_time": "0:09:05", "throughput": 2753.47, "total_tokens": 557016}
|
||||
{"current_steps": 385, "total_steps": 1405, "loss": 1.4814, "lr": 4.5577311442966584e-05, "epoch": 1.3701067615658362, "percentage": 27.4, "elapsed_time": "0:03:22", "remaining_time": "0:08:57", "throughput": 2784.28, "total_tokens": 564504}
|
||||
{"current_steps": 390, "total_steps": 1405, "loss": 0.1601, "lr": 4.539928980497903e-05, "epoch": 1.387900355871886, "percentage": 27.76, "elapsed_time": "0:03:23", "remaining_time": "0:08:48", "throughput": 2814.37, "total_tokens": 571864}
|
||||
{"current_steps": 395, "total_steps": 1405, "loss": 0.2213, "lr": 4.521811784412996e-05, "epoch": 1.405693950177936, "percentage": 28.11, "elapsed_time": "0:03:23", "remaining_time": "0:08:40", "throughput": 2840.85, "total_tokens": 578456}
|
||||
{"current_steps": 400, "total_steps": 1405, "loss": 1.4493, "lr": 4.503382353934294e-05, "epoch": 1.4234875444839858, "percentage": 28.47, "elapsed_time": "0:03:24", "remaining_time": "0:08:32", "throughput": 2865.2, "total_tokens": 584600}
|
||||
{"current_steps": 405, "total_steps": 1405, "loss": 0.1729, "lr": 4.4846435351734376e-05, "epoch": 1.4412811387900355, "percentage": 28.83, "elapsed_time": "0:03:24", "remaining_time": "0:08:24", "throughput": 2891.23, "total_tokens": 591128}
|
||||
{"current_steps": 410, "total_steps": 1405, "loss": 0.1539, "lr": 4.4655982220218176e-05, "epoch": 1.4590747330960854, "percentage": 29.18, "elapsed_time": "0:03:24", "remaining_time": "0:08:17", "throughput": 2921.17, "total_tokens": 598552}
|
||||
{"current_steps": 415, "total_steps": 1405, "loss": 0.1612, "lr": 4.446249355703661e-05, "epoch": 1.4768683274021353, "percentage": 29.54, "elapsed_time": "0:03:25", "remaining_time": "0:08:09", "throughput": 2956.98, "total_tokens": 607320}
|
||||
{"current_steps": 420, "total_steps": 1405, "loss": 0.1594, "lr": 4.426599924321815e-05, "epoch": 1.4946619217081851, "percentage": 29.89, "elapsed_time": "0:03:25", "remaining_time": "0:08:02", "throughput": 2986.7, "total_tokens": 614744}
|
||||
{"current_steps": 425, "total_steps": 1405, "loss": 0.1947, "lr": 4.4066529623962784e-05, "epoch": 1.512455516014235, "percentage": 30.25, "elapsed_time": "0:03:26", "remaining_time": "0:07:55", "throughput": 3019.07, "total_tokens": 622808}
|
||||
{"current_steps": 426, "total_steps": 1405, "eval_loss": 0.18150445818901062, "epoch": 1.5160142348754448, "percentage": 30.32, "elapsed_time": "0:03:26", "remaining_time": "0:07:55", "throughput": 3016.74, "total_tokens": 624344}
|
||||
{"current_steps": 430, "total_steps": 1405, "loss": 0.1523, "lr": 4.386411550395576e-05, "epoch": 1.5302491103202847, "percentage": 30.6, "elapsed_time": "0:03:48", "remaining_time": "0:08:38", "throughput": 2759.03, "total_tokens": 630488}
|
||||
{"current_steps": 435, "total_steps": 1405, "loss": 0.1721, "lr": 4.365878814261032e-05, "epoch": 1.5480427046263345, "percentage": 30.96, "elapsed_time": "0:03:48", "remaining_time": "0:08:30", "throughput": 2788.16, "total_tokens": 638424}
|
||||
{"current_steps": 440, "total_steps": 1405, "loss": 0.1551, "lr": 4.34505792492402e-05, "epoch": 1.5658362989323842, "percentage": 31.32, "elapsed_time": "0:03:49", "remaining_time": "0:08:23", "throughput": 2812.54, "total_tokens": 645208}
|
||||
{"current_steps": 445, "total_steps": 1405, "loss": 0.1499, "lr": 4.323952097816269e-05, "epoch": 1.583629893238434, "percentage": 31.67, "elapsed_time": "0:03:49", "remaining_time": "0:08:15", "throughput": 2840.95, "total_tokens": 653016}
|
||||
{"current_steps": 450, "total_steps": 1405, "loss": 0.1843, "lr": 4.3025645923732926e-05, "epoch": 1.601423487544484, "percentage": 32.03, "elapsed_time": "0:03:50", "remaining_time": "0:08:08", "throughput": 2865.85, "total_tokens": 659992}
|
||||
{"current_steps": 455, "total_steps": 1405, "loss": 0.1579, "lr": 4.2808987115310255e-05, "epoch": 1.6192170818505338, "percentage": 32.38, "elapsed_time": "0:03:50", "remaining_time": "0:08:01", "throughput": 2891.63, "total_tokens": 667224}
|
||||
{"current_steps": 460, "total_steps": 1405, "loss": 0.1563, "lr": 4.2589578012157426e-05, "epoch": 1.6370106761565837, "percentage": 32.74, "elapsed_time": "0:03:51", "remaining_time": "0:07:54", "throughput": 2920.2, "total_tokens": 675160}
|
||||
{"current_steps": 465, "total_steps": 1405, "loss": 0.1556, "lr": 4.236745249827336e-05, "epoch": 1.6548042704626336, "percentage": 33.1, "elapsed_time": "0:03:51", "remaining_time": "0:07:48", "throughput": 2950.41, "total_tokens": 683544}
|
||||
{"current_steps": 470, "total_steps": 1405, "loss": 0.1593, "lr": 4.214264487716033e-05, "epoch": 1.6725978647686834, "percentage": 33.45, "elapsed_time": "0:03:52", "remaining_time": "0:07:41", "throughput": 2970.35, "total_tokens": 689368}
|
||||
{"current_steps": 475, "total_steps": 1405, "loss": 0.1699, "lr": 4.191518986652642e-05, "epoch": 1.690391459074733, "percentage": 33.81, "elapsed_time": "0:03:52", "remaining_time": "0:07:35", "throughput": 2992.74, "total_tokens": 695832}
|
||||
{"current_steps": 480, "total_steps": 1405, "loss": 0.1563, "lr": 4.168512259292391e-05, "epoch": 1.708185053380783, "percentage": 34.16, "elapsed_time": "0:03:52", "remaining_time": "0:07:28", "throughput": 3018.34, "total_tokens": 703128}
|
||||
{"current_steps": 485, "total_steps": 1405, "loss": 0.1507, "lr": 4.1452478586324605e-05, "epoch": 1.7259786476868326, "percentage": 34.52, "elapsed_time": "0:03:53", "remaining_time": "0:07:22", "throughput": 3040.19, "total_tokens": 709528}
|
||||
{"current_steps": 490, "total_steps": 1405, "loss": 0.1558, "lr": 4.121729377463285e-05, "epoch": 1.7437722419928825, "percentage": 34.88, "elapsed_time": "0:03:53", "remaining_time": "0:07:16", "throughput": 3063.63, "total_tokens": 716312}
|
||||
{"current_steps": 495, "total_steps": 1405, "loss": 0.1825, "lr": 4.097960447813705e-05, "epoch": 1.7615658362989324, "percentage": 35.23, "elapsed_time": "0:03:54", "remaining_time": "0:07:10", "throughput": 3085.76, "total_tokens": 722776}
|
||||
{"current_steps": 497, "total_steps": 1405, "eval_loss": 0.16469639539718628, "epoch": 1.7686832740213523, "percentage": 35.37, "elapsed_time": "0:03:54", "remaining_time": "0:07:09", "throughput": 3088.2, "total_tokens": 725656}
|
||||
{"current_steps": 500, "total_steps": 1405, "loss": 0.1798, "lr": 4.073944740390061e-05, "epoch": 1.7793594306049823, "percentage": 35.59, "elapsed_time": "0:04:17", "remaining_time": "0:07:46", "throughput": 2831.32, "total_tokens": 729944}
|
||||
{"current_steps": 505, "total_steps": 1405, "loss": 0.1694, "lr": 4.049685964009321e-05, "epoch": 1.7971530249110321, "percentage": 35.94, "elapsed_time": "0:04:18", "remaining_time": "0:07:40", "throughput": 2854.17, "total_tokens": 737112}
|
||||
{"current_steps": 510, "total_steps": 1405, "loss": 0.1605, "lr": 4.025187865026311e-05, "epoch": 1.814946619217082, "percentage": 36.3, "elapsed_time": "0:04:18", "remaining_time": "0:07:34", "throughput": 2877.43, "total_tokens": 744408}
|
||||
{"current_steps": 515, "total_steps": 1405, "loss": 0.1574, "lr": 4.000454226755159e-05, "epoch": 1.8327402135231317, "percentage": 36.65, "elapsed_time": "0:04:19", "remaining_time": "0:07:27", "throughput": 2896.23, "total_tokens": 750488}
|
||||
{"current_steps": 520, "total_steps": 1405, "loss": 0.1703, "lr": 3.975488868885021e-05, "epoch": 1.8505338078291815, "percentage": 37.01, "elapsed_time": "0:04:19", "remaining_time": "0:07:21", "throughput": 2918.39, "total_tokens": 757528}
|
||||
{"current_steps": 525, "total_steps": 1405, "loss": 0.1545, "lr": 3.9502956468902014e-05, "epoch": 1.8683274021352312, "percentage": 37.37, "elapsed_time": "0:04:19", "remaining_time": "0:07:15", "throughput": 2937.52, "total_tokens": 763736}
|
||||
{"current_steps": 530, "total_steps": 1405, "loss": 0.1534, "lr": 3.924878451434735e-05, "epoch": 1.886120996441281, "percentage": 37.72, "elapsed_time": "0:04:20", "remaining_time": "0:07:10", "throughput": 2963.41, "total_tokens": 771864}
|
||||
{"current_steps": 535, "total_steps": 1405, "loss": 0.1537, "lr": 3.899241207771546e-05, "epoch": 1.903914590747331, "percentage": 38.08, "elapsed_time": "0:04:20", "remaining_time": "0:07:04", "throughput": 2984.65, "total_tokens": 778712}
|
||||
{"current_steps": 540, "total_steps": 1405, "loss": 0.1917, "lr": 3.873387875136252e-05, "epoch": 1.9217081850533808, "percentage": 38.43, "elapsed_time": "0:04:21", "remaining_time": "0:06:58", "throughput": 3001.32, "total_tokens": 784280}
|
||||
{"current_steps": 545, "total_steps": 1405, "loss": 0.1743, "lr": 3.847322446135736e-05, "epoch": 1.9395017793594307, "percentage": 38.79, "elapsed_time": "0:04:21", "remaining_time": "0:06:53", "throughput": 3026.54, "total_tokens": 792280}
|
||||
{"current_steps": 550, "total_steps": 1405, "loss": 0.1752, "lr": 3.821048946131549e-05, "epoch": 1.9572953736654806, "percentage": 39.15, "elapsed_time": "0:04:22", "remaining_time": "0:06:47", "throughput": 3045.34, "total_tokens": 798488}
|
||||
{"current_steps": 555, "total_steps": 1405, "loss": 0.1578, "lr": 3.794571432618267e-05, "epoch": 1.9750889679715302, "percentage": 39.5, "elapsed_time": "0:04:22", "remaining_time": "0:06:42", "throughput": 3069.06, "total_tokens": 806104}
|
||||
{"current_steps": 560, "total_steps": 1405, "loss": 0.1774, "lr": 3.767893994596876e-05, "epoch": 1.99288256227758, "percentage": 39.86, "elapsed_time": "0:04:23", "remaining_time": "0:06:37", "throughput": 3091.34, "total_tokens": 813336}
|
||||
{"current_steps": 565, "total_steps": 1405, "loss": 0.1568, "lr": 3.741020751943297e-05, "epoch": 2.0106761565836297, "percentage": 40.21, "elapsed_time": "0:04:23", "remaining_time": "0:06:31", "throughput": 3101.64, "total_tokens": 817576}
|
||||
{"current_steps": 568, "total_steps": 1405, "eval_loss": 0.15550938248634338, "epoch": 2.02135231316726, "percentage": 40.43, "elapsed_time": "0:04:24", "remaining_time": "0:06:29", "throughput": 3106.17, "total_tokens": 821416}
|
||||
{"current_steps": 570, "total_steps": 1405, "loss": 0.1565, "lr": 3.713955854772144e-05, "epoch": 2.0284697508896796, "percentage": 40.57, "elapsed_time": "0:05:04", "remaining_time": "0:07:25", "throughput": 2708.15, "total_tokens": 823848}
|
||||
{"current_steps": 575, "total_steps": 1405, "loss": 0.1536, "lr": 3.686703482795802e-05, "epoch": 2.0462633451957295, "percentage": 40.93, "elapsed_time": "0:05:04", "remaining_time": "0:07:19", "throughput": 2731.49, "total_tokens": 832232}
|
||||
{"current_steps": 580, "total_steps": 1405, "loss": 0.1624, "lr": 3.6592678446789516e-05, "epoch": 2.0640569395017794, "percentage": 41.28, "elapsed_time": "0:05:05", "remaining_time": "0:07:14", "throughput": 2754.17, "total_tokens": 840424}
|
||||
{"current_steps": 585, "total_steps": 1405, "loss": 0.1395, "lr": 3.631653177388605e-05, "epoch": 2.0818505338078293, "percentage": 41.64, "elapsed_time": "0:05:05", "remaining_time": "0:07:08", "throughput": 2771.33, "total_tokens": 846824}
|
||||
{"current_steps": 590, "total_steps": 1405, "loss": 0.196, "lr": 3.60386374553978e-05, "epoch": 2.099644128113879, "percentage": 41.99, "elapsed_time": "0:05:05", "remaining_time": "0:07:02", "throughput": 2789.65, "total_tokens": 853608}
|
||||
{"current_steps": 595, "total_steps": 1405, "loss": 0.1637, "lr": 3.5759038407369056e-05, "epoch": 2.117437722419929, "percentage": 42.35, "elapsed_time": "0:05:06", "remaining_time": "0:06:57", "throughput": 2809.64, "total_tokens": 860968}
|
||||
{"current_steps": 600, "total_steps": 1405, "loss": 0.194, "lr": 3.547777780911055e-05, "epoch": 2.135231316725979, "percentage": 42.7, "elapsed_time": "0:05:06", "remaining_time": "0:06:51", "throughput": 2831.31, "total_tokens": 868904}
|
||||
{"current_steps": 605, "total_steps": 1405, "loss": 0.1592, "lr": 3.519489909653113e-05, "epoch": 2.1530249110320283, "percentage": 43.06, "elapsed_time": "0:05:07", "remaining_time": "0:06:46", "throughput": 2850.58, "total_tokens": 876072}
|
||||
{"current_steps": 610, "total_steps": 1405, "loss": 0.1549, "lr": 3.4910445955429854e-05, "epoch": 2.170818505338078, "percentage": 43.42, "elapsed_time": "0:05:07", "remaining_time": "0:06:41", "throughput": 2871.33, "total_tokens": 883752}
|
||||
{"current_steps": 615, "total_steps": 1405, "loss": 0.1533, "lr": 3.4624462314749443e-05, "epoch": 2.188612099644128, "percentage": 43.77, "elapsed_time": "0:05:08", "remaining_time": "0:06:35", "throughput": 2891.64, "total_tokens": 891304}
|
||||
{"current_steps": 620, "total_steps": 1405, "loss": 0.1483, "lr": 3.433699233979222e-05, "epoch": 2.206405693950178, "percentage": 44.13, "elapsed_time": "0:05:08", "remaining_time": "0:06:30", "throughput": 2912.87, "total_tokens": 899176}
|
||||
{"current_steps": 625, "total_steps": 1405, "loss": 0.1436, "lr": 3.4048080425399505e-05, "epoch": 2.224199288256228, "percentage": 44.48, "elapsed_time": "0:05:09", "remaining_time": "0:06:25", "throughput": 2935.53, "total_tokens": 907560}
|
||||
{"current_steps": 630, "total_steps": 1405, "loss": 0.1413, "lr": 3.375777118909561e-05, "epoch": 2.2419928825622777, "percentage": 44.84, "elapsed_time": "0:05:09", "remaining_time": "0:06:20", "throughput": 2956.05, "total_tokens": 915240}
|
||||
{"current_steps": 635, "total_steps": 1405, "loss": 0.1597, "lr": 3.3466109464197426e-05, "epoch": 2.2597864768683276, "percentage": 45.2, "elapsed_time": "0:05:10", "remaining_time": "0:06:15", "throughput": 2971.91, "total_tokens": 921384}
|
||||
{"current_steps": 639, "total_steps": 1405, "eval_loss": 0.1567462682723999, "epoch": 2.2740213523131674, "percentage": 45.48, "elapsed_time": "0:05:10", "remaining_time": "0:06:12", "throughput": 2980.22, "total_tokens": 926760}
|
||||
{"current_steps": 640, "total_steps": 1405, "loss": 0.1653, "lr": 3.317314029289067e-05, "epoch": 2.277580071174377, "percentage": 45.55, "elapsed_time": "0:05:30", "remaining_time": "0:06:35", "throughput": 2804.57, "total_tokens": 927528}
|
||||
{"current_steps": 645, "total_steps": 1405, "loss": 0.1594, "lr": 3.287890891927386e-05, "epoch": 2.295373665480427, "percentage": 45.91, "elapsed_time": "0:05:31", "remaining_time": "0:06:30", "throughput": 2822.09, "total_tokens": 934568}
|
||||
{"current_steps": 650, "total_steps": 1405, "loss": 0.1402, "lr": 3.258346078237122e-05, "epoch": 2.3131672597864767, "percentage": 46.26, "elapsed_time": "0:05:31", "remaining_time": "0:06:25", "throughput": 2841.4, "total_tokens": 942248}
|
||||
{"current_steps": 655, "total_steps": 1405, "loss": 0.2418, "lr": 3.228684150911527e-05, "epoch": 2.3309608540925266, "percentage": 46.62, "elapsed_time": "0:05:32", "remaining_time": "0:06:20", "throughput": 2858.33, "total_tokens": 949096}
|
||||
{"current_steps": 660, "total_steps": 1405, "loss": 0.1845, "lr": 3.198909690730063e-05, "epoch": 2.3487544483985765, "percentage": 46.98, "elapsed_time": "0:05:32", "remaining_time": "0:06:15", "throughput": 2874.67, "total_tokens": 955752}
|
||||
{"current_steps": 665, "total_steps": 1405, "loss": 0.1664, "lr": 3.169027295850977e-05, "epoch": 2.3665480427046264, "percentage": 47.33, "elapsed_time": "0:05:32", "remaining_time": "0:06:10", "throughput": 2893.1, "total_tokens": 963176}
|
||||
{"current_steps": 670, "total_steps": 1405, "loss": 0.1627, "lr": 3.139041581101187e-05, "epoch": 2.3843416370106763, "percentage": 47.69, "elapsed_time": "0:05:33", "remaining_time": "0:06:05", "throughput": 2904.91, "total_tokens": 968232}
|
||||
{"current_steps": 675, "total_steps": 1405, "loss": 0.1498, "lr": 3.108957177263608e-05, "epoch": 2.402135231316726, "percentage": 48.04, "elapsed_time": "0:05:33", "remaining_time": "0:06:00", "throughput": 2925.75, "total_tokens": 976552}
|
||||
{"current_steps": 680, "total_steps": 1405, "loss": 0.1656, "lr": 3.078778730362003e-05, "epoch": 2.419928825622776, "percentage": 48.4, "elapsed_time": "0:05:34", "remaining_time": "0:05:56", "throughput": 2943.32, "total_tokens": 983720}
|
||||
{"current_steps": 685, "total_steps": 1405, "loss": 0.1567, "lr": 3.048510900943484e-05, "epoch": 2.4377224199288254, "percentage": 48.75, "elapsed_time": "0:05:34", "remaining_time": "0:05:51", "throughput": 2963.87, "total_tokens": 991976}
|
||||
{"current_steps": 690, "total_steps": 1405, "loss": 0.1807, "lr": 3.018158363358773e-05, "epoch": 2.4555160142348753, "percentage": 49.11, "elapsed_time": "0:05:35", "remaining_time": "0:05:47", "throughput": 2978.75, "total_tokens": 998184}
|
||||
{"current_steps": 695, "total_steps": 1405, "loss": 0.1678, "lr": 2.9877258050403212e-05, "epoch": 2.473309608540925, "percentage": 49.47, "elapsed_time": "0:05:35", "remaining_time": "0:05:42", "throughput": 2997.12, "total_tokens": 1005672}
|
||||
{"current_steps": 700, "total_steps": 1405, "loss": 0.1531, "lr": 2.9572179257784215e-05, "epoch": 2.491103202846975, "percentage": 49.82, "elapsed_time": "0:05:35", "remaining_time": "0:05:38", "throughput": 3015.24, "total_tokens": 1013096}
|
||||
{"current_steps": 705, "total_steps": 1405, "loss": 0.1337, "lr": 2.9266394369954052e-05, "epoch": 2.508896797153025, "percentage": 50.18, "elapsed_time": "0:05:36", "remaining_time": "0:05:34", "throughput": 3029.96, "total_tokens": 1019304}
|
||||
{"current_steps": 710, "total_steps": 1405, "loss": 0.1431, "lr": 2.8959950610180374e-05, "epoch": 2.526690391459075, "percentage": 50.53, "elapsed_time": "0:05:36", "remaining_time": "0:05:29", "throughput": 3044.1, "total_tokens": 1025320}
|
||||
{"current_steps": 710, "total_steps": 1405, "eval_loss": 0.16391661763191223, "epoch": 2.526690391459075, "percentage": 50.53, "elapsed_time": "0:05:37", "remaining_time": "0:05:30", "throughput": 3038.61, "total_tokens": 1025320}
|
||||
{"current_steps": 715, "total_steps": 1405, "loss": 0.1675, "lr": 2.865289530348243e-05, "epoch": 2.5444839857651247, "percentage": 50.89, "elapsed_time": "0:05:58", "remaining_time": "0:05:45", "throughput": 2882.35, "total_tokens": 1032552}
|
||||
{"current_steps": 720, "total_steps": 1405, "loss": 2.4615, "lr": 2.834527586932243e-05, "epoch": 2.562277580071174, "percentage": 51.25, "elapsed_time": "0:05:58", "remaining_time": "0:05:41", "throughput": 2899.29, "total_tokens": 1039912}
|
||||
{"current_steps": 725, "total_steps": 1405, "loss": 0.1636, "lr": 2.8037139814282493e-05, "epoch": 2.580071174377224, "percentage": 51.6, "elapsed_time": "0:05:59", "remaining_time": "0:05:36", "throughput": 2916.01, "total_tokens": 1047208}
|
||||
{"current_steps": 730, "total_steps": 1405, "loss": 0.1652, "lr": 2.7728534724728027e-05, "epoch": 2.597864768683274, "percentage": 51.96, "elapsed_time": "0:05:59", "remaining_time": "0:05:32", "throughput": 2931.23, "total_tokens": 1053928}
|
||||
{"current_steps": 735, "total_steps": 1405, "loss": 0.1482, "lr": 2.741950825945881e-05, "epoch": 2.6156583629893237, "percentage": 52.31, "elapsed_time": "0:06:00", "remaining_time": "0:05:28", "throughput": 2948.87, "total_tokens": 1061608}
|
||||
{"current_steps": 740, "total_steps": 1405, "loss": 0.1501, "lr": 2.711010814234896e-05, "epoch": 2.6334519572953736, "percentage": 52.67, "elapsed_time": "0:06:00", "remaining_time": "0:05:23", "throughput": 2962.04, "total_tokens": 1067560}
|
||||
{"current_steps": 745, "total_steps": 1405, "loss": 0.1743, "lr": 2.6800382154976732e-05, "epoch": 2.6512455516014235, "percentage": 53.02, "elapsed_time": "0:06:00", "remaining_time": "0:05:19", "throughput": 2976.8, "total_tokens": 1074152}
|
||||
{"current_steps": 750, "total_steps": 1405, "loss": 0.1441, "lr": 2.6490378129245498e-05, "epoch": 2.6690391459074734, "percentage": 53.38, "elapsed_time": "0:06:01", "remaining_time": "0:05:15", "throughput": 2996.98, "total_tokens": 1082856}
|
||||
{"current_steps": 755, "total_steps": 1405, "loss": 0.1495, "lr": 2.6180143939996925e-05, "epoch": 2.6868327402135233, "percentage": 53.74, "elapsed_time": "0:06:01", "remaining_time": "0:05:11", "throughput": 3011.87, "total_tokens": 1089512}
|
||||
{"current_steps": 760, "total_steps": 1405, "loss": 0.1464, "lr": 2.5869727497617495e-05, "epoch": 2.704626334519573, "percentage": 54.09, "elapsed_time": "0:06:02", "remaining_time": "0:05:07", "throughput": 3026.86, "total_tokens": 1096232}
|
||||
{"current_steps": 765, "total_steps": 1405, "loss": 0.1572, "lr": 2.55591767406396e-05, "epoch": 2.722419928825623, "percentage": 54.45, "elapsed_time": "0:06:02", "remaining_time": "0:05:03", "throughput": 3044.92, "total_tokens": 1104168}
|
||||
{"current_steps": 770, "total_steps": 1405, "loss": 0.1326, "lr": 2.5248539628338246e-05, "epoch": 2.7402135231316724, "percentage": 54.8, "elapsed_time": "0:06:03", "remaining_time": "0:04:59", "throughput": 3063.29, "total_tokens": 1112232}
|
||||
{"current_steps": 775, "total_steps": 1405, "loss": 0.1734, "lr": 2.4937864133324516e-05, "epoch": 2.7580071174377223, "percentage": 55.16, "elapsed_time": "0:06:03", "remaining_time": "0:04:55", "throughput": 3078.33, "total_tokens": 1119016}
|
||||
{"current_steps": 780, "total_steps": 1405, "loss": 0.1986, "lr": 2.462719823413707e-05, "epoch": 2.775800711743772, "percentage": 55.52, "elapsed_time": "0:06:03", "remaining_time": "0:04:51", "throughput": 3095.58, "total_tokens": 1126696}
|
||||
{"current_steps": 781, "total_steps": 1405, "eval_loss": 0.15414386987686157, "epoch": 2.7793594306049823, "percentage": 55.59, "elapsed_time": "0:06:04", "remaining_time": "0:04:51", "throughput": 3093.52, "total_tokens": 1128104}
|
||||
{"current_steps": 785, "total_steps": 1405, "loss": 0.1576, "lr": 2.4316589907832654e-05, "epoch": 2.793594306049822, "percentage": 55.87, "elapsed_time": "0:06:26", "remaining_time": "0:05:05", "throughput": 2934.79, "total_tokens": 1134184}
|
||||
{"current_steps": 790, "total_steps": 1405, "loss": 0.1392, "lr": 2.4006087122576863e-05, "epoch": 2.811387900355872, "percentage": 56.23, "elapsed_time": "0:06:26", "remaining_time": "0:05:01", "throughput": 2947.66, "total_tokens": 1140392}
|
||||
{"current_steps": 795, "total_steps": 1405, "loss": 0.2025, "lr": 2.3695737830236266e-05, "epoch": 2.829181494661922, "percentage": 56.58, "elapsed_time": "0:06:27", "remaining_time": "0:04:57", "throughput": 2964.64, "total_tokens": 1148328}
|
||||
{"current_steps": 800, "total_steps": 1405, "loss": 0.1781, "lr": 2.338558995897307e-05, "epoch": 2.8469750889679717, "percentage": 56.94, "elapsed_time": "0:06:27", "remaining_time": "0:04:53", "throughput": 2976.22, "total_tokens": 1154024}
|
||||
{"current_steps": 805, "total_steps": 1405, "loss": 0.195, "lr": 2.3075691405843435e-05, "epoch": 2.864768683274021, "percentage": 57.3, "elapsed_time": "0:06:28", "remaining_time": "0:04:49", "throughput": 2990.39, "total_tokens": 1160808}
|
||||
{"current_steps": 810, "total_steps": 1405, "loss": 0.1597, "lr": 2.2766090029400573e-05, "epoch": 2.882562277580071, "percentage": 57.65, "elapsed_time": "0:06:28", "remaining_time": "0:04:45", "throughput": 3005.29, "total_tokens": 1167912}
|
||||
{"current_steps": 815, "total_steps": 1405, "loss": 0.1433, "lr": 2.2456833642303822e-05, "epoch": 2.900355871886121, "percentage": 58.01, "elapsed_time": "0:06:29", "remaining_time": "0:04:41", "throughput": 3019.06, "total_tokens": 1174568}
|
||||
{"current_steps": 820, "total_steps": 1405, "loss": 0.1553, "lr": 2.214797000393479e-05, "epoch": 2.9181494661921707, "percentage": 58.36, "elapsed_time": "0:06:29", "remaining_time": "0:04:37", "throughput": 3033.42, "total_tokens": 1181480}
|
||||
{"current_steps": 825, "total_steps": 1405, "loss": 0.1614, "lr": 2.183954681302173e-05, "epoch": 2.9359430604982206, "percentage": 58.72, "elapsed_time": "0:06:29", "remaining_time": "0:04:34", "throughput": 3051.42, "total_tokens": 1189928}
|
||||
{"current_steps": 830, "total_steps": 1405, "loss": 0.1351, "lr": 2.1531611700273297e-05, "epoch": 2.9537366548042705, "percentage": 59.07, "elapsed_time": "0:06:30", "remaining_time": "0:04:30", "throughput": 3067.25, "total_tokens": 1197480}
|
||||
{"current_steps": 835, "total_steps": 1405, "loss": 0.1845, "lr": 2.1224212221022777e-05, "epoch": 2.9715302491103204, "percentage": 59.43, "elapsed_time": "0:06:30", "remaining_time": "0:04:26", "throughput": 3081.96, "total_tokens": 1204584}
|
||||
{"current_steps": 840, "total_steps": 1405, "loss": 0.1616, "lr": 2.0917395847883995e-05, "epoch": 2.9893238434163703, "percentage": 59.79, "elapsed_time": "0:06:31", "remaining_time": "0:04:23", "throughput": 3098.78, "total_tokens": 1212584}
|
||||
{"current_steps": 845, "total_steps": 1405, "loss": 0.1625, "lr": 2.0611209963419958e-05, "epoch": 3.00711743772242, "percentage": 60.14, "elapsed_time": "0:06:31", "remaining_time": "0:04:19", "throughput": 3108.17, "total_tokens": 1217856}
|
||||
{"current_steps": 850, "total_steps": 1405, "loss": 0.137, "lr": 2.030570185282544e-05, "epoch": 3.0249110320284696, "percentage": 60.5, "elapsed_time": "0:06:32", "remaining_time": "0:04:16", "throughput": 3126.69, "total_tokens": 1226624}
|
||||
{"current_steps": 852, "total_steps": 1405, "eval_loss": 0.1851627230644226, "epoch": 3.0320284697508897, "percentage": 60.64, "elapsed_time": "0:06:33", "remaining_time": "0:04:15", "throughput": 3127.64, "total_tokens": 1229440}
|
||||
{"current_steps": 855, "total_steps": 1405, "loss": 0.1453, "lr": 2.0000918696624588e-05, "epoch": 3.0427046263345194, "percentage": 60.85, "elapsed_time": "0:07:14", "remaining_time": "0:04:39", "throughput": 2836.99, "total_tokens": 1233152}
|
||||
{"current_steps": 860, "total_steps": 1405, "loss": 0.138, "lr": 1.9696907563384687e-05, "epoch": 3.0604982206405693, "percentage": 61.21, "elapsed_time": "0:07:15", "remaining_time": "0:04:35", "throughput": 2850.2, "total_tokens": 1240128}
|
||||
{"current_steps": 865, "total_steps": 1405, "loss": 0.1148, "lr": 1.939371540244723e-05, "epoch": 3.078291814946619, "percentage": 61.57, "elapsed_time": "0:07:15", "remaining_time": "0:04:31", "throughput": 2865.38, "total_tokens": 1248064}
|
||||
{"current_steps": 870, "total_steps": 1405, "loss": 0.1106, "lr": 1.9091389036677382e-05, "epoch": 3.096085409252669, "percentage": 61.92, "elapsed_time": "0:07:16", "remaining_time": "0:04:28", "throughput": 2878.95, "total_tokens": 1255232}
|
||||
{"current_steps": 875, "total_steps": 1405, "loss": 0.1169, "lr": 1.878997515523299e-05, "epoch": 3.113879003558719, "percentage": 62.28, "elapsed_time": "0:07:16", "remaining_time": "0:04:24", "throughput": 2892.21, "total_tokens": 1262272}
|
||||
{"current_steps": 880, "total_steps": 1405, "loss": 0.1161, "lr": 1.848952030635424e-05, "epoch": 3.131672597864769, "percentage": 62.63, "elapsed_time": "0:07:16", "remaining_time": "0:04:20", "throughput": 2906.11, "total_tokens": 1269632}
|
||||
{"current_steps": 885, "total_steps": 1405, "loss": 0.123, "lr": 1.819007089017508e-05, "epoch": 3.1494661921708187, "percentage": 62.99, "elapsed_time": "0:07:17", "remaining_time": "0:04:16", "throughput": 2920.66, "total_tokens": 1277312}
|
||||
{"current_steps": 890, "total_steps": 1405, "loss": 0.1599, "lr": 1.789167315155749e-05, "epoch": 3.167259786476868, "percentage": 63.35, "elapsed_time": "0:07:17", "remaining_time": "0:04:13", "throughput": 2933.3, "total_tokens": 1284096}
|
||||
{"current_steps": 895, "total_steps": 1405, "loss": 0.1109, "lr": 1.7594373172949784e-05, "epoch": 3.185053380782918, "percentage": 63.7, "elapsed_time": "0:07:18", "remaining_time": "0:04:09", "throughput": 2947.55, "total_tokens": 1291648}
|
||||
{"current_steps": 900, "total_steps": 1405, "loss": 0.1569, "lr": 1.7298216867269906e-05, "epoch": 3.202846975088968, "percentage": 64.06, "elapsed_time": "0:07:18", "remaining_time": "0:04:06", "throughput": 2962.83, "total_tokens": 1299712}
|
||||
{"current_steps": 905, "total_steps": 1405, "loss": 0.1082, "lr": 1.7003249970815026e-05, "epoch": 3.2206405693950177, "percentage": 64.41, "elapsed_time": "0:07:19", "remaining_time": "0:04:02", "throughput": 2974.7, "total_tokens": 1306176}
|
||||
{"current_steps": 910, "total_steps": 1405, "loss": 0.1387, "lr": 1.6709518036198308e-05, "epoch": 3.2384341637010676, "percentage": 64.77, "elapsed_time": "0:07:19", "remaining_time": "0:03:59", "throughput": 2989.65, "total_tokens": 1314112}
|
||||
{"current_steps": 915, "total_steps": 1405, "loss": 0.1199, "lr": 1.6417066425314087e-05, "epoch": 3.2562277580071175, "percentage": 65.12, "elapsed_time": "0:07:19", "remaining_time": "0:03:55", "throughput": 3002.55, "total_tokens": 1321088}
|
||||
{"current_steps": 920, "total_steps": 1405, "loss": 0.1422, "lr": 1.612594030233252e-05, "epoch": 3.2740213523131674, "percentage": 65.48, "elapsed_time": "0:07:20", "remaining_time": "0:03:52", "throughput": 3016.33, "total_tokens": 1328512}
|
||||
{"current_steps": 923, "total_steps": 1405, "eval_loss": 0.16463510692119598, "epoch": 3.284697508896797, "percentage": 65.69, "elapsed_time": "0:07:21", "remaining_time": "0:03:50", "throughput": 3019.68, "total_tokens": 1332544}
|
||||
{"current_steps": 925, "total_steps": 1405, "loss": 0.0863, "lr": 1.583618462672472e-05, "epoch": 3.2918149466192173, "percentage": 65.84, "elapsed_time": "0:07:41", "remaining_time": "0:03:59", "throughput": 2894.57, "total_tokens": 1336128}
|
||||
{"current_steps": 930, "total_steps": 1405, "loss": 0.1155, "lr": 1.5547844146319545e-05, "epoch": 3.309608540925267, "percentage": 66.19, "elapsed_time": "0:07:42", "remaining_time": "0:03:55", "throughput": 2907.84, "total_tokens": 1343552}
|
||||
{"current_steps": 935, "total_steps": 1405, "loss": 0.1691, "lr": 1.5260963390393075e-05, "epoch": 3.3274021352313166, "percentage": 66.55, "elapsed_time": "0:07:42", "remaining_time": "0:03:52", "throughput": 2922.22, "total_tokens": 1351552}
|
||||
{"current_steps": 940, "total_steps": 1405, "loss": 0.0983, "lr": 1.4975586662791783e-05, "epoch": 3.3451957295373664, "percentage": 66.9, "elapsed_time": "0:07:42", "remaining_time": "0:03:49", "throughput": 2934.01, "total_tokens": 1358272}
|
||||
{"current_steps": 945, "total_steps": 1405, "loss": 0.137, "lr": 1.4691758035090602e-05, "epoch": 3.3629893238434163, "percentage": 67.26, "elapsed_time": "0:07:43", "remaining_time": "0:03:45", "throughput": 2949.36, "total_tokens": 1366784}
|
||||
{"current_steps": 950, "total_steps": 1405, "loss": 0.1389, "lr": 1.4409521339786808e-05, "epoch": 3.380782918149466, "percentage": 67.62, "elapsed_time": "0:07:43", "remaining_time": "0:03:42", "throughput": 2960.74, "total_tokens": 1373312}
|
||||
{"current_steps": 955, "total_steps": 1405, "loss": 0.0916, "lr": 1.41289201635308e-05, "epoch": 3.398576512455516, "percentage": 67.97, "elapsed_time": "0:07:44", "remaining_time": "0:03:38", "throughput": 2973.88, "total_tokens": 1380736}
|
||||
{"current_steps": 960, "total_steps": 1405, "loss": 0.096, "lr": 1.3849997840394943e-05, "epoch": 3.416370106761566, "percentage": 68.33, "elapsed_time": "0:07:44", "remaining_time": "0:03:35", "throughput": 2987.73, "total_tokens": 1388544}
|
||||
{"current_steps": 965, "total_steps": 1405, "loss": 0.1252, "lr": 1.3572797445181345e-05, "epoch": 3.434163701067616, "percentage": 68.68, "elapsed_time": "0:07:45", "remaining_time": "0:03:32", "throughput": 3001.21, "total_tokens": 1396160}
|
||||
{"current_steps": 970, "total_steps": 1405, "loss": 0.0988, "lr": 1.3297361786769652e-05, "epoch": 3.4519572953736652, "percentage": 69.04, "elapsed_time": "0:07:45", "remaining_time": "0:03:28", "throughput": 3015.3, "total_tokens": 1404096}
|
||||
{"current_steps": 975, "total_steps": 1405, "loss": 0.1135, "lr": 1.3023733401505981e-05, "epoch": 3.469750889679715, "percentage": 69.4, "elapsed_time": "0:07:46", "remaining_time": "0:03:25", "throughput": 3027.31, "total_tokens": 1411008}
|
||||
{"current_steps": 980, "total_steps": 1405, "loss": 0.155, "lr": 1.2751954546633871e-05, "epoch": 3.487544483985765, "percentage": 69.75, "elapsed_time": "0:07:46", "remaining_time": "0:03:22", "throughput": 3041.19, "total_tokens": 1418880}
|
||||
{"current_steps": 985, "total_steps": 1405, "loss": 0.1302, "lr": 1.2482067193768417e-05, "epoch": 3.505338078291815, "percentage": 70.11, "elapsed_time": "0:07:46", "remaining_time": "0:03:19", "throughput": 3053.67, "total_tokens": 1426048}
|
||||
{"current_steps": 990, "total_steps": 1405, "loss": 0.0911, "lr": 1.2214113022414448e-05, "epoch": 3.5231316725978647, "percentage": 70.46, "elapsed_time": "0:07:47", "remaining_time": "0:03:15", "throughput": 3063.86, "total_tokens": 1432064}
|
||||
{"current_steps": 994, "total_steps": 1405, "eval_loss": 0.1803617924451828, "epoch": 3.5373665480427046, "percentage": 70.75, "elapsed_time": "0:07:48", "remaining_time": "0:03:13", "throughput": 3070.92, "total_tokens": 1438336}
|
||||
{"current_steps": 995, "total_steps": 1405, "loss": 0.1165, "lr": 1.1948133413529817e-05, "epoch": 3.5409252669039146, "percentage": 70.82, "elapsed_time": "0:08:11", "remaining_time": "0:03:22", "throughput": 2931.84, "total_tokens": 1439808}
|
||||
{"current_steps": 1000, "total_steps": 1405, "loss": 0.156, "lr": 1.168416944313486e-05, "epoch": 3.5587188612099645, "percentage": 71.17, "elapsed_time": "0:08:11", "remaining_time": "0:03:19", "throughput": 2944.97, "total_tokens": 1447616}
|
||||
{"current_steps": 1005, "total_steps": 1405, "loss": 0.0978, "lr": 1.1422261875968845e-05, "epoch": 3.5765124555160144, "percentage": 71.53, "elapsed_time": "0:08:11", "remaining_time": "0:03:15", "throughput": 2955.75, "total_tokens": 1454208}
|
||||
{"current_steps": 1010, "total_steps": 1405, "loss": 0.0784, "lr": 1.1162451159194614e-05, "epoch": 3.5943060498220643, "percentage": 71.89, "elapsed_time": "0:08:12", "remaining_time": "0:03:12", "throughput": 2971.24, "total_tokens": 1463296}
|
||||
{"current_steps": 1015, "total_steps": 1405, "loss": 0.1698, "lr": 1.0904777416152166e-05, "epoch": 3.612099644128114, "percentage": 72.24, "elapsed_time": "0:08:12", "remaining_time": "0:03:09", "throughput": 2982.14, "total_tokens": 1469952}
|
||||
{"current_steps": 1020, "total_steps": 1405, "loss": 0.1033, "lr": 1.0649280440162326e-05, "epoch": 3.6298932384341636, "percentage": 72.6, "elapsed_time": "0:08:13", "remaining_time": "0:03:06", "throughput": 2994.1, "total_tokens": 1477184}
|
||||
{"current_steps": 1025, "total_steps": 1405, "loss": 0.1025, "lr": 1.0395999688381314e-05, "epoch": 3.6476868327402134, "percentage": 72.95, "elapsed_time": "0:08:13", "remaining_time": "0:03:03", "throughput": 3005.56, "total_tokens": 1484160}
|
||||
{"current_steps": 1030, "total_steps": 1405, "loss": 0.0885, "lr": 1.0144974275707241e-05, "epoch": 3.6654804270462633, "percentage": 73.31, "elapsed_time": "0:08:14", "remaining_time": "0:02:59", "throughput": 3017.14, "total_tokens": 1491200}
|
||||
{"current_steps": 1035, "total_steps": 1405, "loss": 0.1678, "lr": 9.896242968739539e-06, "epoch": 3.683274021352313, "percentage": 73.67, "elapsed_time": "0:08:14", "remaining_time": "0:02:56", "throughput": 3028.92, "total_tokens": 1498368}
|
||||
{"current_steps": 1040, "total_steps": 1405, "loss": 0.1068, "lr": 9.649844179792081e-06, "epoch": 3.701067615658363, "percentage": 74.02, "elapsed_time": "0:08:15", "remaining_time": "0:02:53", "throughput": 3041.56, "total_tokens": 1505984}
|
||||
{"current_steps": 1045, "total_steps": 1405, "loss": 0.0978, "lr": 9.405815960961054e-06, "epoch": 3.718861209964413, "percentage": 74.38, "elapsed_time": "0:08:15", "remaining_time": "0:02:50", "throughput": 3050.57, "total_tokens": 1511680}
|
||||
{"current_steps": 1050, "total_steps": 1405, "loss": 0.0966, "lr": 9.16419599824847e-06, "epoch": 3.7366548042704624, "percentage": 74.73, "elapsed_time": "0:08:15", "remaining_time": "0:02:47", "throughput": 3060.55, "total_tokens": 1517888}
|
||||
{"current_steps": 1055, "total_steps": 1405, "loss": 0.1815, "lr": 8.925021605742211e-06, "epoch": 3.7544483985765122, "percentage": 75.09, "elapsed_time": "0:08:16", "remaining_time": "0:02:44", "throughput": 3073.22, "total_tokens": 1525568}
|
||||
{"current_steps": 1060, "total_steps": 1405, "loss": 0.1028, "lr": 8.68832971985347e-06, "epoch": 3.772241992882562, "percentage": 75.44, "elapsed_time": "0:08:16", "remaining_time": "0:02:41", "throughput": 3084.46, "total_tokens": 1532480}
|
||||
{"current_steps": 1065, "total_steps": 1405, "loss": 0.1203, "lr": 8.454156893612591e-06, "epoch": 3.790035587188612, "percentage": 75.8, "elapsed_time": "0:08:17", "remaining_time": "0:02:38", "throughput": 3095.02, "total_tokens": 1539072}
|
||||
{"current_steps": 1065, "total_steps": 1405, "eval_loss": 0.17713916301727295, "epoch": 3.790035587188612, "percentage": 75.8, "elapsed_time": "0:08:17", "remaining_time": "0:02:38", "throughput": 3091.12, "total_tokens": 1539072}
|
||||
{"current_steps": 1070, "total_steps": 1405, "loss": 0.1178, "lr": 8.222539291024078e-06, "epoch": 3.807829181494662, "percentage": 76.16, "elapsed_time": "0:09:00", "remaining_time": "0:02:49", "throughput": 2862.34, "total_tokens": 1547584}
|
||||
{"current_steps": 1075, "total_steps": 1405, "loss": 0.0999, "lr": 7.993512681481639e-06, "epoch": 3.8256227758007118, "percentage": 76.51, "elapsed_time": "0:09:01", "remaining_time": "0:02:46", "throughput": 2872.5, "total_tokens": 1554304}
|
||||
{"current_steps": 1080, "total_steps": 1405, "loss": 0.145, "lr": 7.767112434244253e-06, "epoch": 3.8434163701067616, "percentage": 76.87, "elapsed_time": "0:09:01", "remaining_time": "0:02:42", "throughput": 2882.43, "total_tokens": 1560896}
|
||||
{"current_steps": 1085, "total_steps": 1405, "loss": 0.0627, "lr": 7.543373512973947e-06, "epoch": 3.8612099644128115, "percentage": 77.22, "elapsed_time": "0:09:01", "remaining_time": "0:02:39", "throughput": 2892.76, "total_tokens": 1567744}
|
||||
{"current_steps": 1090, "total_steps": 1405, "loss": 0.1558, "lr": 7.3223304703363135e-06, "epoch": 3.8790035587188614, "percentage": 77.58, "elapsed_time": "0:09:02", "remaining_time": "0:02:36", "throughput": 2902.76, "total_tokens": 1574400}
|
||||
{"current_steps": 1095, "total_steps": 1405, "loss": 0.0965, "lr": 7.104017442664393e-06, "epoch": 3.8967971530249113, "percentage": 77.94, "elapsed_time": "0:09:02", "remaining_time": "0:02:33", "throughput": 2913.52, "total_tokens": 1581504}
|
||||
{"current_steps": 1100, "total_steps": 1405, "loss": 0.0914, "lr": 6.8884681446869105e-06, "epoch": 3.914590747330961, "percentage": 78.29, "elapsed_time": "0:09:03", "remaining_time": "0:02:30", "throughput": 2925.79, "total_tokens": 1589504}
|
||||
{"current_steps": 1105, "total_steps": 1405, "loss": 0.124, "lr": 6.67571586432163e-06, "epoch": 3.9323843416370106, "percentage": 78.65, "elapsed_time": "0:09:03", "remaining_time": "0:02:27", "throughput": 2938.36, "total_tokens": 1597696}
|
||||
{"current_steps": 1110, "total_steps": 1405, "loss": 0.1388, "lr": 6.465793457534553e-06, "epoch": 3.9501779359430604, "percentage": 79.0, "elapsed_time": "0:09:04", "remaining_time": "0:02:24", "throughput": 2949.82, "total_tokens": 1605248}
|
||||
{"current_steps": 1115, "total_steps": 1405, "loss": 0.1646, "lr": 6.258733343265932e-06, "epoch": 3.9679715302491103, "percentage": 79.36, "elapsed_time": "0:09:04", "remaining_time": "0:02:21", "throughput": 2963.21, "total_tokens": 1613952}
|
||||
{"current_steps": 1120, "total_steps": 1405, "loss": 0.1024, "lr": 6.0545674984236826e-06, "epoch": 3.98576512455516, "percentage": 79.72, "elapsed_time": "0:09:05", "remaining_time": "0:02:18", "throughput": 2972.46, "total_tokens": 1620224}
|
||||
{"current_steps": 1125, "total_steps": 1405, "loss": 0.0889, "lr": 5.853327452945115e-06, "epoch": 4.00355871886121, "percentage": 80.07, "elapsed_time": "0:09:05", "remaining_time": "0:02:15", "throughput": 2979.71, "total_tokens": 1625800}
|
||||
{"current_steps": 1130, "total_steps": 1405, "loss": 0.0747, "lr": 5.655044284927657e-06, "epoch": 4.0213523131672595, "percentage": 80.43, "elapsed_time": "0:09:06", "remaining_time": "0:02:12", "throughput": 2991.1, "total_tokens": 1633352}
|
||||
{"current_steps": 1135, "total_steps": 1405, "loss": 0.0551, "lr": 5.459748615829355e-06, "epoch": 4.039145907473309, "percentage": 80.78, "elapsed_time": "0:09:06", "remaining_time": "0:02:10", "throughput": 3002.35, "total_tokens": 1640840}
|
||||
{"current_steps": 1136, "total_steps": 1405, "eval_loss": 0.19830213487148285, "epoch": 4.04270462633452, "percentage": 80.85, "elapsed_time": "0:09:07", "remaining_time": "0:02:09", "throughput": 3001.97, "total_tokens": 1642696}
|
||||
{"current_steps": 1140, "total_steps": 1405, "loss": 0.0395, "lr": 5.267470605739952e-06, "epoch": 4.056939501779359, "percentage": 81.14, "elapsed_time": "0:09:29", "remaining_time": "0:02:12", "throughput": 2896.93, "total_tokens": 1648520}
|
||||
{"current_steps": 1145, "total_steps": 1405, "loss": 0.0215, "lr": 5.078239948723154e-06, "epoch": 4.074733096085409, "percentage": 81.49, "elapsed_time": "0:09:29", "remaining_time": "0:02:09", "throughput": 2907.39, "total_tokens": 1655752}
|
||||
{"current_steps": 1150, "total_steps": 1405, "loss": 0.0073, "lr": 4.892085868230881e-06, "epoch": 4.092526690391459, "percentage": 81.85, "elapsed_time": "0:09:29", "remaining_time": "0:02:06", "throughput": 2917.72, "total_tokens": 1662920}
|
||||
{"current_steps": 1155, "total_steps": 1405, "loss": 0.0348, "lr": 4.709037112590217e-06, "epoch": 4.110320284697509, "percentage": 82.21, "elapsed_time": "0:09:30", "remaining_time": "0:02:03", "throughput": 2927.74, "total_tokens": 1669896}
|
||||
{"current_steps": 1160, "total_steps": 1405, "loss": 0.076, "lr": 4.529121950563716e-06, "epoch": 4.128113879003559, "percentage": 82.56, "elapsed_time": "0:09:30", "remaining_time": "0:02:00", "throughput": 2935.35, "total_tokens": 1675400}
|
||||
{"current_steps": 1165, "total_steps": 1405, "loss": 0.0705, "lr": 4.352368166983753e-06, "epoch": 4.145907473309609, "percentage": 82.92, "elapsed_time": "0:09:31", "remaining_time": "0:01:57", "throughput": 2946.25, "total_tokens": 1682952}
|
||||
{"current_steps": 1170, "total_steps": 1405, "loss": 0.088, "lr": 4.178803058461664e-06, "epoch": 4.1637010676156585, "percentage": 83.27, "elapsed_time": "0:09:31", "remaining_time": "0:01:54", "throughput": 2956.74, "total_tokens": 1690248}
|
||||
{"current_steps": 1175, "total_steps": 1405, "loss": 0.05, "lr": 4.0084534291722376e-06, "epoch": 4.181494661921708, "percentage": 83.63, "elapsed_time": "0:09:32", "remaining_time": "0:01:51", "throughput": 2966.09, "total_tokens": 1696840}
|
||||
{"current_steps": 1180, "total_steps": 1405, "loss": 0.0689, "lr": 3.841345586714251e-06, "epoch": 4.199288256227758, "percentage": 83.99, "elapsed_time": "0:09:32", "remaining_time": "0:01:49", "throughput": 2975.66, "total_tokens": 1703624}
|
||||
{"current_steps": 1185, "total_steps": 1405, "loss": 0.0218, "lr": 3.677505338047729e-06, "epoch": 4.217081850533808, "percentage": 84.34, "elapsed_time": "0:09:32", "remaining_time": "0:01:46", "throughput": 2984.64, "total_tokens": 1710024}
|
||||
{"current_steps": 1190, "total_steps": 1405, "loss": 0.068, "lr": 3.516957985508476e-06, "epoch": 4.234875444839858, "percentage": 84.7, "elapsed_time": "0:09:33", "remaining_time": "0:01:43", "throughput": 2995.78, "total_tokens": 1717768}
|
||||
{"current_steps": 1195, "total_steps": 1405, "loss": 0.021, "lr": 3.3597283229005877e-06, "epoch": 4.252669039145908, "percentage": 85.05, "elapsed_time": "0:09:33", "remaining_time": "0:01:40", "throughput": 3009.68, "total_tokens": 1727240}
|
||||
{"current_steps": 1200, "total_steps": 1405, "loss": 0.0422, "lr": 3.205840631667456e-06, "epoch": 4.270462633451958, "percentage": 85.41, "elapsed_time": "0:09:34", "remaining_time": "0:01:38", "throughput": 3019.85, "total_tokens": 1734408}
|
||||
{"current_steps": 1205, "total_steps": 1405, "loss": 0.0577, "lr": 3.0553186771419162e-06, "epoch": 4.288256227758007, "percentage": 85.77, "elapsed_time": "0:09:34", "remaining_time": "0:01:35", "throughput": 3029.0, "total_tokens": 1740936}
|
||||
{"current_steps": 1207, "total_steps": 1405, "eval_loss": 0.3402128219604492, "epoch": 4.295373665480427, "percentage": 85.91, "elapsed_time": "0:09:35", "remaining_time": "0:01:34", "throughput": 3029.68, "total_tokens": 1743624}
|
||||
{"current_steps": 1210, "total_steps": 1405, "loss": 0.0397, "lr": 2.908185704876101e-06, "epoch": 4.306049822064057, "percentage": 86.12, "elapsed_time": "0:09:56", "remaining_time": "0:01:36", "throughput": 2931.73, "total_tokens": 1747784}
|
||||
{"current_steps": 1215, "total_steps": 1405, "loss": 0.0636, "lr": 2.7644644370515365e-06, "epoch": 4.3238434163701065, "percentage": 86.48, "elapsed_time": "0:09:56", "remaining_time": "0:01:33", "throughput": 2941.47, "total_tokens": 1754888}
|
||||
{"current_steps": 1220, "total_steps": 1405, "loss": 0.0083, "lr": 2.624177068970124e-06, "epoch": 4.341637010676156, "percentage": 86.83, "elapsed_time": "0:09:57", "remaining_time": "0:01:30", "throughput": 2952.18, "total_tokens": 1762632}
|
||||
{"current_steps": 1225, "total_steps": 1405, "loss": 0.0331, "lr": 2.4873452656264313e-06, "epoch": 4.359430604982206, "percentage": 87.19, "elapsed_time": "0:09:57", "remaining_time": "0:01:27", "throughput": 2962.18, "total_tokens": 1769928}
|
||||
{"current_steps": 1230, "total_steps": 1405, "loss": 0.0824, "lr": 2.3539901583619185e-06, "epoch": 4.377224199288256, "percentage": 87.54, "elapsed_time": "0:09:57", "remaining_time": "0:01:25", "throughput": 2972.55, "total_tokens": 1777480}
|
||||
{"current_steps": 1235, "total_steps": 1405, "loss": 0.0384, "lr": 2.2241323416015453e-06, "epoch": 4.395017793594306, "percentage": 87.9, "elapsed_time": "0:09:58", "remaining_time": "0:01:22", "throughput": 2982.63, "total_tokens": 1784840}
|
||||
{"current_steps": 1240, "total_steps": 1405, "loss": 0.0435, "lr": 2.09779186967331e-06, "epoch": 4.412811387900356, "percentage": 88.26, "elapsed_time": "0:09:58", "remaining_time": "0:01:19", "throughput": 2993.3, "total_tokens": 1792584}
|
||||
{"current_steps": 1245, "total_steps": 1405, "loss": 0.0525, "lr": 1.9749882537112296e-06, "epoch": 4.430604982206406, "percentage": 88.61, "elapsed_time": "0:09:59", "remaining_time": "0:01:17", "throughput": 3004.91, "total_tokens": 1800968}
|
||||
{"current_steps": 1250, "total_steps": 1405, "loss": 0.0777, "lr": 1.8557404586421413e-06, "epoch": 4.448398576512456, "percentage": 88.97, "elapsed_time": "0:09:59", "remaining_time": "0:01:14", "throughput": 3015.14, "total_tokens": 1808456}
|
||||
{"current_steps": 1255, "total_steps": 1405, "loss": 0.1469, "lr": 1.7400669002569232e-06, "epoch": 4.4661921708185055, "percentage": 89.32, "elapsed_time": "0:10:00", "remaining_time": "0:01:11", "throughput": 3025.63, "total_tokens": 1816136}
|
||||
{"current_steps": 1260, "total_steps": 1405, "loss": 0.0696, "lr": 1.6279854423664697e-06, "epoch": 4.483985765124555, "percentage": 89.68, "elapsed_time": "0:10:00", "remaining_time": "0:01:09", "throughput": 3036.57, "total_tokens": 1824136}
|
||||
{"current_steps": 1265, "total_steps": 1405, "loss": 0.0084, "lr": 1.5195133940429345e-06, "epoch": 4.501779359430605, "percentage": 90.04, "elapsed_time": "0:10:01", "remaining_time": "0:01:06", "throughput": 3046.26, "total_tokens": 1831304}
|
||||
{"current_steps": 1270, "total_steps": 1405, "loss": 0.0259, "lr": 1.4146675069466403e-06, "epoch": 4.519572953736655, "percentage": 90.39, "elapsed_time": "0:10:01", "remaining_time": "0:01:03", "throughput": 3054.47, "total_tokens": 1837512}
|
||||
{"current_steps": 1275, "total_steps": 1405, "loss": 0.0319, "lr": 1.313463972739068e-06, "epoch": 4.537366548042705, "percentage": 90.75, "elapsed_time": "0:10:02", "remaining_time": "0:01:01", "throughput": 3063.57, "total_tokens": 1844296}
|
||||
{"current_steps": 1278, "total_steps": 1405, "eval_loss": 0.3532261848449707, "epoch": 4.548042704626335, "percentage": 90.96, "elapsed_time": "0:10:02", "remaining_time": "0:00:59", "throughput": 3067.44, "total_tokens": 1849416}
|
||||
{"current_steps": 1280, "total_steps": 1405, "loss": 0.0338, "lr": 1.2159184205823432e-06, "epoch": 4.555160142348754, "percentage": 91.1, "elapsed_time": "0:10:42", "remaining_time": "0:01:02", "throughput": 2880.92, "total_tokens": 1851720}
|
||||
{"current_steps": 1285, "total_steps": 1405, "loss": 0.0457, "lr": 1.122045914725564e-06, "epoch": 4.572953736654805, "percentage": 91.46, "elapsed_time": "0:10:43", "remaining_time": "0:01:00", "throughput": 2888.98, "total_tokens": 1858120}
|
||||
{"current_steps": 1290, "total_steps": 1405, "loss": 0.0645, "lr": 1.0318609521783818e-06, "epoch": 4.590747330960854, "percentage": 91.81, "elapsed_time": "0:10:43", "remaining_time": "0:00:57", "throughput": 2899.08, "total_tokens": 1865928}
|
||||
{"current_steps": 1295, "total_steps": 1405, "loss": 0.0261, "lr": 9.453774604721938e-07, "epoch": 4.608540925266904, "percentage": 92.17, "elapsed_time": "0:10:44", "remaining_time": "0:00:54", "throughput": 2909.25, "total_tokens": 1873800}
|
||||
{"current_steps": 1300, "total_steps": 1405, "loss": 0.054, "lr": 8.62608795509276e-07, "epoch": 4.6263345195729535, "percentage": 92.53, "elapsed_time": "0:10:44", "remaining_time": "0:00:52", "throughput": 2919.58, "total_tokens": 1881800}
|
||||
{"current_steps": 1305, "total_steps": 1405, "loss": 0.0036, "lr": 7.835677395001795e-07, "epoch": 4.644128113879003, "percentage": 92.88, "elapsed_time": "0:10:44", "remaining_time": "0:00:49", "throughput": 2928.25, "total_tokens": 1888648}
|
||||
{"current_steps": 1310, "total_steps": 1405, "loss": 0.1115, "lr": 7.082664989897487e-07, "epoch": 4.661921708185053, "percentage": 93.24, "elapsed_time": "0:10:45", "remaining_time": "0:00:46", "throughput": 2936.79, "total_tokens": 1895432}
|
||||
{"current_steps": 1315, "total_steps": 1405, "loss": 0.0608, "lr": 6.367167029720234e-07, "epoch": 4.679715302491103, "percentage": 93.59, "elapsed_time": "0:10:45", "remaining_time": "0:00:44", "throughput": 2945.61, "total_tokens": 1902408}
|
||||
{"current_steps": 1320, "total_steps": 1405, "loss": 0.0289, "lr": 5.68929401094323e-07, "epoch": 4.697508896797153, "percentage": 93.95, "elapsed_time": "0:10:46", "remaining_time": "0:00:41", "throughput": 2955.8, "total_tokens": 1910344}
|
||||
{"current_steps": 1325, "total_steps": 1405, "loss": 0.0309, "lr": 5.049150619508502e-07, "epoch": 4.715302491103203, "percentage": 94.31, "elapsed_time": "0:10:46", "remaining_time": "0:00:39", "throughput": 2966.24, "total_tokens": 1918472}
|
||||
{"current_steps": 1330, "total_steps": 1405, "loss": 0.0078, "lr": 4.4468357146596475e-07, "epoch": 4.733096085409253, "percentage": 94.66, "elapsed_time": "0:10:47", "remaining_time": "0:00:36", "throughput": 2974.0, "total_tokens": 1924744}
|
||||
{"current_steps": 1335, "total_steps": 1405, "loss": 0.0676, "lr": 3.8824423136748777e-07, "epoch": 4.750889679715303, "percentage": 95.02, "elapsed_time": "0:10:47", "remaining_time": "0:00:33", "throughput": 2984.42, "total_tokens": 1932872}
|
||||
{"current_steps": 1340, "total_steps": 1405, "loss": 0.0673, "lr": 3.3560575775019864e-07, "epoch": 4.7686832740213525, "percentage": 95.37, "elapsed_time": "0:10:48", "remaining_time": "0:00:31", "throughput": 2993.47, "total_tokens": 1940040}
|
||||
{"current_steps": 1345, "total_steps": 1405, "loss": 0.0846, "lr": 2.8677627972978906e-07, "epoch": 4.786476868327402, "percentage": 95.73, "elapsed_time": "0:10:48", "remaining_time": "0:00:28", "throughput": 3004.92, "total_tokens": 1948936}
|
||||
{"current_steps": 1349, "total_steps": 1405, "eval_loss": 0.34229812026023865, "epoch": 4.800711743772242, "percentage": 96.01, "elapsed_time": "0:10:49", "remaining_time": "0:00:26", "throughput": 3009.22, "total_tokens": 1954568}
|
||||
{"current_steps": 1350, "total_steps": 1405, "loss": 0.001, "lr": 2.417633381874534e-07, "epoch": 4.804270462633452, "percentage": 96.09, "elapsed_time": "0:11:10", "remaining_time": "0:00:27", "throughput": 2916.75, "total_tokens": 1955912}
|
||||
{"current_steps": 1355, "total_steps": 1405, "loss": 0.0243, "lr": 2.0057388460533732e-07, "epoch": 4.822064056939502, "percentage": 96.44, "elapsed_time": "0:11:11", "remaining_time": "0:00:24", "throughput": 2925.09, "total_tokens": 1962760}
|
||||
{"current_steps": 1360, "total_steps": 1405, "loss": 0.0594, "lr": 1.6321427999298755e-07, "epoch": 4.839857651245552, "percentage": 96.8, "elapsed_time": "0:11:11", "remaining_time": "0:00:22", "throughput": 2932.8, "total_tokens": 1969160}
|
||||
{"current_steps": 1365, "total_steps": 1405, "loss": 0.0329, "lr": 1.2969029390501597e-07, "epoch": 4.857651245551601, "percentage": 97.15, "elapsed_time": "0:11:11", "remaining_time": "0:00:19", "throughput": 2940.76, "total_tokens": 1975752}
|
||||
{"current_steps": 1370, "total_steps": 1405, "loss": 0.0349, "lr": 1.0000710355008159e-07, "epoch": 4.875444839857651, "percentage": 97.51, "elapsed_time": "0:11:12", "remaining_time": "0:00:17", "throughput": 2949.94, "total_tokens": 1983240}
|
||||
{"current_steps": 1375, "total_steps": 1405, "loss": 0.004, "lr": 7.416929299135511e-08, "epoch": 4.893238434163701, "percentage": 97.86, "elapsed_time": "0:11:12", "remaining_time": "0:00:14", "throughput": 2959.19, "total_tokens": 1990792}
|
||||
{"current_steps": 1380, "total_steps": 1405, "loss": 0.028, "lr": 5.218085243859638e-08, "epoch": 4.911032028469751, "percentage": 98.22, "elapsed_time": "0:11:13", "remaining_time": "0:00:12", "throughput": 2968.97, "total_tokens": 1998728}
|
||||
{"current_steps": 1385, "total_steps": 1405, "loss": 0.046, "lr": 3.4045177631936155e-08, "epoch": 4.9288256227758005, "percentage": 98.58, "elapsed_time": "0:11:13", "remaining_time": "0:00:09", "throughput": 2979.09, "total_tokens": 2006920}
|
||||
{"current_steps": 1390, "total_steps": 1405, "loss": 0.0136, "lr": 1.976506931745392e-08, "epoch": 4.94661921708185, "percentage": 98.93, "elapsed_time": "0:11:14", "remaining_time": "0:00:07", "throughput": 2986.46, "total_tokens": 2013128}
|
||||
{"current_steps": 1395, "total_steps": 1405, "loss": 0.0718, "lr": 9.3427328146517e-09, "epoch": 4.9644128113879, "percentage": 99.29, "elapsed_time": "0:11:14", "remaining_time": "0:00:04", "throughput": 2997.08, "total_tokens": 2021704}
|
||||
{"current_steps": 1400, "total_steps": 1405, "loss": 0.1224, "lr": 2.779777675890327e-09, "epoch": 4.98220640569395, "percentage": 99.64, "elapsed_time": "0:11:14", "remaining_time": "0:00:02", "throughput": 3005.77, "total_tokens": 2028872}
|
||||
{"current_steps": 1405, "total_steps": 1405, "loss": 0.0499, "lr": 7.72174378022017e-11, "epoch": 5.0, "percentage": 100.0, "elapsed_time": "0:11:15", "remaining_time": "0:00:00", "throughput": 3013.01, "total_tokens": 2035272}
|
||||
{"current_steps": 1405, "total_steps": 1405, "epoch": 5.0, "percentage": 100.0, "elapsed_time": "0:11:38", "remaining_time": "0:00:00", "throughput": 2915.24, "total_tokens": 2035272}
|
||||
2463
trainer_state.json
Normal file
2463
trainer_state.json
Normal file
File diff suppressed because it is too large
Load Diff
3
training_args.bin
Normal file
3
training_args.bin
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:eccf16f4bf6cae28454d431aeb6753fe6e61852ac86054d5e48a347a445e0d46
|
||||
size 6289
|
||||
BIN
training_eval_loss.png
Normal file
BIN
training_eval_loss.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 37 KiB |
BIN
training_loss.png
Normal file
BIN
training_loss.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 39 KiB |
Reference in New Issue
Block a user