初始化项目,由ModelHub XC社区提供模型

Model: PKU-Alignment/ProgressGym-HistLlama3-8B-C013-instruct-v0.2
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-26 09:23:15 +08:00
commit 65d2ddfdc4
25 changed files with 413911 additions and 0 deletions

39
.gitattributes vendored Normal file
View File

@@ -0,0 +1,39 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
model-00001-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
model-00002-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
model-00003-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
model-00004-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text

210
README.md Normal file
View File

@@ -0,0 +1,210 @@
---
license: cc-by-4.0
tags:
- alignment
- value alignment
- AI safety
- safety
- LLM
- history
datasets:
- PKU-Alignment/ProgressGym-HistText
- PKU-Alignment/ProgressGym-TimelessQA
base_model:
- PKU-Alignment/ProgressGym-HistLlama3-8B-C013-pretrain
- meta-llama/Meta-Llama-3-8B
---
# ProgressGym-HistLlama3-8B-C013-instruct
## Overview
#### The ProgressGym Framework
![Framework Diagram](./readme-assets/main-diagram.png)
**ProgressGym-HistLlama3-8B-C013-instruct** is part of the **ProgressGym** framework for research and experimentation on *progress alignment* - the emulation of moral progress in AI alignment algorithms, as a measure to prevent risks of societal value lock-in.
To quote the paper [*ProgressGym: Alignment with a Millennium of Moral Progress*](https://arxiv.org/abs/2406.20087):
> Frontier AI systems, including large language models (LLMs), hold increasing influence over the epistemology of human users. Such influence can reinforce prevailing societal values, potentially contributing to the lock-in of misguided moral beliefs and, consequently, the perpetuation of problematic moral practices on a broad scale.
>
> We introduce *progress alignment* as a technical solution to mitigate this imminent risk. Progress alignment algorithms learn to emulate the mechanics of human moral progress, thereby addressing the susceptibility of existing alignment methods to contemporary moral blindspots.
#### ProgressGym-HistLlama3-8B-C013-instruct
ProgressGym-HistLlama3-8B-C013-instruct is one of the **36 historical language models** in the ProgressGym framework.
**ProgressGym-HistLlama3-8B-C013-instruct is under continual iteration.** Improving upon the current version, new versions of the model are currently being trained to reflect historical moral tendencies in ever more comprehensive ways.
**ProgressGym-HistLlama3-8B-C013-instruct is a 13th-century historical language model.** Based on [Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B), It is continued-pretrained on the 13th-century text data from [ProgressGym-HistText](https://huggingface.co/datasets/PKU-Alignment/ProgressGym-HistText), using the following hyperparameters:
- learning_rate: 1.5e-05
- train_batch_size: 8
- eval_batch_size: 16
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 64
- total_eval_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: polynomial
- lr_scheduler_warmup_steps: 20
- num_epochs: 4.0
- mixed_precision_training: Native AMP
... with the following training results:
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| 1.7594 | 0.0149 | 1 | 1.7163 |
| 1.7333 | 0.0746 | 5 | 1.7008 |
| 1.6854 | 0.1493 | 10 | 1.6825 |
| 1.6897 | 0.2239 | 15 | 1.6701 |
| 1.6656 | 0.2985 | 20 | 1.6651 |
| 1.7254 | 0.3731 | 25 | 1.6679 |
| 1.7178 | 0.4478 | 30 | 1.6542 |
| 1.6656 | 0.5224 | 35 | 1.6459 |
| 1.6647 | 0.5970 | 40 | 1.6308 |
| 1.6645 | 0.6716 | 45 | 1.6205 |
| 1.6151 | 0.7463 | 50 | 1.6129 |
| 1.6359 | 0.8209 | 55 | 1.6052 |
| 1.5885 | 0.8955 | 60 | 1.5995 |
| 1.6142 | 0.9701 | 65 | 1.5943 |
| 1.4875 | 1.0448 | 70 | 1.5963 |
| 1.3844 | 1.1194 | 75 | 1.6118 |
| 1.3555 | 1.1940 | 80 | 1.6069 |
| 1.3597 | 1.2687 | 85 | 1.6040 |
| 1.3737 | 1.3433 | 90 | 1.6071 |
| 1.3492 | 1.4179 | 95 | 1.6074 |
| 1.3826 | 1.4925 | 100 | 1.6055 |
| 1.3533 | 1.5672 | 105 | 1.6035 |
| 1.3611 | 1.6418 | 110 | 1.6023 |
| 1.328 | 1.7164 | 115 | 1.6022 |
| 1.3443 | 1.7910 | 120 | 1.6026 |
| 1.3386 | 1.8657 | 125 | 1.6029 |
| 1.3396 | 1.9403 | 130 | 1.6029 |
| 1.3573 | 2.0149 | 135 | 1.6029 |
| 1.3754 | 2.0896 | 140 | 1.6034 |
| 1.3229 | 2.1642 | 145 | 1.6044 |
| 1.3194 | 2.2388 | 150 | 1.6055 |
| 1.3361 | 2.3134 | 155 | 1.6065 |
| 1.3231 | 2.3881 | 160 | 1.6072 |
| 1.32 | 2.4627 | 165 | 1.6076 |
| 1.3406 | 2.5373 | 170 | 1.6078 |
| 1.3184 | 2.6119 | 175 | 1.6079 |
| 1.2745 | 2.6866 | 180 | 1.6080 |
| 1.3024 | 2.7612 | 185 | 1.6079 |
| 1.3243 | 2.8358 | 190 | 1.6079 |
| 1.3239 | 2.9104 | 195 | 1.6080 |
| 1.3349 | 2.9851 | 200 | 1.6081 |
| 1.337 | 3.0597 | 205 | 1.6079 |
| 1.3091 | 3.1343 | 210 | 1.6078 |
| 1.3266 | 3.2090 | 215 | 1.6079 |
| 1.3014 | 3.2836 | 220 | 1.6083 |
| 1.3153 | 3.3582 | 225 | 1.6086 |
| 1.3192 | 3.4328 | 230 | 1.6090 |
| 1.315 | 3.5075 | 235 | 1.6093 |
| 1.3047 | 3.5821 | 240 | 1.6093 |
| 1.3208 | 3.6567 | 245 | 1.6093 |
| 1.362 | 3.7313 | 250 | 1.6093 |
| 1.3255 | 3.8060 | 255 | 1.6091 |
| 1.2941 | 3.8806 | 260 | 1.6089 |
| 1.3254 | 3.9552 | 265 | 1.6086 |
Note that the training data volume for the continued pretraining stage is capped at 3GB. When the corresponding century's corpus exceeds this volume, the training data is randomly sampled to fit the volume.
**ProgressGym-HistLlama3-8B-C013-instruct is an instruction-tuned language model.** It is tuned on [ProgressGym-TimelessQA](https://huggingface.co/datasets/PKU-Alignment/ProgressGym-TimelessQA), using the following hyperparameters. Note, however, that the snapshot at training step 10 is used for the final model, to minimize erosion of the value tendencies learned during continued pretraining; we qualitatively observe that this snapshot still possesses strong instruction-following capabilities.
- learning_rate: 1.5e-05
- train_batch_size: 8
- eval_batch_size: 16
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 64
- total_eval_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: polynomial
- lr_scheduler_warmup_steps: 20
- num_epochs: 4.0
- mixed_precision_training: Native AMP
... with the following training results:
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| 0.9805 | 0.0208 | 1 | 0.9737 |
| 0.9446 | 0.1042 | 5 | 0.9455 |
| 0.8481 | 0.2083 | 10 | 0.8154 |
| 0.7794 | 0.3125 | 15 | 0.8123 |
| 0.7798 | 0.4167 | 20 | 0.8411 |
| 0.8576 | 0.5208 | 25 | 0.8676 |
| 0.8852 | 0.625 | 30 | 0.8673 |
| 0.8529 | 0.7292 | 35 | 0.8561 |
| 0.8224 | 0.8333 | 40 | 0.8470 |
| 0.8536 | 0.9375 | 45 | 0.8378 |
| 0.662 | 1.0417 | 50 | 0.8294 |
| 0.437 | 1.1458 | 55 | 0.8531 |
| 0.4402 | 1.25 | 60 | 0.8569 |
| 0.4244 | 1.3542 | 65 | 0.8569 |
| 0.4495 | 1.4583 | 70 | 0.8547 |
| 0.4689 | 1.5625 | 75 | 0.8494 |
| 0.4309 | 1.6667 | 80 | 0.8461 |
| 0.4299 | 1.7708 | 85 | 0.8446 |
| 0.4461 | 1.875 | 90 | 0.8440 |
| 0.4474 | 1.9792 | 95 | 0.8439 |
| 0.3614 | 2.0833 | 100 | 0.8445 |
| 0.3861 | 2.1875 | 105 | 0.8457 |
| 0.3829 | 2.2917 | 110 | 0.8473 |
| 0.3764 | 2.3958 | 115 | 0.8488 |
| 0.3655 | 2.5 | 120 | 0.8500 |
| 0.4243 | 2.6042 | 125 | 0.8511 |
| 0.3884 | 2.7083 | 130 | 0.8520 |
| 0.3634 | 2.8125 | 135 | 0.8528 |
| 0.3846 | 2.9167 | 140 | 0.8537 |
| 0.3872 | 3.0208 | 145 | 0.8547 |
| 0.3869 | 3.125 | 150 | 0.8558 |
| 0.3876 | 3.2292 | 155 | 0.8566 |
| 0.3844 | 3.3333 | 160 | 0.8573 |
| 0.3535 | 3.4375 | 165 | 0.8579 |
| 0.3488 | 3.5417 | 170 | 0.8588 |
| 0.3464 | 3.6458 | 175 | 0.8598 |
| 0.361 | 3.75 | 180 | 0.8607 |
| 0.3674 | 3.8542 | 185 | 0.8612 |
| 0.3988 | 3.9583 | 190 | 0.8612 |
## Links
- **[Paper Preprint]** [ProgressGym: Alignment with a Millennium of Moral Progress](https://arxiv.org/abs/2406.20087)
- **[Leaderboard & Interactive Playground]** [PKU-Alignment/ProgressGym-LeaderBoard](https://huggingface.co/spaces/PKU-Alignment/ProgressGym-LeaderBoard)
- **[Huggingface Data & Model Collection]** [PKU-Alignment/ProgressGym](https://huggingface.co/collections/PKU-Alignment/progressgym-666735fcf3e4efa276226eaa)
- **[Github Codebase]** [PKU-Alignment/ProgressGym](https://github.com/PKU-Alignment/ProgressGym)
- **[Documentation]** [ProgressGym Documentation](https://pku-alignment.github.io/ProgressGym/)
- **[PyPI Package]** *(coming soon - [stay tuned](https://forms.gle/1TWFLL4ZCLeYTD5N6)!)*
## Citation
If the datasets, models, or framework of ProgressGym help you in your project, please cite ProgressGym using the bibtex entry below.
```text
@article{progressgym,
title={ProgressGym: Alignment with a Millennium of Moral Progress},
author={Tianyi Qiu and Yang Zhang and Xuchuan Huang and Jasmine Xinze Li and Jiaming Ji and Yaodong Yang},
journal={arXiv preprint arXiv:2406.20087},
eprint={2406.20087},
eprinttype = {arXiv},
year={2024}
}
```
## Ethics Statement
- **Copyright information of historical text data sources**:
- Project Gutenberg, one among our four source of our historical text data, consists only of texts in the public domain.
- For the text that we draw from Internet Archive, we only include those that uploaded by *Library of Congress*, which are texts freely released online by the U.S. Library of Congress for research and public use.
- The text data from Early English Books Online are, according to their publisher, "freely available to the public" and "available for access, distribution, use, or reuse by anyone".
- The last remaining source of our historical text data, the Pile of Law dataset, is released under a Creative Commons license, which we adhere to in our use.
- **Reproducibility**: To ensure reproducibility, we open-source all the code involved in the production of our main results (including the entire pipeline starting from data collection and model training), as well as the supporting infrastructure (the ProgressGym framework), making replication as easy as running a few simple script files.
- **Misuse Prevention**: In order to prevent potential misuse of progress alignment algorithms, we have carefully formulated progress alignment as strictly value-neutral, without *a priori* assumptions on the direction of progress. In the event of potential misuse of our dataset, we condemn any misuse attempt to the strongest degree possible, and will work with the research community on whistleblowing for such attempts.
- **Open-Sourcing**: We confirm that our code, data, and models are to be open-sourced under a CC-BY 4.0 license. We will continue to maintain and update our open-source repositories and models.

12
all_results.json Normal file
View File

@@ -0,0 +1,12 @@
{
"epoch": 4.0,
"eval_loss": 0.8123041987419128,
"eval_runtime": 2.1028,
"eval_samples_per_second": 161.691,
"eval_steps_per_second": 1.427,
"total_flos": 5363820134400.0,
"train_loss": 0.5090278356025616,
"train_runtime": 6014.9673,
"train_samples_per_second": 2.031,
"train_steps_per_second": 0.032
}

28
config.json Normal file
View File

@@ -0,0 +1,28 @@
{
"_name_or_path": "./output/training_results/C013_llama3-8b-base_pretrain_20240428_005832/",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128001,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.40.0",
"use_cache": false,
"vocab_size": 128256
}

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}

7
eval_results.json Normal file
View File

@@ -0,0 +1,7 @@
{
"epoch": 4.0,
"eval_loss": 0.8123041987419128,
"eval_runtime": 2.1028,
"eval_samples_per_second": 161.691,
"eval_steps_per_second": 1.427
}

6
generation_config.json Normal file
View File

@@ -0,0 +1,6 @@
{
"_from_model_config": true,
"bos_token_id": 128000,
"eos_token_id": 128001,
"transformers_version": "4.40.0"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:04e95a34b63c8a3b1e9d555837cb3e8c5eb62eef55da32ef1ea4eeb663e353cc
size 4976698592

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e1d1ce3df0cb483d4209d598838fb2b2611ac9b9441b6ee991f0d074bc608571
size 4999802616

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:63918ea2dfb5e85170a7df2101dc8d9bd5aeec78b9646b074703cfe2e75ffab6
size 4915916080

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d0c8544b338326968f6e3864335cf01f7fd7e6b6b725f20c34b4fc695522507d
size 1168138808

View File

@@ -0,0 +1,298 @@
{
"metadata": {
"total_size": 16060522496
},
"weight_map": {
"lm_head.weight": "model-00004-of-00004.safetensors",
"model.embed_tokens.weight": "model-00001-of-00004.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.20.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.21.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.30.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.input_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.norm.weight": "model-00004-of-00004.safetensors"
}
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 85 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 210 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 178 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 218 KiB

23
special_tokens_map.json Normal file
View File

@@ -0,0 +1,23 @@
{
"bos_token": {
"content": "<|begin_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<|end_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<|end_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

410504
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

2065
tokenizer_config.json Normal file

File diff suppressed because it is too large Load Diff

8
train_results.json Normal file
View File

@@ -0,0 +1,8 @@
{
"epoch": 4.0,
"total_flos": 5363820134400.0,
"train_loss": 0.5090278356025616,
"train_runtime": 6014.9673,
"train_samples_per_second": 2.031,
"train_steps_per_second": 0.032
}

80
trainer_log.jsonl Normal file
View File

@@ -0,0 +1,80 @@
{"current_steps": 1, "total_steps": 192, "loss": 0.9805, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 0.0, "epoch": 0.020833333333333332, "percentage": 0.52, "elapsed_time": "0:00:21", "remaining_time": "1:08:51"}
{"current_steps": 1, "total_steps": 192, "loss": null, "eval_loss": 0.9736970067024231, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.020833333333333332, "percentage": 0.52, "elapsed_time": "0:00:21", "remaining_time": "1:08:51"}
{"current_steps": 5, "total_steps": 192, "loss": 0.9446, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.5e-06, "epoch": 0.10416666666666667, "percentage": 2.6, "elapsed_time": "0:01:25", "remaining_time": "0:53:13"}
{"current_steps": 5, "total_steps": 192, "loss": null, "eval_loss": 0.9454841613769531, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.10416666666666667, "percentage": 2.6, "elapsed_time": "0:01:25", "remaining_time": "0:53:13"}
{"current_steps": 10, "total_steps": 192, "loss": 0.8481, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.25e-06, "epoch": 0.20833333333333334, "percentage": 5.21, "elapsed_time": "0:04:04", "remaining_time": "1:14:01"}
{"current_steps": 10, "total_steps": 192, "loss": null, "eval_loss": 0.8153812289237976, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.20833333333333334, "percentage": 5.21, "elapsed_time": "0:04:04", "remaining_time": "1:14:01"}
{"current_steps": 15, "total_steps": 192, "loss": 0.7794, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 9e-06, "epoch": 0.3125, "percentage": 7.81, "elapsed_time": "0:06:37", "remaining_time": "1:18:10"}
{"current_steps": 15, "total_steps": 192, "loss": null, "eval_loss": 0.8123041987419128, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.3125, "percentage": 7.81, "elapsed_time": "0:06:37", "remaining_time": "1:18:10"}
{"current_steps": 20, "total_steps": 192, "loss": 0.7798, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.275e-05, "epoch": 0.4166666666666667, "percentage": 10.42, "elapsed_time": "0:09:16", "remaining_time": "1:19:44"}
{"current_steps": 20, "total_steps": 192, "loss": null, "eval_loss": 0.8410752415657043, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.4166666666666667, "percentage": 10.42, "elapsed_time": "0:09:16", "remaining_time": "1:19:44"}
{"current_steps": 25, "total_steps": 192, "loss": 0.8576, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.3195176200175283e-05, "epoch": 0.5208333333333334, "percentage": 13.02, "elapsed_time": "0:11:57", "remaining_time": "1:19:51"}
{"current_steps": 25, "total_steps": 192, "loss": null, "eval_loss": 0.8676239848136902, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.5208333333333334, "percentage": 13.02, "elapsed_time": "0:11:57", "remaining_time": "1:19:51"}
{"current_steps": 30, "total_steps": 192, "loss": 0.8852, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 9.515676612044427e-06, "epoch": 0.625, "percentage": 15.62, "elapsed_time": "0:14:41", "remaining_time": "1:19:17"}
{"current_steps": 30, "total_steps": 192, "loss": null, "eval_loss": 0.867268979549408, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.625, "percentage": 15.62, "elapsed_time": "0:14:41", "remaining_time": "1:19:17"}
{"current_steps": 35, "total_steps": 192, "loss": 0.8529, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 6.797580677308734e-06, "epoch": 0.7291666666666666, "percentage": 18.23, "elapsed_time": "0:17:12", "remaining_time": "1:17:10"}
{"current_steps": 35, "total_steps": 192, "loss": null, "eval_loss": 0.8560981154441833, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.7291666666666666, "percentage": 18.23, "elapsed_time": "0:17:12", "remaining_time": "1:17:10"}
{"current_steps": 40, "total_steps": 192, "loss": 0.8224, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.808575415542887e-06, "epoch": 0.8333333333333334, "percentage": 20.83, "elapsed_time": "0:19:49", "remaining_time": "1:15:19"}
{"current_steps": 40, "total_steps": 192, "loss": null, "eval_loss": 0.8470456004142761, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.8333333333333334, "percentage": 20.83, "elapsed_time": "0:19:49", "remaining_time": "1:15:19"}
{"current_steps": 45, "total_steps": 192, "loss": 0.8536, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.3676619069852654e-06, "epoch": 0.9375, "percentage": 23.44, "elapsed_time": "0:22:24", "remaining_time": "1:13:13"}
{"current_steps": 45, "total_steps": 192, "loss": null, "eval_loss": 0.8378292918205261, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.9375, "percentage": 23.44, "elapsed_time": "0:22:24", "remaining_time": "1:13:13"}
{"current_steps": 50, "total_steps": 192, "loss": 0.662, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.334947896124909e-06, "epoch": 1.0416666666666667, "percentage": 26.04, "elapsed_time": "0:25:01", "remaining_time": "1:11:05"}
{"current_steps": 50, "total_steps": 192, "loss": null, "eval_loss": 0.8293696045875549, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.0416666666666667, "percentage": 26.04, "elapsed_time": "0:25:01", "remaining_time": "1:11:05"}
{"current_steps": 55, "total_steps": 192, "loss": 0.437, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.603233215095547e-06, "epoch": 1.1458333333333333, "percentage": 28.65, "elapsed_time": "0:27:38", "remaining_time": "1:08:51"}
{"current_steps": 55, "total_steps": 192, "loss": null, "eval_loss": 0.8531150817871094, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.1458333333333333, "percentage": 28.65, "elapsed_time": "0:27:38", "remaining_time": "1:08:51"}
{"current_steps": 60, "total_steps": 192, "loss": 0.4402, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.0911174606561334e-06, "epoch": 1.25, "percentage": 31.25, "elapsed_time": "0:30:17", "remaining_time": "1:06:37"}
{"current_steps": 60, "total_steps": 192, "loss": null, "eval_loss": 0.8569180369377136, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.25, "percentage": 31.25, "elapsed_time": "0:30:17", "remaining_time": "1:06:37"}
{"current_steps": 65, "total_steps": 192, "loss": 0.4244, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 7.373930741131784e-07, "epoch": 1.3541666666666667, "percentage": 33.85, "elapsed_time": "0:32:53", "remaining_time": "1:04:15"}
{"current_steps": 65, "total_steps": 192, "loss": null, "eval_loss": 0.8569238185882568, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.3541666666666667, "percentage": 33.85, "elapsed_time": "0:32:53", "remaining_time": "1:04:15"}
{"current_steps": 70, "total_steps": 192, "loss": 0.4495, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.374210410959207e-07, "epoch": 1.4583333333333333, "percentage": 36.46, "elapsed_time": "0:35:28", "remaining_time": "1:01:50"}
{"current_steps": 70, "total_steps": 192, "loss": null, "eval_loss": 0.8547163605690002, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.4583333333333333, "percentage": 36.46, "elapsed_time": "0:35:28", "remaining_time": "1:01:50"}
{"current_steps": 75, "total_steps": 192, "loss": 0.4689, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.6222476698215175e-07, "epoch": 1.5625, "percentage": 39.06, "elapsed_time": "0:38:03", "remaining_time": "0:59:22"}
{"current_steps": 75, "total_steps": 192, "loss": null, "eval_loss": 0.8493571877479553, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.5625, "percentage": 39.06, "elapsed_time": "0:38:03", "remaining_time": "0:59:22"}
{"current_steps": 80, "total_steps": 192, "loss": 0.4309, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.462755297384099e-07, "epoch": 1.6666666666666665, "percentage": 41.67, "elapsed_time": "0:40:43", "remaining_time": "0:57:00"}
{"current_steps": 80, "total_steps": 192, "loss": null, "eval_loss": 0.846055269241333, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.6666666666666665, "percentage": 41.67, "elapsed_time": "0:40:43", "remaining_time": "0:57:00"}
{"current_steps": 85, "total_steps": 192, "loss": 0.4299, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.7088740175034947e-07, "epoch": 1.7708333333333335, "percentage": 44.27, "elapsed_time": "0:43:19", "remaining_time": "0:54:32"}
{"current_steps": 85, "total_steps": 192, "loss": null, "eval_loss": 0.8445951342582703, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.7708333333333335, "percentage": 44.27, "elapsed_time": "0:43:19", "remaining_time": "0:54:32"}
{"current_steps": 90, "total_steps": 192, "loss": 0.4461, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.228102956599465e-07, "epoch": 1.875, "percentage": 46.88, "elapsed_time": "0:45:54", "remaining_time": "0:52:01"}
{"current_steps": 90, "total_steps": 192, "loss": null, "eval_loss": 0.8440027832984924, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.875, "percentage": 46.88, "elapsed_time": "0:45:54", "remaining_time": "0:52:01"}
{"current_steps": 95, "total_steps": 192, "loss": 0.4474, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 9.279207916081227e-08, "epoch": 1.9791666666666665, "percentage": 49.48, "elapsed_time": "0:48:29", "remaining_time": "0:49:30"}
{"current_steps": 95, "total_steps": 192, "loss": null, "eval_loss": 0.8438854217529297, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.9791666666666665, "percentage": 49.48, "elapsed_time": "0:48:29", "remaining_time": "0:49:30"}
{"current_steps": 100, "total_steps": 192, "loss": 0.3614, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 7.448002404850094e-08, "epoch": 2.0833333333333335, "percentage": 52.08, "elapsed_time": "0:51:06", "remaining_time": "0:47:00"}
{"current_steps": 100, "total_steps": 192, "loss": null, "eval_loss": 0.8445320725440979, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.0833333333333335, "percentage": 52.08, "elapsed_time": "0:51:06", "remaining_time": "0:47:00"}
{"current_steps": 105, "total_steps": 192, "loss": 0.3861, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 6.35920070839697e-08, "epoch": 2.1875, "percentage": 54.69, "elapsed_time": "0:53:39", "remaining_time": "0:44:27"}
{"current_steps": 105, "total_steps": 192, "loss": null, "eval_loss": 0.8457441926002502, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.1875, "percentage": 54.69, "elapsed_time": "0:53:39", "remaining_time": "0:44:27"}
{"current_steps": 110, "total_steps": 192, "loss": 0.3829, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.7299804687499997e-08, "epoch": 2.2916666666666665, "percentage": 57.29, "elapsed_time": "0:56:16", "remaining_time": "0:41:57"}
{"current_steps": 110, "total_steps": 192, "loss": null, "eval_loss": 0.847288191318512, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.2916666666666665, "percentage": 57.29, "elapsed_time": "0:56:16", "remaining_time": "0:41:57"}
{"current_steps": 115, "total_steps": 192, "loss": 0.3764, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.37771434967624e-08, "epoch": 2.3958333333333335, "percentage": 59.9, "elapsed_time": "0:58:52", "remaining_time": "0:39:24"}
{"current_steps": 115, "total_steps": 192, "loss": null, "eval_loss": 0.8487641215324402, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.3958333333333335, "percentage": 59.9, "elapsed_time": "0:58:52", "remaining_time": "0:39:24"}
{"current_steps": 120, "total_steps": 192, "loss": 0.3655, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.187403540619925e-08, "epoch": 2.5, "percentage": 62.5, "elapsed_time": "1:01:27", "remaining_time": "0:36:52"}
{"current_steps": 120, "total_steps": 192, "loss": null, "eval_loss": 0.8499611020088196, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.5, "percentage": 62.5, "elapsed_time": "1:01:27", "remaining_time": "0:36:52"}
{"current_steps": 125, "total_steps": 192, "loss": 0.4243, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.088648238966908e-08, "epoch": 2.6041666666666665, "percentage": 65.1, "elapsed_time": "1:04:01", "remaining_time": "0:34:19"}
{"current_steps": 125, "total_steps": 192, "loss": null, "eval_loss": 0.8510637879371643, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.6041666666666665, "percentage": 65.1, "elapsed_time": "1:04:01", "remaining_time": "0:34:19"}
{"current_steps": 130, "total_steps": 192, "loss": 0.3884, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.039701925276604e-08, "epoch": 2.7083333333333335, "percentage": 67.71, "elapsed_time": "1:06:38", "remaining_time": "0:31:46"}
{"current_steps": 130, "total_steps": 192, "loss": null, "eval_loss": 0.8520172238349915, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.7083333333333335, "percentage": 67.71, "elapsed_time": "1:06:38", "remaining_time": "0:31:46"}
{"current_steps": 135, "total_steps": 192, "loss": 0.3634, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0166900048082497e-08, "epoch": 2.8125, "percentage": 70.31, "elapsed_time": "1:09:13", "remaining_time": "0:29:13"}
{"current_steps": 135, "total_steps": 192, "loss": null, "eval_loss": 0.8528143763542175, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.8125, "percentage": 70.31, "elapsed_time": "1:09:13", "remaining_time": "0:29:13"}
{"current_steps": 140, "total_steps": 192, "loss": 0.3846, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0065147322870076e-08, "epoch": 2.9166666666666665, "percentage": 72.92, "elapsed_time": "1:11:49", "remaining_time": "0:26:40"}
{"current_steps": 140, "total_steps": 192, "loss": null, "eval_loss": 0.8537066578865051, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.9166666666666665, "percentage": 72.92, "elapsed_time": "1:11:49", "remaining_time": "0:26:40"}
{"current_steps": 145, "total_steps": 192, "loss": 0.3872, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.002328628528332e-08, "epoch": 3.0208333333333335, "percentage": 75.52, "elapsed_time": "1:14:24", "remaining_time": "0:24:06"}
{"current_steps": 145, "total_steps": 192, "loss": null, "eval_loss": 0.8547406196594238, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.0208333333333335, "percentage": 75.52, "elapsed_time": "1:14:24", "remaining_time": "0:24:06"}
{"current_steps": 150, "total_steps": 192, "loss": 0.3869, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0007484528133236e-08, "epoch": 3.125, "percentage": 78.12, "elapsed_time": "1:17:02", "remaining_time": "0:21:34"}
{"current_steps": 150, "total_steps": 192, "loss": null, "eval_loss": 0.8557960391044617, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.125, "percentage": 78.12, "elapsed_time": "1:17:02", "remaining_time": "0:21:34"}
{"current_steps": 155, "total_steps": 192, "loss": 0.3876, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0002110817570477e-08, "epoch": 3.2291666666666665, "percentage": 80.73, "elapsed_time": "1:19:36", "remaining_time": "0:19:00"}
{"current_steps": 155, "total_steps": 192, "loss": null, "eval_loss": 0.8566272854804993, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.2291666666666665, "percentage": 80.73, "elapsed_time": "1:19:36", "remaining_time": "0:19:00"}
{"current_steps": 160, "total_steps": 192, "loss": 0.3844, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0000504842356326e-08, "epoch": 3.3333333333333335, "percentage": 83.33, "elapsed_time": "1:22:13", "remaining_time": "0:16:26"}
{"current_steps": 160, "total_steps": 192, "loss": null, "eval_loss": 0.8572790026664734, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.3333333333333335, "percentage": 83.33, "elapsed_time": "1:22:13", "remaining_time": "0:16:26"}
{"current_steps": 165, "total_steps": 192, "loss": 0.3535, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.000009745562451e-08, "epoch": 3.4375, "percentage": 85.94, "elapsed_time": "1:24:48", "remaining_time": "0:13:52"}
{"current_steps": 165, "total_steps": 192, "loss": null, "eval_loss": 0.8578632473945618, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.4375, "percentage": 85.94, "elapsed_time": "1:24:48", "remaining_time": "0:13:52"}
{"current_steps": 170, "total_steps": 192, "loss": 0.3488, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0000014077810156e-08, "epoch": 3.5416666666666665, "percentage": 88.54, "elapsed_time": "1:27:24", "remaining_time": "0:11:18"}
{"current_steps": 170, "total_steps": 192, "loss": null, "eval_loss": 0.85884028673172, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.5416666666666665, "percentage": 88.54, "elapsed_time": "1:27:24", "remaining_time": "0:11:18"}
{"current_steps": 175, "total_steps": 192, "loss": 0.3464, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0000001343508807e-08, "epoch": 3.6458333333333335, "percentage": 91.15, "elapsed_time": "1:30:00", "remaining_time": "0:08:44"}
{"current_steps": 175, "total_steps": 192, "loss": null, "eval_loss": 0.8598365783691406, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.6458333333333335, "percentage": 91.15, "elapsed_time": "1:30:00", "remaining_time": "0:08:44"}
{"current_steps": 180, "total_steps": 192, "loss": 0.361, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.000000006747581e-08, "epoch": 3.75, "percentage": 93.75, "elapsed_time": "1:32:36", "remaining_time": "0:06:10"}
{"current_steps": 180, "total_steps": 192, "loss": null, "eval_loss": 0.8606703877449036, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.75, "percentage": 93.75, "elapsed_time": "1:32:36", "remaining_time": "0:06:10"}
{"current_steps": 185, "total_steps": 192, "loss": 0.3674, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0000000001094325e-08, "epoch": 3.8541666666666665, "percentage": 96.35, "elapsed_time": "1:35:11", "remaining_time": "0:03:36"}
{"current_steps": 185, "total_steps": 192, "loss": null, "eval_loss": 0.8611735701560974, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.8541666666666665, "percentage": 96.35, "elapsed_time": "1:35:11", "remaining_time": "0:03:36"}
{"current_steps": 190, "total_steps": 192, "loss": 0.3988, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.000000000000139e-08, "epoch": 3.9583333333333335, "percentage": 98.96, "elapsed_time": "1:37:48", "remaining_time": "0:01:01"}
{"current_steps": 190, "total_steps": 192, "loss": null, "eval_loss": 0.8612277507781982, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.9583333333333335, "percentage": 98.96, "elapsed_time": "1:37:48", "remaining_time": "0:01:01"}
{"current_steps": 192, "total_steps": 192, "loss": null, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 4.0, "percentage": 100.0, "elapsed_time": "1:39:43", "remaining_time": "0:00:00"}
{"current_steps": 3, "total_steps": 3, "loss": null, "eval_loss": 0.8123041987419128, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 4.0, "percentage": 100.0, "elapsed_time": "1:40:42", "remaining_time": "0:00:00"}

615
trainer_state.json Normal file
View File

@@ -0,0 +1,615 @@
{
"best_metric": 0.8123041987419128,
"best_model_checkpoint": "./output/training_results/C013_llama3-8b-base_instruct_20240428_005832/checkpoint-15",
"epoch": 4.0,
"eval_steps": 5,
"global_step": 192,
"is_hyper_param_search": false,
"is_local_process_zero": true,
"is_world_process_zero": true,
"log_history": [
{
"epoch": 0.020833333333333332,
"grad_norm": 0.0,
"learning_rate": 0.0,
"loss": 0.9805,
"step": 1
},
{
"epoch": 0.020833333333333332,
"eval_loss": 0.9736970067024231,
"eval_runtime": 2.153,
"eval_samples_per_second": 157.916,
"eval_steps_per_second": 1.393,
"step": 1
},
{
"epoch": 0.10416666666666667,
"grad_norm": 14.850728211706278,
"learning_rate": 1.5e-06,
"loss": 0.9446,
"step": 5
},
{
"epoch": 0.10416666666666667,
"eval_loss": 0.9454841613769531,
"eval_runtime": 2.0973,
"eval_samples_per_second": 162.11,
"eval_steps_per_second": 1.43,
"step": 5
},
{
"epoch": 0.20833333333333334,
"grad_norm": 4.950599387514031,
"learning_rate": 5.25e-06,
"loss": 0.8481,
"step": 10
},
{
"epoch": 0.20833333333333334,
"eval_loss": 0.8153812289237976,
"eval_runtime": 2.0923,
"eval_samples_per_second": 162.499,
"eval_steps_per_second": 1.434,
"step": 10
},
{
"epoch": 0.3125,
"grad_norm": 4.621063619275185,
"learning_rate": 9e-06,
"loss": 0.7794,
"step": 15
},
{
"epoch": 0.3125,
"eval_loss": 0.8123041987419128,
"eval_runtime": 2.1028,
"eval_samples_per_second": 161.686,
"eval_steps_per_second": 1.427,
"step": 15
},
{
"epoch": 0.4166666666666667,
"grad_norm": 4.141809373286457,
"learning_rate": 1.275e-05,
"loss": 0.7798,
"step": 20
},
{
"epoch": 0.4166666666666667,
"eval_loss": 0.8410752415657043,
"eval_runtime": 2.0891,
"eval_samples_per_second": 162.747,
"eval_steps_per_second": 1.436,
"step": 20
},
{
"epoch": 0.5208333333333334,
"grad_norm": 4.211750921552142,
"learning_rate": 1.3195176200175283e-05,
"loss": 0.8576,
"step": 25
},
{
"epoch": 0.5208333333333334,
"eval_loss": 0.8676239848136902,
"eval_runtime": 2.0885,
"eval_samples_per_second": 162.793,
"eval_steps_per_second": 1.436,
"step": 25
},
{
"epoch": 0.625,
"grad_norm": 4.126229536438554,
"learning_rate": 9.515676612044427e-06,
"loss": 0.8852,
"step": 30
},
{
"epoch": 0.625,
"eval_loss": 0.867268979549408,
"eval_runtime": 2.0839,
"eval_samples_per_second": 163.157,
"eval_steps_per_second": 1.44,
"step": 30
},
{
"epoch": 0.7291666666666666,
"grad_norm": 4.316589185885892,
"learning_rate": 6.797580677308734e-06,
"loss": 0.8529,
"step": 35
},
{
"epoch": 0.7291666666666666,
"eval_loss": 0.8560981154441833,
"eval_runtime": 2.1307,
"eval_samples_per_second": 159.573,
"eval_steps_per_second": 1.408,
"step": 35
},
{
"epoch": 0.8333333333333334,
"grad_norm": 4.0216031828158005,
"learning_rate": 4.808575415542887e-06,
"loss": 0.8224,
"step": 40
},
{
"epoch": 0.8333333333333334,
"eval_loss": 0.8470456004142761,
"eval_runtime": 2.0873,
"eval_samples_per_second": 162.886,
"eval_steps_per_second": 1.437,
"step": 40
},
{
"epoch": 0.9375,
"grad_norm": 4.316706720311178,
"learning_rate": 3.3676619069852654e-06,
"loss": 0.8536,
"step": 45
},
{
"epoch": 0.9375,
"eval_loss": 0.8378292918205261,
"eval_runtime": 2.0847,
"eval_samples_per_second": 163.089,
"eval_steps_per_second": 1.439,
"step": 45
},
{
"epoch": 1.0416666666666667,
"grad_norm": 3.7957934185208795,
"learning_rate": 2.334947896124909e-06,
"loss": 0.662,
"step": 50
},
{
"epoch": 1.0416666666666667,
"eval_loss": 0.8293696045875549,
"eval_runtime": 2.0835,
"eval_samples_per_second": 163.187,
"eval_steps_per_second": 1.44,
"step": 50
},
{
"epoch": 1.1458333333333333,
"grad_norm": 3.4155908301931186,
"learning_rate": 1.603233215095547e-06,
"loss": 0.437,
"step": 55
},
{
"epoch": 1.1458333333333333,
"eval_loss": 0.8531150817871094,
"eval_runtime": 2.1006,
"eval_samples_per_second": 161.859,
"eval_steps_per_second": 1.428,
"step": 55
},
{
"epoch": 1.25,
"grad_norm": 3.377214905899517,
"learning_rate": 1.0911174606561334e-06,
"loss": 0.4402,
"step": 60
},
{
"epoch": 1.25,
"eval_loss": 0.8569180369377136,
"eval_runtime": 2.0899,
"eval_samples_per_second": 162.69,
"eval_steps_per_second": 1.436,
"step": 60
},
{
"epoch": 1.3541666666666667,
"grad_norm": 4.018786896199577,
"learning_rate": 7.373930741131784e-07,
"loss": 0.4244,
"step": 65
},
{
"epoch": 1.3541666666666667,
"eval_loss": 0.8569238185882568,
"eval_runtime": 2.0969,
"eval_samples_per_second": 162.148,
"eval_steps_per_second": 1.431,
"step": 65
},
{
"epoch": 1.4583333333333333,
"grad_norm": 4.3050060673581205,
"learning_rate": 5.374210410959207e-07,
"loss": 0.4495,
"step": 70
},
{
"epoch": 1.4583333333333333,
"eval_loss": 0.8547163605690002,
"eval_runtime": 2.0852,
"eval_samples_per_second": 163.056,
"eval_steps_per_second": 1.439,
"step": 70
},
{
"epoch": 1.5625,
"grad_norm": 3.8753963390823842,
"learning_rate": 3.6222476698215175e-07,
"loss": 0.4689,
"step": 75
},
{
"epoch": 1.5625,
"eval_loss": 0.8493571877479553,
"eval_runtime": 2.1006,
"eval_samples_per_second": 161.855,
"eval_steps_per_second": 1.428,
"step": 75
},
{
"epoch": 1.6666666666666665,
"grad_norm": 3.2777220151938935,
"learning_rate": 2.462755297384099e-07,
"loss": 0.4309,
"step": 80
},
{
"epoch": 1.6666666666666665,
"eval_loss": 0.846055269241333,
"eval_runtime": 2.0775,
"eval_samples_per_second": 163.657,
"eval_steps_per_second": 1.444,
"step": 80
},
{
"epoch": 1.7708333333333335,
"grad_norm": 3.25027538013195,
"learning_rate": 1.7088740175034947e-07,
"loss": 0.4299,
"step": 85
},
{
"epoch": 1.7708333333333335,
"eval_loss": 0.8445951342582703,
"eval_runtime": 2.0859,
"eval_samples_per_second": 163.002,
"eval_steps_per_second": 1.438,
"step": 85
},
{
"epoch": 1.875,
"grad_norm": 3.841600887262257,
"learning_rate": 1.228102956599465e-07,
"loss": 0.4461,
"step": 90
},
{
"epoch": 1.875,
"eval_loss": 0.8440027832984924,
"eval_runtime": 2.099,
"eval_samples_per_second": 161.984,
"eval_steps_per_second": 1.429,
"step": 90
},
{
"epoch": 1.9791666666666665,
"grad_norm": 4.633157495322692,
"learning_rate": 9.279207916081227e-08,
"loss": 0.4474,
"step": 95
},
{
"epoch": 1.9791666666666665,
"eval_loss": 0.8438854217529297,
"eval_runtime": 2.094,
"eval_samples_per_second": 162.368,
"eval_steps_per_second": 1.433,
"step": 95
},
{
"epoch": 2.0833333333333335,
"grad_norm": 3.3543713588136885,
"learning_rate": 7.448002404850094e-08,
"loss": 0.3614,
"step": 100
},
{
"epoch": 2.0833333333333335,
"eval_loss": 0.8445320725440979,
"eval_runtime": 2.0778,
"eval_samples_per_second": 163.634,
"eval_steps_per_second": 1.444,
"step": 100
},
{
"epoch": 2.1875,
"grad_norm": 3.5776096289343053,
"learning_rate": 6.35920070839697e-08,
"loss": 0.3861,
"step": 105
},
{
"epoch": 2.1875,
"eval_loss": 0.8457441926002502,
"eval_runtime": 2.1055,
"eval_samples_per_second": 161.484,
"eval_steps_per_second": 1.425,
"step": 105
},
{
"epoch": 2.2916666666666665,
"grad_norm": 3.811456756438563,
"learning_rate": 5.7299804687499997e-08,
"loss": 0.3829,
"step": 110
},
{
"epoch": 2.2916666666666665,
"eval_loss": 0.847288191318512,
"eval_runtime": 2.083,
"eval_samples_per_second": 163.223,
"eval_steps_per_second": 1.44,
"step": 110
},
{
"epoch": 2.3958333333333335,
"grad_norm": 3.1978758437608823,
"learning_rate": 5.37771434967624e-08,
"loss": 0.3764,
"step": 115
},
{
"epoch": 2.3958333333333335,
"eval_loss": 0.8487641215324402,
"eval_runtime": 2.1168,
"eval_samples_per_second": 160.617,
"eval_steps_per_second": 1.417,
"step": 115
},
{
"epoch": 2.5,
"grad_norm": 3.472352228062058,
"learning_rate": 5.187403540619925e-08,
"loss": 0.3655,
"step": 120
},
{
"epoch": 2.5,
"eval_loss": 0.8499611020088196,
"eval_runtime": 2.0908,
"eval_samples_per_second": 162.615,
"eval_steps_per_second": 1.435,
"step": 120
},
{
"epoch": 2.6041666666666665,
"grad_norm": 3.2298459394815793,
"learning_rate": 5.088648238966908e-08,
"loss": 0.4243,
"step": 125
},
{
"epoch": 2.6041666666666665,
"eval_loss": 0.8510637879371643,
"eval_runtime": 2.0941,
"eval_samples_per_second": 162.36,
"eval_steps_per_second": 1.433,
"step": 125
},
{
"epoch": 2.7083333333333335,
"grad_norm": 3.7544587648641756,
"learning_rate": 5.039701925276604e-08,
"loss": 0.3884,
"step": 130
},
{
"epoch": 2.7083333333333335,
"eval_loss": 0.8520172238349915,
"eval_runtime": 2.1032,
"eval_samples_per_second": 161.66,
"eval_steps_per_second": 1.426,
"step": 130
},
{
"epoch": 2.8125,
"grad_norm": 3.5032769257867695,
"learning_rate": 5.0166900048082497e-08,
"loss": 0.3634,
"step": 135
},
{
"epoch": 2.8125,
"eval_loss": 0.8528143763542175,
"eval_runtime": 2.0786,
"eval_samples_per_second": 163.568,
"eval_steps_per_second": 1.443,
"step": 135
},
{
"epoch": 2.9166666666666665,
"grad_norm": 3.023294292675947,
"learning_rate": 5.0065147322870076e-08,
"loss": 0.3846,
"step": 140
},
{
"epoch": 2.9166666666666665,
"eval_loss": 0.8537066578865051,
"eval_runtime": 2.0903,
"eval_samples_per_second": 162.659,
"eval_steps_per_second": 1.435,
"step": 140
},
{
"epoch": 3.0208333333333335,
"grad_norm": 3.1767015238154075,
"learning_rate": 5.002328628528332e-08,
"loss": 0.3872,
"step": 145
},
{
"epoch": 3.0208333333333335,
"eval_loss": 0.8547406196594238,
"eval_runtime": 2.0891,
"eval_samples_per_second": 162.748,
"eval_steps_per_second": 1.436,
"step": 145
},
{
"epoch": 3.125,
"grad_norm": 3.1942747338221045,
"learning_rate": 5.0007484528133236e-08,
"loss": 0.3869,
"step": 150
},
{
"epoch": 3.125,
"eval_loss": 0.8557960391044617,
"eval_runtime": 2.0819,
"eval_samples_per_second": 163.312,
"eval_steps_per_second": 1.441,
"step": 150
},
{
"epoch": 3.2291666666666665,
"grad_norm": 3.815918812229993,
"learning_rate": 5.0002110817570477e-08,
"loss": 0.3876,
"step": 155
},
{
"epoch": 3.2291666666666665,
"eval_loss": 0.8566272854804993,
"eval_runtime": 2.0781,
"eval_samples_per_second": 163.61,
"eval_steps_per_second": 1.444,
"step": 155
},
{
"epoch": 3.3333333333333335,
"grad_norm": 3.4577646975309366,
"learning_rate": 5.0000504842356326e-08,
"loss": 0.3844,
"step": 160
},
{
"epoch": 3.3333333333333335,
"eval_loss": 0.8572790026664734,
"eval_runtime": 2.0811,
"eval_samples_per_second": 163.373,
"eval_steps_per_second": 1.442,
"step": 160
},
{
"epoch": 3.4375,
"grad_norm": 3.274685205370877,
"learning_rate": 5.000009745562451e-08,
"loss": 0.3535,
"step": 165
},
{
"epoch": 3.4375,
"eval_loss": 0.8578632473945618,
"eval_runtime": 2.0918,
"eval_samples_per_second": 162.539,
"eval_steps_per_second": 1.434,
"step": 165
},
{
"epoch": 3.5416666666666665,
"grad_norm": 3.246459205886974,
"learning_rate": 5.0000014077810156e-08,
"loss": 0.3488,
"step": 170
},
{
"epoch": 3.5416666666666665,
"eval_loss": 0.85884028673172,
"eval_runtime": 2.1178,
"eval_samples_per_second": 160.545,
"eval_steps_per_second": 1.417,
"step": 170
},
{
"epoch": 3.6458333333333335,
"grad_norm": 3.3944513203963504,
"learning_rate": 5.0000001343508807e-08,
"loss": 0.3464,
"step": 175
},
{
"epoch": 3.6458333333333335,
"eval_loss": 0.8598365783691406,
"eval_runtime": 2.0828,
"eval_samples_per_second": 163.238,
"eval_steps_per_second": 1.44,
"step": 175
},
{
"epoch": 3.75,
"grad_norm": 3.258773113208273,
"learning_rate": 5.000000006747581e-08,
"loss": 0.361,
"step": 180
},
{
"epoch": 3.75,
"eval_loss": 0.8606703877449036,
"eval_runtime": 2.1172,
"eval_samples_per_second": 160.588,
"eval_steps_per_second": 1.417,
"step": 180
},
{
"epoch": 3.8541666666666665,
"grad_norm": 3.586703083699586,
"learning_rate": 5.0000000001094325e-08,
"loss": 0.3674,
"step": 185
},
{
"epoch": 3.8541666666666665,
"eval_loss": 0.8611735701560974,
"eval_runtime": 2.0956,
"eval_samples_per_second": 162.243,
"eval_steps_per_second": 1.432,
"step": 185
},
{
"epoch": 3.9583333333333335,
"grad_norm": 3.5661429802112616,
"learning_rate": 5.000000000000139e-08,
"loss": 0.3988,
"step": 190
},
{
"epoch": 3.9583333333333335,
"eval_loss": 0.8612277507781982,
"eval_runtime": 2.0853,
"eval_samples_per_second": 163.045,
"eval_steps_per_second": 1.439,
"step": 190
},
{
"epoch": 4.0,
"step": 192,
"total_flos": 5363820134400.0,
"train_loss": 0.5090278356025616,
"train_runtime": 6014.9673,
"train_samples_per_second": 2.031,
"train_steps_per_second": 0.032
}
],
"logging_steps": 5,
"max_steps": 192,
"num_input_tokens_seen": 0,
"num_train_epochs": 4,
"save_steps": 5,
"total_flos": 5363820134400.0,
"train_batch_size": 8,
"trial_name": null,
"trial_params": null
}

3
training_args.bin Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:fa3669f56e96d865dc1093fe06d25ee5dfbdfdce605f2359ead076119c115479
size 6968

BIN
training_eval_loss.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 41 KiB

BIN
training_loss.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 40 KiB