初始化项目,由ModelHub XC社区提供模型

Model: PKU-Alignment/ProgressGym-HistLlama3-8B-C014-instruct-v0.2
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-25 16:15:14 +08:00
commit 8de117494f
25 changed files with 413910 additions and 0 deletions

39
.gitattributes vendored Normal file
View File

@@ -0,0 +1,39 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
model-00001-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
model-00002-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
model-00003-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
model-00004-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text

209
README.md Normal file
View File

@@ -0,0 +1,209 @@
---
license: cc-by-4.0
tags:
- alignment
- value alignment
- AI safety
- safety
- LLM
- history
datasets:
- PKU-Alignment/ProgressGym-HistText
- PKU-Alignment/ProgressGym-TimelessQA
base_model:
- PKU-Alignment/ProgressGym-HistLlama3-8B-C014-pretrain
- meta-llama/Meta-Llama-3-8B
---
# ProgressGym-HistLlama3-8B-C014-instruct
## Overview
#### The ProgressGym Framework
![Framework Diagram](./readme-assets/main-diagram.png)
**ProgressGym-HistLlama3-8B-C014-instruct** is part of the **ProgressGym** framework for research and experimentation on *progress alignment* - the emulation of moral progress in AI alignment algorithms, as a measure to prevent risks of societal value lock-in.
To quote the paper [*ProgressGym: Alignment with a Millennium of Moral Progress*](https://arxiv.org/abs/2406.20087):
> Frontier AI systems, including large language models (LLMs), hold increasing influence over the epistemology of human users. Such influence can reinforce prevailing societal values, potentially contributing to the lock-in of misguided moral beliefs and, consequently, the perpetuation of problematic moral practices on a broad scale.
>
> We introduce *progress alignment* as a technical solution to mitigate this imminent risk. Progress alignment algorithms learn to emulate the mechanics of human moral progress, thereby addressing the susceptibility of existing alignment methods to contemporary moral blindspots.
#### ProgressGym-HistLlama3-8B-C014-instruct
ProgressGym-HistLlama3-8B-C014-instruct is one of the **36 historical language models** in the ProgressGym framework.
**ProgressGym-HistLlama3-8B-C014-instruct is under continual iteration.** Improving upon the current version, new versions of the model are currently being trained to reflect historical moral tendencies in ever more comprehensive ways.
**ProgressGym-HistLlama3-8B-C014-instruct is a 14th-century historical language model.** Based on [Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B), It is continued-pretrained on the 14th-century text data from [ProgressGym-HistText](https://huggingface.co/datasets/PKU-Alignment/ProgressGym-HistText), using the following hyperparameters:
- learning_rate: 1.5e-05
- train_batch_size: 8
- eval_batch_size: 16
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 64
- total_eval_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: polynomial
- lr_scheduler_warmup_steps: 20
- num_epochs: 4.0
- mixed_precision_training: Native AMP
... with the following training results:
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| 2.5789 | 0.0152 | 1 | 2.6458 |
| 2.5672 | 0.0758 | 5 | 2.6280 |
| 2.5751 | 0.1515 | 10 | 2.5314 |
| 2.418 | 0.2273 | 15 | 2.4634 |
| 2.4701 | 0.3030 | 20 | 2.4177 |
| 2.3904 | 0.3788 | 25 | 2.3785 |
| 2.3539 | 0.4545 | 30 | 2.3378 |
| 2.3101 | 0.5303 | 35 | 2.3082 |
| 2.3254 | 0.6061 | 40 | 2.2816 |
| 2.2762 | 0.6818 | 45 | 2.2614 |
| 2.2525 | 0.7576 | 50 | 2.2458 |
| 2.2777 | 0.8333 | 55 | 2.2321 |
| 2.2054 | 0.9091 | 60 | 2.2206 |
| 2.237 | 0.9848 | 65 | 2.2113 |
| 1.986 | 1.0606 | 70 | 2.2115 |
| 1.9373 | 1.1364 | 75 | 2.2217 |
| 1.9228 | 1.2121 | 80 | 2.2132 |
| 1.9084 | 1.2879 | 85 | 2.2118 |
| 1.9684 | 1.3636 | 90 | 2.2122 |
| 1.9126 | 1.4394 | 95 | 2.2094 |
| 1.9101 | 1.5152 | 100 | 2.2066 |
| 1.8496 | 1.5909 | 105 | 2.2058 |
| 1.9154 | 1.6667 | 110 | 2.2057 |
| 1.9233 | 1.7424 | 115 | 2.2056 |
| 1.9198 | 1.8182 | 120 | 2.2052 |
| 1.9229 | 1.8939 | 125 | 2.2048 |
| 1.8913 | 1.9697 | 130 | 2.2045 |
| 1.8814 | 2.0455 | 135 | 2.2046 |
| 1.8813 | 2.1212 | 140 | 2.2051 |
| 1.8912 | 2.1970 | 145 | 2.2058 |
| 1.9184 | 2.2727 | 150 | 2.2065 |
| 1.8662 | 2.3485 | 155 | 2.2071 |
| 1.8809 | 2.4242 | 160 | 2.2074 |
| 1.8591 | 2.5 | 165 | 2.2077 |
| 1.8731 | 2.5758 | 170 | 2.2079 |
| 1.8948 | 2.6515 | 175 | 2.2082 |
| 1.8876 | 2.7273 | 180 | 2.2082 |
| 1.8408 | 2.8030 | 185 | 2.2083 |
| 1.8931 | 2.8788 | 190 | 2.2082 |
| 1.8569 | 2.9545 | 195 | 2.2080 |
| 1.8621 | 3.0303 | 200 | 2.2079 |
| 1.8863 | 3.1061 | 205 | 2.2078 |
| 1.9021 | 3.1818 | 210 | 2.2079 |
| 1.8648 | 3.2576 | 215 | 2.2080 |
| 1.8443 | 3.3333 | 220 | 2.2081 |
| 1.8978 | 3.4091 | 225 | 2.2080 |
| 1.8658 | 3.4848 | 230 | 2.2080 |
| 1.8706 | 3.5606 | 235 | 2.2079 |
| 1.8855 | 3.6364 | 240 | 2.2078 |
| 1.8535 | 3.7121 | 245 | 2.2078 |
| 1.9062 | 3.7879 | 250 | 2.2079 |
| 1.8628 | 3.8636 | 255 | 2.2078 |
| 1.8484 | 3.9394 | 260 | 2.2077 |
Note that the training data volume for the continued pretraining stage is capped at 3GB. When the corresponding century's corpus exceeds this volume, the training data is randomly sampled to fit the volume.
**ProgressGym-HistLlama3-8B-C014-instruct is an instruction-tuned language model.** It is tuned on [ProgressGym-TimelessQA](https://huggingface.co/datasets/PKU-Alignment/ProgressGym-TimelessQA), using the following hyperparameters. Note, however, that the snapshot at training step 10 is used for the final model, to minimize erosion of the value tendencies learned during continued pretraining; we qualitatively observe that this snapshot still possesses strong instruction-following capabilities.
- learning_rate: 1.5e-05
- train_batch_size: 8
- eval_batch_size: 16
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 64
- total_eval_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: polynomial
- lr_scheduler_warmup_steps: 20
- num_epochs: 4.0
- mixed_precision_training: Native AMP
... with the following training results:
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| 0.9832 | 0.0208 | 1 | 0.9730 |
| 0.9463 | 0.1042 | 5 | 0.9421 |
| 0.8488 | 0.2083 | 10 | 0.8247 |
| 0.7833 | 0.3125 | 15 | 0.8149 |
| 0.7797 | 0.4167 | 20 | 0.8403 |
| 0.8542 | 0.5208 | 25 | 0.8670 |
| 0.8895 | 0.625 | 30 | 0.8718 |
| 0.8519 | 0.7292 | 35 | 0.8592 |
| 0.8224 | 0.8333 | 40 | 0.8491 |
| 0.8538 | 0.9375 | 45 | 0.8384 |
| 0.6569 | 1.0417 | 50 | 0.8295 |
| 0.437 | 1.1458 | 55 | 0.8457 |
| 0.4405 | 1.25 | 60 | 0.8668 |
| 0.4331 | 1.3542 | 65 | 0.8671 |
| 0.448 | 1.4583 | 70 | 0.8597 |
| 0.4673 | 1.5625 | 75 | 0.8514 |
| 0.4298 | 1.6667 | 80 | 0.8474 |
| 0.4252 | 1.7708 | 85 | 0.8458 |
| 0.4429 | 1.875 | 90 | 0.8451 |
| 0.4484 | 1.9792 | 95 | 0.8450 |
| 0.3634 | 2.0833 | 100 | 0.8455 |
| 0.3876 | 2.1875 | 105 | 0.8467 |
| 0.3717 | 2.2917 | 110 | 0.8481 |
| 0.387 | 2.3958 | 115 | 0.8494 |
| 0.3561 | 2.5 | 120 | 0.8505 |
| 0.4219 | 2.6042 | 125 | 0.8516 |
| 0.3798 | 2.7083 | 130 | 0.8527 |
| 0.3551 | 2.8125 | 135 | 0.8537 |
| 0.3827 | 2.9167 | 140 | 0.8546 |
| 0.3938 | 3.0208 | 145 | 0.8556 |
| 0.3805 | 3.125 | 150 | 0.8565 |
| 0.3813 | 3.2292 | 155 | 0.8574 |
| 0.3894 | 3.3333 | 160 | 0.8582 |
| 0.3603 | 3.4375 | 165 | 0.8589 |
| 0.3515 | 3.5417 | 170 | 0.8597 |
| 0.3433 | 3.6458 | 175 | 0.8605 |
| 0.3511 | 3.75 | 180 | 0.8614 |
| 0.3599 | 3.8542 | 185 | 0.8620 |
| 0.3994 | 3.9583 | 190 | 0.8621 |
## Links
- **[Paper Preprint]** [ProgressGym: Alignment with a Millennium of Moral Progress](https://arxiv.org/abs/2406.20087)
- **[Leaderboard & Interactive Playground]** [PKU-Alignment/ProgressGym-LeaderBoard](https://huggingface.co/spaces/PKU-Alignment/ProgressGym-LeaderBoard)
- **[Huggingface Data & Model Collection]** [PKU-Alignment/ProgressGym](https://huggingface.co/collections/PKU-Alignment/progressgym-666735fcf3e4efa276226eaa)
- **[Github Codebase]** [PKU-Alignment/ProgressGym](https://github.com/PKU-Alignment/ProgressGym)
- **[Documentation]** [ProgressGym Documentation](https://pku-alignment.github.io/ProgressGym/)
- **[PyPI Package]** *(coming soon - [stay tuned](https://forms.gle/1TWFLL4ZCLeYTD5N6)!)*
## Citation
If the datasets, models, or framework of ProgressGym help you in your project, please cite ProgressGym using the bibtex entry below.
```text
@article{progressgym,
title={ProgressGym: Alignment with a Millennium of Moral Progress},
author={Tianyi Qiu and Yang Zhang and Xuchuan Huang and Jasmine Xinze Li and Jiaming Ji and Yaodong Yang},
journal={arXiv preprint arXiv:2406.20087},
eprint={2406.20087},
eprinttype = {arXiv},
year={2024}
}
```
## Ethics Statement
- **Copyright information of historical text data sources**:
- Project Gutenberg, one among our four source of our historical text data, consists only of texts in the public domain.
- For the text that we draw from Internet Archive, we only include those that uploaded by *Library of Congress*, which are texts freely released online by the U.S. Library of Congress for research and public use.
- The text data from Early English Books Online are, according to their publisher, "freely available to the public" and "available for access, distribution, use, or reuse by anyone".
- The last remaining source of our historical text data, the Pile of Law dataset, is released under a Creative Commons license, which we adhere to in our use.
- **Reproducibility**: To ensure reproducibility, we open-source all the code involved in the production of our main results (including the entire pipeline starting from data collection and model training), as well as the supporting infrastructure (the ProgressGym framework), making replication as easy as running a few simple script files.
- **Misuse Prevention**: In order to prevent potential misuse of progress alignment algorithms, we have carefully formulated progress alignment as strictly value-neutral, without *a priori* assumptions on the direction of progress. In the event of potential misuse of our dataset, we condemn any misuse attempt to the strongest degree possible, and will work with the research community on whistleblowing for such attempts.
- **Open-Sourcing**: We confirm that our code, data, and models are to be open-sourced under a CC-BY 4.0 license. We will continue to maintain and update our open-source repositories and models.

12
all_results.json Normal file
View File

@@ -0,0 +1,12 @@
{
"epoch": 4.0,
"eval_loss": 0.8149010539054871,
"eval_runtime": 2.1161,
"eval_samples_per_second": 160.676,
"eval_steps_per_second": 1.418,
"total_flos": 5360548577280.0,
"train_loss": 0.507965192819635,
"train_runtime": 5963.6843,
"train_samples_per_second": 2.048,
"train_steps_per_second": 0.032
}

28
config.json Normal file
View File

@@ -0,0 +1,28 @@
{
"_name_or_path": "./output/training_results/C014_llama3-8b-base_pretrain_20240428_005832/",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128001,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.40.0",
"use_cache": false,
"vocab_size": 128256
}

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}

7
eval_results.json Normal file
View File

@@ -0,0 +1,7 @@
{
"epoch": 4.0,
"eval_loss": 0.8149010539054871,
"eval_runtime": 2.1161,
"eval_samples_per_second": 160.676,
"eval_steps_per_second": 1.418
}

6
generation_config.json Normal file
View File

@@ -0,0 +1,6 @@
{
"_from_model_config": true,
"bos_token_id": 128000,
"eos_token_id": 128001,
"transformers_version": "4.40.0"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a68a11d1949e53816b9bbecdc0ec223dc7566a26d46e1f44ef284faf5c4227ca
size 4976698592

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:019b6d35cde1e6b1ab36c607253c8634ff1041694001d873b425c1a18598f7d0
size 4999802616

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c379dcf4ce836ce30e67cd411c921a01bcbe71a26dc8f911d8e318a07cb71530
size 4915916080

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:587e49afd5cca68b1b18cdcd8cb2fe8c281c609f8c94c6d8e022443ef5cb3b96
size 1168138808

View File

@@ -0,0 +1,298 @@
{
"metadata": {
"total_size": 16060522496
},
"weight_map": {
"lm_head.weight": "model-00004-of-00004.safetensors",
"model.embed_tokens.weight": "model-00001-of-00004.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.20.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.21.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.30.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.input_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.norm.weight": "model-00004-of-00004.safetensors"
}
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 85 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 210 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 178 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 218 KiB

23
special_tokens_map.json Normal file
View File

@@ -0,0 +1,23 @@
{
"bos_token": {
"content": "<|begin_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<|end_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<|end_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

410504
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

2065
tokenizer_config.json Normal file

File diff suppressed because it is too large Load Diff

8
train_results.json Normal file
View File

@@ -0,0 +1,8 @@
{
"epoch": 4.0,
"total_flos": 5360548577280.0,
"train_loss": 0.507965192819635,
"train_runtime": 5963.6843,
"train_samples_per_second": 2.048,
"train_steps_per_second": 0.032
}

80
trainer_log.jsonl Normal file
View File

@@ -0,0 +1,80 @@
{"current_steps": 1, "total_steps": 192, "loss": 0.9832, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 0.0, "epoch": 0.020833333333333332, "percentage": 0.52, "elapsed_time": "0:00:21", "remaining_time": "1:07:44"}
{"current_steps": 1, "total_steps": 192, "loss": null, "eval_loss": 0.9730262160301208, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.020833333333333332, "percentage": 0.52, "elapsed_time": "0:00:21", "remaining_time": "1:07:44"}
{"current_steps": 5, "total_steps": 192, "loss": 0.9463, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.5e-06, "epoch": 0.10416666666666667, "percentage": 2.6, "elapsed_time": "0:01:24", "remaining_time": "0:52:50"}
{"current_steps": 5, "total_steps": 192, "loss": null, "eval_loss": 0.9420890212059021, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.10416666666666667, "percentage": 2.6, "elapsed_time": "0:01:24", "remaining_time": "0:52:50"}
{"current_steps": 10, "total_steps": 192, "loss": 0.8488, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.25e-06, "epoch": 0.20833333333333334, "percentage": 5.21, "elapsed_time": "0:03:58", "remaining_time": "1:12:27"}
{"current_steps": 10, "total_steps": 192, "loss": null, "eval_loss": 0.8247124552726746, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.20833333333333334, "percentage": 5.21, "elapsed_time": "0:03:58", "remaining_time": "1:12:27"}
{"current_steps": 15, "total_steps": 192, "loss": 0.7833, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 9e-06, "epoch": 0.3125, "percentage": 7.81, "elapsed_time": "0:06:29", "remaining_time": "1:16:40"}
{"current_steps": 15, "total_steps": 192, "loss": null, "eval_loss": 0.8149010539054871, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.3125, "percentage": 7.81, "elapsed_time": "0:06:29", "remaining_time": "1:16:40"}
{"current_steps": 20, "total_steps": 192, "loss": 0.7797, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.275e-05, "epoch": 0.4166666666666667, "percentage": 10.42, "elapsed_time": "0:09:02", "remaining_time": "1:17:46"}
{"current_steps": 20, "total_steps": 192, "loss": null, "eval_loss": 0.8403318524360657, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.4166666666666667, "percentage": 10.42, "elapsed_time": "0:09:02", "remaining_time": "1:17:46"}
{"current_steps": 25, "total_steps": 192, "loss": 0.8542, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.3195176200175283e-05, "epoch": 0.5208333333333334, "percentage": 13.02, "elapsed_time": "0:11:32", "remaining_time": "1:17:02"}
{"current_steps": 25, "total_steps": 192, "loss": null, "eval_loss": 0.8670275807380676, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.5208333333333334, "percentage": 13.02, "elapsed_time": "0:11:32", "remaining_time": "1:17:02"}
{"current_steps": 30, "total_steps": 192, "loss": 0.8895, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 9.515676612044427e-06, "epoch": 0.625, "percentage": 15.62, "elapsed_time": "0:14:03", "remaining_time": "1:15:52"}
{"current_steps": 30, "total_steps": 192, "loss": null, "eval_loss": 0.8718018531799316, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.625, "percentage": 15.62, "elapsed_time": "0:14:03", "remaining_time": "1:15:52"}
{"current_steps": 35, "total_steps": 192, "loss": 0.8519, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 6.797580677308734e-06, "epoch": 0.7291666666666666, "percentage": 18.23, "elapsed_time": "0:16:33", "remaining_time": "1:14:14"}
{"current_steps": 35, "total_steps": 192, "loss": null, "eval_loss": 0.859227180480957, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.7291666666666666, "percentage": 18.23, "elapsed_time": "0:16:33", "remaining_time": "1:14:14"}
{"current_steps": 40, "total_steps": 192, "loss": 0.8224, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.808575415542887e-06, "epoch": 0.8333333333333334, "percentage": 20.83, "elapsed_time": "0:19:11", "remaining_time": "1:12:54"}
{"current_steps": 40, "total_steps": 192, "loss": null, "eval_loss": 0.8491263389587402, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.8333333333333334, "percentage": 20.83, "elapsed_time": "0:19:11", "remaining_time": "1:12:54"}
{"current_steps": 45, "total_steps": 192, "loss": 0.8538, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.3676619069852654e-06, "epoch": 0.9375, "percentage": 23.44, "elapsed_time": "0:21:49", "remaining_time": "1:11:18"}
{"current_steps": 45, "total_steps": 192, "loss": null, "eval_loss": 0.8384072780609131, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.9375, "percentage": 23.44, "elapsed_time": "0:21:49", "remaining_time": "1:11:18"}
{"current_steps": 50, "total_steps": 192, "loss": 0.6569, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.334947896124909e-06, "epoch": 1.0416666666666667, "percentage": 26.04, "elapsed_time": "0:24:27", "remaining_time": "1:09:26"}
{"current_steps": 50, "total_steps": 192, "loss": null, "eval_loss": 0.8294973373413086, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.0416666666666667, "percentage": 26.04, "elapsed_time": "0:24:27", "remaining_time": "1:09:26"}
{"current_steps": 55, "total_steps": 192, "loss": 0.437, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.603233215095547e-06, "epoch": 1.1458333333333333, "percentage": 28.65, "elapsed_time": "0:27:03", "remaining_time": "1:07:24"}
{"current_steps": 55, "total_steps": 192, "loss": null, "eval_loss": 0.8457258343696594, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.1458333333333333, "percentage": 28.65, "elapsed_time": "0:27:03", "remaining_time": "1:07:24"}
{"current_steps": 60, "total_steps": 192, "loss": 0.4405, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.0911174606561334e-06, "epoch": 1.25, "percentage": 31.25, "elapsed_time": "0:29:40", "remaining_time": "1:05:16"}
{"current_steps": 60, "total_steps": 192, "loss": null, "eval_loss": 0.8668487071990967, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.25, "percentage": 31.25, "elapsed_time": "0:29:40", "remaining_time": "1:05:16"}
{"current_steps": 65, "total_steps": 192, "loss": 0.4331, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 7.373930741131784e-07, "epoch": 1.3541666666666667, "percentage": 33.85, "elapsed_time": "0:32:16", "remaining_time": "1:03:03"}
{"current_steps": 65, "total_steps": 192, "loss": null, "eval_loss": 0.8670875430107117, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.3541666666666667, "percentage": 33.85, "elapsed_time": "0:32:16", "remaining_time": "1:03:03"}
{"current_steps": 70, "total_steps": 192, "loss": 0.448, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.374210410959207e-07, "epoch": 1.4583333333333333, "percentage": 36.46, "elapsed_time": "0:34:50", "remaining_time": "1:00:42"}
{"current_steps": 70, "total_steps": 192, "loss": null, "eval_loss": 0.8596971035003662, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.4583333333333333, "percentage": 36.46, "elapsed_time": "0:34:50", "remaining_time": "1:00:42"}
{"current_steps": 75, "total_steps": 192, "loss": 0.4673, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.6222476698215175e-07, "epoch": 1.5625, "percentage": 39.06, "elapsed_time": "0:37:23", "remaining_time": "0:58:20"}
{"current_steps": 75, "total_steps": 192, "loss": null, "eval_loss": 0.8513818383216858, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.5625, "percentage": 39.06, "elapsed_time": "0:37:23", "remaining_time": "0:58:20"}
{"current_steps": 80, "total_steps": 192, "loss": 0.4298, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.462755297384099e-07, "epoch": 1.6666666666666665, "percentage": 41.67, "elapsed_time": "0:40:00", "remaining_time": "0:56:00"}
{"current_steps": 80, "total_steps": 192, "loss": null, "eval_loss": 0.8474181294441223, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.6666666666666665, "percentage": 41.67, "elapsed_time": "0:40:00", "remaining_time": "0:56:00"}
{"current_steps": 85, "total_steps": 192, "loss": 0.4252, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.7088740175034947e-07, "epoch": 1.7708333333333335, "percentage": 44.27, "elapsed_time": "0:42:37", "remaining_time": "0:53:39"}
{"current_steps": 85, "total_steps": 192, "loss": null, "eval_loss": 0.8457570672035217, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.7708333333333335, "percentage": 44.27, "elapsed_time": "0:42:37", "remaining_time": "0:53:39"}
{"current_steps": 90, "total_steps": 192, "loss": 0.4429, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.228102956599465e-07, "epoch": 1.875, "percentage": 46.88, "elapsed_time": "0:45:11", "remaining_time": "0:51:13"}
{"current_steps": 90, "total_steps": 192, "loss": null, "eval_loss": 0.8451478481292725, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.875, "percentage": 46.88, "elapsed_time": "0:45:11", "remaining_time": "0:51:13"}
{"current_steps": 95, "total_steps": 192, "loss": 0.4484, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 9.279207916081227e-08, "epoch": 1.9791666666666665, "percentage": 49.48, "elapsed_time": "0:47:45", "remaining_time": "0:48:45"}
{"current_steps": 95, "total_steps": 192, "loss": null, "eval_loss": 0.8449902534484863, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.9791666666666665, "percentage": 49.48, "elapsed_time": "0:47:45", "remaining_time": "0:48:45"}
{"current_steps": 100, "total_steps": 192, "loss": 0.3634, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 7.448002404850094e-08, "epoch": 2.0833333333333335, "percentage": 52.08, "elapsed_time": "0:50:22", "remaining_time": "0:46:20"}
{"current_steps": 100, "total_steps": 192, "loss": null, "eval_loss": 0.8455283641815186, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.0833333333333335, "percentage": 52.08, "elapsed_time": "0:50:22", "remaining_time": "0:46:20"}
{"current_steps": 105, "total_steps": 192, "loss": 0.3876, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 6.35920070839697e-08, "epoch": 2.1875, "percentage": 54.69, "elapsed_time": "0:52:57", "remaining_time": "0:43:52"}
{"current_steps": 105, "total_steps": 192, "loss": null, "eval_loss": 0.8467428684234619, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.1875, "percentage": 54.69, "elapsed_time": "0:52:57", "remaining_time": "0:43:52"}
{"current_steps": 110, "total_steps": 192, "loss": 0.3717, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.7299804687499997e-08, "epoch": 2.2916666666666665, "percentage": 57.29, "elapsed_time": "0:55:32", "remaining_time": "0:41:24"}
{"current_steps": 110, "total_steps": 192, "loss": null, "eval_loss": 0.8481121063232422, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.2916666666666665, "percentage": 57.29, "elapsed_time": "0:55:32", "remaining_time": "0:41:24"}
{"current_steps": 115, "total_steps": 192, "loss": 0.387, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.37771434967624e-08, "epoch": 2.3958333333333335, "percentage": 59.9, "elapsed_time": "0:58:07", "remaining_time": "0:38:55"}
{"current_steps": 115, "total_steps": 192, "loss": null, "eval_loss": 0.8493936061859131, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.3958333333333335, "percentage": 59.9, "elapsed_time": "0:58:07", "remaining_time": "0:38:55"}
{"current_steps": 120, "total_steps": 192, "loss": 0.3561, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.187403540619925e-08, "epoch": 2.5, "percentage": 62.5, "elapsed_time": "1:00:43", "remaining_time": "0:36:26"}
{"current_steps": 120, "total_steps": 192, "loss": null, "eval_loss": 0.85052889585495, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.5, "percentage": 62.5, "elapsed_time": "1:00:43", "remaining_time": "0:36:26"}
{"current_steps": 125, "total_steps": 192, "loss": 0.4219, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.088648238966908e-08, "epoch": 2.6041666666666665, "percentage": 65.1, "elapsed_time": "1:03:17", "remaining_time": "0:33:55"}
{"current_steps": 125, "total_steps": 192, "loss": null, "eval_loss": 0.8516257405281067, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.6041666666666665, "percentage": 65.1, "elapsed_time": "1:03:17", "remaining_time": "0:33:55"}
{"current_steps": 130, "total_steps": 192, "loss": 0.3798, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.039701925276604e-08, "epoch": 2.7083333333333335, "percentage": 67.71, "elapsed_time": "1:05:52", "remaining_time": "0:31:25"}
{"current_steps": 130, "total_steps": 192, "loss": null, "eval_loss": 0.8526514172554016, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.7083333333333335, "percentage": 67.71, "elapsed_time": "1:05:52", "remaining_time": "0:31:25"}
{"current_steps": 135, "total_steps": 192, "loss": 0.3551, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0166900048082497e-08, "epoch": 2.8125, "percentage": 70.31, "elapsed_time": "1:08:27", "remaining_time": "0:28:54"}
{"current_steps": 135, "total_steps": 192, "loss": null, "eval_loss": 0.8536917567253113, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.8125, "percentage": 70.31, "elapsed_time": "1:08:27", "remaining_time": "0:28:54"}
{"current_steps": 140, "total_steps": 192, "loss": 0.3827, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0065147322870076e-08, "epoch": 2.9166666666666665, "percentage": 72.92, "elapsed_time": "1:11:01", "remaining_time": "0:26:22"}
{"current_steps": 140, "total_steps": 192, "loss": null, "eval_loss": 0.8546140193939209, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.9166666666666665, "percentage": 72.92, "elapsed_time": "1:11:01", "remaining_time": "0:26:22"}
{"current_steps": 145, "total_steps": 192, "loss": 0.3938, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.002328628528332e-08, "epoch": 3.0208333333333335, "percentage": 75.52, "elapsed_time": "1:13:36", "remaining_time": "0:23:51"}
{"current_steps": 145, "total_steps": 192, "loss": null, "eval_loss": 0.8555943369865417, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.0208333333333335, "percentage": 75.52, "elapsed_time": "1:13:36", "remaining_time": "0:23:51"}
{"current_steps": 150, "total_steps": 192, "loss": 0.3805, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0007484528133236e-08, "epoch": 3.125, "percentage": 78.12, "elapsed_time": "1:16:13", "remaining_time": "0:21:20"}
{"current_steps": 150, "total_steps": 192, "loss": null, "eval_loss": 0.8565306663513184, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.125, "percentage": 78.12, "elapsed_time": "1:16:13", "remaining_time": "0:21:20"}
{"current_steps": 155, "total_steps": 192, "loss": 0.3813, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0002110817570477e-08, "epoch": 3.2291666666666665, "percentage": 80.73, "elapsed_time": "1:18:47", "remaining_time": "0:18:48"}
{"current_steps": 155, "total_steps": 192, "loss": null, "eval_loss": 0.8574034571647644, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.2291666666666665, "percentage": 80.73, "elapsed_time": "1:18:47", "remaining_time": "0:18:48"}
{"current_steps": 160, "total_steps": 192, "loss": 0.3894, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0000504842356326e-08, "epoch": 3.3333333333333335, "percentage": 83.33, "elapsed_time": "1:21:22", "remaining_time": "0:16:16"}
{"current_steps": 160, "total_steps": 192, "loss": null, "eval_loss": 0.8581907153129578, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.3333333333333335, "percentage": 83.33, "elapsed_time": "1:21:22", "remaining_time": "0:16:16"}
{"current_steps": 165, "total_steps": 192, "loss": 0.3603, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.000009745562451e-08, "epoch": 3.4375, "percentage": 85.94, "elapsed_time": "1:23:56", "remaining_time": "0:13:44"}
{"current_steps": 165, "total_steps": 192, "loss": null, "eval_loss": 0.8588598370552063, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.4375, "percentage": 85.94, "elapsed_time": "1:23:56", "remaining_time": "0:13:44"}
{"current_steps": 170, "total_steps": 192, "loss": 0.3515, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0000014077810156e-08, "epoch": 3.5416666666666665, "percentage": 88.54, "elapsed_time": "1:26:31", "remaining_time": "0:11:11"}
{"current_steps": 170, "total_steps": 192, "loss": null, "eval_loss": 0.8596634864807129, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.5416666666666665, "percentage": 88.54, "elapsed_time": "1:26:31", "remaining_time": "0:11:11"}
{"current_steps": 175, "total_steps": 192, "loss": 0.3433, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0000001343508807e-08, "epoch": 3.6458333333333335, "percentage": 91.15, "elapsed_time": "1:29:05", "remaining_time": "0:08:39"}
{"current_steps": 175, "total_steps": 192, "loss": null, "eval_loss": 0.8604967594146729, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.6458333333333335, "percentage": 91.15, "elapsed_time": "1:29:05", "remaining_time": "0:08:39"}
{"current_steps": 180, "total_steps": 192, "loss": 0.3511, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.000000006747581e-08, "epoch": 3.75, "percentage": 93.75, "elapsed_time": "1:31:40", "remaining_time": "0:06:06"}
{"current_steps": 180, "total_steps": 192, "loss": null, "eval_loss": 0.861361026763916, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.75, "percentage": 93.75, "elapsed_time": "1:31:40", "remaining_time": "0:06:06"}
{"current_steps": 185, "total_steps": 192, "loss": 0.3599, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0000000001094325e-08, "epoch": 3.8541666666666665, "percentage": 96.35, "elapsed_time": "1:34:14", "remaining_time": "0:03:33"}
{"current_steps": 185, "total_steps": 192, "loss": null, "eval_loss": 0.8619682192802429, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.8541666666666665, "percentage": 96.35, "elapsed_time": "1:34:14", "remaining_time": "0:03:33"}
{"current_steps": 190, "total_steps": 192, "loss": 0.3994, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.000000000000139e-08, "epoch": 3.9583333333333335, "percentage": 98.96, "elapsed_time": "1:36:50", "remaining_time": "0:01:01"}
{"current_steps": 190, "total_steps": 192, "loss": null, "eval_loss": 0.8621244430541992, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.9583333333333335, "percentage": 98.96, "elapsed_time": "1:36:50", "remaining_time": "0:01:01"}
{"current_steps": 192, "total_steps": 192, "loss": null, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 4.0, "percentage": 100.0, "elapsed_time": "1:38:45", "remaining_time": "0:00:00"}
{"current_steps": 3, "total_steps": 3, "loss": null, "eval_loss": 0.8149010539054871, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 4.0, "percentage": 100.0, "elapsed_time": "1:39:52", "remaining_time": "0:00:00"}

615
trainer_state.json Normal file
View File

@@ -0,0 +1,615 @@
{
"best_metric": 0.8149010539054871,
"best_model_checkpoint": "./output/training_results/C014_llama3-8b-base_instruct_20240428_005832/checkpoint-15",
"epoch": 4.0,
"eval_steps": 5,
"global_step": 192,
"is_hyper_param_search": false,
"is_local_process_zero": true,
"is_world_process_zero": true,
"log_history": [
{
"epoch": 0.020833333333333332,
"grad_norm": 0.0,
"learning_rate": 0.0,
"loss": 0.9832,
"step": 1
},
{
"epoch": 0.020833333333333332,
"eval_loss": 0.9730262160301208,
"eval_runtime": 2.116,
"eval_samples_per_second": 160.683,
"eval_steps_per_second": 1.418,
"step": 1
},
{
"epoch": 0.10416666666666667,
"grad_norm": 16.41232960271884,
"learning_rate": 1.5e-06,
"loss": 0.9463,
"step": 5
},
{
"epoch": 0.10416666666666667,
"eval_loss": 0.9420890212059021,
"eval_runtime": 2.0786,
"eval_samples_per_second": 163.572,
"eval_steps_per_second": 1.443,
"step": 5
},
{
"epoch": 0.20833333333333334,
"grad_norm": 6.185215383740627,
"learning_rate": 5.25e-06,
"loss": 0.8488,
"step": 10
},
{
"epoch": 0.20833333333333334,
"eval_loss": 0.8247124552726746,
"eval_runtime": 2.0863,
"eval_samples_per_second": 162.965,
"eval_steps_per_second": 1.438,
"step": 10
},
{
"epoch": 0.3125,
"grad_norm": 4.677780246708798,
"learning_rate": 9e-06,
"loss": 0.7833,
"step": 15
},
{
"epoch": 0.3125,
"eval_loss": 0.8149010539054871,
"eval_runtime": 2.0708,
"eval_samples_per_second": 164.186,
"eval_steps_per_second": 1.449,
"step": 15
},
{
"epoch": 0.4166666666666667,
"grad_norm": 4.282490348738236,
"learning_rate": 1.275e-05,
"loss": 0.7797,
"step": 20
},
{
"epoch": 0.4166666666666667,
"eval_loss": 0.8403318524360657,
"eval_runtime": 2.076,
"eval_samples_per_second": 163.776,
"eval_steps_per_second": 1.445,
"step": 20
},
{
"epoch": 0.5208333333333334,
"grad_norm": 4.312240371628775,
"learning_rate": 1.3195176200175283e-05,
"loss": 0.8542,
"step": 25
},
{
"epoch": 0.5208333333333334,
"eval_loss": 0.8670275807380676,
"eval_runtime": 2.0781,
"eval_samples_per_second": 163.608,
"eval_steps_per_second": 1.444,
"step": 25
},
{
"epoch": 0.625,
"grad_norm": 4.2373297823136244,
"learning_rate": 9.515676612044427e-06,
"loss": 0.8895,
"step": 30
},
{
"epoch": 0.625,
"eval_loss": 0.8718018531799316,
"eval_runtime": 2.0707,
"eval_samples_per_second": 164.196,
"eval_steps_per_second": 1.449,
"step": 30
},
{
"epoch": 0.7291666666666666,
"grad_norm": 4.44083784051028,
"learning_rate": 6.797580677308734e-06,
"loss": 0.8519,
"step": 35
},
{
"epoch": 0.7291666666666666,
"eval_loss": 0.859227180480957,
"eval_runtime": 2.0671,
"eval_samples_per_second": 164.485,
"eval_steps_per_second": 1.451,
"step": 35
},
{
"epoch": 0.8333333333333334,
"grad_norm": 4.131620700380954,
"learning_rate": 4.808575415542887e-06,
"loss": 0.8224,
"step": 40
},
{
"epoch": 0.8333333333333334,
"eval_loss": 0.8491263389587402,
"eval_runtime": 2.0743,
"eval_samples_per_second": 163.912,
"eval_steps_per_second": 1.446,
"step": 40
},
{
"epoch": 0.9375,
"grad_norm": 4.319858409892453,
"learning_rate": 3.3676619069852654e-06,
"loss": 0.8538,
"step": 45
},
{
"epoch": 0.9375,
"eval_loss": 0.8384072780609131,
"eval_runtime": 2.0776,
"eval_samples_per_second": 163.653,
"eval_steps_per_second": 1.444,
"step": 45
},
{
"epoch": 1.0416666666666667,
"grad_norm": 3.80418376995363,
"learning_rate": 2.334947896124909e-06,
"loss": 0.6569,
"step": 50
},
{
"epoch": 1.0416666666666667,
"eval_loss": 0.8294973373413086,
"eval_runtime": 2.0689,
"eval_samples_per_second": 164.335,
"eval_steps_per_second": 1.45,
"step": 50
},
{
"epoch": 1.1458333333333333,
"grad_norm": 3.3894455787756694,
"learning_rate": 1.603233215095547e-06,
"loss": 0.437,
"step": 55
},
{
"epoch": 1.1458333333333333,
"eval_loss": 0.8457258343696594,
"eval_runtime": 2.0842,
"eval_samples_per_second": 163.13,
"eval_steps_per_second": 1.439,
"step": 55
},
{
"epoch": 1.25,
"grad_norm": 3.633089838413966,
"learning_rate": 1.0911174606561334e-06,
"loss": 0.4405,
"step": 60
},
{
"epoch": 1.25,
"eval_loss": 0.8668487071990967,
"eval_runtime": 2.0758,
"eval_samples_per_second": 163.796,
"eval_steps_per_second": 1.445,
"step": 60
},
{
"epoch": 1.3541666666666667,
"grad_norm": 4.225857095854057,
"learning_rate": 7.373930741131784e-07,
"loss": 0.4331,
"step": 65
},
{
"epoch": 1.3541666666666667,
"eval_loss": 0.8670875430107117,
"eval_runtime": 2.0785,
"eval_samples_per_second": 163.58,
"eval_steps_per_second": 1.443,
"step": 65
},
{
"epoch": 1.4583333333333333,
"grad_norm": 3.838684962723822,
"learning_rate": 5.374210410959207e-07,
"loss": 0.448,
"step": 70
},
{
"epoch": 1.4583333333333333,
"eval_loss": 0.8596971035003662,
"eval_runtime": 2.0789,
"eval_samples_per_second": 163.548,
"eval_steps_per_second": 1.443,
"step": 70
},
{
"epoch": 1.5625,
"grad_norm": 3.8823844934735114,
"learning_rate": 3.6222476698215175e-07,
"loss": 0.4673,
"step": 75
},
{
"epoch": 1.5625,
"eval_loss": 0.8513818383216858,
"eval_runtime": 2.0778,
"eval_samples_per_second": 163.638,
"eval_steps_per_second": 1.444,
"step": 75
},
{
"epoch": 1.6666666666666665,
"grad_norm": 3.2398255350683103,
"learning_rate": 2.462755297384099e-07,
"loss": 0.4298,
"step": 80
},
{
"epoch": 1.6666666666666665,
"eval_loss": 0.8474181294441223,
"eval_runtime": 2.0907,
"eval_samples_per_second": 162.623,
"eval_steps_per_second": 1.435,
"step": 80
},
{
"epoch": 1.7708333333333335,
"grad_norm": 3.153318195454539,
"learning_rate": 1.7088740175034947e-07,
"loss": 0.4252,
"step": 85
},
{
"epoch": 1.7708333333333335,
"eval_loss": 0.8457570672035217,
"eval_runtime": 2.0841,
"eval_samples_per_second": 163.139,
"eval_steps_per_second": 1.439,
"step": 85
},
{
"epoch": 1.875,
"grad_norm": 3.9154872471233073,
"learning_rate": 1.228102956599465e-07,
"loss": 0.4429,
"step": 90
},
{
"epoch": 1.875,
"eval_loss": 0.8451478481292725,
"eval_runtime": 2.0694,
"eval_samples_per_second": 164.3,
"eval_steps_per_second": 1.45,
"step": 90
},
{
"epoch": 1.9791666666666665,
"grad_norm": 4.304265882610879,
"learning_rate": 9.279207916081227e-08,
"loss": 0.4484,
"step": 95
},
{
"epoch": 1.9791666666666665,
"eval_loss": 0.8449902534484863,
"eval_runtime": 2.0701,
"eval_samples_per_second": 164.241,
"eval_steps_per_second": 1.449,
"step": 95
},
{
"epoch": 2.0833333333333335,
"grad_norm": 3.2728230120401633,
"learning_rate": 7.448002404850094e-08,
"loss": 0.3634,
"step": 100
},
{
"epoch": 2.0833333333333335,
"eval_loss": 0.8455283641815186,
"eval_runtime": 2.0713,
"eval_samples_per_second": 164.145,
"eval_steps_per_second": 1.448,
"step": 100
},
{
"epoch": 2.1875,
"grad_norm": 3.660020107151519,
"learning_rate": 6.35920070839697e-08,
"loss": 0.3876,
"step": 105
},
{
"epoch": 2.1875,
"eval_loss": 0.8467428684234619,
"eval_runtime": 2.0936,
"eval_samples_per_second": 162.401,
"eval_steps_per_second": 1.433,
"step": 105
},
{
"epoch": 2.2916666666666665,
"grad_norm": 3.8970751627622926,
"learning_rate": 5.7299804687499997e-08,
"loss": 0.3717,
"step": 110
},
{
"epoch": 2.2916666666666665,
"eval_loss": 0.8481121063232422,
"eval_runtime": 2.0678,
"eval_samples_per_second": 164.429,
"eval_steps_per_second": 1.451,
"step": 110
},
{
"epoch": 2.3958333333333335,
"grad_norm": 3.386652715934595,
"learning_rate": 5.37771434967624e-08,
"loss": 0.387,
"step": 115
},
{
"epoch": 2.3958333333333335,
"eval_loss": 0.8493936061859131,
"eval_runtime": 2.1051,
"eval_samples_per_second": 161.51,
"eval_steps_per_second": 1.425,
"step": 115
},
{
"epoch": 2.5,
"grad_norm": 3.4200942052169547,
"learning_rate": 5.187403540619925e-08,
"loss": 0.3561,
"step": 120
},
{
"epoch": 2.5,
"eval_loss": 0.85052889585495,
"eval_runtime": 2.0652,
"eval_samples_per_second": 164.632,
"eval_steps_per_second": 1.453,
"step": 120
},
{
"epoch": 2.6041666666666665,
"grad_norm": 3.268980501701993,
"learning_rate": 5.088648238966908e-08,
"loss": 0.4219,
"step": 125
},
{
"epoch": 2.6041666666666665,
"eval_loss": 0.8516257405281067,
"eval_runtime": 2.1146,
"eval_samples_per_second": 160.788,
"eval_steps_per_second": 1.419,
"step": 125
},
{
"epoch": 2.7083333333333335,
"grad_norm": 3.4285942542360806,
"learning_rate": 5.039701925276604e-08,
"loss": 0.3798,
"step": 130
},
{
"epoch": 2.7083333333333335,
"eval_loss": 0.8526514172554016,
"eval_runtime": 2.1018,
"eval_samples_per_second": 161.768,
"eval_steps_per_second": 1.427,
"step": 130
},
{
"epoch": 2.8125,
"grad_norm": 3.438575160339058,
"learning_rate": 5.0166900048082497e-08,
"loss": 0.3551,
"step": 135
},
{
"epoch": 2.8125,
"eval_loss": 0.8536917567253113,
"eval_runtime": 2.1025,
"eval_samples_per_second": 161.713,
"eval_steps_per_second": 1.427,
"step": 135
},
{
"epoch": 2.9166666666666665,
"grad_norm": 3.1199400563472683,
"learning_rate": 5.0065147322870076e-08,
"loss": 0.3827,
"step": 140
},
{
"epoch": 2.9166666666666665,
"eval_loss": 0.8546140193939209,
"eval_runtime": 2.0898,
"eval_samples_per_second": 162.691,
"eval_steps_per_second": 1.436,
"step": 140
},
{
"epoch": 3.0208333333333335,
"grad_norm": 3.1711921144705,
"learning_rate": 5.002328628528332e-08,
"loss": 0.3938,
"step": 145
},
{
"epoch": 3.0208333333333335,
"eval_loss": 0.8555943369865417,
"eval_runtime": 2.0827,
"eval_samples_per_second": 163.25,
"eval_steps_per_second": 1.44,
"step": 145
},
{
"epoch": 3.125,
"grad_norm": 3.133782976096458,
"learning_rate": 5.0007484528133236e-08,
"loss": 0.3805,
"step": 150
},
{
"epoch": 3.125,
"eval_loss": 0.8565306663513184,
"eval_runtime": 2.1024,
"eval_samples_per_second": 161.723,
"eval_steps_per_second": 1.427,
"step": 150
},
{
"epoch": 3.2291666666666665,
"grad_norm": 3.7319435280210085,
"learning_rate": 5.0002110817570477e-08,
"loss": 0.3813,
"step": 155
},
{
"epoch": 3.2291666666666665,
"eval_loss": 0.8574034571647644,
"eval_runtime": 2.0911,
"eval_samples_per_second": 162.593,
"eval_steps_per_second": 1.435,
"step": 155
},
{
"epoch": 3.3333333333333335,
"grad_norm": 3.5844045117334833,
"learning_rate": 5.0000504842356326e-08,
"loss": 0.3894,
"step": 160
},
{
"epoch": 3.3333333333333335,
"eval_loss": 0.8581907153129578,
"eval_runtime": 2.0963,
"eval_samples_per_second": 162.194,
"eval_steps_per_second": 1.431,
"step": 160
},
{
"epoch": 3.4375,
"grad_norm": 3.2964992218641544,
"learning_rate": 5.000009745562451e-08,
"loss": 0.3603,
"step": 165
},
{
"epoch": 3.4375,
"eval_loss": 0.8588598370552063,
"eval_runtime": 2.0794,
"eval_samples_per_second": 163.512,
"eval_steps_per_second": 1.443,
"step": 165
},
{
"epoch": 3.5416666666666665,
"grad_norm": 3.307148767623163,
"learning_rate": 5.0000014077810156e-08,
"loss": 0.3515,
"step": 170
},
{
"epoch": 3.5416666666666665,
"eval_loss": 0.8596634864807129,
"eval_runtime": 2.0755,
"eval_samples_per_second": 163.816,
"eval_steps_per_second": 1.445,
"step": 170
},
{
"epoch": 3.6458333333333335,
"grad_norm": 3.3334351206179402,
"learning_rate": 5.0000001343508807e-08,
"loss": 0.3433,
"step": 175
},
{
"epoch": 3.6458333333333335,
"eval_loss": 0.8604967594146729,
"eval_runtime": 2.0699,
"eval_samples_per_second": 164.261,
"eval_steps_per_second": 1.449,
"step": 175
},
{
"epoch": 3.75,
"grad_norm": 3.196293836404165,
"learning_rate": 5.000000006747581e-08,
"loss": 0.3511,
"step": 180
},
{
"epoch": 3.75,
"eval_loss": 0.861361026763916,
"eval_runtime": 2.0796,
"eval_samples_per_second": 163.491,
"eval_steps_per_second": 1.443,
"step": 180
},
{
"epoch": 3.8541666666666665,
"grad_norm": 3.472738636185267,
"learning_rate": 5.0000000001094325e-08,
"loss": 0.3599,
"step": 185
},
{
"epoch": 3.8541666666666665,
"eval_loss": 0.8619682192802429,
"eval_runtime": 2.0705,
"eval_samples_per_second": 164.215,
"eval_steps_per_second": 1.449,
"step": 185
},
{
"epoch": 3.9583333333333335,
"grad_norm": 3.6408101963860187,
"learning_rate": 5.000000000000139e-08,
"loss": 0.3994,
"step": 190
},
{
"epoch": 3.9583333333333335,
"eval_loss": 0.8621244430541992,
"eval_runtime": 2.0725,
"eval_samples_per_second": 164.052,
"eval_steps_per_second": 1.448,
"step": 190
},
{
"epoch": 4.0,
"step": 192,
"total_flos": 5360548577280.0,
"train_loss": 0.507965192819635,
"train_runtime": 5963.6843,
"train_samples_per_second": 2.048,
"train_steps_per_second": 0.032
}
],
"logging_steps": 5,
"max_steps": 192,
"num_input_tokens_seen": 0,
"num_train_epochs": 4,
"save_steps": 5,
"total_flos": 5360548577280.0,
"train_batch_size": 8,
"trial_name": null,
"trial_params": null
}

3
training_args.bin Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b6f5cc57e4de6a6111af868bf438bb519b737c60668638ce3523812799fd88b3
size 6968

BIN
training_eval_loss.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 42 KiB

BIN
training_loss.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 40 KiB