初始化项目，由ModelHub XC社区提供模型

Model: PKU-Alignment/ProgressGym-HistLlama3-8B-C015-instruct-v0.2 Source: Original Platform
2026-05-25 16:30:16 +08:00
commit 483b1ab7d6
25 changed files with 413941 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,39 @@
+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+model-00001-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
+model-00002-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
+model-00003-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
+model-00004-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,249 @@
+---
+license: cc-by-4.0
+tags:
+- alignment
+- value alignment
+- AI safety
+- safety
+- LLM
+- history
+datasets:
+- PKU-Alignment/ProgressGym-HistText
+- PKU-Alignment/ProgressGym-TimelessQA
+base_model:
+- PKU-Alignment/ProgressGym-HistLlama3-8B-C015-pretrain
+- meta-llama/Meta-Llama-3-8B
+---
+
+# ProgressGym-HistLlama3-8B-C015-instruct
+
+## Overview
+
+#### The ProgressGym Framework
+
+![Framework Diagram](./readme-assets/main-diagram.png)
+
+**ProgressGym-HistLlama3-8B-C015-instruct** is part of the **ProgressGym** framework for research and experimentation on *progress alignment* - the emulation of moral progress in AI alignment algorithms, as a measure to prevent risks of societal value lock-in. 
+
+To quote the paper [*ProgressGym: Alignment with a Millennium of Moral Progress*](https://arxiv.org/abs/2406.20087):
+
+> Frontier AI systems, including large language models (LLMs), hold increasing influence over the epistemology of human users. Such influence can reinforce prevailing societal values, potentially contributing to the lock-in of misguided moral beliefs and, consequently, the perpetuation of problematic moral practices on a broad scale. 
+>
+> We introduce *progress alignment* as a technical solution to mitigate this imminent risk. Progress alignment algorithms learn to emulate the mechanics of human moral progress, thereby addressing the susceptibility of existing alignment methods to contemporary moral blindspots.
+
+#### ProgressGym-HistLlama3-8B-C015-instruct
+
+ProgressGym-HistLlama3-8B-C015-instruct is one of the **36 historical language models** in the ProgressGym framework. 
+
+**ProgressGym-HistLlama3-8B-C015-instruct is under continual iteration.** Improving upon the current version, new versions of the model are currently being trained to reflect historical moral tendencies in ever more comprehensive ways.
+
+**ProgressGym-HistLlama3-8B-C015-instruct is a 15th-century historical language model.** Based on [Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B), It is continued-pretrained on the 15th-century text data from [ProgressGym-HistText](https://huggingface.co/datasets/PKU-Alignment/ProgressGym-HistText), using the following hyperparameters:
+
+- learning_rate: 1.5e-05
+- train_batch_size: 8
+- eval_batch_size: 16
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 8
+- total_train_batch_size: 64
+- total_eval_batch_size: 128
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: polynomial
+- lr_scheduler_warmup_steps: 20
+- num_epochs: 3.02
+- mixed_precision_training: Native AMP
+
+... with the following training results:
+
+| Training Loss |  Epoch   | Step | Validation Loss |
+|:-------------:|:--------:|:----:|:---------------:|
+| 2.6141        | 0.006494 | 1    | 2.6354          |
+| 2.657         | 0.032468 | 5    | 2.6206          |
+| 2.6337        | 0.064935 | 10   | 2.5846          |
+| 2.5268        | 0.097403 | 15   | 2.5516          |
+| 2.5275        | 0.129870 | 20   | 2.5321          |
+| 2.5005        | 0.162338 | 25   | 2.5131          |
+| 2.5339        | 0.194805 | 30   | 2.4961          |
+| 2.5335        | 0.227273 | 35   | 2.4808          |
+| 2.4252        | 0.259740 | 40   | 2.4643          |
+| 2.4445        | 0.292208 | 45   | 2.4518          |
+| 2.4594        | 0.324675 | 50   | 2.4394          |
+| 2.4498        | 0.357143 | 55   | 2.4287          |
+| 2.3821        | 0.389610 | 60   | 2.4184          |
+| 2.4317        | 0.422078 | 65   | 2.4091          |
+| 2.3931        | 0.454545 | 70   | 2.4001          |
+| 2.3695        | 0.487013 | 75   | 2.3934          |
+| 2.3981        | 0.519481 | 80   | 2.3855          |
+| 2.3952        | 0.551948 | 85   | 2.3789          |
+| 2.4137        | 0.584416 | 90   | 2.3721          |
+| 2.3614        | 0.616883 | 95   | 2.3669          |
+| 2.3467        | 0.649351 | 100  | 2.3612          |
+| 2.4012        | 0.681818 | 105  | 2.3569          |
+| 2.3224        | 0.714286 | 110  | 2.3528          |
+| 2.3348        | 0.746753 | 115  | 2.3483          |
+| 2.3573        | 0.779221 | 120  | 2.3448          |
+| 2.306         | 0.811688 | 125  | 2.3412          |
+| 2.342         | 0.844156 | 130  | 2.3382          |
+| 2.3045        | 0.876623 | 135  | 2.3356          |
+| 2.2959        | 0.909091 | 140  | 2.3330          |
+| 2.3545        | 0.941558 | 145  | 2.3305          |
+| 2.3446        | 0.974026 | 150  | 2.3285          |
+| 2.2502        | 1.006494 | 155  | 2.3268          |
+| 2.0791        | 1.038961 | 160  | 2.3347          |
+| 2.1034        | 1.071429 | 165  | 2.3399          |
+| 2.095         | 1.103896 | 170  | 2.3358          |
+| 2.0627        | 1.136364 | 175  | 2.3346          |
+| 2.0408        | 1.168831 | 180  | 2.3357          |
+| 2.0575        | 1.201299 | 185  | 2.3364          |
+| 2.0976        | 1.233766 | 190  | 2.3349          |
+| 2.0668        | 1.266234 | 195  | 2.3336          |
+| 2.0579        | 1.298701 | 200  | 2.3329          |
+| 2.0756        | 1.331169 | 205  | 2.3326          |
+| 2.1174        | 1.363636 | 210  | 2.3325          |
+| 2.0663        | 1.396104 | 215  | 2.3325          |
+| 2.0941        | 1.428571 | 220  | 2.3324          |
+| 2.1074        | 1.461039 | 225  | 2.3324          |
+| 2.1251        | 1.493506 | 230  | 2.3322          |
+| 2.0629        | 1.525974 | 235  | 2.3318          |
+| 2.0872        | 1.558442 | 240  | 2.3312          |
+| 2.0994        | 1.590909 | 245  | 2.3310          |
+| 2.0879        | 1.623377 | 250  | 2.3308          |
+| 2.0623        | 1.655844 | 255  | 2.3305          |
+| 2.1054        | 1.688312 | 260  | 2.3303          |
+| 2.0736        | 1.720779 | 265  | 2.3301          |
+| 2.1146        | 1.753247 | 270  | 2.3300          |
+| 2.0444        | 1.785714 | 275  | 2.3301          |
+| 2.0541        | 1.818182 | 280  | 2.3301          |
+| 2.1333        | 1.850649 | 285  | 2.3300          |
+| 2.1101        | 1.883117 | 290  | 2.3299          |
+| 2.0234        | 1.915584 | 295  | 2.3298          |
+| 2.0671        | 1.948052 | 300  | 2.3298          |
+| 2.083         | 1.980519 | 305  | 2.3298          |
+| 2.0417        | 2.012987 | 310  | 2.3299          |
+| 2.0784        | 2.045455 | 315  | 2.3303          |
+| 2.058         | 2.077922 | 320  | 2.3308          |
+| 2.0524        | 2.110390 | 325  | 2.3312          |
+| 2.0318        | 2.142857 | 330  | 2.3316          |
+| 2.0914        | 2.175325 | 335  | 2.3318          |
+| 2.0319        | 2.207792 | 340  | 2.3320          |
+| 2.0099        | 2.240260 | 345  | 2.3322          |
+| 2.075         | 2.272727 | 350  | 2.3323          |
+| 2.0444        | 2.305195 | 355  | 2.3324          |
+| 2.0428        | 2.337662 | 360  | 2.3325          |
+| 2.0612        | 2.370130 | 365  | 2.3326          |
+| 2.1078        | 2.402597 | 370  | 2.3327          |
+| 2.0643        | 2.435065 | 375  | 2.3327          |
+| 2.0667        | 2.467532 | 380  | 2.3326          |
+| 2.0285        | 2.500000 | 385  | 2.3324          |
+| 2.0571        | 2.532468 | 390  | 2.3322          |
+| 2.0209        | 2.564935 | 395  | 2.3322          |
+| 2.0537        | 2.597403 | 400  | 2.3323          |
+| 2.0138        | 2.629870 | 405  | 2.3324          |
+| 2.0772        | 2.662338 | 410  | 2.3324          |
+| 2.039         | 2.694805 | 415  | 2.3323          |
+| 2.0181        | 2.727273 | 420  | 2.3322          |
+| 2.0484        | 2.759740 | 425  | 2.3320          |
+| 2.0224        | 2.792208 | 430  | 2.3320          |
+| 2.0732        | 2.824675 | 435  | 2.3320          |
+| 2.0499        | 2.857143 | 440  | 2.3321          |
+| 2.0498        | 2.889610 | 445  | 2.3321          |
+| 2.0472        | 2.922078 | 450  | 2.3320          |
+| 2.1327        | 2.954545 | 455  | 2.3319          |
+| 2.0642        | 2.987013 | 460  | 2.3319          |
+| 2.0654        | 3.019481 | 465  | -               |
+
+Note that the training data volume for the continued pretraining stage is capped at 3GB. When the corresponding century's corpus exceeds this volume, the training data is randomly sampled to fit the volume.
+
+**ProgressGym-HistLlama3-8B-C015-instruct is an instruction-tuned language model.** It is tuned on [ProgressGym-TimelessQA](https://huggingface.co/datasets/PKU-Alignment/ProgressGym-TimelessQA), using the following hyperparameters. Note, however, that the snapshot at training step 10 is used for the final model, to minimize erosion of the value tendencies learned during continued pretraining; we qualitatively observe that this snapshot still possesses strong instruction-following capabilities.
+- learning_rate: 1.5e-05
+- train_batch_size: 8
+- eval_batch_size: 16
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 8
+- total_train_batch_size: 64
+- total_eval_batch_size: 128
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: polynomial
+- lr_scheduler_warmup_steps: 20
+- num_epochs: 4.0
+- mixed_precision_training: Native AMP
+
+... with the following training results:
+
+| Training Loss | Epoch  | Step | Validation Loss |
+|:-------------:|:------:|:----:|:---------------:|
+| 0.8675        | 0.1042 | 5    | 0.8585          |
+| 0.8415        | 0.2083 | 10   | 0.8063          |
+| 0.8225        | 0.3125 | 15   | 0.8210          |
+| 0.806         | 0.4167 | 20   | 0.8412          |
+| 0.8139        | 0.5208 | 25   | 0.8702          |
+| 0.8978        | 0.625  | 30   | 0.8631          |
+| 0.814         | 0.7292 | 35   | 0.8550          |
+| 0.7989        | 0.8333 | 40   | 0.8473          |
+| 0.8769        | 0.9375 | 45   | 0.8383          |
+| 0.7244        | 1.0417 | 50   | 0.8278          |
+| 0.4644        | 1.1458 | 55   | 0.8387          |
+| 0.4488        | 1.25   | 60   | 0.8680          |
+| 0.3973        | 1.3542 | 65   | 0.8718          |
+| 0.443         | 1.4583 | 70   | 0.8596          |
+| 0.4346        | 1.5625 | 75   | 0.8514          |
+| 0.4701        | 1.6667 | 80   | 0.8461          |
+| 0.4344        | 1.7708 | 85   | 0.8437          |
+| 0.4274        | 1.875  | 90   | 0.8434          |
+| 0.4771        | 1.9792 | 95   | 0.8434          |
+| 0.3876        | 2.0833 | 100  | 0.8439          |
+| 0.3698        | 2.1875 | 105  | 0.8451          |
+| 0.407         | 2.2917 | 110  | 0.8465          |
+| 0.374         | 2.3958 | 115  | 0.8482          |
+| 0.3945        | 2.5    | 120  | 0.8498          |
+| 0.3753        | 2.6042 | 125  | 0.8513          |
+| 0.3721        | 2.7083 | 130  | 0.8528          |
+| 0.3718        | 2.8125 | 135  | 0.8542          |
+| 0.3773        | 2.9167 | 140  | 0.8555          |
+| 0.3723        | 3.0208 | 145  | 0.8565          |
+| 0.374         | 3.125  | 150  | 0.8576          |
+| 0.3728        | 3.2292 | 155  | 0.8588          |
+| 0.3686        | 3.3333 | 160  | 0.8598          |
+| 0.3617        | 3.4375 | 165  | 0.8607          |
+| 0.3546        | 3.5417 | 170  | 0.8613          |
+| 0.3707        | 3.6458 | 175  | 0.8619          |
+| 0.3739        | 3.75   | 180  | 0.8625          |
+| 0.3617        | 3.8542 | 185  | 0.8632          |
+| 0.3591        | 3.9583 | 190  | 0.8637          |
+
+
+## Links
+
+- **[Paper Preprint]**  [ProgressGym: Alignment with a Millennium of Moral Progress](https://arxiv.org/abs/2406.20087)
+- **[Leaderboard & Interactive Playground]** [PKU-Alignment/ProgressGym-LeaderBoard](https://huggingface.co/spaces/PKU-Alignment/ProgressGym-LeaderBoard)
+- **[Huggingface Data & Model Collection]** [PKU-Alignment/ProgressGym](https://huggingface.co/collections/PKU-Alignment/progressgym-666735fcf3e4efa276226eaa)
+- **[Github Codebase]** [PKU-Alignment/ProgressGym](https://github.com/PKU-Alignment/ProgressGym)
+- **[Documentation]** [ProgressGym Documentation](https://pku-alignment.github.io/ProgressGym/)
+- **[PyPI Package]** *(coming soon - [stay tuned](https://forms.gle/1TWFLL4ZCLeYTD5N6)!)*
+
+## Citation
+
+If the datasets, models, or framework of ProgressGym help you in your project, please cite ProgressGym using the bibtex entry below.
+
+```text
+@article{progressgym,
+  title={ProgressGym: Alignment with a Millennium of Moral Progress},
+  author={Tianyi Qiu and Yang Zhang and Xuchuan Huang and Jasmine Xinze Li and Jiaming Ji and Yaodong Yang},
+  journal={arXiv preprint arXiv:2406.20087},
+  eprint={2406.20087},
+  eprinttype = {arXiv},
+  year={2024}
+}
+```
+
+## Ethics Statement
+
+- **Copyright information of historical text data sources**:
+  - Project Gutenberg, one among our four source of our historical text data, consists only of texts in the public domain.
+  - For the text that we draw from Internet Archive, we only include those that uploaded by *Library of Congress*, which are texts freely released online by the U.S. Library of Congress for research and public use.
+  - The text data from Early English Books Online are, according to their publisher, "freely available to the public" and "available for access, distribution, use, or reuse by anyone".
+  - The last remaining source of our historical text data, the Pile of Law dataset, is released under a Creative Commons license, which we adhere to in our use.
+- **Reproducibility**: To ensure reproducibility, we open-source all the code involved in the production of our main results (including the entire pipeline starting from data collection and model training), as well as the supporting infrastructure (the ProgressGym framework), making replication as easy as running a few simple script files.
+- **Misuse Prevention**: In order to prevent potential misuse of progress alignment algorithms, we have carefully formulated progress alignment as strictly value-neutral, without *a priori* assumptions on the direction of progress. In the event of potential misuse of our dataset, we condemn any misuse attempt to the strongest degree possible, and will work with the research community on whistleblowing for such attempts. 
+- **Open-Sourcing**: We confirm that our code, data, and models are to be open-sourced under a CC-BY 4.0 license. We will continue to maintain and update our open-source repositories and models.
--- a/all_results.json
+++ b/all_results.json
@@ -0,0 +1,12 @@
+{
+    "epoch": 4.0,
+    "eval_loss": 0.8062803149223328,
+    "eval_runtime": 1.9823,
+    "eval_samples_per_second": 171.522,
+    "eval_steps_per_second": 1.513,
+    "total_flos": 5334785064960.0,
+    "train_loss": 0.508326952966551,
+    "train_runtime": 4164.2152,
+    "train_samples_per_second": 2.934,
+    "train_steps_per_second": 0.046
+}
--- a/config.json
+++ b/config.json
@@ -0,0 +1,28 @@
+{
+  "_name_or_path": "/mnt/fl/projects/pro-align/progressalign/shared_storage/our_models/C015_llama3-8b-base_pretrain_20240428_005832",
+  "architectures": [
+    "LlamaForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 128000,
+  "eos_token_id": 128001,
+  "hidden_act": "silu",
+  "hidden_size": 4096,
+  "initializer_range": 0.02,
+  "intermediate_size": 14336,
+  "max_position_embeddings": 8192,
+  "model_type": "llama",
+  "num_attention_heads": 32,
+  "num_hidden_layers": 32,
+  "num_key_value_heads": 8,
+  "pretraining_tp": 1,
+  "rms_norm_eps": 1e-05,
+  "rope_scaling": null,
+  "rope_theta": 500000.0,
+  "tie_word_embeddings": false,
+  "torch_dtype": "float16",
+  "transformers_version": "4.40.1",
+  "use_cache": false,
+  "vocab_size": 128256
+}
--- a/configuration.json
+++ b/configuration.json
@@ -0,0 +1 @@
+{"framework": "pytorch", "task": "text-generation", "allow_remote": true}
--- a/eval_results.json
+++ b/eval_results.json
@@ -0,0 +1,7 @@
+{
+    "epoch": 4.0,
+    "eval_loss": 0.8062803149223328,
+    "eval_runtime": 1.9823,
+    "eval_samples_per_second": 171.522,
+    "eval_steps_per_second": 1.513
+}
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,6 @@
+{
+  "_from_model_config": true,
+  "bos_token_id": 128000,
+  "eos_token_id": 128001,
+  "transformers_version": "4.40.1"
+}
--- a/model-00001-of-00004.safetensors
+++ b/model-00001-of-00004.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:4ef6764666375baa6c70755ee998e513f62060f5b46913ae34e67abe0e31a72d
+size 4976698592
--- a/model-00002-of-00004.safetensors
+++ b/model-00002-of-00004.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:f657d1353892b5d5fe5ed93998f2aa6e88c5a78fc81fb51cb9573ee86d0c8583
+size 4999802616
--- a/model-00003-of-00004.safetensors
+++ b/model-00003-of-00004.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:fcb906e7483301863b75ce9e879e61afb9cf662fee8c8167b7e3221807d8eda6
+size 4915916080
--- a/model-00004-of-00004.safetensors
+++ b/model-00004-of-00004.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:3adff485e0f3ece17e8f1673a10767ec1fe1aea949fad8118fe8a1e26c16d979
+size 1168138808
--- a/model.safetensors.index.json
+++ b/model.safetensors.index.json
@@ -0,0 +1,298 @@
+{
+  "metadata": {
+    "total_size": 16060522496
+  },
+  "weight_map": {
+    "lm_head.weight": "model-00004-of-00004.safetensors",
+    "model.embed_tokens.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.20.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.20.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.20.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.20.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.20.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.21.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.30.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.31.input_layernorm.weight": "model-00004-of-00004.safetensors",
+    "model.layers.31.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.31.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.31.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.31.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
+    "model.layers.31.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.31.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.31.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.31.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.norm.weight": "model-00004-of-00004.safetensors"
+  }
+}
--- a/readme-assets/data-sources.png
+++ b/readme-assets/data-sources.png
--- a/readme-assets/data-stats.png
+++ b/readme-assets/data-stats.png
--- a/readme-assets/main-diagram.png
+++ b/readme-assets/main-diagram.png
--- a/readme-assets/moral-evals.png
+++ b/readme-assets/moral-evals.png
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,23 @@
+{
+  "bos_token": {
+    "content": "<|begin_of_text|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|end_of_text|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|end_of_text|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}
--- a/tokenizer.json
+++ b/tokenizer.json
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
--- a/train_results.json
+++ b/train_results.json
@@ -0,0 +1,8 @@
+{
+    "epoch": 4.0,
+    "total_flos": 5334785064960.0,
+    "train_loss": 0.508326952966551,
+    "train_runtime": 4164.2152,
+    "train_samples_per_second": 2.934,
+    "train_steps_per_second": 0.046
+}
--- a/trainer_log.jsonl
+++ b/trainer_log.jsonl
@@ -0,0 +1,79 @@
+{"current_steps": 1, "total_steps": 192, "loss": 0.8766, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 0.0, "epoch": 0.020833333333333332, "percentage": 0.52, "elapsed_time": "0:00:03", "remaining_time": "0:09:41"}
+{"current_steps": 5, "total_steps": 192, "loss": 0.8675, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.25e-06, "epoch": 0.10416666666666667, "percentage": 2.6, "elapsed_time": "0:00:08", "remaining_time": "0:05:18"}
+{"current_steps": 5, "total_steps": 192, "loss": null, "eval_loss": 0.8584801554679871, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.10416666666666667, "percentage": 2.6, "elapsed_time": "0:00:08", "remaining_time": "0:05:18"}
+{"current_steps": 10, "total_steps": 192, "loss": 0.8415, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.25e-06, "epoch": 0.20833333333333334, "percentage": 5.21, "elapsed_time": "0:00:59", "remaining_time": "0:18:00"}
+{"current_steps": 10, "total_steps": 192, "loss": null, "eval_loss": 0.8062803149223328, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.20833333333333334, "percentage": 5.21, "elapsed_time": "0:00:59", "remaining_time": "0:18:00"}
+{"current_steps": 15, "total_steps": 192, "loss": 0.8225, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 8.25e-06, "epoch": 0.3125, "percentage": 7.81, "elapsed_time": "0:01:52", "remaining_time": "0:22:11"}
+{"current_steps": 15, "total_steps": 192, "loss": null, "eval_loss": 0.820951521396637, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.3125, "percentage": 7.81, "elapsed_time": "0:01:52", "remaining_time": "0:22:11"}
+{"current_steps": 20, "total_steps": 192, "loss": 0.806, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.2e-05, "epoch": 0.4166666666666667, "percentage": 10.42, "elapsed_time": "0:03:49", "remaining_time": "0:32:56"}
+{"current_steps": 20, "total_steps": 192, "loss": null, "eval_loss": 0.8412486910820007, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.4166666666666667, "percentage": 10.42, "elapsed_time": "0:03:49", "remaining_time": "0:32:56"}
+{"current_steps": 25, "total_steps": 192, "loss": 0.8139, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.4071209905461127e-05, "epoch": 0.5208333333333334, "percentage": 13.02, "elapsed_time": "0:05:44", "remaining_time": "0:38:20"}
+{"current_steps": 25, "total_steps": 192, "loss": null, "eval_loss": 0.8701534867286682, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.5208333333333334, "percentage": 13.02, "elapsed_time": "0:05:44", "remaining_time": "0:38:20"}
+{"current_steps": 30, "total_steps": 192, "loss": 0.8978, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.0166196232101288e-05, "epoch": 0.625, "percentage": 15.62, "elapsed_time": "0:07:41", "remaining_time": "0:41:30"}
+{"current_steps": 30, "total_steps": 192, "loss": null, "eval_loss": 0.8630704879760742, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.625, "percentage": 15.62, "elapsed_time": "0:07:41", "remaining_time": "0:41:30"}
+{"current_steps": 35, "total_steps": 192, "loss": 0.814, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 7.276248845991498e-06, "epoch": 0.7291666666666666, "percentage": 18.23, "elapsed_time": "0:09:38", "remaining_time": "0:43:14"}
+{"current_steps": 35, "total_steps": 192, "loss": null, "eval_loss": 0.8549697995185852, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.7291666666666666, "percentage": 18.23, "elapsed_time": "0:09:38", "remaining_time": "0:43:14"}
+{"current_steps": 40, "total_steps": 192, "loss": 0.7989, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.157388080190487e-06, "epoch": 0.8333333333333334, "percentage": 20.83, "elapsed_time": "0:11:33", "remaining_time": "0:43:56"}
+{"current_steps": 40, "total_steps": 192, "loss": null, "eval_loss": 0.8472943902015686, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.8333333333333334, "percentage": 20.83, "elapsed_time": "0:11:33", "remaining_time": "0:43:56"}
+{"current_steps": 45, "total_steps": 192, "loss": 0.8769, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.6192313334626905e-06, "epoch": 0.9375, "percentage": 23.44, "elapsed_time": "0:13:28", "remaining_time": "0:44:01"}
+{"current_steps": 45, "total_steps": 192, "loss": null, "eval_loss": 0.8382811546325684, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.9375, "percentage": 23.44, "elapsed_time": "0:13:28", "remaining_time": "0:44:01"}
+{"current_steps": 50, "total_steps": 192, "loss": 0.7244, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.514391432582838e-06, "epoch": 1.0416666666666667, "percentage": 26.04, "elapsed_time": "0:15:23", "remaining_time": "0:43:43"}
+{"current_steps": 50, "total_steps": 192, "loss": null, "eval_loss": 0.8277742266654968, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.0416666666666667, "percentage": 26.04, "elapsed_time": "0:15:23", "remaining_time": "0:43:43"}
+{"current_steps": 55, "total_steps": 192, "loss": 0.4644, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.7297262757656213e-06, "epoch": 1.1458333333333333, "percentage": 28.65, "elapsed_time": "0:17:17", "remaining_time": "0:43:05"}
+{"current_steps": 55, "total_steps": 192, "loss": null, "eval_loss": 0.8387134671211243, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.1458333333333333, "percentage": 28.65, "elapsed_time": "0:17:17", "remaining_time": "0:43:05"}
+{"current_steps": 60, "total_steps": 192, "loss": 0.4488, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.1791620375982074e-06, "epoch": 1.25, "percentage": 31.25, "elapsed_time": "0:19:12", "remaining_time": "0:42:16"}
+{"current_steps": 60, "total_steps": 192, "loss": null, "eval_loss": 0.8680305480957031, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.25, "percentage": 31.25, "elapsed_time": "0:19:12", "remaining_time": "0:42:16"}
+{"current_steps": 65, "total_steps": 192, "loss": 0.3973, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 7.978466092394693e-07, "epoch": 1.3541666666666667, "percentage": 33.85, "elapsed_time": "0:21:07", "remaining_time": "0:41:15"}
+{"current_steps": 65, "total_steps": 192, "loss": null, "eval_loss": 0.8717625737190247, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.3541666666666667, "percentage": 33.85, "elapsed_time": "0:21:07", "remaining_time": "0:41:15"}
+{"current_steps": 70, "total_steps": 192, "loss": 0.443, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.374210410959207e-07, "epoch": 1.4583333333333333, "percentage": 36.46, "elapsed_time": "0:22:59", "remaining_time": "0:40:03"}
+{"current_steps": 70, "total_steps": 192, "loss": null, "eval_loss": 0.8596016764640808, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.4583333333333333, "percentage": 36.46, "elapsed_time": "0:22:59", "remaining_time": "0:40:03"}
+{"current_steps": 75, "total_steps": 192, "loss": 0.4346, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.6222476698215175e-07, "epoch": 1.5625, "percentage": 39.06, "elapsed_time": "0:24:51", "remaining_time": "0:38:46"}
+{"current_steps": 75, "total_steps": 192, "loss": null, "eval_loss": 0.8514222502708435, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.5625, "percentage": 39.06, "elapsed_time": "0:24:51", "remaining_time": "0:38:46"}
+{"current_steps": 80, "total_steps": 192, "loss": 0.4701, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.462755297384099e-07, "epoch": 1.6666666666666665, "percentage": 41.67, "elapsed_time": "0:26:44", "remaining_time": "0:37:25"}
+{"current_steps": 80, "total_steps": 192, "loss": null, "eval_loss": 0.8461114764213562, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.6666666666666665, "percentage": 41.67, "elapsed_time": "0:26:44", "remaining_time": "0:37:25"}
+{"current_steps": 85, "total_steps": 192, "loss": 0.4344, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.7088740175034947e-07, "epoch": 1.7708333333333335, "percentage": 44.27, "elapsed_time": "0:28:36", "remaining_time": "0:36:00"}
+{"current_steps": 85, "total_steps": 192, "loss": null, "eval_loss": 0.8437052369117737, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.7708333333333335, "percentage": 44.27, "elapsed_time": "0:28:36", "remaining_time": "0:36:00"}
+{"current_steps": 90, "total_steps": 192, "loss": 0.4274, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.228102956599465e-07, "epoch": 1.875, "percentage": 46.88, "elapsed_time": "0:30:27", "remaining_time": "0:34:31"}
+{"current_steps": 90, "total_steps": 192, "loss": null, "eval_loss": 0.8434357643127441, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.875, "percentage": 46.88, "elapsed_time": "0:30:27", "remaining_time": "0:34:31"}
+{"current_steps": 95, "total_steps": 192, "loss": 0.4771, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 9.279207916081227e-08, "epoch": 1.9791666666666665, "percentage": 49.48, "elapsed_time": "0:32:20", "remaining_time": "0:33:01"}
+{"current_steps": 95, "total_steps": 192, "loss": null, "eval_loss": 0.8434197902679443, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.9791666666666665, "percentage": 49.48, "elapsed_time": "0:32:20", "remaining_time": "0:33:01"}
+{"current_steps": 100, "total_steps": 192, "loss": 0.3876, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 7.448002404850094e-08, "epoch": 2.0833333333333335, "percentage": 52.08, "elapsed_time": "0:34:12", "remaining_time": "0:31:28"}
+{"current_steps": 100, "total_steps": 192, "loss": null, "eval_loss": 0.8438728451728821, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.0833333333333335, "percentage": 52.08, "elapsed_time": "0:34:12", "remaining_time": "0:31:28"}
+{"current_steps": 105, "total_steps": 192, "loss": 0.3698, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 6.35920070839697e-08, "epoch": 2.1875, "percentage": 54.69, "elapsed_time": "0:36:01", "remaining_time": "0:29:51"}
+{"current_steps": 105, "total_steps": 192, "loss": null, "eval_loss": 0.845079243183136, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.1875, "percentage": 54.69, "elapsed_time": "0:36:01", "remaining_time": "0:29:51"}
+{"current_steps": 110, "total_steps": 192, "loss": 0.407, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.7299804687499997e-08, "epoch": 2.2916666666666665, "percentage": 57.29, "elapsed_time": "0:37:49", "remaining_time": "0:28:12"}
+{"current_steps": 110, "total_steps": 192, "loss": null, "eval_loss": 0.8465444445610046, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.2916666666666665, "percentage": 57.29, "elapsed_time": "0:37:49", "remaining_time": "0:28:12"}
+{"current_steps": 115, "total_steps": 192, "loss": 0.374, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.37771434967624e-08, "epoch": 2.3958333333333335, "percentage": 59.9, "elapsed_time": "0:39:40", "remaining_time": "0:26:33"}
+{"current_steps": 115, "total_steps": 192, "loss": null, "eval_loss": 0.8481599688529968, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.3958333333333335, "percentage": 59.9, "elapsed_time": "0:39:40", "remaining_time": "0:26:33"}
+{"current_steps": 120, "total_steps": 192, "loss": 0.3945, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.187403540619925e-08, "epoch": 2.5, "percentage": 62.5, "elapsed_time": "0:41:28", "remaining_time": "0:24:53"}
+{"current_steps": 120, "total_steps": 192, "loss": null, "eval_loss": 0.8498236536979675, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.5, "percentage": 62.5, "elapsed_time": "0:41:28", "remaining_time": "0:24:53"}
+{"current_steps": 125, "total_steps": 192, "loss": 0.3753, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.088648238966908e-08, "epoch": 2.6041666666666665, "percentage": 65.1, "elapsed_time": "0:43:19", "remaining_time": "0:23:13"}
+{"current_steps": 125, "total_steps": 192, "loss": null, "eval_loss": 0.8512565493583679, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.6041666666666665, "percentage": 65.1, "elapsed_time": "0:43:19", "remaining_time": "0:23:13"}
+{"current_steps": 130, "total_steps": 192, "loss": 0.3721, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.039701925276604e-08, "epoch": 2.7083333333333335, "percentage": 67.71, "elapsed_time": "0:45:11", "remaining_time": "0:21:32"}
+{"current_steps": 130, "total_steps": 192, "loss": null, "eval_loss": 0.8527700304985046, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.7083333333333335, "percentage": 67.71, "elapsed_time": "0:45:11", "remaining_time": "0:21:32"}
+{"current_steps": 135, "total_steps": 192, "loss": 0.3718, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0166900048082497e-08, "epoch": 2.8125, "percentage": 70.31, "elapsed_time": "0:47:01", "remaining_time": "0:19:51"}
+{"current_steps": 135, "total_steps": 192, "loss": null, "eval_loss": 0.8541720509529114, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.8125, "percentage": 70.31, "elapsed_time": "0:47:01", "remaining_time": "0:19:51"}
+{"current_steps": 140, "total_steps": 192, "loss": 0.3773, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0065147322870076e-08, "epoch": 2.9166666666666665, "percentage": 72.92, "elapsed_time": "0:48:51", "remaining_time": "0:18:09"}
+{"current_steps": 140, "total_steps": 192, "loss": null, "eval_loss": 0.8555252552032471, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.9166666666666665, "percentage": 72.92, "elapsed_time": "0:48:51", "remaining_time": "0:18:09"}
+{"current_steps": 145, "total_steps": 192, "loss": 0.3723, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.002328628528332e-08, "epoch": 3.0208333333333335, "percentage": 75.52, "elapsed_time": "0:50:40", "remaining_time": "0:16:25"}
+{"current_steps": 145, "total_steps": 192, "loss": null, "eval_loss": 0.8565484881401062, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.0208333333333335, "percentage": 75.52, "elapsed_time": "0:50:40", "remaining_time": "0:16:25"}
+{"current_steps": 150, "total_steps": 192, "loss": 0.374, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0007484528133236e-08, "epoch": 3.125, "percentage": 78.12, "elapsed_time": "0:52:32", "remaining_time": "0:14:42"}
+{"current_steps": 150, "total_steps": 192, "loss": null, "eval_loss": 0.8576194643974304, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.125, "percentage": 78.12, "elapsed_time": "0:52:32", "remaining_time": "0:14:42"}
+{"current_steps": 155, "total_steps": 192, "loss": 0.3728, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0002110817570477e-08, "epoch": 3.2291666666666665, "percentage": 80.73, "elapsed_time": "0:54:21", "remaining_time": "0:12:58"}
+{"current_steps": 155, "total_steps": 192, "loss": null, "eval_loss": 0.8588044047355652, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.2291666666666665, "percentage": 80.73, "elapsed_time": "0:54:21", "remaining_time": "0:12:58"}
+{"current_steps": 160, "total_steps": 192, "loss": 0.3686, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0000504842356326e-08, "epoch": 3.3333333333333335, "percentage": 83.33, "elapsed_time": "0:56:13", "remaining_time": "0:11:14"}
+{"current_steps": 160, "total_steps": 192, "loss": null, "eval_loss": 0.859791100025177, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.3333333333333335, "percentage": 83.33, "elapsed_time": "0:56:13", "remaining_time": "0:11:14"}
+{"current_steps": 165, "total_steps": 192, "loss": 0.3617, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.000009745562451e-08, "epoch": 3.4375, "percentage": 85.94, "elapsed_time": "0:58:03", "remaining_time": "0:09:29"}
+{"current_steps": 165, "total_steps": 192, "loss": null, "eval_loss": 0.8607122302055359, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.4375, "percentage": 85.94, "elapsed_time": "0:58:03", "remaining_time": "0:09:29"}
+{"current_steps": 170, "total_steps": 192, "loss": 0.3546, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0000014077810156e-08, "epoch": 3.5416666666666665, "percentage": 88.54, "elapsed_time": "0:59:52", "remaining_time": "0:07:44"}
+{"current_steps": 170, "total_steps": 192, "loss": null, "eval_loss": 0.8613293170928955, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.5416666666666665, "percentage": 88.54, "elapsed_time": "0:59:52", "remaining_time": "0:07:44"}
+{"current_steps": 175, "total_steps": 192, "loss": 0.3707, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0000001343508807e-08, "epoch": 3.6458333333333335, "percentage": 91.15, "elapsed_time": "1:01:40", "remaining_time": "0:05:59"}
+{"current_steps": 175, "total_steps": 192, "loss": null, "eval_loss": 0.8619220852851868, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.6458333333333335, "percentage": 91.15, "elapsed_time": "1:01:40", "remaining_time": "0:05:59"}
+{"current_steps": 180, "total_steps": 192, "loss": 0.3739, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.000000006747581e-08, "epoch": 3.75, "percentage": 93.75, "elapsed_time": "1:03:31", "remaining_time": "0:04:14"}
+{"current_steps": 180, "total_steps": 192, "loss": null, "eval_loss": 0.862490177154541, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.75, "percentage": 93.75, "elapsed_time": "1:03:31", "remaining_time": "0:04:14"}
+{"current_steps": 185, "total_steps": 192, "loss": 0.3617, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0000000001094325e-08, "epoch": 3.8541666666666665, "percentage": 96.35, "elapsed_time": "1:05:21", "remaining_time": "0:02:28"}
+{"current_steps": 185, "total_steps": 192, "loss": null, "eval_loss": 0.8631939888000488, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.8541666666666665, "percentage": 96.35, "elapsed_time": "1:05:21", "remaining_time": "0:02:28"}
+{"current_steps": 190, "total_steps": 192, "loss": 0.3591, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.000000000000139e-08, "epoch": 3.9583333333333335, "percentage": 98.96, "elapsed_time": "1:07:12", "remaining_time": "0:00:42"}
+{"current_steps": 190, "total_steps": 192, "loss": null, "eval_loss": 0.8637197613716125, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.9583333333333335, "percentage": 98.96, "elapsed_time": "1:07:12", "remaining_time": "0:00:42"}
+{"current_steps": 192, "total_steps": 192, "loss": null, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 4.0, "percentage": 100.0, "elapsed_time": "1:08:56", "remaining_time": "0:00:00"}
+{"current_steps": 3, "total_steps": 3, "loss": null, "eval_loss": 0.8062803149223328, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 4.0, "percentage": 100.0, "elapsed_time": "1:09:34", "remaining_time": "0:00:00"}
--- a/trainer_state.json
+++ b/trainer_state.json
@@ -0,0 +1,607 @@
+{
+  "best_metric": 0.8062803149223328,
+  "best_model_checkpoint": "./output/training_results/C015_llama3-8b-base_instruct_20240504_123713/checkpoint-10",
+  "epoch": 4.0,
+  "eval_steps": 5,
+  "global_step": 192,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.020833333333333332,
+      "grad_norm": 0.0,
+      "learning_rate": 0.0,
+      "loss": 0.8766,
+      "step": 1
+    },
+    {
+      "epoch": 0.10416666666666667,
+      "grad_norm": 10.34330229700748,
+      "learning_rate": 2.25e-06,
+      "loss": 0.8675,
+      "step": 5
+    },
+    {
+      "epoch": 0.10416666666666667,
+      "eval_loss": 0.8584801554679871,
+      "eval_runtime": 1.9951,
+      "eval_samples_per_second": 170.414,
+      "eval_steps_per_second": 1.504,
+      "step": 5
+    },
+    {
+      "epoch": 0.20833333333333334,
+      "grad_norm": 4.512014612667122,
+      "learning_rate": 5.25e-06,
+      "loss": 0.8415,
+      "step": 10
+    },
+    {
+      "epoch": 0.20833333333333334,
+      "eval_loss": 0.8062803149223328,
+      "eval_runtime": 1.9589,
+      "eval_samples_per_second": 173.564,
+      "eval_steps_per_second": 1.531,
+      "step": 10
+    },
+    {
+      "epoch": 0.3125,
+      "grad_norm": 5.466223920700161,
+      "learning_rate": 8.25e-06,
+      "loss": 0.8225,
+      "step": 15
+    },
+    {
+      "epoch": 0.3125,
+      "eval_loss": 0.820951521396637,
+      "eval_runtime": 1.9583,
+      "eval_samples_per_second": 173.62,
+      "eval_steps_per_second": 1.532,
+      "step": 15
+    },
+    {
+      "epoch": 0.4166666666666667,
+      "grad_norm": 5.091985939472258,
+      "learning_rate": 1.2e-05,
+      "loss": 0.806,
+      "step": 20
+    },
+    {
+      "epoch": 0.4166666666666667,
+      "eval_loss": 0.8412486910820007,
+      "eval_runtime": 1.9516,
+      "eval_samples_per_second": 174.217,
+      "eval_steps_per_second": 1.537,
+      "step": 20
+    },
+    {
+      "epoch": 0.5208333333333334,
+      "grad_norm": 4.323182492427286,
+      "learning_rate": 1.4071209905461127e-05,
+      "loss": 0.8139,
+      "step": 25
+    },
+    {
+      "epoch": 0.5208333333333334,
+      "eval_loss": 0.8701534867286682,
+      "eval_runtime": 1.956,
+      "eval_samples_per_second": 173.828,
+      "eval_steps_per_second": 1.534,
+      "step": 25
+    },
+    {
+      "epoch": 0.625,
+      "grad_norm": 4.430828029367158,
+      "learning_rate": 1.0166196232101288e-05,
+      "loss": 0.8978,
+      "step": 30
+    },
+    {
+      "epoch": 0.625,
+      "eval_loss": 0.8630704879760742,
+      "eval_runtime": 1.9545,
+      "eval_samples_per_second": 173.954,
+      "eval_steps_per_second": 1.535,
+      "step": 30
+    },
+    {
+      "epoch": 0.7291666666666666,
+      "grad_norm": 3.8459262571296122,
+      "learning_rate": 7.276248845991498e-06,
+      "loss": 0.814,
+      "step": 35
+    },
+    {
+      "epoch": 0.7291666666666666,
+      "eval_loss": 0.8549697995185852,
+      "eval_runtime": 1.9539,
+      "eval_samples_per_second": 174.008,
+      "eval_steps_per_second": 1.535,
+      "step": 35
+    },
+    {
+      "epoch": 0.8333333333333334,
+      "grad_norm": 4.312137758874538,
+      "learning_rate": 5.157388080190487e-06,
+      "loss": 0.7989,
+      "step": 40
+    },
+    {
+      "epoch": 0.8333333333333334,
+      "eval_loss": 0.8472943902015686,
+      "eval_runtime": 1.9515,
+      "eval_samples_per_second": 174.222,
+      "eval_steps_per_second": 1.537,
+      "step": 40
+    },
+    {
+      "epoch": 0.9375,
+      "grad_norm": 4.013634591695819,
+      "learning_rate": 3.6192313334626905e-06,
+      "loss": 0.8769,
+      "step": 45
+    },
+    {
+      "epoch": 0.9375,
+      "eval_loss": 0.8382811546325684,
+      "eval_runtime": 1.9527,
+      "eval_samples_per_second": 174.116,
+      "eval_steps_per_second": 1.536,
+      "step": 45
+    },
+    {
+      "epoch": 1.0416666666666667,
+      "grad_norm": 3.825805552811641,
+      "learning_rate": 2.514391432582838e-06,
+      "loss": 0.7244,
+      "step": 50
+    },
+    {
+      "epoch": 1.0416666666666667,
+      "eval_loss": 0.8277742266654968,
+      "eval_runtime": 1.9549,
+      "eval_samples_per_second": 173.925,
+      "eval_steps_per_second": 1.535,
+      "step": 50
+    },
+    {
+      "epoch": 1.1458333333333333,
+      "grad_norm": 3.0670883145044487,
+      "learning_rate": 1.7297262757656213e-06,
+      "loss": 0.4644,
+      "step": 55
+    },
+    {
+      "epoch": 1.1458333333333333,
+      "eval_loss": 0.8387134671211243,
+      "eval_runtime": 1.959,
+      "eval_samples_per_second": 173.561,
+      "eval_steps_per_second": 1.531,
+      "step": 55
+    },
+    {
+      "epoch": 1.25,
+      "grad_norm": 3.904758197983173,
+      "learning_rate": 1.1791620375982074e-06,
+      "loss": 0.4488,
+      "step": 60
+    },
+    {
+      "epoch": 1.25,
+      "eval_loss": 0.8680305480957031,
+      "eval_runtime": 1.953,
+      "eval_samples_per_second": 174.087,
+      "eval_steps_per_second": 1.536,
+      "step": 60
+    },
+    {
+      "epoch": 1.3541666666666667,
+      "grad_norm": 4.037522952699986,
+      "learning_rate": 7.978466092394693e-07,
+      "loss": 0.3973,
+      "step": 65
+    },
+    {
+      "epoch": 1.3541666666666667,
+      "eval_loss": 0.8717625737190247,
+      "eval_runtime": 1.9541,
+      "eval_samples_per_second": 173.996,
+      "eval_steps_per_second": 1.535,
+      "step": 65
+    },
+    {
+      "epoch": 1.4583333333333333,
+      "grad_norm": 4.006395162021574,
+      "learning_rate": 5.374210410959207e-07,
+      "loss": 0.443,
+      "step": 70
+    },
+    {
+      "epoch": 1.4583333333333333,
+      "eval_loss": 0.8596016764640808,
+      "eval_runtime": 1.9513,
+      "eval_samples_per_second": 174.244,
+      "eval_steps_per_second": 1.537,
+      "step": 70
+    },
+    {
+      "epoch": 1.5625,
+      "grad_norm": 4.477860740274337,
+      "learning_rate": 3.6222476698215175e-07,
+      "loss": 0.4346,
+      "step": 75
+    },
+    {
+      "epoch": 1.5625,
+      "eval_loss": 0.8514222502708435,
+      "eval_runtime": 1.9616,
+      "eval_samples_per_second": 173.329,
+      "eval_steps_per_second": 1.529,
+      "step": 75
+    },
+    {
+      "epoch": 1.6666666666666665,
+      "grad_norm": 3.79003494996975,
+      "learning_rate": 2.462755297384099e-07,
+      "loss": 0.4701,
+      "step": 80
+    },
+    {
+      "epoch": 1.6666666666666665,
+      "eval_loss": 0.8461114764213562,
+      "eval_runtime": 1.9519,
+      "eval_samples_per_second": 174.192,
+      "eval_steps_per_second": 1.537,
+      "step": 80
+    },
+    {
+      "epoch": 1.7708333333333335,
+      "grad_norm": 3.431009868136907,
+      "learning_rate": 1.7088740175034947e-07,
+      "loss": 0.4344,
+      "step": 85
+    },
+    {
+      "epoch": 1.7708333333333335,
+      "eval_loss": 0.8437052369117737,
+      "eval_runtime": 1.9548,
+      "eval_samples_per_second": 173.928,
+      "eval_steps_per_second": 1.535,
+      "step": 85
+    },
+    {
+      "epoch": 1.875,
+      "grad_norm": 3.4612975522103846,
+      "learning_rate": 1.228102956599465e-07,
+      "loss": 0.4274,
+      "step": 90
+    },
+    {
+      "epoch": 1.875,
+      "eval_loss": 0.8434357643127441,
+      "eval_runtime": 1.9551,
+      "eval_samples_per_second": 173.905,
+      "eval_steps_per_second": 1.534,
+      "step": 90
+    },
+    {
+      "epoch": 1.9791666666666665,
+      "grad_norm": 4.089060356958601,
+      "learning_rate": 9.279207916081227e-08,
+      "loss": 0.4771,
+      "step": 95
+    },
+    {
+      "epoch": 1.9791666666666665,
+      "eval_loss": 0.8434197902679443,
+      "eval_runtime": 1.9533,
+      "eval_samples_per_second": 174.06,
+      "eval_steps_per_second": 1.536,
+      "step": 95
+    },
+    {
+      "epoch": 2.0833333333333335,
+      "grad_norm": 3.5663107359521624,
+      "learning_rate": 7.448002404850094e-08,
+      "loss": 0.3876,
+      "step": 100
+    },
+    {
+      "epoch": 2.0833333333333335,
+      "eval_loss": 0.8438728451728821,
+      "eval_runtime": 1.957,
+      "eval_samples_per_second": 173.739,
+      "eval_steps_per_second": 1.533,
+      "step": 100
+    },
+    {
+      "epoch": 2.1875,
+      "grad_norm": 3.3025378175144806,
+      "learning_rate": 6.35920070839697e-08,
+      "loss": 0.3698,
+      "step": 105
+    },
+    {
+      "epoch": 2.1875,
+      "eval_loss": 0.845079243183136,
+      "eval_runtime": 1.9562,
+      "eval_samples_per_second": 173.803,
+      "eval_steps_per_second": 1.534,
+      "step": 105
+    },
+    {
+      "epoch": 2.2916666666666665,
+      "grad_norm": 3.4985300165216615,
+      "learning_rate": 5.7299804687499997e-08,
+      "loss": 0.407,
+      "step": 110
+    },
+    {
+      "epoch": 2.2916666666666665,
+      "eval_loss": 0.8465444445610046,
+      "eval_runtime": 1.9573,
+      "eval_samples_per_second": 173.708,
+      "eval_steps_per_second": 1.533,
+      "step": 110
+    },
+    {
+      "epoch": 2.3958333333333335,
+      "grad_norm": 3.394752587183079,
+      "learning_rate": 5.37771434967624e-08,
+      "loss": 0.374,
+      "step": 115
+    },
+    {
+      "epoch": 2.3958333333333335,
+      "eval_loss": 0.8481599688529968,
+      "eval_runtime": 1.9556,
+      "eval_samples_per_second": 173.861,
+      "eval_steps_per_second": 1.534,
+      "step": 115
+    },
+    {
+      "epoch": 2.5,
+      "grad_norm": 4.399721414998381,
+      "learning_rate": 5.187403540619925e-08,
+      "loss": 0.3945,
+      "step": 120
+    },
+    {
+      "epoch": 2.5,
+      "eval_loss": 0.8498236536979675,
+      "eval_runtime": 1.9552,
+      "eval_samples_per_second": 173.893,
+      "eval_steps_per_second": 1.534,
+      "step": 120
+    },
+    {
+      "epoch": 2.6041666666666665,
+      "grad_norm": 3.2768849901156845,
+      "learning_rate": 5.088648238966908e-08,
+      "loss": 0.3753,
+      "step": 125
+    },
+    {
+      "epoch": 2.6041666666666665,
+      "eval_loss": 0.8512565493583679,
+      "eval_runtime": 1.9594,
+      "eval_samples_per_second": 173.526,
+      "eval_steps_per_second": 1.531,
+      "step": 125
+    },
+    {
+      "epoch": 2.7083333333333335,
+      "grad_norm": 3.666595330730063,
+      "learning_rate": 5.039701925276604e-08,
+      "loss": 0.3721,
+      "step": 130
+    },
+    {
+      "epoch": 2.7083333333333335,
+      "eval_loss": 0.8527700304985046,
+      "eval_runtime": 1.9575,
+      "eval_samples_per_second": 173.689,
+      "eval_steps_per_second": 1.533,
+      "step": 130
+    },
+    {
+      "epoch": 2.8125,
+      "grad_norm": 3.4733320032072537,
+      "learning_rate": 5.0166900048082497e-08,
+      "loss": 0.3718,
+      "step": 135
+    },
+    {
+      "epoch": 2.8125,
+      "eval_loss": 0.8541720509529114,
+      "eval_runtime": 1.9599,
+      "eval_samples_per_second": 173.479,
+      "eval_steps_per_second": 1.531,
+      "step": 135
+    },
+    {
+      "epoch": 2.9166666666666665,
+      "grad_norm": 3.447696757476531,
+      "learning_rate": 5.0065147322870076e-08,
+      "loss": 0.3773,
+      "step": 140
+    },
+    {
+      "epoch": 2.9166666666666665,
+      "eval_loss": 0.8555252552032471,
+      "eval_runtime": 1.9586,
+      "eval_samples_per_second": 173.592,
+      "eval_steps_per_second": 1.532,
+      "step": 140
+    },
+    {
+      "epoch": 3.0208333333333335,
+      "grad_norm": 3.0052210603196237,
+      "learning_rate": 5.002328628528332e-08,
+      "loss": 0.3723,
+      "step": 145
+    },
+    {
+      "epoch": 3.0208333333333335,
+      "eval_loss": 0.8565484881401062,
+      "eval_runtime": 1.9586,
+      "eval_samples_per_second": 173.589,
+      "eval_steps_per_second": 1.532,
+      "step": 145
+    },
+    {
+      "epoch": 3.125,
+      "grad_norm": 3.368197941438436,
+      "learning_rate": 5.0007484528133236e-08,
+      "loss": 0.374,
+      "step": 150
+    },
+    {
+      "epoch": 3.125,
+      "eval_loss": 0.8576194643974304,
+      "eval_runtime": 1.9541,
+      "eval_samples_per_second": 173.993,
+      "eval_steps_per_second": 1.535,
+      "step": 150
+    },
+    {
+      "epoch": 3.2291666666666665,
+      "grad_norm": 3.3290743731904304,
+      "learning_rate": 5.0002110817570477e-08,
+      "loss": 0.3728,
+      "step": 155
+    },
+    {
+      "epoch": 3.2291666666666665,
+      "eval_loss": 0.8588044047355652,
+      "eval_runtime": 1.951,
+      "eval_samples_per_second": 174.273,
+      "eval_steps_per_second": 1.538,
+      "step": 155
+    },
+    {
+      "epoch": 3.3333333333333335,
+      "grad_norm": 4.793937739567796,
+      "learning_rate": 5.0000504842356326e-08,
+      "loss": 0.3686,
+      "step": 160
+    },
+    {
+      "epoch": 3.3333333333333335,
+      "eval_loss": 0.859791100025177,
+      "eval_runtime": 1.9522,
+      "eval_samples_per_second": 174.159,
+      "eval_steps_per_second": 1.537,
+      "step": 160
+    },
+    {
+      "epoch": 3.4375,
+      "grad_norm": 3.326342529192208,
+      "learning_rate": 5.000009745562451e-08,
+      "loss": 0.3617,
+      "step": 165
+    },
+    {
+      "epoch": 3.4375,
+      "eval_loss": 0.8607122302055359,
+      "eval_runtime": 1.958,
+      "eval_samples_per_second": 173.647,
+      "eval_steps_per_second": 1.532,
+      "step": 165
+    },
+    {
+      "epoch": 3.5416666666666665,
+      "grad_norm": 3.6505713497705736,
+      "learning_rate": 5.0000014077810156e-08,
+      "loss": 0.3546,
+      "step": 170
+    },
+    {
+      "epoch": 3.5416666666666665,
+      "eval_loss": 0.8613293170928955,
+      "eval_runtime": 1.9527,
+      "eval_samples_per_second": 174.122,
+      "eval_steps_per_second": 1.536,
+      "step": 170
+    },
+    {
+      "epoch": 3.6458333333333335,
+      "grad_norm": 3.496080458530573,
+      "learning_rate": 5.0000001343508807e-08,
+      "loss": 0.3707,
+      "step": 175
+    },
+    {
+      "epoch": 3.6458333333333335,
+      "eval_loss": 0.8619220852851868,
+      "eval_runtime": 1.9552,
+      "eval_samples_per_second": 173.893,
+      "eval_steps_per_second": 1.534,
+      "step": 175
+    },
+    {
+      "epoch": 3.75,
+      "grad_norm": 3.50316414527161,
+      "learning_rate": 5.000000006747581e-08,
+      "loss": 0.3739,
+      "step": 180
+    },
+    {
+      "epoch": 3.75,
+      "eval_loss": 0.862490177154541,
+      "eval_runtime": 1.9547,
+      "eval_samples_per_second": 173.936,
+      "eval_steps_per_second": 1.535,
+      "step": 180
+    },
+    {
+      "epoch": 3.8541666666666665,
+      "grad_norm": 3.7278057893863874,
+      "learning_rate": 5.0000000001094325e-08,
+      "loss": 0.3617,
+      "step": 185
+    },
+    {
+      "epoch": 3.8541666666666665,
+      "eval_loss": 0.8631939888000488,
+      "eval_runtime": 1.9574,
+      "eval_samples_per_second": 173.703,
+      "eval_steps_per_second": 1.533,
+      "step": 185
+    },
+    {
+      "epoch": 3.9583333333333335,
+      "grad_norm": 3.160928200357982,
+      "learning_rate": 5.000000000000139e-08,
+      "loss": 0.3591,
+      "step": 190
+    },
+    {
+      "epoch": 3.9583333333333335,
+      "eval_loss": 0.8637197613716125,
+      "eval_runtime": 1.9552,
+      "eval_samples_per_second": 173.893,
+      "eval_steps_per_second": 1.534,
+      "step": 190
+    },
+    {
+      "epoch": 4.0,
+      "step": 192,
+      "total_flos": 5334785064960.0,
+      "train_loss": 0.508326952966551,
+      "train_runtime": 4164.2152,
+      "train_samples_per_second": 2.934,
+      "train_steps_per_second": 0.046
+    }
+  ],
+  "logging_steps": 5,
+  "max_steps": 192,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 4,
+  "save_steps": 5,
+  "total_flos": 5334785064960.0,
+  "train_batch_size": 8,
+  "trial_name": null,
+  "trial_params": null
+}
--- a/training_args.bin
+++ b/training_args.bin
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:6847c094292a8b40cbd23c921c6bcb9cbc8346d03045e21561f149c7aa6e6569
+size 6776
--- a/training_eval_loss.png
+++ b/training_eval_loss.png
--- a/training_loss.png
+++ b/training_loss.png
				`@@ -0,0 +1 @@`
				`{"framework": "pytorch", "task": "text-generation", "allow_remote": true}`