初始化项目,由ModelHub XC社区提供模型
Model: PKU-Alignment/ProgressGym-HistLlama3-8B-C013-instruct-v0.2 Source: Original Platform
This commit is contained in:
39
.gitattributes
vendored
Normal file
39
.gitattributes
vendored
Normal file
@@ -0,0 +1,39 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
model-00001-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
model-00002-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
model-00003-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
model-00004-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
210
README.md
Normal file
210
README.md
Normal file
@@ -0,0 +1,210 @@
|
||||
---
|
||||
license: cc-by-4.0
|
||||
tags:
|
||||
- alignment
|
||||
- value alignment
|
||||
- AI safety
|
||||
- safety
|
||||
- LLM
|
||||
- history
|
||||
datasets:
|
||||
- PKU-Alignment/ProgressGym-HistText
|
||||
- PKU-Alignment/ProgressGym-TimelessQA
|
||||
base_model:
|
||||
- PKU-Alignment/ProgressGym-HistLlama3-8B-C013-pretrain
|
||||
- meta-llama/Meta-Llama-3-8B
|
||||
---
|
||||
|
||||
# ProgressGym-HistLlama3-8B-C013-instruct
|
||||
|
||||
## Overview
|
||||
|
||||
#### The ProgressGym Framework
|
||||
|
||||

|
||||
|
||||
**ProgressGym-HistLlama3-8B-C013-instruct** is part of the **ProgressGym** framework for research and experimentation on *progress alignment* - the emulation of moral progress in AI alignment algorithms, as a measure to prevent risks of societal value lock-in.
|
||||
|
||||
To quote the paper [*ProgressGym: Alignment with a Millennium of Moral Progress*](https://arxiv.org/abs/2406.20087):
|
||||
|
||||
> Frontier AI systems, including large language models (LLMs), hold increasing influence over the epistemology of human users. Such influence can reinforce prevailing societal values, potentially contributing to the lock-in of misguided moral beliefs and, consequently, the perpetuation of problematic moral practices on a broad scale.
|
||||
>
|
||||
> We introduce *progress alignment* as a technical solution to mitigate this imminent risk. Progress alignment algorithms learn to emulate the mechanics of human moral progress, thereby addressing the susceptibility of existing alignment methods to contemporary moral blindspots.
|
||||
|
||||
#### ProgressGym-HistLlama3-8B-C013-instruct
|
||||
|
||||
ProgressGym-HistLlama3-8B-C013-instruct is one of the **36 historical language models** in the ProgressGym framework.
|
||||
|
||||
**ProgressGym-HistLlama3-8B-C013-instruct is under continual iteration.** Improving upon the current version, new versions of the model are currently being trained to reflect historical moral tendencies in ever more comprehensive ways.
|
||||
|
||||
**ProgressGym-HistLlama3-8B-C013-instruct is a 13th-century historical language model.** Based on [Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B), It is continued-pretrained on the 13th-century text data from [ProgressGym-HistText](https://huggingface.co/datasets/PKU-Alignment/ProgressGym-HistText), using the following hyperparameters:
|
||||
|
||||
- learning_rate: 1.5e-05
|
||||
- train_batch_size: 8
|
||||
- eval_batch_size: 16
|
||||
- seed: 42
|
||||
- distributed_type: multi-GPU
|
||||
- num_devices: 8
|
||||
- total_train_batch_size: 64
|
||||
- total_eval_batch_size: 128
|
||||
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
||||
- lr_scheduler_type: polynomial
|
||||
- lr_scheduler_warmup_steps: 20
|
||||
- num_epochs: 4.0
|
||||
- mixed_precision_training: Native AMP
|
||||
|
||||
... with the following training results:
|
||||
|
||||
| Training Loss | Epoch | Step | Validation Loss |
|
||||
|:-------------:|:------:|:----:|:---------------:|
|
||||
| 1.7594 | 0.0149 | 1 | 1.7163 |
|
||||
| 1.7333 | 0.0746 | 5 | 1.7008 |
|
||||
| 1.6854 | 0.1493 | 10 | 1.6825 |
|
||||
| 1.6897 | 0.2239 | 15 | 1.6701 |
|
||||
| 1.6656 | 0.2985 | 20 | 1.6651 |
|
||||
| 1.7254 | 0.3731 | 25 | 1.6679 |
|
||||
| 1.7178 | 0.4478 | 30 | 1.6542 |
|
||||
| 1.6656 | 0.5224 | 35 | 1.6459 |
|
||||
| 1.6647 | 0.5970 | 40 | 1.6308 |
|
||||
| 1.6645 | 0.6716 | 45 | 1.6205 |
|
||||
| 1.6151 | 0.7463 | 50 | 1.6129 |
|
||||
| 1.6359 | 0.8209 | 55 | 1.6052 |
|
||||
| 1.5885 | 0.8955 | 60 | 1.5995 |
|
||||
| 1.6142 | 0.9701 | 65 | 1.5943 |
|
||||
| 1.4875 | 1.0448 | 70 | 1.5963 |
|
||||
| 1.3844 | 1.1194 | 75 | 1.6118 |
|
||||
| 1.3555 | 1.1940 | 80 | 1.6069 |
|
||||
| 1.3597 | 1.2687 | 85 | 1.6040 |
|
||||
| 1.3737 | 1.3433 | 90 | 1.6071 |
|
||||
| 1.3492 | 1.4179 | 95 | 1.6074 |
|
||||
| 1.3826 | 1.4925 | 100 | 1.6055 |
|
||||
| 1.3533 | 1.5672 | 105 | 1.6035 |
|
||||
| 1.3611 | 1.6418 | 110 | 1.6023 |
|
||||
| 1.328 | 1.7164 | 115 | 1.6022 |
|
||||
| 1.3443 | 1.7910 | 120 | 1.6026 |
|
||||
| 1.3386 | 1.8657 | 125 | 1.6029 |
|
||||
| 1.3396 | 1.9403 | 130 | 1.6029 |
|
||||
| 1.3573 | 2.0149 | 135 | 1.6029 |
|
||||
| 1.3754 | 2.0896 | 140 | 1.6034 |
|
||||
| 1.3229 | 2.1642 | 145 | 1.6044 |
|
||||
| 1.3194 | 2.2388 | 150 | 1.6055 |
|
||||
| 1.3361 | 2.3134 | 155 | 1.6065 |
|
||||
| 1.3231 | 2.3881 | 160 | 1.6072 |
|
||||
| 1.32 | 2.4627 | 165 | 1.6076 |
|
||||
| 1.3406 | 2.5373 | 170 | 1.6078 |
|
||||
| 1.3184 | 2.6119 | 175 | 1.6079 |
|
||||
| 1.2745 | 2.6866 | 180 | 1.6080 |
|
||||
| 1.3024 | 2.7612 | 185 | 1.6079 |
|
||||
| 1.3243 | 2.8358 | 190 | 1.6079 |
|
||||
| 1.3239 | 2.9104 | 195 | 1.6080 |
|
||||
| 1.3349 | 2.9851 | 200 | 1.6081 |
|
||||
| 1.337 | 3.0597 | 205 | 1.6079 |
|
||||
| 1.3091 | 3.1343 | 210 | 1.6078 |
|
||||
| 1.3266 | 3.2090 | 215 | 1.6079 |
|
||||
| 1.3014 | 3.2836 | 220 | 1.6083 |
|
||||
| 1.3153 | 3.3582 | 225 | 1.6086 |
|
||||
| 1.3192 | 3.4328 | 230 | 1.6090 |
|
||||
| 1.315 | 3.5075 | 235 | 1.6093 |
|
||||
| 1.3047 | 3.5821 | 240 | 1.6093 |
|
||||
| 1.3208 | 3.6567 | 245 | 1.6093 |
|
||||
| 1.362 | 3.7313 | 250 | 1.6093 |
|
||||
| 1.3255 | 3.8060 | 255 | 1.6091 |
|
||||
| 1.2941 | 3.8806 | 260 | 1.6089 |
|
||||
| 1.3254 | 3.9552 | 265 | 1.6086 |
|
||||
|
||||
Note that the training data volume for the continued pretraining stage is capped at 3GB. When the corresponding century's corpus exceeds this volume, the training data is randomly sampled to fit the volume.
|
||||
|
||||
**ProgressGym-HistLlama3-8B-C013-instruct is an instruction-tuned language model.** It is tuned on [ProgressGym-TimelessQA](https://huggingface.co/datasets/PKU-Alignment/ProgressGym-TimelessQA), using the following hyperparameters. Note, however, that the snapshot at training step 10 is used for the final model, to minimize erosion of the value tendencies learned during continued pretraining; we qualitatively observe that this snapshot still possesses strong instruction-following capabilities.
|
||||
- learning_rate: 1.5e-05
|
||||
- train_batch_size: 8
|
||||
- eval_batch_size: 16
|
||||
- seed: 42
|
||||
- distributed_type: multi-GPU
|
||||
- num_devices: 8
|
||||
- total_train_batch_size: 64
|
||||
- total_eval_batch_size: 128
|
||||
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
||||
- lr_scheduler_type: polynomial
|
||||
- lr_scheduler_warmup_steps: 20
|
||||
- num_epochs: 4.0
|
||||
- mixed_precision_training: Native AMP
|
||||
|
||||
... with the following training results:
|
||||
|
||||
| Training Loss | Epoch | Step | Validation Loss |
|
||||
|:-------------:|:------:|:----:|:---------------:|
|
||||
| 0.9805 | 0.0208 | 1 | 0.9737 |
|
||||
| 0.9446 | 0.1042 | 5 | 0.9455 |
|
||||
| 0.8481 | 0.2083 | 10 | 0.8154 |
|
||||
| 0.7794 | 0.3125 | 15 | 0.8123 |
|
||||
| 0.7798 | 0.4167 | 20 | 0.8411 |
|
||||
| 0.8576 | 0.5208 | 25 | 0.8676 |
|
||||
| 0.8852 | 0.625 | 30 | 0.8673 |
|
||||
| 0.8529 | 0.7292 | 35 | 0.8561 |
|
||||
| 0.8224 | 0.8333 | 40 | 0.8470 |
|
||||
| 0.8536 | 0.9375 | 45 | 0.8378 |
|
||||
| 0.662 | 1.0417 | 50 | 0.8294 |
|
||||
| 0.437 | 1.1458 | 55 | 0.8531 |
|
||||
| 0.4402 | 1.25 | 60 | 0.8569 |
|
||||
| 0.4244 | 1.3542 | 65 | 0.8569 |
|
||||
| 0.4495 | 1.4583 | 70 | 0.8547 |
|
||||
| 0.4689 | 1.5625 | 75 | 0.8494 |
|
||||
| 0.4309 | 1.6667 | 80 | 0.8461 |
|
||||
| 0.4299 | 1.7708 | 85 | 0.8446 |
|
||||
| 0.4461 | 1.875 | 90 | 0.8440 |
|
||||
| 0.4474 | 1.9792 | 95 | 0.8439 |
|
||||
| 0.3614 | 2.0833 | 100 | 0.8445 |
|
||||
| 0.3861 | 2.1875 | 105 | 0.8457 |
|
||||
| 0.3829 | 2.2917 | 110 | 0.8473 |
|
||||
| 0.3764 | 2.3958 | 115 | 0.8488 |
|
||||
| 0.3655 | 2.5 | 120 | 0.8500 |
|
||||
| 0.4243 | 2.6042 | 125 | 0.8511 |
|
||||
| 0.3884 | 2.7083 | 130 | 0.8520 |
|
||||
| 0.3634 | 2.8125 | 135 | 0.8528 |
|
||||
| 0.3846 | 2.9167 | 140 | 0.8537 |
|
||||
| 0.3872 | 3.0208 | 145 | 0.8547 |
|
||||
| 0.3869 | 3.125 | 150 | 0.8558 |
|
||||
| 0.3876 | 3.2292 | 155 | 0.8566 |
|
||||
| 0.3844 | 3.3333 | 160 | 0.8573 |
|
||||
| 0.3535 | 3.4375 | 165 | 0.8579 |
|
||||
| 0.3488 | 3.5417 | 170 | 0.8588 |
|
||||
| 0.3464 | 3.6458 | 175 | 0.8598 |
|
||||
| 0.361 | 3.75 | 180 | 0.8607 |
|
||||
| 0.3674 | 3.8542 | 185 | 0.8612 |
|
||||
| 0.3988 | 3.9583 | 190 | 0.8612 |
|
||||
|
||||
|
||||
## Links
|
||||
|
||||
- **[Paper Preprint]** [ProgressGym: Alignment with a Millennium of Moral Progress](https://arxiv.org/abs/2406.20087)
|
||||
- **[Leaderboard & Interactive Playground]** [PKU-Alignment/ProgressGym-LeaderBoard](https://huggingface.co/spaces/PKU-Alignment/ProgressGym-LeaderBoard)
|
||||
- **[Huggingface Data & Model Collection]** [PKU-Alignment/ProgressGym](https://huggingface.co/collections/PKU-Alignment/progressgym-666735fcf3e4efa276226eaa)
|
||||
- **[Github Codebase]** [PKU-Alignment/ProgressGym](https://github.com/PKU-Alignment/ProgressGym)
|
||||
- **[Documentation]** [ProgressGym Documentation](https://pku-alignment.github.io/ProgressGym/)
|
||||
- **[PyPI Package]** *(coming soon - [stay tuned](https://forms.gle/1TWFLL4ZCLeYTD5N6)!)*
|
||||
|
||||
## Citation
|
||||
|
||||
If the datasets, models, or framework of ProgressGym help you in your project, please cite ProgressGym using the bibtex entry below.
|
||||
|
||||
```text
|
||||
@article{progressgym,
|
||||
title={ProgressGym: Alignment with a Millennium of Moral Progress},
|
||||
author={Tianyi Qiu and Yang Zhang and Xuchuan Huang and Jasmine Xinze Li and Jiaming Ji and Yaodong Yang},
|
||||
journal={arXiv preprint arXiv:2406.20087},
|
||||
eprint={2406.20087},
|
||||
eprinttype = {arXiv},
|
||||
year={2024}
|
||||
}
|
||||
```
|
||||
|
||||
## Ethics Statement
|
||||
|
||||
- **Copyright information of historical text data sources**:
|
||||
- Project Gutenberg, one among our four source of our historical text data, consists only of texts in the public domain.
|
||||
- For the text that we draw from Internet Archive, we only include those that uploaded by *Library of Congress*, which are texts freely released online by the U.S. Library of Congress for research and public use.
|
||||
- The text data from Early English Books Online are, according to their publisher, "freely available to the public" and "available for access, distribution, use, or reuse by anyone".
|
||||
- The last remaining source of our historical text data, the Pile of Law dataset, is released under a Creative Commons license, which we adhere to in our use.
|
||||
- **Reproducibility**: To ensure reproducibility, we open-source all the code involved in the production of our main results (including the entire pipeline starting from data collection and model training), as well as the supporting infrastructure (the ProgressGym framework), making replication as easy as running a few simple script files.
|
||||
- **Misuse Prevention**: In order to prevent potential misuse of progress alignment algorithms, we have carefully formulated progress alignment as strictly value-neutral, without *a priori* assumptions on the direction of progress. In the event of potential misuse of our dataset, we condemn any misuse attempt to the strongest degree possible, and will work with the research community on whistleblowing for such attempts.
|
||||
- **Open-Sourcing**: We confirm that our code, data, and models are to be open-sourced under a CC-BY 4.0 license. We will continue to maintain and update our open-source repositories and models.
|
||||
12
all_results.json
Normal file
12
all_results.json
Normal file
@@ -0,0 +1,12 @@
|
||||
{
|
||||
"epoch": 4.0,
|
||||
"eval_loss": 0.8123041987419128,
|
||||
"eval_runtime": 2.1028,
|
||||
"eval_samples_per_second": 161.691,
|
||||
"eval_steps_per_second": 1.427,
|
||||
"total_flos": 5363820134400.0,
|
||||
"train_loss": 0.5090278356025616,
|
||||
"train_runtime": 6014.9673,
|
||||
"train_samples_per_second": 2.031,
|
||||
"train_steps_per_second": 0.032
|
||||
}
|
||||
28
config.json
Normal file
28
config.json
Normal file
@@ -0,0 +1,28 @@
|
||||
{
|
||||
"_name_or_path": "./output/training_results/C013_llama3-8b-base_pretrain_20240428_005832/",
|
||||
"architectures": [
|
||||
"LlamaForCausalLM"
|
||||
],
|
||||
"attention_bias": false,
|
||||
"attention_dropout": 0.0,
|
||||
"bos_token_id": 128000,
|
||||
"eos_token_id": 128001,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 4096,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 14336,
|
||||
"max_position_embeddings": 8192,
|
||||
"model_type": "llama",
|
||||
"num_attention_heads": 32,
|
||||
"num_hidden_layers": 32,
|
||||
"num_key_value_heads": 8,
|
||||
"pretraining_tp": 1,
|
||||
"rms_norm_eps": 1e-05,
|
||||
"rope_scaling": null,
|
||||
"rope_theta": 500000.0,
|
||||
"tie_word_embeddings": false,
|
||||
"torch_dtype": "float16",
|
||||
"transformers_version": "4.40.0",
|
||||
"use_cache": false,
|
||||
"vocab_size": 128256
|
||||
}
|
||||
1
configuration.json
Normal file
1
configuration.json
Normal file
@@ -0,0 +1 @@
|
||||
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}
|
||||
7
eval_results.json
Normal file
7
eval_results.json
Normal file
@@ -0,0 +1,7 @@
|
||||
{
|
||||
"epoch": 4.0,
|
||||
"eval_loss": 0.8123041987419128,
|
||||
"eval_runtime": 2.1028,
|
||||
"eval_samples_per_second": 161.691,
|
||||
"eval_steps_per_second": 1.427
|
||||
}
|
||||
6
generation_config.json
Normal file
6
generation_config.json
Normal file
@@ -0,0 +1,6 @@
|
||||
{
|
||||
"_from_model_config": true,
|
||||
"bos_token_id": 128000,
|
||||
"eos_token_id": 128001,
|
||||
"transformers_version": "4.40.0"
|
||||
}
|
||||
3
model-00001-of-00004.safetensors
Normal file
3
model-00001-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:04e95a34b63c8a3b1e9d555837cb3e8c5eb62eef55da32ef1ea4eeb663e353cc
|
||||
size 4976698592
|
||||
3
model-00002-of-00004.safetensors
Normal file
3
model-00002-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:e1d1ce3df0cb483d4209d598838fb2b2611ac9b9441b6ee991f0d074bc608571
|
||||
size 4999802616
|
||||
3
model-00003-of-00004.safetensors
Normal file
3
model-00003-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:63918ea2dfb5e85170a7df2101dc8d9bd5aeec78b9646b074703cfe2e75ffab6
|
||||
size 4915916080
|
||||
3
model-00004-of-00004.safetensors
Normal file
3
model-00004-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:d0c8544b338326968f6e3864335cf01f7fd7e6b6b725f20c34b4fc695522507d
|
||||
size 1168138808
|
||||
298
model.safetensors.index.json
Normal file
298
model.safetensors.index.json
Normal file
@@ -0,0 +1,298 @@
|
||||
{
|
||||
"metadata": {
|
||||
"total_size": 16060522496
|
||||
},
|
||||
"weight_map": {
|
||||
"lm_head.weight": "model-00004-of-00004.safetensors",
|
||||
"model.embed_tokens.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.20.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.20.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.20.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.20.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.21.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.21.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.21.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.21.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.30.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.input_layernorm.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.31.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.31.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.norm.weight": "model-00004-of-00004.safetensors"
|
||||
}
|
||||
}
|
||||
BIN
readme-assets/data-sources.png
Normal file
BIN
readme-assets/data-sources.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 85 KiB |
BIN
readme-assets/data-stats.png
Normal file
BIN
readme-assets/data-stats.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 210 KiB |
BIN
readme-assets/main-diagram.png
Normal file
BIN
readme-assets/main-diagram.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 178 KiB |
BIN
readme-assets/moral-evals.png
Normal file
BIN
readme-assets/moral-evals.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 218 KiB |
23
special_tokens_map.json
Normal file
23
special_tokens_map.json
Normal file
@@ -0,0 +1,23 @@
|
||||
{
|
||||
"bos_token": {
|
||||
"content": "<|begin_of_text|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"eos_token": {
|
||||
"content": "<|end_of_text|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": {
|
||||
"content": "<|end_of_text|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
410504
tokenizer.json
Normal file
410504
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
2065
tokenizer_config.json
Normal file
2065
tokenizer_config.json
Normal file
File diff suppressed because it is too large
Load Diff
8
train_results.json
Normal file
8
train_results.json
Normal file
@@ -0,0 +1,8 @@
|
||||
{
|
||||
"epoch": 4.0,
|
||||
"total_flos": 5363820134400.0,
|
||||
"train_loss": 0.5090278356025616,
|
||||
"train_runtime": 6014.9673,
|
||||
"train_samples_per_second": 2.031,
|
||||
"train_steps_per_second": 0.032
|
||||
}
|
||||
80
trainer_log.jsonl
Normal file
80
trainer_log.jsonl
Normal file
@@ -0,0 +1,80 @@
|
||||
{"current_steps": 1, "total_steps": 192, "loss": 0.9805, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 0.0, "epoch": 0.020833333333333332, "percentage": 0.52, "elapsed_time": "0:00:21", "remaining_time": "1:08:51"}
|
||||
{"current_steps": 1, "total_steps": 192, "loss": null, "eval_loss": 0.9736970067024231, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.020833333333333332, "percentage": 0.52, "elapsed_time": "0:00:21", "remaining_time": "1:08:51"}
|
||||
{"current_steps": 5, "total_steps": 192, "loss": 0.9446, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.5e-06, "epoch": 0.10416666666666667, "percentage": 2.6, "elapsed_time": "0:01:25", "remaining_time": "0:53:13"}
|
||||
{"current_steps": 5, "total_steps": 192, "loss": null, "eval_loss": 0.9454841613769531, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.10416666666666667, "percentage": 2.6, "elapsed_time": "0:01:25", "remaining_time": "0:53:13"}
|
||||
{"current_steps": 10, "total_steps": 192, "loss": 0.8481, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.25e-06, "epoch": 0.20833333333333334, "percentage": 5.21, "elapsed_time": "0:04:04", "remaining_time": "1:14:01"}
|
||||
{"current_steps": 10, "total_steps": 192, "loss": null, "eval_loss": 0.8153812289237976, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.20833333333333334, "percentage": 5.21, "elapsed_time": "0:04:04", "remaining_time": "1:14:01"}
|
||||
{"current_steps": 15, "total_steps": 192, "loss": 0.7794, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 9e-06, "epoch": 0.3125, "percentage": 7.81, "elapsed_time": "0:06:37", "remaining_time": "1:18:10"}
|
||||
{"current_steps": 15, "total_steps": 192, "loss": null, "eval_loss": 0.8123041987419128, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.3125, "percentage": 7.81, "elapsed_time": "0:06:37", "remaining_time": "1:18:10"}
|
||||
{"current_steps": 20, "total_steps": 192, "loss": 0.7798, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.275e-05, "epoch": 0.4166666666666667, "percentage": 10.42, "elapsed_time": "0:09:16", "remaining_time": "1:19:44"}
|
||||
{"current_steps": 20, "total_steps": 192, "loss": null, "eval_loss": 0.8410752415657043, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.4166666666666667, "percentage": 10.42, "elapsed_time": "0:09:16", "remaining_time": "1:19:44"}
|
||||
{"current_steps": 25, "total_steps": 192, "loss": 0.8576, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.3195176200175283e-05, "epoch": 0.5208333333333334, "percentage": 13.02, "elapsed_time": "0:11:57", "remaining_time": "1:19:51"}
|
||||
{"current_steps": 25, "total_steps": 192, "loss": null, "eval_loss": 0.8676239848136902, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.5208333333333334, "percentage": 13.02, "elapsed_time": "0:11:57", "remaining_time": "1:19:51"}
|
||||
{"current_steps": 30, "total_steps": 192, "loss": 0.8852, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 9.515676612044427e-06, "epoch": 0.625, "percentage": 15.62, "elapsed_time": "0:14:41", "remaining_time": "1:19:17"}
|
||||
{"current_steps": 30, "total_steps": 192, "loss": null, "eval_loss": 0.867268979549408, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.625, "percentage": 15.62, "elapsed_time": "0:14:41", "remaining_time": "1:19:17"}
|
||||
{"current_steps": 35, "total_steps": 192, "loss": 0.8529, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 6.797580677308734e-06, "epoch": 0.7291666666666666, "percentage": 18.23, "elapsed_time": "0:17:12", "remaining_time": "1:17:10"}
|
||||
{"current_steps": 35, "total_steps": 192, "loss": null, "eval_loss": 0.8560981154441833, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.7291666666666666, "percentage": 18.23, "elapsed_time": "0:17:12", "remaining_time": "1:17:10"}
|
||||
{"current_steps": 40, "total_steps": 192, "loss": 0.8224, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 4.808575415542887e-06, "epoch": 0.8333333333333334, "percentage": 20.83, "elapsed_time": "0:19:49", "remaining_time": "1:15:19"}
|
||||
{"current_steps": 40, "total_steps": 192, "loss": null, "eval_loss": 0.8470456004142761, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.8333333333333334, "percentage": 20.83, "elapsed_time": "0:19:49", "remaining_time": "1:15:19"}
|
||||
{"current_steps": 45, "total_steps": 192, "loss": 0.8536, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.3676619069852654e-06, "epoch": 0.9375, "percentage": 23.44, "elapsed_time": "0:22:24", "remaining_time": "1:13:13"}
|
||||
{"current_steps": 45, "total_steps": 192, "loss": null, "eval_loss": 0.8378292918205261, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.9375, "percentage": 23.44, "elapsed_time": "0:22:24", "remaining_time": "1:13:13"}
|
||||
{"current_steps": 50, "total_steps": 192, "loss": 0.662, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.334947896124909e-06, "epoch": 1.0416666666666667, "percentage": 26.04, "elapsed_time": "0:25:01", "remaining_time": "1:11:05"}
|
||||
{"current_steps": 50, "total_steps": 192, "loss": null, "eval_loss": 0.8293696045875549, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.0416666666666667, "percentage": 26.04, "elapsed_time": "0:25:01", "remaining_time": "1:11:05"}
|
||||
{"current_steps": 55, "total_steps": 192, "loss": 0.437, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.603233215095547e-06, "epoch": 1.1458333333333333, "percentage": 28.65, "elapsed_time": "0:27:38", "remaining_time": "1:08:51"}
|
||||
{"current_steps": 55, "total_steps": 192, "loss": null, "eval_loss": 0.8531150817871094, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.1458333333333333, "percentage": 28.65, "elapsed_time": "0:27:38", "remaining_time": "1:08:51"}
|
||||
{"current_steps": 60, "total_steps": 192, "loss": 0.4402, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.0911174606561334e-06, "epoch": 1.25, "percentage": 31.25, "elapsed_time": "0:30:17", "remaining_time": "1:06:37"}
|
||||
{"current_steps": 60, "total_steps": 192, "loss": null, "eval_loss": 0.8569180369377136, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.25, "percentage": 31.25, "elapsed_time": "0:30:17", "remaining_time": "1:06:37"}
|
||||
{"current_steps": 65, "total_steps": 192, "loss": 0.4244, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 7.373930741131784e-07, "epoch": 1.3541666666666667, "percentage": 33.85, "elapsed_time": "0:32:53", "remaining_time": "1:04:15"}
|
||||
{"current_steps": 65, "total_steps": 192, "loss": null, "eval_loss": 0.8569238185882568, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.3541666666666667, "percentage": 33.85, "elapsed_time": "0:32:53", "remaining_time": "1:04:15"}
|
||||
{"current_steps": 70, "total_steps": 192, "loss": 0.4495, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.374210410959207e-07, "epoch": 1.4583333333333333, "percentage": 36.46, "elapsed_time": "0:35:28", "remaining_time": "1:01:50"}
|
||||
{"current_steps": 70, "total_steps": 192, "loss": null, "eval_loss": 0.8547163605690002, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.4583333333333333, "percentage": 36.46, "elapsed_time": "0:35:28", "remaining_time": "1:01:50"}
|
||||
{"current_steps": 75, "total_steps": 192, "loss": 0.4689, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.6222476698215175e-07, "epoch": 1.5625, "percentage": 39.06, "elapsed_time": "0:38:03", "remaining_time": "0:59:22"}
|
||||
{"current_steps": 75, "total_steps": 192, "loss": null, "eval_loss": 0.8493571877479553, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.5625, "percentage": 39.06, "elapsed_time": "0:38:03", "remaining_time": "0:59:22"}
|
||||
{"current_steps": 80, "total_steps": 192, "loss": 0.4309, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.462755297384099e-07, "epoch": 1.6666666666666665, "percentage": 41.67, "elapsed_time": "0:40:43", "remaining_time": "0:57:00"}
|
||||
{"current_steps": 80, "total_steps": 192, "loss": null, "eval_loss": 0.846055269241333, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.6666666666666665, "percentage": 41.67, "elapsed_time": "0:40:43", "remaining_time": "0:57:00"}
|
||||
{"current_steps": 85, "total_steps": 192, "loss": 0.4299, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.7088740175034947e-07, "epoch": 1.7708333333333335, "percentage": 44.27, "elapsed_time": "0:43:19", "remaining_time": "0:54:32"}
|
||||
{"current_steps": 85, "total_steps": 192, "loss": null, "eval_loss": 0.8445951342582703, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.7708333333333335, "percentage": 44.27, "elapsed_time": "0:43:19", "remaining_time": "0:54:32"}
|
||||
{"current_steps": 90, "total_steps": 192, "loss": 0.4461, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.228102956599465e-07, "epoch": 1.875, "percentage": 46.88, "elapsed_time": "0:45:54", "remaining_time": "0:52:01"}
|
||||
{"current_steps": 90, "total_steps": 192, "loss": null, "eval_loss": 0.8440027832984924, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.875, "percentage": 46.88, "elapsed_time": "0:45:54", "remaining_time": "0:52:01"}
|
||||
{"current_steps": 95, "total_steps": 192, "loss": 0.4474, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 9.279207916081227e-08, "epoch": 1.9791666666666665, "percentage": 49.48, "elapsed_time": "0:48:29", "remaining_time": "0:49:30"}
|
||||
{"current_steps": 95, "total_steps": 192, "loss": null, "eval_loss": 0.8438854217529297, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.9791666666666665, "percentage": 49.48, "elapsed_time": "0:48:29", "remaining_time": "0:49:30"}
|
||||
{"current_steps": 100, "total_steps": 192, "loss": 0.3614, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 7.448002404850094e-08, "epoch": 2.0833333333333335, "percentage": 52.08, "elapsed_time": "0:51:06", "remaining_time": "0:47:00"}
|
||||
{"current_steps": 100, "total_steps": 192, "loss": null, "eval_loss": 0.8445320725440979, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.0833333333333335, "percentage": 52.08, "elapsed_time": "0:51:06", "remaining_time": "0:47:00"}
|
||||
{"current_steps": 105, "total_steps": 192, "loss": 0.3861, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 6.35920070839697e-08, "epoch": 2.1875, "percentage": 54.69, "elapsed_time": "0:53:39", "remaining_time": "0:44:27"}
|
||||
{"current_steps": 105, "total_steps": 192, "loss": null, "eval_loss": 0.8457441926002502, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.1875, "percentage": 54.69, "elapsed_time": "0:53:39", "remaining_time": "0:44:27"}
|
||||
{"current_steps": 110, "total_steps": 192, "loss": 0.3829, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.7299804687499997e-08, "epoch": 2.2916666666666665, "percentage": 57.29, "elapsed_time": "0:56:16", "remaining_time": "0:41:57"}
|
||||
{"current_steps": 110, "total_steps": 192, "loss": null, "eval_loss": 0.847288191318512, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.2916666666666665, "percentage": 57.29, "elapsed_time": "0:56:16", "remaining_time": "0:41:57"}
|
||||
{"current_steps": 115, "total_steps": 192, "loss": 0.3764, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.37771434967624e-08, "epoch": 2.3958333333333335, "percentage": 59.9, "elapsed_time": "0:58:52", "remaining_time": "0:39:24"}
|
||||
{"current_steps": 115, "total_steps": 192, "loss": null, "eval_loss": 0.8487641215324402, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.3958333333333335, "percentage": 59.9, "elapsed_time": "0:58:52", "remaining_time": "0:39:24"}
|
||||
{"current_steps": 120, "total_steps": 192, "loss": 0.3655, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.187403540619925e-08, "epoch": 2.5, "percentage": 62.5, "elapsed_time": "1:01:27", "remaining_time": "0:36:52"}
|
||||
{"current_steps": 120, "total_steps": 192, "loss": null, "eval_loss": 0.8499611020088196, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.5, "percentage": 62.5, "elapsed_time": "1:01:27", "remaining_time": "0:36:52"}
|
||||
{"current_steps": 125, "total_steps": 192, "loss": 0.4243, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.088648238966908e-08, "epoch": 2.6041666666666665, "percentage": 65.1, "elapsed_time": "1:04:01", "remaining_time": "0:34:19"}
|
||||
{"current_steps": 125, "total_steps": 192, "loss": null, "eval_loss": 0.8510637879371643, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.6041666666666665, "percentage": 65.1, "elapsed_time": "1:04:01", "remaining_time": "0:34:19"}
|
||||
{"current_steps": 130, "total_steps": 192, "loss": 0.3884, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.039701925276604e-08, "epoch": 2.7083333333333335, "percentage": 67.71, "elapsed_time": "1:06:38", "remaining_time": "0:31:46"}
|
||||
{"current_steps": 130, "total_steps": 192, "loss": null, "eval_loss": 0.8520172238349915, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.7083333333333335, "percentage": 67.71, "elapsed_time": "1:06:38", "remaining_time": "0:31:46"}
|
||||
{"current_steps": 135, "total_steps": 192, "loss": 0.3634, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0166900048082497e-08, "epoch": 2.8125, "percentage": 70.31, "elapsed_time": "1:09:13", "remaining_time": "0:29:13"}
|
||||
{"current_steps": 135, "total_steps": 192, "loss": null, "eval_loss": 0.8528143763542175, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.8125, "percentage": 70.31, "elapsed_time": "1:09:13", "remaining_time": "0:29:13"}
|
||||
{"current_steps": 140, "total_steps": 192, "loss": 0.3846, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0065147322870076e-08, "epoch": 2.9166666666666665, "percentage": 72.92, "elapsed_time": "1:11:49", "remaining_time": "0:26:40"}
|
||||
{"current_steps": 140, "total_steps": 192, "loss": null, "eval_loss": 0.8537066578865051, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.9166666666666665, "percentage": 72.92, "elapsed_time": "1:11:49", "remaining_time": "0:26:40"}
|
||||
{"current_steps": 145, "total_steps": 192, "loss": 0.3872, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.002328628528332e-08, "epoch": 3.0208333333333335, "percentage": 75.52, "elapsed_time": "1:14:24", "remaining_time": "0:24:06"}
|
||||
{"current_steps": 145, "total_steps": 192, "loss": null, "eval_loss": 0.8547406196594238, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.0208333333333335, "percentage": 75.52, "elapsed_time": "1:14:24", "remaining_time": "0:24:06"}
|
||||
{"current_steps": 150, "total_steps": 192, "loss": 0.3869, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0007484528133236e-08, "epoch": 3.125, "percentage": 78.12, "elapsed_time": "1:17:02", "remaining_time": "0:21:34"}
|
||||
{"current_steps": 150, "total_steps": 192, "loss": null, "eval_loss": 0.8557960391044617, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.125, "percentage": 78.12, "elapsed_time": "1:17:02", "remaining_time": "0:21:34"}
|
||||
{"current_steps": 155, "total_steps": 192, "loss": 0.3876, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0002110817570477e-08, "epoch": 3.2291666666666665, "percentage": 80.73, "elapsed_time": "1:19:36", "remaining_time": "0:19:00"}
|
||||
{"current_steps": 155, "total_steps": 192, "loss": null, "eval_loss": 0.8566272854804993, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.2291666666666665, "percentage": 80.73, "elapsed_time": "1:19:36", "remaining_time": "0:19:00"}
|
||||
{"current_steps": 160, "total_steps": 192, "loss": 0.3844, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0000504842356326e-08, "epoch": 3.3333333333333335, "percentage": 83.33, "elapsed_time": "1:22:13", "remaining_time": "0:16:26"}
|
||||
{"current_steps": 160, "total_steps": 192, "loss": null, "eval_loss": 0.8572790026664734, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.3333333333333335, "percentage": 83.33, "elapsed_time": "1:22:13", "remaining_time": "0:16:26"}
|
||||
{"current_steps": 165, "total_steps": 192, "loss": 0.3535, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.000009745562451e-08, "epoch": 3.4375, "percentage": 85.94, "elapsed_time": "1:24:48", "remaining_time": "0:13:52"}
|
||||
{"current_steps": 165, "total_steps": 192, "loss": null, "eval_loss": 0.8578632473945618, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.4375, "percentage": 85.94, "elapsed_time": "1:24:48", "remaining_time": "0:13:52"}
|
||||
{"current_steps": 170, "total_steps": 192, "loss": 0.3488, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0000014077810156e-08, "epoch": 3.5416666666666665, "percentage": 88.54, "elapsed_time": "1:27:24", "remaining_time": "0:11:18"}
|
||||
{"current_steps": 170, "total_steps": 192, "loss": null, "eval_loss": 0.85884028673172, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.5416666666666665, "percentage": 88.54, "elapsed_time": "1:27:24", "remaining_time": "0:11:18"}
|
||||
{"current_steps": 175, "total_steps": 192, "loss": 0.3464, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0000001343508807e-08, "epoch": 3.6458333333333335, "percentage": 91.15, "elapsed_time": "1:30:00", "remaining_time": "0:08:44"}
|
||||
{"current_steps": 175, "total_steps": 192, "loss": null, "eval_loss": 0.8598365783691406, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.6458333333333335, "percentage": 91.15, "elapsed_time": "1:30:00", "remaining_time": "0:08:44"}
|
||||
{"current_steps": 180, "total_steps": 192, "loss": 0.361, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.000000006747581e-08, "epoch": 3.75, "percentage": 93.75, "elapsed_time": "1:32:36", "remaining_time": "0:06:10"}
|
||||
{"current_steps": 180, "total_steps": 192, "loss": null, "eval_loss": 0.8606703877449036, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.75, "percentage": 93.75, "elapsed_time": "1:32:36", "remaining_time": "0:06:10"}
|
||||
{"current_steps": 185, "total_steps": 192, "loss": 0.3674, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0000000001094325e-08, "epoch": 3.8541666666666665, "percentage": 96.35, "elapsed_time": "1:35:11", "remaining_time": "0:03:36"}
|
||||
{"current_steps": 185, "total_steps": 192, "loss": null, "eval_loss": 0.8611735701560974, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.8541666666666665, "percentage": 96.35, "elapsed_time": "1:35:11", "remaining_time": "0:03:36"}
|
||||
{"current_steps": 190, "total_steps": 192, "loss": 0.3988, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.000000000000139e-08, "epoch": 3.9583333333333335, "percentage": 98.96, "elapsed_time": "1:37:48", "remaining_time": "0:01:01"}
|
||||
{"current_steps": 190, "total_steps": 192, "loss": null, "eval_loss": 0.8612277507781982, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.9583333333333335, "percentage": 98.96, "elapsed_time": "1:37:48", "remaining_time": "0:01:01"}
|
||||
{"current_steps": 192, "total_steps": 192, "loss": null, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 4.0, "percentage": 100.0, "elapsed_time": "1:39:43", "remaining_time": "0:00:00"}
|
||||
{"current_steps": 3, "total_steps": 3, "loss": null, "eval_loss": 0.8123041987419128, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 4.0, "percentage": 100.0, "elapsed_time": "1:40:42", "remaining_time": "0:00:00"}
|
||||
615
trainer_state.json
Normal file
615
trainer_state.json
Normal file
@@ -0,0 +1,615 @@
|
||||
{
|
||||
"best_metric": 0.8123041987419128,
|
||||
"best_model_checkpoint": "./output/training_results/C013_llama3-8b-base_instruct_20240428_005832/checkpoint-15",
|
||||
"epoch": 4.0,
|
||||
"eval_steps": 5,
|
||||
"global_step": 192,
|
||||
"is_hyper_param_search": false,
|
||||
"is_local_process_zero": true,
|
||||
"is_world_process_zero": true,
|
||||
"log_history": [
|
||||
{
|
||||
"epoch": 0.020833333333333332,
|
||||
"grad_norm": 0.0,
|
||||
"learning_rate": 0.0,
|
||||
"loss": 0.9805,
|
||||
"step": 1
|
||||
},
|
||||
{
|
||||
"epoch": 0.020833333333333332,
|
||||
"eval_loss": 0.9736970067024231,
|
||||
"eval_runtime": 2.153,
|
||||
"eval_samples_per_second": 157.916,
|
||||
"eval_steps_per_second": 1.393,
|
||||
"step": 1
|
||||
},
|
||||
{
|
||||
"epoch": 0.10416666666666667,
|
||||
"grad_norm": 14.850728211706278,
|
||||
"learning_rate": 1.5e-06,
|
||||
"loss": 0.9446,
|
||||
"step": 5
|
||||
},
|
||||
{
|
||||
"epoch": 0.10416666666666667,
|
||||
"eval_loss": 0.9454841613769531,
|
||||
"eval_runtime": 2.0973,
|
||||
"eval_samples_per_second": 162.11,
|
||||
"eval_steps_per_second": 1.43,
|
||||
"step": 5
|
||||
},
|
||||
{
|
||||
"epoch": 0.20833333333333334,
|
||||
"grad_norm": 4.950599387514031,
|
||||
"learning_rate": 5.25e-06,
|
||||
"loss": 0.8481,
|
||||
"step": 10
|
||||
},
|
||||
{
|
||||
"epoch": 0.20833333333333334,
|
||||
"eval_loss": 0.8153812289237976,
|
||||
"eval_runtime": 2.0923,
|
||||
"eval_samples_per_second": 162.499,
|
||||
"eval_steps_per_second": 1.434,
|
||||
"step": 10
|
||||
},
|
||||
{
|
||||
"epoch": 0.3125,
|
||||
"grad_norm": 4.621063619275185,
|
||||
"learning_rate": 9e-06,
|
||||
"loss": 0.7794,
|
||||
"step": 15
|
||||
},
|
||||
{
|
||||
"epoch": 0.3125,
|
||||
"eval_loss": 0.8123041987419128,
|
||||
"eval_runtime": 2.1028,
|
||||
"eval_samples_per_second": 161.686,
|
||||
"eval_steps_per_second": 1.427,
|
||||
"step": 15
|
||||
},
|
||||
{
|
||||
"epoch": 0.4166666666666667,
|
||||
"grad_norm": 4.141809373286457,
|
||||
"learning_rate": 1.275e-05,
|
||||
"loss": 0.7798,
|
||||
"step": 20
|
||||
},
|
||||
{
|
||||
"epoch": 0.4166666666666667,
|
||||
"eval_loss": 0.8410752415657043,
|
||||
"eval_runtime": 2.0891,
|
||||
"eval_samples_per_second": 162.747,
|
||||
"eval_steps_per_second": 1.436,
|
||||
"step": 20
|
||||
},
|
||||
{
|
||||
"epoch": 0.5208333333333334,
|
||||
"grad_norm": 4.211750921552142,
|
||||
"learning_rate": 1.3195176200175283e-05,
|
||||
"loss": 0.8576,
|
||||
"step": 25
|
||||
},
|
||||
{
|
||||
"epoch": 0.5208333333333334,
|
||||
"eval_loss": 0.8676239848136902,
|
||||
"eval_runtime": 2.0885,
|
||||
"eval_samples_per_second": 162.793,
|
||||
"eval_steps_per_second": 1.436,
|
||||
"step": 25
|
||||
},
|
||||
{
|
||||
"epoch": 0.625,
|
||||
"grad_norm": 4.126229536438554,
|
||||
"learning_rate": 9.515676612044427e-06,
|
||||
"loss": 0.8852,
|
||||
"step": 30
|
||||
},
|
||||
{
|
||||
"epoch": 0.625,
|
||||
"eval_loss": 0.867268979549408,
|
||||
"eval_runtime": 2.0839,
|
||||
"eval_samples_per_second": 163.157,
|
||||
"eval_steps_per_second": 1.44,
|
||||
"step": 30
|
||||
},
|
||||
{
|
||||
"epoch": 0.7291666666666666,
|
||||
"grad_norm": 4.316589185885892,
|
||||
"learning_rate": 6.797580677308734e-06,
|
||||
"loss": 0.8529,
|
||||
"step": 35
|
||||
},
|
||||
{
|
||||
"epoch": 0.7291666666666666,
|
||||
"eval_loss": 0.8560981154441833,
|
||||
"eval_runtime": 2.1307,
|
||||
"eval_samples_per_second": 159.573,
|
||||
"eval_steps_per_second": 1.408,
|
||||
"step": 35
|
||||
},
|
||||
{
|
||||
"epoch": 0.8333333333333334,
|
||||
"grad_norm": 4.0216031828158005,
|
||||
"learning_rate": 4.808575415542887e-06,
|
||||
"loss": 0.8224,
|
||||
"step": 40
|
||||
},
|
||||
{
|
||||
"epoch": 0.8333333333333334,
|
||||
"eval_loss": 0.8470456004142761,
|
||||
"eval_runtime": 2.0873,
|
||||
"eval_samples_per_second": 162.886,
|
||||
"eval_steps_per_second": 1.437,
|
||||
"step": 40
|
||||
},
|
||||
{
|
||||
"epoch": 0.9375,
|
||||
"grad_norm": 4.316706720311178,
|
||||
"learning_rate": 3.3676619069852654e-06,
|
||||
"loss": 0.8536,
|
||||
"step": 45
|
||||
},
|
||||
{
|
||||
"epoch": 0.9375,
|
||||
"eval_loss": 0.8378292918205261,
|
||||
"eval_runtime": 2.0847,
|
||||
"eval_samples_per_second": 163.089,
|
||||
"eval_steps_per_second": 1.439,
|
||||
"step": 45
|
||||
},
|
||||
{
|
||||
"epoch": 1.0416666666666667,
|
||||
"grad_norm": 3.7957934185208795,
|
||||
"learning_rate": 2.334947896124909e-06,
|
||||
"loss": 0.662,
|
||||
"step": 50
|
||||
},
|
||||
{
|
||||
"epoch": 1.0416666666666667,
|
||||
"eval_loss": 0.8293696045875549,
|
||||
"eval_runtime": 2.0835,
|
||||
"eval_samples_per_second": 163.187,
|
||||
"eval_steps_per_second": 1.44,
|
||||
"step": 50
|
||||
},
|
||||
{
|
||||
"epoch": 1.1458333333333333,
|
||||
"grad_norm": 3.4155908301931186,
|
||||
"learning_rate": 1.603233215095547e-06,
|
||||
"loss": 0.437,
|
||||
"step": 55
|
||||
},
|
||||
{
|
||||
"epoch": 1.1458333333333333,
|
||||
"eval_loss": 0.8531150817871094,
|
||||
"eval_runtime": 2.1006,
|
||||
"eval_samples_per_second": 161.859,
|
||||
"eval_steps_per_second": 1.428,
|
||||
"step": 55
|
||||
},
|
||||
{
|
||||
"epoch": 1.25,
|
||||
"grad_norm": 3.377214905899517,
|
||||
"learning_rate": 1.0911174606561334e-06,
|
||||
"loss": 0.4402,
|
||||
"step": 60
|
||||
},
|
||||
{
|
||||
"epoch": 1.25,
|
||||
"eval_loss": 0.8569180369377136,
|
||||
"eval_runtime": 2.0899,
|
||||
"eval_samples_per_second": 162.69,
|
||||
"eval_steps_per_second": 1.436,
|
||||
"step": 60
|
||||
},
|
||||
{
|
||||
"epoch": 1.3541666666666667,
|
||||
"grad_norm": 4.018786896199577,
|
||||
"learning_rate": 7.373930741131784e-07,
|
||||
"loss": 0.4244,
|
||||
"step": 65
|
||||
},
|
||||
{
|
||||
"epoch": 1.3541666666666667,
|
||||
"eval_loss": 0.8569238185882568,
|
||||
"eval_runtime": 2.0969,
|
||||
"eval_samples_per_second": 162.148,
|
||||
"eval_steps_per_second": 1.431,
|
||||
"step": 65
|
||||
},
|
||||
{
|
||||
"epoch": 1.4583333333333333,
|
||||
"grad_norm": 4.3050060673581205,
|
||||
"learning_rate": 5.374210410959207e-07,
|
||||
"loss": 0.4495,
|
||||
"step": 70
|
||||
},
|
||||
{
|
||||
"epoch": 1.4583333333333333,
|
||||
"eval_loss": 0.8547163605690002,
|
||||
"eval_runtime": 2.0852,
|
||||
"eval_samples_per_second": 163.056,
|
||||
"eval_steps_per_second": 1.439,
|
||||
"step": 70
|
||||
},
|
||||
{
|
||||
"epoch": 1.5625,
|
||||
"grad_norm": 3.8753963390823842,
|
||||
"learning_rate": 3.6222476698215175e-07,
|
||||
"loss": 0.4689,
|
||||
"step": 75
|
||||
},
|
||||
{
|
||||
"epoch": 1.5625,
|
||||
"eval_loss": 0.8493571877479553,
|
||||
"eval_runtime": 2.1006,
|
||||
"eval_samples_per_second": 161.855,
|
||||
"eval_steps_per_second": 1.428,
|
||||
"step": 75
|
||||
},
|
||||
{
|
||||
"epoch": 1.6666666666666665,
|
||||
"grad_norm": 3.2777220151938935,
|
||||
"learning_rate": 2.462755297384099e-07,
|
||||
"loss": 0.4309,
|
||||
"step": 80
|
||||
},
|
||||
{
|
||||
"epoch": 1.6666666666666665,
|
||||
"eval_loss": 0.846055269241333,
|
||||
"eval_runtime": 2.0775,
|
||||
"eval_samples_per_second": 163.657,
|
||||
"eval_steps_per_second": 1.444,
|
||||
"step": 80
|
||||
},
|
||||
{
|
||||
"epoch": 1.7708333333333335,
|
||||
"grad_norm": 3.25027538013195,
|
||||
"learning_rate": 1.7088740175034947e-07,
|
||||
"loss": 0.4299,
|
||||
"step": 85
|
||||
},
|
||||
{
|
||||
"epoch": 1.7708333333333335,
|
||||
"eval_loss": 0.8445951342582703,
|
||||
"eval_runtime": 2.0859,
|
||||
"eval_samples_per_second": 163.002,
|
||||
"eval_steps_per_second": 1.438,
|
||||
"step": 85
|
||||
},
|
||||
{
|
||||
"epoch": 1.875,
|
||||
"grad_norm": 3.841600887262257,
|
||||
"learning_rate": 1.228102956599465e-07,
|
||||
"loss": 0.4461,
|
||||
"step": 90
|
||||
},
|
||||
{
|
||||
"epoch": 1.875,
|
||||
"eval_loss": 0.8440027832984924,
|
||||
"eval_runtime": 2.099,
|
||||
"eval_samples_per_second": 161.984,
|
||||
"eval_steps_per_second": 1.429,
|
||||
"step": 90
|
||||
},
|
||||
{
|
||||
"epoch": 1.9791666666666665,
|
||||
"grad_norm": 4.633157495322692,
|
||||
"learning_rate": 9.279207916081227e-08,
|
||||
"loss": 0.4474,
|
||||
"step": 95
|
||||
},
|
||||
{
|
||||
"epoch": 1.9791666666666665,
|
||||
"eval_loss": 0.8438854217529297,
|
||||
"eval_runtime": 2.094,
|
||||
"eval_samples_per_second": 162.368,
|
||||
"eval_steps_per_second": 1.433,
|
||||
"step": 95
|
||||
},
|
||||
{
|
||||
"epoch": 2.0833333333333335,
|
||||
"grad_norm": 3.3543713588136885,
|
||||
"learning_rate": 7.448002404850094e-08,
|
||||
"loss": 0.3614,
|
||||
"step": 100
|
||||
},
|
||||
{
|
||||
"epoch": 2.0833333333333335,
|
||||
"eval_loss": 0.8445320725440979,
|
||||
"eval_runtime": 2.0778,
|
||||
"eval_samples_per_second": 163.634,
|
||||
"eval_steps_per_second": 1.444,
|
||||
"step": 100
|
||||
},
|
||||
{
|
||||
"epoch": 2.1875,
|
||||
"grad_norm": 3.5776096289343053,
|
||||
"learning_rate": 6.35920070839697e-08,
|
||||
"loss": 0.3861,
|
||||
"step": 105
|
||||
},
|
||||
{
|
||||
"epoch": 2.1875,
|
||||
"eval_loss": 0.8457441926002502,
|
||||
"eval_runtime": 2.1055,
|
||||
"eval_samples_per_second": 161.484,
|
||||
"eval_steps_per_second": 1.425,
|
||||
"step": 105
|
||||
},
|
||||
{
|
||||
"epoch": 2.2916666666666665,
|
||||
"grad_norm": 3.811456756438563,
|
||||
"learning_rate": 5.7299804687499997e-08,
|
||||
"loss": 0.3829,
|
||||
"step": 110
|
||||
},
|
||||
{
|
||||
"epoch": 2.2916666666666665,
|
||||
"eval_loss": 0.847288191318512,
|
||||
"eval_runtime": 2.083,
|
||||
"eval_samples_per_second": 163.223,
|
||||
"eval_steps_per_second": 1.44,
|
||||
"step": 110
|
||||
},
|
||||
{
|
||||
"epoch": 2.3958333333333335,
|
||||
"grad_norm": 3.1978758437608823,
|
||||
"learning_rate": 5.37771434967624e-08,
|
||||
"loss": 0.3764,
|
||||
"step": 115
|
||||
},
|
||||
{
|
||||
"epoch": 2.3958333333333335,
|
||||
"eval_loss": 0.8487641215324402,
|
||||
"eval_runtime": 2.1168,
|
||||
"eval_samples_per_second": 160.617,
|
||||
"eval_steps_per_second": 1.417,
|
||||
"step": 115
|
||||
},
|
||||
{
|
||||
"epoch": 2.5,
|
||||
"grad_norm": 3.472352228062058,
|
||||
"learning_rate": 5.187403540619925e-08,
|
||||
"loss": 0.3655,
|
||||
"step": 120
|
||||
},
|
||||
{
|
||||
"epoch": 2.5,
|
||||
"eval_loss": 0.8499611020088196,
|
||||
"eval_runtime": 2.0908,
|
||||
"eval_samples_per_second": 162.615,
|
||||
"eval_steps_per_second": 1.435,
|
||||
"step": 120
|
||||
},
|
||||
{
|
||||
"epoch": 2.6041666666666665,
|
||||
"grad_norm": 3.2298459394815793,
|
||||
"learning_rate": 5.088648238966908e-08,
|
||||
"loss": 0.4243,
|
||||
"step": 125
|
||||
},
|
||||
{
|
||||
"epoch": 2.6041666666666665,
|
||||
"eval_loss": 0.8510637879371643,
|
||||
"eval_runtime": 2.0941,
|
||||
"eval_samples_per_second": 162.36,
|
||||
"eval_steps_per_second": 1.433,
|
||||
"step": 125
|
||||
},
|
||||
{
|
||||
"epoch": 2.7083333333333335,
|
||||
"grad_norm": 3.7544587648641756,
|
||||
"learning_rate": 5.039701925276604e-08,
|
||||
"loss": 0.3884,
|
||||
"step": 130
|
||||
},
|
||||
{
|
||||
"epoch": 2.7083333333333335,
|
||||
"eval_loss": 0.8520172238349915,
|
||||
"eval_runtime": 2.1032,
|
||||
"eval_samples_per_second": 161.66,
|
||||
"eval_steps_per_second": 1.426,
|
||||
"step": 130
|
||||
},
|
||||
{
|
||||
"epoch": 2.8125,
|
||||
"grad_norm": 3.5032769257867695,
|
||||
"learning_rate": 5.0166900048082497e-08,
|
||||
"loss": 0.3634,
|
||||
"step": 135
|
||||
},
|
||||
{
|
||||
"epoch": 2.8125,
|
||||
"eval_loss": 0.8528143763542175,
|
||||
"eval_runtime": 2.0786,
|
||||
"eval_samples_per_second": 163.568,
|
||||
"eval_steps_per_second": 1.443,
|
||||
"step": 135
|
||||
},
|
||||
{
|
||||
"epoch": 2.9166666666666665,
|
||||
"grad_norm": 3.023294292675947,
|
||||
"learning_rate": 5.0065147322870076e-08,
|
||||
"loss": 0.3846,
|
||||
"step": 140
|
||||
},
|
||||
{
|
||||
"epoch": 2.9166666666666665,
|
||||
"eval_loss": 0.8537066578865051,
|
||||
"eval_runtime": 2.0903,
|
||||
"eval_samples_per_second": 162.659,
|
||||
"eval_steps_per_second": 1.435,
|
||||
"step": 140
|
||||
},
|
||||
{
|
||||
"epoch": 3.0208333333333335,
|
||||
"grad_norm": 3.1767015238154075,
|
||||
"learning_rate": 5.002328628528332e-08,
|
||||
"loss": 0.3872,
|
||||
"step": 145
|
||||
},
|
||||
{
|
||||
"epoch": 3.0208333333333335,
|
||||
"eval_loss": 0.8547406196594238,
|
||||
"eval_runtime": 2.0891,
|
||||
"eval_samples_per_second": 162.748,
|
||||
"eval_steps_per_second": 1.436,
|
||||
"step": 145
|
||||
},
|
||||
{
|
||||
"epoch": 3.125,
|
||||
"grad_norm": 3.1942747338221045,
|
||||
"learning_rate": 5.0007484528133236e-08,
|
||||
"loss": 0.3869,
|
||||
"step": 150
|
||||
},
|
||||
{
|
||||
"epoch": 3.125,
|
||||
"eval_loss": 0.8557960391044617,
|
||||
"eval_runtime": 2.0819,
|
||||
"eval_samples_per_second": 163.312,
|
||||
"eval_steps_per_second": 1.441,
|
||||
"step": 150
|
||||
},
|
||||
{
|
||||
"epoch": 3.2291666666666665,
|
||||
"grad_norm": 3.815918812229993,
|
||||
"learning_rate": 5.0002110817570477e-08,
|
||||
"loss": 0.3876,
|
||||
"step": 155
|
||||
},
|
||||
{
|
||||
"epoch": 3.2291666666666665,
|
||||
"eval_loss": 0.8566272854804993,
|
||||
"eval_runtime": 2.0781,
|
||||
"eval_samples_per_second": 163.61,
|
||||
"eval_steps_per_second": 1.444,
|
||||
"step": 155
|
||||
},
|
||||
{
|
||||
"epoch": 3.3333333333333335,
|
||||
"grad_norm": 3.4577646975309366,
|
||||
"learning_rate": 5.0000504842356326e-08,
|
||||
"loss": 0.3844,
|
||||
"step": 160
|
||||
},
|
||||
{
|
||||
"epoch": 3.3333333333333335,
|
||||
"eval_loss": 0.8572790026664734,
|
||||
"eval_runtime": 2.0811,
|
||||
"eval_samples_per_second": 163.373,
|
||||
"eval_steps_per_second": 1.442,
|
||||
"step": 160
|
||||
},
|
||||
{
|
||||
"epoch": 3.4375,
|
||||
"grad_norm": 3.274685205370877,
|
||||
"learning_rate": 5.000009745562451e-08,
|
||||
"loss": 0.3535,
|
||||
"step": 165
|
||||
},
|
||||
{
|
||||
"epoch": 3.4375,
|
||||
"eval_loss": 0.8578632473945618,
|
||||
"eval_runtime": 2.0918,
|
||||
"eval_samples_per_second": 162.539,
|
||||
"eval_steps_per_second": 1.434,
|
||||
"step": 165
|
||||
},
|
||||
{
|
||||
"epoch": 3.5416666666666665,
|
||||
"grad_norm": 3.246459205886974,
|
||||
"learning_rate": 5.0000014077810156e-08,
|
||||
"loss": 0.3488,
|
||||
"step": 170
|
||||
},
|
||||
{
|
||||
"epoch": 3.5416666666666665,
|
||||
"eval_loss": 0.85884028673172,
|
||||
"eval_runtime": 2.1178,
|
||||
"eval_samples_per_second": 160.545,
|
||||
"eval_steps_per_second": 1.417,
|
||||
"step": 170
|
||||
},
|
||||
{
|
||||
"epoch": 3.6458333333333335,
|
||||
"grad_norm": 3.3944513203963504,
|
||||
"learning_rate": 5.0000001343508807e-08,
|
||||
"loss": 0.3464,
|
||||
"step": 175
|
||||
},
|
||||
{
|
||||
"epoch": 3.6458333333333335,
|
||||
"eval_loss": 0.8598365783691406,
|
||||
"eval_runtime": 2.0828,
|
||||
"eval_samples_per_second": 163.238,
|
||||
"eval_steps_per_second": 1.44,
|
||||
"step": 175
|
||||
},
|
||||
{
|
||||
"epoch": 3.75,
|
||||
"grad_norm": 3.258773113208273,
|
||||
"learning_rate": 5.000000006747581e-08,
|
||||
"loss": 0.361,
|
||||
"step": 180
|
||||
},
|
||||
{
|
||||
"epoch": 3.75,
|
||||
"eval_loss": 0.8606703877449036,
|
||||
"eval_runtime": 2.1172,
|
||||
"eval_samples_per_second": 160.588,
|
||||
"eval_steps_per_second": 1.417,
|
||||
"step": 180
|
||||
},
|
||||
{
|
||||
"epoch": 3.8541666666666665,
|
||||
"grad_norm": 3.586703083699586,
|
||||
"learning_rate": 5.0000000001094325e-08,
|
||||
"loss": 0.3674,
|
||||
"step": 185
|
||||
},
|
||||
{
|
||||
"epoch": 3.8541666666666665,
|
||||
"eval_loss": 0.8611735701560974,
|
||||
"eval_runtime": 2.0956,
|
||||
"eval_samples_per_second": 162.243,
|
||||
"eval_steps_per_second": 1.432,
|
||||
"step": 185
|
||||
},
|
||||
{
|
||||
"epoch": 3.9583333333333335,
|
||||
"grad_norm": 3.5661429802112616,
|
||||
"learning_rate": 5.000000000000139e-08,
|
||||
"loss": 0.3988,
|
||||
"step": 190
|
||||
},
|
||||
{
|
||||
"epoch": 3.9583333333333335,
|
||||
"eval_loss": 0.8612277507781982,
|
||||
"eval_runtime": 2.0853,
|
||||
"eval_samples_per_second": 163.045,
|
||||
"eval_steps_per_second": 1.439,
|
||||
"step": 190
|
||||
},
|
||||
{
|
||||
"epoch": 4.0,
|
||||
"step": 192,
|
||||
"total_flos": 5363820134400.0,
|
||||
"train_loss": 0.5090278356025616,
|
||||
"train_runtime": 6014.9673,
|
||||
"train_samples_per_second": 2.031,
|
||||
"train_steps_per_second": 0.032
|
||||
}
|
||||
],
|
||||
"logging_steps": 5,
|
||||
"max_steps": 192,
|
||||
"num_input_tokens_seen": 0,
|
||||
"num_train_epochs": 4,
|
||||
"save_steps": 5,
|
||||
"total_flos": 5363820134400.0,
|
||||
"train_batch_size": 8,
|
||||
"trial_name": null,
|
||||
"trial_params": null
|
||||
}
|
||||
3
training_args.bin
Normal file
3
training_args.bin
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:fa3669f56e96d865dc1093fe06d25ee5dfbdfdce605f2359ead076119c115479
|
||||
size 6968
|
||||
BIN
training_eval_loss.png
Normal file
BIN
training_eval_loss.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 41 KiB |
BIN
training_loss.png
Normal file
BIN
training_loss.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 40 KiB |
Reference in New Issue
Block a user