初始化项目,由ModelHub XC社区提供模型
Model: PKU-Alignment/ProgressGym-HistLlama3-8B-C015-instruct-v0.2 Source: Original Platform
This commit is contained in:
39
.gitattributes
vendored
Normal file
39
.gitattributes
vendored
Normal file
@@ -0,0 +1,39 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
model-00001-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
model-00002-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
model-00003-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
model-00004-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
249
README.md
Normal file
249
README.md
Normal file
@@ -0,0 +1,249 @@
|
||||
---
|
||||
license: cc-by-4.0
|
||||
tags:
|
||||
- alignment
|
||||
- value alignment
|
||||
- AI safety
|
||||
- safety
|
||||
- LLM
|
||||
- history
|
||||
datasets:
|
||||
- PKU-Alignment/ProgressGym-HistText
|
||||
- PKU-Alignment/ProgressGym-TimelessQA
|
||||
base_model:
|
||||
- PKU-Alignment/ProgressGym-HistLlama3-8B-C015-pretrain
|
||||
- meta-llama/Meta-Llama-3-8B
|
||||
---
|
||||
|
||||
# ProgressGym-HistLlama3-8B-C015-instruct
|
||||
|
||||
## Overview
|
||||
|
||||
#### The ProgressGym Framework
|
||||
|
||||

|
||||
|
||||
**ProgressGym-HistLlama3-8B-C015-instruct** is part of the **ProgressGym** framework for research and experimentation on *progress alignment* - the emulation of moral progress in AI alignment algorithms, as a measure to prevent risks of societal value lock-in.
|
||||
|
||||
To quote the paper [*ProgressGym: Alignment with a Millennium of Moral Progress*](https://arxiv.org/abs/2406.20087):
|
||||
|
||||
> Frontier AI systems, including large language models (LLMs), hold increasing influence over the epistemology of human users. Such influence can reinforce prevailing societal values, potentially contributing to the lock-in of misguided moral beliefs and, consequently, the perpetuation of problematic moral practices on a broad scale.
|
||||
>
|
||||
> We introduce *progress alignment* as a technical solution to mitigate this imminent risk. Progress alignment algorithms learn to emulate the mechanics of human moral progress, thereby addressing the susceptibility of existing alignment methods to contemporary moral blindspots.
|
||||
|
||||
#### ProgressGym-HistLlama3-8B-C015-instruct
|
||||
|
||||
ProgressGym-HistLlama3-8B-C015-instruct is one of the **36 historical language models** in the ProgressGym framework.
|
||||
|
||||
**ProgressGym-HistLlama3-8B-C015-instruct is under continual iteration.** Improving upon the current version, new versions of the model are currently being trained to reflect historical moral tendencies in ever more comprehensive ways.
|
||||
|
||||
**ProgressGym-HistLlama3-8B-C015-instruct is a 15th-century historical language model.** Based on [Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B), It is continued-pretrained on the 15th-century text data from [ProgressGym-HistText](https://huggingface.co/datasets/PKU-Alignment/ProgressGym-HistText), using the following hyperparameters:
|
||||
|
||||
- learning_rate: 1.5e-05
|
||||
- train_batch_size: 8
|
||||
- eval_batch_size: 16
|
||||
- seed: 42
|
||||
- distributed_type: multi-GPU
|
||||
- num_devices: 8
|
||||
- total_train_batch_size: 64
|
||||
- total_eval_batch_size: 128
|
||||
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
||||
- lr_scheduler_type: polynomial
|
||||
- lr_scheduler_warmup_steps: 20
|
||||
- num_epochs: 3.02
|
||||
- mixed_precision_training: Native AMP
|
||||
|
||||
... with the following training results:
|
||||
|
||||
| Training Loss | Epoch | Step | Validation Loss |
|
||||
|:-------------:|:--------:|:----:|:---------------:|
|
||||
| 2.6141 | 0.006494 | 1 | 2.6354 |
|
||||
| 2.657 | 0.032468 | 5 | 2.6206 |
|
||||
| 2.6337 | 0.064935 | 10 | 2.5846 |
|
||||
| 2.5268 | 0.097403 | 15 | 2.5516 |
|
||||
| 2.5275 | 0.129870 | 20 | 2.5321 |
|
||||
| 2.5005 | 0.162338 | 25 | 2.5131 |
|
||||
| 2.5339 | 0.194805 | 30 | 2.4961 |
|
||||
| 2.5335 | 0.227273 | 35 | 2.4808 |
|
||||
| 2.4252 | 0.259740 | 40 | 2.4643 |
|
||||
| 2.4445 | 0.292208 | 45 | 2.4518 |
|
||||
| 2.4594 | 0.324675 | 50 | 2.4394 |
|
||||
| 2.4498 | 0.357143 | 55 | 2.4287 |
|
||||
| 2.3821 | 0.389610 | 60 | 2.4184 |
|
||||
| 2.4317 | 0.422078 | 65 | 2.4091 |
|
||||
| 2.3931 | 0.454545 | 70 | 2.4001 |
|
||||
| 2.3695 | 0.487013 | 75 | 2.3934 |
|
||||
| 2.3981 | 0.519481 | 80 | 2.3855 |
|
||||
| 2.3952 | 0.551948 | 85 | 2.3789 |
|
||||
| 2.4137 | 0.584416 | 90 | 2.3721 |
|
||||
| 2.3614 | 0.616883 | 95 | 2.3669 |
|
||||
| 2.3467 | 0.649351 | 100 | 2.3612 |
|
||||
| 2.4012 | 0.681818 | 105 | 2.3569 |
|
||||
| 2.3224 | 0.714286 | 110 | 2.3528 |
|
||||
| 2.3348 | 0.746753 | 115 | 2.3483 |
|
||||
| 2.3573 | 0.779221 | 120 | 2.3448 |
|
||||
| 2.306 | 0.811688 | 125 | 2.3412 |
|
||||
| 2.342 | 0.844156 | 130 | 2.3382 |
|
||||
| 2.3045 | 0.876623 | 135 | 2.3356 |
|
||||
| 2.2959 | 0.909091 | 140 | 2.3330 |
|
||||
| 2.3545 | 0.941558 | 145 | 2.3305 |
|
||||
| 2.3446 | 0.974026 | 150 | 2.3285 |
|
||||
| 2.2502 | 1.006494 | 155 | 2.3268 |
|
||||
| 2.0791 | 1.038961 | 160 | 2.3347 |
|
||||
| 2.1034 | 1.071429 | 165 | 2.3399 |
|
||||
| 2.095 | 1.103896 | 170 | 2.3358 |
|
||||
| 2.0627 | 1.136364 | 175 | 2.3346 |
|
||||
| 2.0408 | 1.168831 | 180 | 2.3357 |
|
||||
| 2.0575 | 1.201299 | 185 | 2.3364 |
|
||||
| 2.0976 | 1.233766 | 190 | 2.3349 |
|
||||
| 2.0668 | 1.266234 | 195 | 2.3336 |
|
||||
| 2.0579 | 1.298701 | 200 | 2.3329 |
|
||||
| 2.0756 | 1.331169 | 205 | 2.3326 |
|
||||
| 2.1174 | 1.363636 | 210 | 2.3325 |
|
||||
| 2.0663 | 1.396104 | 215 | 2.3325 |
|
||||
| 2.0941 | 1.428571 | 220 | 2.3324 |
|
||||
| 2.1074 | 1.461039 | 225 | 2.3324 |
|
||||
| 2.1251 | 1.493506 | 230 | 2.3322 |
|
||||
| 2.0629 | 1.525974 | 235 | 2.3318 |
|
||||
| 2.0872 | 1.558442 | 240 | 2.3312 |
|
||||
| 2.0994 | 1.590909 | 245 | 2.3310 |
|
||||
| 2.0879 | 1.623377 | 250 | 2.3308 |
|
||||
| 2.0623 | 1.655844 | 255 | 2.3305 |
|
||||
| 2.1054 | 1.688312 | 260 | 2.3303 |
|
||||
| 2.0736 | 1.720779 | 265 | 2.3301 |
|
||||
| 2.1146 | 1.753247 | 270 | 2.3300 |
|
||||
| 2.0444 | 1.785714 | 275 | 2.3301 |
|
||||
| 2.0541 | 1.818182 | 280 | 2.3301 |
|
||||
| 2.1333 | 1.850649 | 285 | 2.3300 |
|
||||
| 2.1101 | 1.883117 | 290 | 2.3299 |
|
||||
| 2.0234 | 1.915584 | 295 | 2.3298 |
|
||||
| 2.0671 | 1.948052 | 300 | 2.3298 |
|
||||
| 2.083 | 1.980519 | 305 | 2.3298 |
|
||||
| 2.0417 | 2.012987 | 310 | 2.3299 |
|
||||
| 2.0784 | 2.045455 | 315 | 2.3303 |
|
||||
| 2.058 | 2.077922 | 320 | 2.3308 |
|
||||
| 2.0524 | 2.110390 | 325 | 2.3312 |
|
||||
| 2.0318 | 2.142857 | 330 | 2.3316 |
|
||||
| 2.0914 | 2.175325 | 335 | 2.3318 |
|
||||
| 2.0319 | 2.207792 | 340 | 2.3320 |
|
||||
| 2.0099 | 2.240260 | 345 | 2.3322 |
|
||||
| 2.075 | 2.272727 | 350 | 2.3323 |
|
||||
| 2.0444 | 2.305195 | 355 | 2.3324 |
|
||||
| 2.0428 | 2.337662 | 360 | 2.3325 |
|
||||
| 2.0612 | 2.370130 | 365 | 2.3326 |
|
||||
| 2.1078 | 2.402597 | 370 | 2.3327 |
|
||||
| 2.0643 | 2.435065 | 375 | 2.3327 |
|
||||
| 2.0667 | 2.467532 | 380 | 2.3326 |
|
||||
| 2.0285 | 2.500000 | 385 | 2.3324 |
|
||||
| 2.0571 | 2.532468 | 390 | 2.3322 |
|
||||
| 2.0209 | 2.564935 | 395 | 2.3322 |
|
||||
| 2.0537 | 2.597403 | 400 | 2.3323 |
|
||||
| 2.0138 | 2.629870 | 405 | 2.3324 |
|
||||
| 2.0772 | 2.662338 | 410 | 2.3324 |
|
||||
| 2.039 | 2.694805 | 415 | 2.3323 |
|
||||
| 2.0181 | 2.727273 | 420 | 2.3322 |
|
||||
| 2.0484 | 2.759740 | 425 | 2.3320 |
|
||||
| 2.0224 | 2.792208 | 430 | 2.3320 |
|
||||
| 2.0732 | 2.824675 | 435 | 2.3320 |
|
||||
| 2.0499 | 2.857143 | 440 | 2.3321 |
|
||||
| 2.0498 | 2.889610 | 445 | 2.3321 |
|
||||
| 2.0472 | 2.922078 | 450 | 2.3320 |
|
||||
| 2.1327 | 2.954545 | 455 | 2.3319 |
|
||||
| 2.0642 | 2.987013 | 460 | 2.3319 |
|
||||
| 2.0654 | 3.019481 | 465 | - |
|
||||
|
||||
Note that the training data volume for the continued pretraining stage is capped at 3GB. When the corresponding century's corpus exceeds this volume, the training data is randomly sampled to fit the volume.
|
||||
|
||||
**ProgressGym-HistLlama3-8B-C015-instruct is an instruction-tuned language model.** It is tuned on [ProgressGym-TimelessQA](https://huggingface.co/datasets/PKU-Alignment/ProgressGym-TimelessQA), using the following hyperparameters. Note, however, that the snapshot at training step 10 is used for the final model, to minimize erosion of the value tendencies learned during continued pretraining; we qualitatively observe that this snapshot still possesses strong instruction-following capabilities.
|
||||
- learning_rate: 1.5e-05
|
||||
- train_batch_size: 8
|
||||
- eval_batch_size: 16
|
||||
- seed: 42
|
||||
- distributed_type: multi-GPU
|
||||
- num_devices: 8
|
||||
- total_train_batch_size: 64
|
||||
- total_eval_batch_size: 128
|
||||
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
||||
- lr_scheduler_type: polynomial
|
||||
- lr_scheduler_warmup_steps: 20
|
||||
- num_epochs: 4.0
|
||||
- mixed_precision_training: Native AMP
|
||||
|
||||
... with the following training results:
|
||||
|
||||
| Training Loss | Epoch | Step | Validation Loss |
|
||||
|:-------------:|:------:|:----:|:---------------:|
|
||||
| 0.8675 | 0.1042 | 5 | 0.8585 |
|
||||
| 0.8415 | 0.2083 | 10 | 0.8063 |
|
||||
| 0.8225 | 0.3125 | 15 | 0.8210 |
|
||||
| 0.806 | 0.4167 | 20 | 0.8412 |
|
||||
| 0.8139 | 0.5208 | 25 | 0.8702 |
|
||||
| 0.8978 | 0.625 | 30 | 0.8631 |
|
||||
| 0.814 | 0.7292 | 35 | 0.8550 |
|
||||
| 0.7989 | 0.8333 | 40 | 0.8473 |
|
||||
| 0.8769 | 0.9375 | 45 | 0.8383 |
|
||||
| 0.7244 | 1.0417 | 50 | 0.8278 |
|
||||
| 0.4644 | 1.1458 | 55 | 0.8387 |
|
||||
| 0.4488 | 1.25 | 60 | 0.8680 |
|
||||
| 0.3973 | 1.3542 | 65 | 0.8718 |
|
||||
| 0.443 | 1.4583 | 70 | 0.8596 |
|
||||
| 0.4346 | 1.5625 | 75 | 0.8514 |
|
||||
| 0.4701 | 1.6667 | 80 | 0.8461 |
|
||||
| 0.4344 | 1.7708 | 85 | 0.8437 |
|
||||
| 0.4274 | 1.875 | 90 | 0.8434 |
|
||||
| 0.4771 | 1.9792 | 95 | 0.8434 |
|
||||
| 0.3876 | 2.0833 | 100 | 0.8439 |
|
||||
| 0.3698 | 2.1875 | 105 | 0.8451 |
|
||||
| 0.407 | 2.2917 | 110 | 0.8465 |
|
||||
| 0.374 | 2.3958 | 115 | 0.8482 |
|
||||
| 0.3945 | 2.5 | 120 | 0.8498 |
|
||||
| 0.3753 | 2.6042 | 125 | 0.8513 |
|
||||
| 0.3721 | 2.7083 | 130 | 0.8528 |
|
||||
| 0.3718 | 2.8125 | 135 | 0.8542 |
|
||||
| 0.3773 | 2.9167 | 140 | 0.8555 |
|
||||
| 0.3723 | 3.0208 | 145 | 0.8565 |
|
||||
| 0.374 | 3.125 | 150 | 0.8576 |
|
||||
| 0.3728 | 3.2292 | 155 | 0.8588 |
|
||||
| 0.3686 | 3.3333 | 160 | 0.8598 |
|
||||
| 0.3617 | 3.4375 | 165 | 0.8607 |
|
||||
| 0.3546 | 3.5417 | 170 | 0.8613 |
|
||||
| 0.3707 | 3.6458 | 175 | 0.8619 |
|
||||
| 0.3739 | 3.75 | 180 | 0.8625 |
|
||||
| 0.3617 | 3.8542 | 185 | 0.8632 |
|
||||
| 0.3591 | 3.9583 | 190 | 0.8637 |
|
||||
|
||||
|
||||
## Links
|
||||
|
||||
- **[Paper Preprint]** [ProgressGym: Alignment with a Millennium of Moral Progress](https://arxiv.org/abs/2406.20087)
|
||||
- **[Leaderboard & Interactive Playground]** [PKU-Alignment/ProgressGym-LeaderBoard](https://huggingface.co/spaces/PKU-Alignment/ProgressGym-LeaderBoard)
|
||||
- **[Huggingface Data & Model Collection]** [PKU-Alignment/ProgressGym](https://huggingface.co/collections/PKU-Alignment/progressgym-666735fcf3e4efa276226eaa)
|
||||
- **[Github Codebase]** [PKU-Alignment/ProgressGym](https://github.com/PKU-Alignment/ProgressGym)
|
||||
- **[Documentation]** [ProgressGym Documentation](https://pku-alignment.github.io/ProgressGym/)
|
||||
- **[PyPI Package]** *(coming soon - [stay tuned](https://forms.gle/1TWFLL4ZCLeYTD5N6)!)*
|
||||
|
||||
## Citation
|
||||
|
||||
If the datasets, models, or framework of ProgressGym help you in your project, please cite ProgressGym using the bibtex entry below.
|
||||
|
||||
```text
|
||||
@article{progressgym,
|
||||
title={ProgressGym: Alignment with a Millennium of Moral Progress},
|
||||
author={Tianyi Qiu and Yang Zhang and Xuchuan Huang and Jasmine Xinze Li and Jiaming Ji and Yaodong Yang},
|
||||
journal={arXiv preprint arXiv:2406.20087},
|
||||
eprint={2406.20087},
|
||||
eprinttype = {arXiv},
|
||||
year={2024}
|
||||
}
|
||||
```
|
||||
|
||||
## Ethics Statement
|
||||
|
||||
- **Copyright information of historical text data sources**:
|
||||
- Project Gutenberg, one among our four source of our historical text data, consists only of texts in the public domain.
|
||||
- For the text that we draw from Internet Archive, we only include those that uploaded by *Library of Congress*, which are texts freely released online by the U.S. Library of Congress for research and public use.
|
||||
- The text data from Early English Books Online are, according to their publisher, "freely available to the public" and "available for access, distribution, use, or reuse by anyone".
|
||||
- The last remaining source of our historical text data, the Pile of Law dataset, is released under a Creative Commons license, which we adhere to in our use.
|
||||
- **Reproducibility**: To ensure reproducibility, we open-source all the code involved in the production of our main results (including the entire pipeline starting from data collection and model training), as well as the supporting infrastructure (the ProgressGym framework), making replication as easy as running a few simple script files.
|
||||
- **Misuse Prevention**: In order to prevent potential misuse of progress alignment algorithms, we have carefully formulated progress alignment as strictly value-neutral, without *a priori* assumptions on the direction of progress. In the event of potential misuse of our dataset, we condemn any misuse attempt to the strongest degree possible, and will work with the research community on whistleblowing for such attempts.
|
||||
- **Open-Sourcing**: We confirm that our code, data, and models are to be open-sourced under a CC-BY 4.0 license. We will continue to maintain and update our open-source repositories and models.
|
||||
12
all_results.json
Normal file
12
all_results.json
Normal file
@@ -0,0 +1,12 @@
|
||||
{
|
||||
"epoch": 4.0,
|
||||
"eval_loss": 0.8062803149223328,
|
||||
"eval_runtime": 1.9823,
|
||||
"eval_samples_per_second": 171.522,
|
||||
"eval_steps_per_second": 1.513,
|
||||
"total_flos": 5334785064960.0,
|
||||
"train_loss": 0.508326952966551,
|
||||
"train_runtime": 4164.2152,
|
||||
"train_samples_per_second": 2.934,
|
||||
"train_steps_per_second": 0.046
|
||||
}
|
||||
28
config.json
Normal file
28
config.json
Normal file
@@ -0,0 +1,28 @@
|
||||
{
|
||||
"_name_or_path": "/mnt/fl/projects/pro-align/progressalign/shared_storage/our_models/C015_llama3-8b-base_pretrain_20240428_005832",
|
||||
"architectures": [
|
||||
"LlamaForCausalLM"
|
||||
],
|
||||
"attention_bias": false,
|
||||
"attention_dropout": 0.0,
|
||||
"bos_token_id": 128000,
|
||||
"eos_token_id": 128001,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 4096,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 14336,
|
||||
"max_position_embeddings": 8192,
|
||||
"model_type": "llama",
|
||||
"num_attention_heads": 32,
|
||||
"num_hidden_layers": 32,
|
||||
"num_key_value_heads": 8,
|
||||
"pretraining_tp": 1,
|
||||
"rms_norm_eps": 1e-05,
|
||||
"rope_scaling": null,
|
||||
"rope_theta": 500000.0,
|
||||
"tie_word_embeddings": false,
|
||||
"torch_dtype": "float16",
|
||||
"transformers_version": "4.40.1",
|
||||
"use_cache": false,
|
||||
"vocab_size": 128256
|
||||
}
|
||||
1
configuration.json
Normal file
1
configuration.json
Normal file
@@ -0,0 +1 @@
|
||||
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}
|
||||
7
eval_results.json
Normal file
7
eval_results.json
Normal file
@@ -0,0 +1,7 @@
|
||||
{
|
||||
"epoch": 4.0,
|
||||
"eval_loss": 0.8062803149223328,
|
||||
"eval_runtime": 1.9823,
|
||||
"eval_samples_per_second": 171.522,
|
||||
"eval_steps_per_second": 1.513
|
||||
}
|
||||
6
generation_config.json
Normal file
6
generation_config.json
Normal file
@@ -0,0 +1,6 @@
|
||||
{
|
||||
"_from_model_config": true,
|
||||
"bos_token_id": 128000,
|
||||
"eos_token_id": 128001,
|
||||
"transformers_version": "4.40.1"
|
||||
}
|
||||
3
model-00001-of-00004.safetensors
Normal file
3
model-00001-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:4ef6764666375baa6c70755ee998e513f62060f5b46913ae34e67abe0e31a72d
|
||||
size 4976698592
|
||||
3
model-00002-of-00004.safetensors
Normal file
3
model-00002-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:f657d1353892b5d5fe5ed93998f2aa6e88c5a78fc81fb51cb9573ee86d0c8583
|
||||
size 4999802616
|
||||
3
model-00003-of-00004.safetensors
Normal file
3
model-00003-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:fcb906e7483301863b75ce9e879e61afb9cf662fee8c8167b7e3221807d8eda6
|
||||
size 4915916080
|
||||
3
model-00004-of-00004.safetensors
Normal file
3
model-00004-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:3adff485e0f3ece17e8f1673a10767ec1fe1aea949fad8118fe8a1e26c16d979
|
||||
size 1168138808
|
||||
298
model.safetensors.index.json
Normal file
298
model.safetensors.index.json
Normal file
@@ -0,0 +1,298 @@
|
||||
{
|
||||
"metadata": {
|
||||
"total_size": 16060522496
|
||||
},
|
||||
"weight_map": {
|
||||
"lm_head.weight": "model-00004-of-00004.safetensors",
|
||||
"model.embed_tokens.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.20.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.20.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.20.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.20.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.21.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.21.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.21.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.21.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.28.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.29.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.30.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.30.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.input_layernorm.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.31.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.31.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.31.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.norm.weight": "model-00004-of-00004.safetensors"
|
||||
}
|
||||
}
|
||||
BIN
readme-assets/data-sources.png
Normal file
BIN
readme-assets/data-sources.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 85 KiB |
BIN
readme-assets/data-stats.png
Normal file
BIN
readme-assets/data-stats.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 210 KiB |
BIN
readme-assets/main-diagram.png
Normal file
BIN
readme-assets/main-diagram.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 178 KiB |
BIN
readme-assets/moral-evals.png
Normal file
BIN
readme-assets/moral-evals.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 218 KiB |
23
special_tokens_map.json
Normal file
23
special_tokens_map.json
Normal file
@@ -0,0 +1,23 @@
|
||||
{
|
||||
"bos_token": {
|
||||
"content": "<|begin_of_text|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"eos_token": {
|
||||
"content": "<|end_of_text|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": {
|
||||
"content": "<|end_of_text|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
410504
tokenizer.json
Normal file
410504
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
2065
tokenizer_config.json
Normal file
2065
tokenizer_config.json
Normal file
File diff suppressed because it is too large
Load Diff
8
train_results.json
Normal file
8
train_results.json
Normal file
@@ -0,0 +1,8 @@
|
||||
{
|
||||
"epoch": 4.0,
|
||||
"total_flos": 5334785064960.0,
|
||||
"train_loss": 0.508326952966551,
|
||||
"train_runtime": 4164.2152,
|
||||
"train_samples_per_second": 2.934,
|
||||
"train_steps_per_second": 0.046
|
||||
}
|
||||
79
trainer_log.jsonl
Normal file
79
trainer_log.jsonl
Normal file
@@ -0,0 +1,79 @@
|
||||
{"current_steps": 1, "total_steps": 192, "loss": 0.8766, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 0.0, "epoch": 0.020833333333333332, "percentage": 0.52, "elapsed_time": "0:00:03", "remaining_time": "0:09:41"}
|
||||
{"current_steps": 5, "total_steps": 192, "loss": 0.8675, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.25e-06, "epoch": 0.10416666666666667, "percentage": 2.6, "elapsed_time": "0:00:08", "remaining_time": "0:05:18"}
|
||||
{"current_steps": 5, "total_steps": 192, "loss": null, "eval_loss": 0.8584801554679871, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.10416666666666667, "percentage": 2.6, "elapsed_time": "0:00:08", "remaining_time": "0:05:18"}
|
||||
{"current_steps": 10, "total_steps": 192, "loss": 0.8415, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.25e-06, "epoch": 0.20833333333333334, "percentage": 5.21, "elapsed_time": "0:00:59", "remaining_time": "0:18:00"}
|
||||
{"current_steps": 10, "total_steps": 192, "loss": null, "eval_loss": 0.8062803149223328, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.20833333333333334, "percentage": 5.21, "elapsed_time": "0:00:59", "remaining_time": "0:18:00"}
|
||||
{"current_steps": 15, "total_steps": 192, "loss": 0.8225, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 8.25e-06, "epoch": 0.3125, "percentage": 7.81, "elapsed_time": "0:01:52", "remaining_time": "0:22:11"}
|
||||
{"current_steps": 15, "total_steps": 192, "loss": null, "eval_loss": 0.820951521396637, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.3125, "percentage": 7.81, "elapsed_time": "0:01:52", "remaining_time": "0:22:11"}
|
||||
{"current_steps": 20, "total_steps": 192, "loss": 0.806, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.2e-05, "epoch": 0.4166666666666667, "percentage": 10.42, "elapsed_time": "0:03:49", "remaining_time": "0:32:56"}
|
||||
{"current_steps": 20, "total_steps": 192, "loss": null, "eval_loss": 0.8412486910820007, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.4166666666666667, "percentage": 10.42, "elapsed_time": "0:03:49", "remaining_time": "0:32:56"}
|
||||
{"current_steps": 25, "total_steps": 192, "loss": 0.8139, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.4071209905461127e-05, "epoch": 0.5208333333333334, "percentage": 13.02, "elapsed_time": "0:05:44", "remaining_time": "0:38:20"}
|
||||
{"current_steps": 25, "total_steps": 192, "loss": null, "eval_loss": 0.8701534867286682, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.5208333333333334, "percentage": 13.02, "elapsed_time": "0:05:44", "remaining_time": "0:38:20"}
|
||||
{"current_steps": 30, "total_steps": 192, "loss": 0.8978, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.0166196232101288e-05, "epoch": 0.625, "percentage": 15.62, "elapsed_time": "0:07:41", "remaining_time": "0:41:30"}
|
||||
{"current_steps": 30, "total_steps": 192, "loss": null, "eval_loss": 0.8630704879760742, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.625, "percentage": 15.62, "elapsed_time": "0:07:41", "remaining_time": "0:41:30"}
|
||||
{"current_steps": 35, "total_steps": 192, "loss": 0.814, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 7.276248845991498e-06, "epoch": 0.7291666666666666, "percentage": 18.23, "elapsed_time": "0:09:38", "remaining_time": "0:43:14"}
|
||||
{"current_steps": 35, "total_steps": 192, "loss": null, "eval_loss": 0.8549697995185852, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.7291666666666666, "percentage": 18.23, "elapsed_time": "0:09:38", "remaining_time": "0:43:14"}
|
||||
{"current_steps": 40, "total_steps": 192, "loss": 0.7989, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.157388080190487e-06, "epoch": 0.8333333333333334, "percentage": 20.83, "elapsed_time": "0:11:33", "remaining_time": "0:43:56"}
|
||||
{"current_steps": 40, "total_steps": 192, "loss": null, "eval_loss": 0.8472943902015686, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.8333333333333334, "percentage": 20.83, "elapsed_time": "0:11:33", "remaining_time": "0:43:56"}
|
||||
{"current_steps": 45, "total_steps": 192, "loss": 0.8769, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.6192313334626905e-06, "epoch": 0.9375, "percentage": 23.44, "elapsed_time": "0:13:28", "remaining_time": "0:44:01"}
|
||||
{"current_steps": 45, "total_steps": 192, "loss": null, "eval_loss": 0.8382811546325684, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.9375, "percentage": 23.44, "elapsed_time": "0:13:28", "remaining_time": "0:44:01"}
|
||||
{"current_steps": 50, "total_steps": 192, "loss": 0.7244, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.514391432582838e-06, "epoch": 1.0416666666666667, "percentage": 26.04, "elapsed_time": "0:15:23", "remaining_time": "0:43:43"}
|
||||
{"current_steps": 50, "total_steps": 192, "loss": null, "eval_loss": 0.8277742266654968, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.0416666666666667, "percentage": 26.04, "elapsed_time": "0:15:23", "remaining_time": "0:43:43"}
|
||||
{"current_steps": 55, "total_steps": 192, "loss": 0.4644, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.7297262757656213e-06, "epoch": 1.1458333333333333, "percentage": 28.65, "elapsed_time": "0:17:17", "remaining_time": "0:43:05"}
|
||||
{"current_steps": 55, "total_steps": 192, "loss": null, "eval_loss": 0.8387134671211243, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.1458333333333333, "percentage": 28.65, "elapsed_time": "0:17:17", "remaining_time": "0:43:05"}
|
||||
{"current_steps": 60, "total_steps": 192, "loss": 0.4488, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.1791620375982074e-06, "epoch": 1.25, "percentage": 31.25, "elapsed_time": "0:19:12", "remaining_time": "0:42:16"}
|
||||
{"current_steps": 60, "total_steps": 192, "loss": null, "eval_loss": 0.8680305480957031, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.25, "percentage": 31.25, "elapsed_time": "0:19:12", "remaining_time": "0:42:16"}
|
||||
{"current_steps": 65, "total_steps": 192, "loss": 0.3973, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 7.978466092394693e-07, "epoch": 1.3541666666666667, "percentage": 33.85, "elapsed_time": "0:21:07", "remaining_time": "0:41:15"}
|
||||
{"current_steps": 65, "total_steps": 192, "loss": null, "eval_loss": 0.8717625737190247, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.3541666666666667, "percentage": 33.85, "elapsed_time": "0:21:07", "remaining_time": "0:41:15"}
|
||||
{"current_steps": 70, "total_steps": 192, "loss": 0.443, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.374210410959207e-07, "epoch": 1.4583333333333333, "percentage": 36.46, "elapsed_time": "0:22:59", "remaining_time": "0:40:03"}
|
||||
{"current_steps": 70, "total_steps": 192, "loss": null, "eval_loss": 0.8596016764640808, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.4583333333333333, "percentage": 36.46, "elapsed_time": "0:22:59", "remaining_time": "0:40:03"}
|
||||
{"current_steps": 75, "total_steps": 192, "loss": 0.4346, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.6222476698215175e-07, "epoch": 1.5625, "percentage": 39.06, "elapsed_time": "0:24:51", "remaining_time": "0:38:46"}
|
||||
{"current_steps": 75, "total_steps": 192, "loss": null, "eval_loss": 0.8514222502708435, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.5625, "percentage": 39.06, "elapsed_time": "0:24:51", "remaining_time": "0:38:46"}
|
||||
{"current_steps": 80, "total_steps": 192, "loss": 0.4701, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.462755297384099e-07, "epoch": 1.6666666666666665, "percentage": 41.67, "elapsed_time": "0:26:44", "remaining_time": "0:37:25"}
|
||||
{"current_steps": 80, "total_steps": 192, "loss": null, "eval_loss": 0.8461114764213562, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.6666666666666665, "percentage": 41.67, "elapsed_time": "0:26:44", "remaining_time": "0:37:25"}
|
||||
{"current_steps": 85, "total_steps": 192, "loss": 0.4344, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.7088740175034947e-07, "epoch": 1.7708333333333335, "percentage": 44.27, "elapsed_time": "0:28:36", "remaining_time": "0:36:00"}
|
||||
{"current_steps": 85, "total_steps": 192, "loss": null, "eval_loss": 0.8437052369117737, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.7708333333333335, "percentage": 44.27, "elapsed_time": "0:28:36", "remaining_time": "0:36:00"}
|
||||
{"current_steps": 90, "total_steps": 192, "loss": 0.4274, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.228102956599465e-07, "epoch": 1.875, "percentage": 46.88, "elapsed_time": "0:30:27", "remaining_time": "0:34:31"}
|
||||
{"current_steps": 90, "total_steps": 192, "loss": null, "eval_loss": 0.8434357643127441, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.875, "percentage": 46.88, "elapsed_time": "0:30:27", "remaining_time": "0:34:31"}
|
||||
{"current_steps": 95, "total_steps": 192, "loss": 0.4771, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 9.279207916081227e-08, "epoch": 1.9791666666666665, "percentage": 49.48, "elapsed_time": "0:32:20", "remaining_time": "0:33:01"}
|
||||
{"current_steps": 95, "total_steps": 192, "loss": null, "eval_loss": 0.8434197902679443, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.9791666666666665, "percentage": 49.48, "elapsed_time": "0:32:20", "remaining_time": "0:33:01"}
|
||||
{"current_steps": 100, "total_steps": 192, "loss": 0.3876, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 7.448002404850094e-08, "epoch": 2.0833333333333335, "percentage": 52.08, "elapsed_time": "0:34:12", "remaining_time": "0:31:28"}
|
||||
{"current_steps": 100, "total_steps": 192, "loss": null, "eval_loss": 0.8438728451728821, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.0833333333333335, "percentage": 52.08, "elapsed_time": "0:34:12", "remaining_time": "0:31:28"}
|
||||
{"current_steps": 105, "total_steps": 192, "loss": 0.3698, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 6.35920070839697e-08, "epoch": 2.1875, "percentage": 54.69, "elapsed_time": "0:36:01", "remaining_time": "0:29:51"}
|
||||
{"current_steps": 105, "total_steps": 192, "loss": null, "eval_loss": 0.845079243183136, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.1875, "percentage": 54.69, "elapsed_time": "0:36:01", "remaining_time": "0:29:51"}
|
||||
{"current_steps": 110, "total_steps": 192, "loss": 0.407, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.7299804687499997e-08, "epoch": 2.2916666666666665, "percentage": 57.29, "elapsed_time": "0:37:49", "remaining_time": "0:28:12"}
|
||||
{"current_steps": 110, "total_steps": 192, "loss": null, "eval_loss": 0.8465444445610046, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.2916666666666665, "percentage": 57.29, "elapsed_time": "0:37:49", "remaining_time": "0:28:12"}
|
||||
{"current_steps": 115, "total_steps": 192, "loss": 0.374, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.37771434967624e-08, "epoch": 2.3958333333333335, "percentage": 59.9, "elapsed_time": "0:39:40", "remaining_time": "0:26:33"}
|
||||
{"current_steps": 115, "total_steps": 192, "loss": null, "eval_loss": 0.8481599688529968, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.3958333333333335, "percentage": 59.9, "elapsed_time": "0:39:40", "remaining_time": "0:26:33"}
|
||||
{"current_steps": 120, "total_steps": 192, "loss": 0.3945, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.187403540619925e-08, "epoch": 2.5, "percentage": 62.5, "elapsed_time": "0:41:28", "remaining_time": "0:24:53"}
|
||||
{"current_steps": 120, "total_steps": 192, "loss": null, "eval_loss": 0.8498236536979675, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.5, "percentage": 62.5, "elapsed_time": "0:41:28", "remaining_time": "0:24:53"}
|
||||
{"current_steps": 125, "total_steps": 192, "loss": 0.3753, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.088648238966908e-08, "epoch": 2.6041666666666665, "percentage": 65.1, "elapsed_time": "0:43:19", "remaining_time": "0:23:13"}
|
||||
{"current_steps": 125, "total_steps": 192, "loss": null, "eval_loss": 0.8512565493583679, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.6041666666666665, "percentage": 65.1, "elapsed_time": "0:43:19", "remaining_time": "0:23:13"}
|
||||
{"current_steps": 130, "total_steps": 192, "loss": 0.3721, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.039701925276604e-08, "epoch": 2.7083333333333335, "percentage": 67.71, "elapsed_time": "0:45:11", "remaining_time": "0:21:32"}
|
||||
{"current_steps": 130, "total_steps": 192, "loss": null, "eval_loss": 0.8527700304985046, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.7083333333333335, "percentage": 67.71, "elapsed_time": "0:45:11", "remaining_time": "0:21:32"}
|
||||
{"current_steps": 135, "total_steps": 192, "loss": 0.3718, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0166900048082497e-08, "epoch": 2.8125, "percentage": 70.31, "elapsed_time": "0:47:01", "remaining_time": "0:19:51"}
|
||||
{"current_steps": 135, "total_steps": 192, "loss": null, "eval_loss": 0.8541720509529114, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.8125, "percentage": 70.31, "elapsed_time": "0:47:01", "remaining_time": "0:19:51"}
|
||||
{"current_steps": 140, "total_steps": 192, "loss": 0.3773, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0065147322870076e-08, "epoch": 2.9166666666666665, "percentage": 72.92, "elapsed_time": "0:48:51", "remaining_time": "0:18:09"}
|
||||
{"current_steps": 140, "total_steps": 192, "loss": null, "eval_loss": 0.8555252552032471, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.9166666666666665, "percentage": 72.92, "elapsed_time": "0:48:51", "remaining_time": "0:18:09"}
|
||||
{"current_steps": 145, "total_steps": 192, "loss": 0.3723, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.002328628528332e-08, "epoch": 3.0208333333333335, "percentage": 75.52, "elapsed_time": "0:50:40", "remaining_time": "0:16:25"}
|
||||
{"current_steps": 145, "total_steps": 192, "loss": null, "eval_loss": 0.8565484881401062, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.0208333333333335, "percentage": 75.52, "elapsed_time": "0:50:40", "remaining_time": "0:16:25"}
|
||||
{"current_steps": 150, "total_steps": 192, "loss": 0.374, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0007484528133236e-08, "epoch": 3.125, "percentage": 78.12, "elapsed_time": "0:52:32", "remaining_time": "0:14:42"}
|
||||
{"current_steps": 150, "total_steps": 192, "loss": null, "eval_loss": 0.8576194643974304, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.125, "percentage": 78.12, "elapsed_time": "0:52:32", "remaining_time": "0:14:42"}
|
||||
{"current_steps": 155, "total_steps": 192, "loss": 0.3728, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0002110817570477e-08, "epoch": 3.2291666666666665, "percentage": 80.73, "elapsed_time": "0:54:21", "remaining_time": "0:12:58"}
|
||||
{"current_steps": 155, "total_steps": 192, "loss": null, "eval_loss": 0.8588044047355652, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.2291666666666665, "percentage": 80.73, "elapsed_time": "0:54:21", "remaining_time": "0:12:58"}
|
||||
{"current_steps": 160, "total_steps": 192, "loss": 0.3686, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0000504842356326e-08, "epoch": 3.3333333333333335, "percentage": 83.33, "elapsed_time": "0:56:13", "remaining_time": "0:11:14"}
|
||||
{"current_steps": 160, "total_steps": 192, "loss": null, "eval_loss": 0.859791100025177, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.3333333333333335, "percentage": 83.33, "elapsed_time": "0:56:13", "remaining_time": "0:11:14"}
|
||||
{"current_steps": 165, "total_steps": 192, "loss": 0.3617, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.000009745562451e-08, "epoch": 3.4375, "percentage": 85.94, "elapsed_time": "0:58:03", "remaining_time": "0:09:29"}
|
||||
{"current_steps": 165, "total_steps": 192, "loss": null, "eval_loss": 0.8607122302055359, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.4375, "percentage": 85.94, "elapsed_time": "0:58:03", "remaining_time": "0:09:29"}
|
||||
{"current_steps": 170, "total_steps": 192, "loss": 0.3546, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0000014077810156e-08, "epoch": 3.5416666666666665, "percentage": 88.54, "elapsed_time": "0:59:52", "remaining_time": "0:07:44"}
|
||||
{"current_steps": 170, "total_steps": 192, "loss": null, "eval_loss": 0.8613293170928955, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.5416666666666665, "percentage": 88.54, "elapsed_time": "0:59:52", "remaining_time": "0:07:44"}
|
||||
{"current_steps": 175, "total_steps": 192, "loss": 0.3707, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0000001343508807e-08, "epoch": 3.6458333333333335, "percentage": 91.15, "elapsed_time": "1:01:40", "remaining_time": "0:05:59"}
|
||||
{"current_steps": 175, "total_steps": 192, "loss": null, "eval_loss": 0.8619220852851868, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.6458333333333335, "percentage": 91.15, "elapsed_time": "1:01:40", "remaining_time": "0:05:59"}
|
||||
{"current_steps": 180, "total_steps": 192, "loss": 0.3739, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.000000006747581e-08, "epoch": 3.75, "percentage": 93.75, "elapsed_time": "1:03:31", "remaining_time": "0:04:14"}
|
||||
{"current_steps": 180, "total_steps": 192, "loss": null, "eval_loss": 0.862490177154541, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.75, "percentage": 93.75, "elapsed_time": "1:03:31", "remaining_time": "0:04:14"}
|
||||
{"current_steps": 185, "total_steps": 192, "loss": 0.3617, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0000000001094325e-08, "epoch": 3.8541666666666665, "percentage": 96.35, "elapsed_time": "1:05:21", "remaining_time": "0:02:28"}
|
||||
{"current_steps": 185, "total_steps": 192, "loss": null, "eval_loss": 0.8631939888000488, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.8541666666666665, "percentage": 96.35, "elapsed_time": "1:05:21", "remaining_time": "0:02:28"}
|
||||
{"current_steps": 190, "total_steps": 192, "loss": 0.3591, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.000000000000139e-08, "epoch": 3.9583333333333335, "percentage": 98.96, "elapsed_time": "1:07:12", "remaining_time": "0:00:42"}
|
||||
{"current_steps": 190, "total_steps": 192, "loss": null, "eval_loss": 0.8637197613716125, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.9583333333333335, "percentage": 98.96, "elapsed_time": "1:07:12", "remaining_time": "0:00:42"}
|
||||
{"current_steps": 192, "total_steps": 192, "loss": null, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 4.0, "percentage": 100.0, "elapsed_time": "1:08:56", "remaining_time": "0:00:00"}
|
||||
{"current_steps": 3, "total_steps": 3, "loss": null, "eval_loss": 0.8062803149223328, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 4.0, "percentage": 100.0, "elapsed_time": "1:09:34", "remaining_time": "0:00:00"}
|
||||
607
trainer_state.json
Normal file
607
trainer_state.json
Normal file
@@ -0,0 +1,607 @@
|
||||
{
|
||||
"best_metric": 0.8062803149223328,
|
||||
"best_model_checkpoint": "./output/training_results/C015_llama3-8b-base_instruct_20240504_123713/checkpoint-10",
|
||||
"epoch": 4.0,
|
||||
"eval_steps": 5,
|
||||
"global_step": 192,
|
||||
"is_hyper_param_search": false,
|
||||
"is_local_process_zero": true,
|
||||
"is_world_process_zero": true,
|
||||
"log_history": [
|
||||
{
|
||||
"epoch": 0.020833333333333332,
|
||||
"grad_norm": 0.0,
|
||||
"learning_rate": 0.0,
|
||||
"loss": 0.8766,
|
||||
"step": 1
|
||||
},
|
||||
{
|
||||
"epoch": 0.10416666666666667,
|
||||
"grad_norm": 10.34330229700748,
|
||||
"learning_rate": 2.25e-06,
|
||||
"loss": 0.8675,
|
||||
"step": 5
|
||||
},
|
||||
{
|
||||
"epoch": 0.10416666666666667,
|
||||
"eval_loss": 0.8584801554679871,
|
||||
"eval_runtime": 1.9951,
|
||||
"eval_samples_per_second": 170.414,
|
||||
"eval_steps_per_second": 1.504,
|
||||
"step": 5
|
||||
},
|
||||
{
|
||||
"epoch": 0.20833333333333334,
|
||||
"grad_norm": 4.512014612667122,
|
||||
"learning_rate": 5.25e-06,
|
||||
"loss": 0.8415,
|
||||
"step": 10
|
||||
},
|
||||
{
|
||||
"epoch": 0.20833333333333334,
|
||||
"eval_loss": 0.8062803149223328,
|
||||
"eval_runtime": 1.9589,
|
||||
"eval_samples_per_second": 173.564,
|
||||
"eval_steps_per_second": 1.531,
|
||||
"step": 10
|
||||
},
|
||||
{
|
||||
"epoch": 0.3125,
|
||||
"grad_norm": 5.466223920700161,
|
||||
"learning_rate": 8.25e-06,
|
||||
"loss": 0.8225,
|
||||
"step": 15
|
||||
},
|
||||
{
|
||||
"epoch": 0.3125,
|
||||
"eval_loss": 0.820951521396637,
|
||||
"eval_runtime": 1.9583,
|
||||
"eval_samples_per_second": 173.62,
|
||||
"eval_steps_per_second": 1.532,
|
||||
"step": 15
|
||||
},
|
||||
{
|
||||
"epoch": 0.4166666666666667,
|
||||
"grad_norm": 5.091985939472258,
|
||||
"learning_rate": 1.2e-05,
|
||||
"loss": 0.806,
|
||||
"step": 20
|
||||
},
|
||||
{
|
||||
"epoch": 0.4166666666666667,
|
||||
"eval_loss": 0.8412486910820007,
|
||||
"eval_runtime": 1.9516,
|
||||
"eval_samples_per_second": 174.217,
|
||||
"eval_steps_per_second": 1.537,
|
||||
"step": 20
|
||||
},
|
||||
{
|
||||
"epoch": 0.5208333333333334,
|
||||
"grad_norm": 4.323182492427286,
|
||||
"learning_rate": 1.4071209905461127e-05,
|
||||
"loss": 0.8139,
|
||||
"step": 25
|
||||
},
|
||||
{
|
||||
"epoch": 0.5208333333333334,
|
||||
"eval_loss": 0.8701534867286682,
|
||||
"eval_runtime": 1.956,
|
||||
"eval_samples_per_second": 173.828,
|
||||
"eval_steps_per_second": 1.534,
|
||||
"step": 25
|
||||
},
|
||||
{
|
||||
"epoch": 0.625,
|
||||
"grad_norm": 4.430828029367158,
|
||||
"learning_rate": 1.0166196232101288e-05,
|
||||
"loss": 0.8978,
|
||||
"step": 30
|
||||
},
|
||||
{
|
||||
"epoch": 0.625,
|
||||
"eval_loss": 0.8630704879760742,
|
||||
"eval_runtime": 1.9545,
|
||||
"eval_samples_per_second": 173.954,
|
||||
"eval_steps_per_second": 1.535,
|
||||
"step": 30
|
||||
},
|
||||
{
|
||||
"epoch": 0.7291666666666666,
|
||||
"grad_norm": 3.8459262571296122,
|
||||
"learning_rate": 7.276248845991498e-06,
|
||||
"loss": 0.814,
|
||||
"step": 35
|
||||
},
|
||||
{
|
||||
"epoch": 0.7291666666666666,
|
||||
"eval_loss": 0.8549697995185852,
|
||||
"eval_runtime": 1.9539,
|
||||
"eval_samples_per_second": 174.008,
|
||||
"eval_steps_per_second": 1.535,
|
||||
"step": 35
|
||||
},
|
||||
{
|
||||
"epoch": 0.8333333333333334,
|
||||
"grad_norm": 4.312137758874538,
|
||||
"learning_rate": 5.157388080190487e-06,
|
||||
"loss": 0.7989,
|
||||
"step": 40
|
||||
},
|
||||
{
|
||||
"epoch": 0.8333333333333334,
|
||||
"eval_loss": 0.8472943902015686,
|
||||
"eval_runtime": 1.9515,
|
||||
"eval_samples_per_second": 174.222,
|
||||
"eval_steps_per_second": 1.537,
|
||||
"step": 40
|
||||
},
|
||||
{
|
||||
"epoch": 0.9375,
|
||||
"grad_norm": 4.013634591695819,
|
||||
"learning_rate": 3.6192313334626905e-06,
|
||||
"loss": 0.8769,
|
||||
"step": 45
|
||||
},
|
||||
{
|
||||
"epoch": 0.9375,
|
||||
"eval_loss": 0.8382811546325684,
|
||||
"eval_runtime": 1.9527,
|
||||
"eval_samples_per_second": 174.116,
|
||||
"eval_steps_per_second": 1.536,
|
||||
"step": 45
|
||||
},
|
||||
{
|
||||
"epoch": 1.0416666666666667,
|
||||
"grad_norm": 3.825805552811641,
|
||||
"learning_rate": 2.514391432582838e-06,
|
||||
"loss": 0.7244,
|
||||
"step": 50
|
||||
},
|
||||
{
|
||||
"epoch": 1.0416666666666667,
|
||||
"eval_loss": 0.8277742266654968,
|
||||
"eval_runtime": 1.9549,
|
||||
"eval_samples_per_second": 173.925,
|
||||
"eval_steps_per_second": 1.535,
|
||||
"step": 50
|
||||
},
|
||||
{
|
||||
"epoch": 1.1458333333333333,
|
||||
"grad_norm": 3.0670883145044487,
|
||||
"learning_rate": 1.7297262757656213e-06,
|
||||
"loss": 0.4644,
|
||||
"step": 55
|
||||
},
|
||||
{
|
||||
"epoch": 1.1458333333333333,
|
||||
"eval_loss": 0.8387134671211243,
|
||||
"eval_runtime": 1.959,
|
||||
"eval_samples_per_second": 173.561,
|
||||
"eval_steps_per_second": 1.531,
|
||||
"step": 55
|
||||
},
|
||||
{
|
||||
"epoch": 1.25,
|
||||
"grad_norm": 3.904758197983173,
|
||||
"learning_rate": 1.1791620375982074e-06,
|
||||
"loss": 0.4488,
|
||||
"step": 60
|
||||
},
|
||||
{
|
||||
"epoch": 1.25,
|
||||
"eval_loss": 0.8680305480957031,
|
||||
"eval_runtime": 1.953,
|
||||
"eval_samples_per_second": 174.087,
|
||||
"eval_steps_per_second": 1.536,
|
||||
"step": 60
|
||||
},
|
||||
{
|
||||
"epoch": 1.3541666666666667,
|
||||
"grad_norm": 4.037522952699986,
|
||||
"learning_rate": 7.978466092394693e-07,
|
||||
"loss": 0.3973,
|
||||
"step": 65
|
||||
},
|
||||
{
|
||||
"epoch": 1.3541666666666667,
|
||||
"eval_loss": 0.8717625737190247,
|
||||
"eval_runtime": 1.9541,
|
||||
"eval_samples_per_second": 173.996,
|
||||
"eval_steps_per_second": 1.535,
|
||||
"step": 65
|
||||
},
|
||||
{
|
||||
"epoch": 1.4583333333333333,
|
||||
"grad_norm": 4.006395162021574,
|
||||
"learning_rate": 5.374210410959207e-07,
|
||||
"loss": 0.443,
|
||||
"step": 70
|
||||
},
|
||||
{
|
||||
"epoch": 1.4583333333333333,
|
||||
"eval_loss": 0.8596016764640808,
|
||||
"eval_runtime": 1.9513,
|
||||
"eval_samples_per_second": 174.244,
|
||||
"eval_steps_per_second": 1.537,
|
||||
"step": 70
|
||||
},
|
||||
{
|
||||
"epoch": 1.5625,
|
||||
"grad_norm": 4.477860740274337,
|
||||
"learning_rate": 3.6222476698215175e-07,
|
||||
"loss": 0.4346,
|
||||
"step": 75
|
||||
},
|
||||
{
|
||||
"epoch": 1.5625,
|
||||
"eval_loss": 0.8514222502708435,
|
||||
"eval_runtime": 1.9616,
|
||||
"eval_samples_per_second": 173.329,
|
||||
"eval_steps_per_second": 1.529,
|
||||
"step": 75
|
||||
},
|
||||
{
|
||||
"epoch": 1.6666666666666665,
|
||||
"grad_norm": 3.79003494996975,
|
||||
"learning_rate": 2.462755297384099e-07,
|
||||
"loss": 0.4701,
|
||||
"step": 80
|
||||
},
|
||||
{
|
||||
"epoch": 1.6666666666666665,
|
||||
"eval_loss": 0.8461114764213562,
|
||||
"eval_runtime": 1.9519,
|
||||
"eval_samples_per_second": 174.192,
|
||||
"eval_steps_per_second": 1.537,
|
||||
"step": 80
|
||||
},
|
||||
{
|
||||
"epoch": 1.7708333333333335,
|
||||
"grad_norm": 3.431009868136907,
|
||||
"learning_rate": 1.7088740175034947e-07,
|
||||
"loss": 0.4344,
|
||||
"step": 85
|
||||
},
|
||||
{
|
||||
"epoch": 1.7708333333333335,
|
||||
"eval_loss": 0.8437052369117737,
|
||||
"eval_runtime": 1.9548,
|
||||
"eval_samples_per_second": 173.928,
|
||||
"eval_steps_per_second": 1.535,
|
||||
"step": 85
|
||||
},
|
||||
{
|
||||
"epoch": 1.875,
|
||||
"grad_norm": 3.4612975522103846,
|
||||
"learning_rate": 1.228102956599465e-07,
|
||||
"loss": 0.4274,
|
||||
"step": 90
|
||||
},
|
||||
{
|
||||
"epoch": 1.875,
|
||||
"eval_loss": 0.8434357643127441,
|
||||
"eval_runtime": 1.9551,
|
||||
"eval_samples_per_second": 173.905,
|
||||
"eval_steps_per_second": 1.534,
|
||||
"step": 90
|
||||
},
|
||||
{
|
||||
"epoch": 1.9791666666666665,
|
||||
"grad_norm": 4.089060356958601,
|
||||
"learning_rate": 9.279207916081227e-08,
|
||||
"loss": 0.4771,
|
||||
"step": 95
|
||||
},
|
||||
{
|
||||
"epoch": 1.9791666666666665,
|
||||
"eval_loss": 0.8434197902679443,
|
||||
"eval_runtime": 1.9533,
|
||||
"eval_samples_per_second": 174.06,
|
||||
"eval_steps_per_second": 1.536,
|
||||
"step": 95
|
||||
},
|
||||
{
|
||||
"epoch": 2.0833333333333335,
|
||||
"grad_norm": 3.5663107359521624,
|
||||
"learning_rate": 7.448002404850094e-08,
|
||||
"loss": 0.3876,
|
||||
"step": 100
|
||||
},
|
||||
{
|
||||
"epoch": 2.0833333333333335,
|
||||
"eval_loss": 0.8438728451728821,
|
||||
"eval_runtime": 1.957,
|
||||
"eval_samples_per_second": 173.739,
|
||||
"eval_steps_per_second": 1.533,
|
||||
"step": 100
|
||||
},
|
||||
{
|
||||
"epoch": 2.1875,
|
||||
"grad_norm": 3.3025378175144806,
|
||||
"learning_rate": 6.35920070839697e-08,
|
||||
"loss": 0.3698,
|
||||
"step": 105
|
||||
},
|
||||
{
|
||||
"epoch": 2.1875,
|
||||
"eval_loss": 0.845079243183136,
|
||||
"eval_runtime": 1.9562,
|
||||
"eval_samples_per_second": 173.803,
|
||||
"eval_steps_per_second": 1.534,
|
||||
"step": 105
|
||||
},
|
||||
{
|
||||
"epoch": 2.2916666666666665,
|
||||
"grad_norm": 3.4985300165216615,
|
||||
"learning_rate": 5.7299804687499997e-08,
|
||||
"loss": 0.407,
|
||||
"step": 110
|
||||
},
|
||||
{
|
||||
"epoch": 2.2916666666666665,
|
||||
"eval_loss": 0.8465444445610046,
|
||||
"eval_runtime": 1.9573,
|
||||
"eval_samples_per_second": 173.708,
|
||||
"eval_steps_per_second": 1.533,
|
||||
"step": 110
|
||||
},
|
||||
{
|
||||
"epoch": 2.3958333333333335,
|
||||
"grad_norm": 3.394752587183079,
|
||||
"learning_rate": 5.37771434967624e-08,
|
||||
"loss": 0.374,
|
||||
"step": 115
|
||||
},
|
||||
{
|
||||
"epoch": 2.3958333333333335,
|
||||
"eval_loss": 0.8481599688529968,
|
||||
"eval_runtime": 1.9556,
|
||||
"eval_samples_per_second": 173.861,
|
||||
"eval_steps_per_second": 1.534,
|
||||
"step": 115
|
||||
},
|
||||
{
|
||||
"epoch": 2.5,
|
||||
"grad_norm": 4.399721414998381,
|
||||
"learning_rate": 5.187403540619925e-08,
|
||||
"loss": 0.3945,
|
||||
"step": 120
|
||||
},
|
||||
{
|
||||
"epoch": 2.5,
|
||||
"eval_loss": 0.8498236536979675,
|
||||
"eval_runtime": 1.9552,
|
||||
"eval_samples_per_second": 173.893,
|
||||
"eval_steps_per_second": 1.534,
|
||||
"step": 120
|
||||
},
|
||||
{
|
||||
"epoch": 2.6041666666666665,
|
||||
"grad_norm": 3.2768849901156845,
|
||||
"learning_rate": 5.088648238966908e-08,
|
||||
"loss": 0.3753,
|
||||
"step": 125
|
||||
},
|
||||
{
|
||||
"epoch": 2.6041666666666665,
|
||||
"eval_loss": 0.8512565493583679,
|
||||
"eval_runtime": 1.9594,
|
||||
"eval_samples_per_second": 173.526,
|
||||
"eval_steps_per_second": 1.531,
|
||||
"step": 125
|
||||
},
|
||||
{
|
||||
"epoch": 2.7083333333333335,
|
||||
"grad_norm": 3.666595330730063,
|
||||
"learning_rate": 5.039701925276604e-08,
|
||||
"loss": 0.3721,
|
||||
"step": 130
|
||||
},
|
||||
{
|
||||
"epoch": 2.7083333333333335,
|
||||
"eval_loss": 0.8527700304985046,
|
||||
"eval_runtime": 1.9575,
|
||||
"eval_samples_per_second": 173.689,
|
||||
"eval_steps_per_second": 1.533,
|
||||
"step": 130
|
||||
},
|
||||
{
|
||||
"epoch": 2.8125,
|
||||
"grad_norm": 3.4733320032072537,
|
||||
"learning_rate": 5.0166900048082497e-08,
|
||||
"loss": 0.3718,
|
||||
"step": 135
|
||||
},
|
||||
{
|
||||
"epoch": 2.8125,
|
||||
"eval_loss": 0.8541720509529114,
|
||||
"eval_runtime": 1.9599,
|
||||
"eval_samples_per_second": 173.479,
|
||||
"eval_steps_per_second": 1.531,
|
||||
"step": 135
|
||||
},
|
||||
{
|
||||
"epoch": 2.9166666666666665,
|
||||
"grad_norm": 3.447696757476531,
|
||||
"learning_rate": 5.0065147322870076e-08,
|
||||
"loss": 0.3773,
|
||||
"step": 140
|
||||
},
|
||||
{
|
||||
"epoch": 2.9166666666666665,
|
||||
"eval_loss": 0.8555252552032471,
|
||||
"eval_runtime": 1.9586,
|
||||
"eval_samples_per_second": 173.592,
|
||||
"eval_steps_per_second": 1.532,
|
||||
"step": 140
|
||||
},
|
||||
{
|
||||
"epoch": 3.0208333333333335,
|
||||
"grad_norm": 3.0052210603196237,
|
||||
"learning_rate": 5.002328628528332e-08,
|
||||
"loss": 0.3723,
|
||||
"step": 145
|
||||
},
|
||||
{
|
||||
"epoch": 3.0208333333333335,
|
||||
"eval_loss": 0.8565484881401062,
|
||||
"eval_runtime": 1.9586,
|
||||
"eval_samples_per_second": 173.589,
|
||||
"eval_steps_per_second": 1.532,
|
||||
"step": 145
|
||||
},
|
||||
{
|
||||
"epoch": 3.125,
|
||||
"grad_norm": 3.368197941438436,
|
||||
"learning_rate": 5.0007484528133236e-08,
|
||||
"loss": 0.374,
|
||||
"step": 150
|
||||
},
|
||||
{
|
||||
"epoch": 3.125,
|
||||
"eval_loss": 0.8576194643974304,
|
||||
"eval_runtime": 1.9541,
|
||||
"eval_samples_per_second": 173.993,
|
||||
"eval_steps_per_second": 1.535,
|
||||
"step": 150
|
||||
},
|
||||
{
|
||||
"epoch": 3.2291666666666665,
|
||||
"grad_norm": 3.3290743731904304,
|
||||
"learning_rate": 5.0002110817570477e-08,
|
||||
"loss": 0.3728,
|
||||
"step": 155
|
||||
},
|
||||
{
|
||||
"epoch": 3.2291666666666665,
|
||||
"eval_loss": 0.8588044047355652,
|
||||
"eval_runtime": 1.951,
|
||||
"eval_samples_per_second": 174.273,
|
||||
"eval_steps_per_second": 1.538,
|
||||
"step": 155
|
||||
},
|
||||
{
|
||||
"epoch": 3.3333333333333335,
|
||||
"grad_norm": 4.793937739567796,
|
||||
"learning_rate": 5.0000504842356326e-08,
|
||||
"loss": 0.3686,
|
||||
"step": 160
|
||||
},
|
||||
{
|
||||
"epoch": 3.3333333333333335,
|
||||
"eval_loss": 0.859791100025177,
|
||||
"eval_runtime": 1.9522,
|
||||
"eval_samples_per_second": 174.159,
|
||||
"eval_steps_per_second": 1.537,
|
||||
"step": 160
|
||||
},
|
||||
{
|
||||
"epoch": 3.4375,
|
||||
"grad_norm": 3.326342529192208,
|
||||
"learning_rate": 5.000009745562451e-08,
|
||||
"loss": 0.3617,
|
||||
"step": 165
|
||||
},
|
||||
{
|
||||
"epoch": 3.4375,
|
||||
"eval_loss": 0.8607122302055359,
|
||||
"eval_runtime": 1.958,
|
||||
"eval_samples_per_second": 173.647,
|
||||
"eval_steps_per_second": 1.532,
|
||||
"step": 165
|
||||
},
|
||||
{
|
||||
"epoch": 3.5416666666666665,
|
||||
"grad_norm": 3.6505713497705736,
|
||||
"learning_rate": 5.0000014077810156e-08,
|
||||
"loss": 0.3546,
|
||||
"step": 170
|
||||
},
|
||||
{
|
||||
"epoch": 3.5416666666666665,
|
||||
"eval_loss": 0.8613293170928955,
|
||||
"eval_runtime": 1.9527,
|
||||
"eval_samples_per_second": 174.122,
|
||||
"eval_steps_per_second": 1.536,
|
||||
"step": 170
|
||||
},
|
||||
{
|
||||
"epoch": 3.6458333333333335,
|
||||
"grad_norm": 3.496080458530573,
|
||||
"learning_rate": 5.0000001343508807e-08,
|
||||
"loss": 0.3707,
|
||||
"step": 175
|
||||
},
|
||||
{
|
||||
"epoch": 3.6458333333333335,
|
||||
"eval_loss": 0.8619220852851868,
|
||||
"eval_runtime": 1.9552,
|
||||
"eval_samples_per_second": 173.893,
|
||||
"eval_steps_per_second": 1.534,
|
||||
"step": 175
|
||||
},
|
||||
{
|
||||
"epoch": 3.75,
|
||||
"grad_norm": 3.50316414527161,
|
||||
"learning_rate": 5.000000006747581e-08,
|
||||
"loss": 0.3739,
|
||||
"step": 180
|
||||
},
|
||||
{
|
||||
"epoch": 3.75,
|
||||
"eval_loss": 0.862490177154541,
|
||||
"eval_runtime": 1.9547,
|
||||
"eval_samples_per_second": 173.936,
|
||||
"eval_steps_per_second": 1.535,
|
||||
"step": 180
|
||||
},
|
||||
{
|
||||
"epoch": 3.8541666666666665,
|
||||
"grad_norm": 3.7278057893863874,
|
||||
"learning_rate": 5.0000000001094325e-08,
|
||||
"loss": 0.3617,
|
||||
"step": 185
|
||||
},
|
||||
{
|
||||
"epoch": 3.8541666666666665,
|
||||
"eval_loss": 0.8631939888000488,
|
||||
"eval_runtime": 1.9574,
|
||||
"eval_samples_per_second": 173.703,
|
||||
"eval_steps_per_second": 1.533,
|
||||
"step": 185
|
||||
},
|
||||
{
|
||||
"epoch": 3.9583333333333335,
|
||||
"grad_norm": 3.160928200357982,
|
||||
"learning_rate": 5.000000000000139e-08,
|
||||
"loss": 0.3591,
|
||||
"step": 190
|
||||
},
|
||||
{
|
||||
"epoch": 3.9583333333333335,
|
||||
"eval_loss": 0.8637197613716125,
|
||||
"eval_runtime": 1.9552,
|
||||
"eval_samples_per_second": 173.893,
|
||||
"eval_steps_per_second": 1.534,
|
||||
"step": 190
|
||||
},
|
||||
{
|
||||
"epoch": 4.0,
|
||||
"step": 192,
|
||||
"total_flos": 5334785064960.0,
|
||||
"train_loss": 0.508326952966551,
|
||||
"train_runtime": 4164.2152,
|
||||
"train_samples_per_second": 2.934,
|
||||
"train_steps_per_second": 0.046
|
||||
}
|
||||
],
|
||||
"logging_steps": 5,
|
||||
"max_steps": 192,
|
||||
"num_input_tokens_seen": 0,
|
||||
"num_train_epochs": 4,
|
||||
"save_steps": 5,
|
||||
"total_flos": 5334785064960.0,
|
||||
"train_batch_size": 8,
|
||||
"trial_name": null,
|
||||
"trial_params": null
|
||||
}
|
||||
3
training_args.bin
Normal file
3
training_args.bin
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:6847c094292a8b40cbd23c921c6bcb9cbc8346d03045e21561f149c7aa6e6569
|
||||
size 6776
|
||||
BIN
training_eval_loss.png
Normal file
BIN
training_eval_loss.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 47 KiB |
BIN
training_loss.png
Normal file
BIN
training_loss.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 40 KiB |
Reference in New Issue
Block a user