Upload ./train_results.json with huggingface_hub

This commit is contained in:
ai-modelscope
2025-02-07 12:50:20 +08:00
parent 6b66061628
commit 5ddaf615b2
25 changed files with 413438 additions and 64 deletions

36
.gitattributes vendored
View File

@@ -1,47 +1,39 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bin.* filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zstandard filter=lfs diff=lfs merge=lfs -text
*.tfevents* filter=lfs diff=lfs merge=lfs -text
*.db* filter=lfs diff=lfs merge=lfs -text
*.ark* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.gguf* filter=lfs diff=lfs merge=lfs -text
*.ggml filter=lfs diff=lfs merge=lfs -text
*.llamafile* filter=lfs diff=lfs merge=lfs -text
*.pt2 filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
model-00001-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
model-00002-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
model-00003-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
model-00004-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text

151
README.md
View File

@@ -1,47 +1,114 @@
---
license: Apache License 2.0
#model-type:
##如 gpt、phi、llama、chatglm、baichuan 等
#- gpt
#domain:
##如 nlp、cv、audio、multi-modal
#- nlp
#language:
##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa
#- cn
#metrics:
##如 CIDEr、Blue、ROUGE 等
#- CIDEr
#tags:
##各种自定义,包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他
#- pretrained
#tools:
##如 vllm、fastchat、llamacpp、AdaSeq 等
#- vllm
license: cc-by-4.0
tags:
- alignment
- value alignment
- AI safety
- safety
- LLM
- history
datasets:
- PKU-Alignment/ProgressGym-HistText
base_model:
- meta-llama/Meta-Llama-3-8B
---
### 当前模型的贡献者未提供更加详细的模型介绍。模型文件和权重,可浏览“模型文件”页面获取。
#### 您可以通过如下git clone命令或者ModelScope SDK来下载模型
SDK下载
```bash
#安装ModelScope
pip install modelscope
```
```python
#SDK模型下载
from modelscope import snapshot_download
model_dir = snapshot_download('PKU-Alignment/ProgressGym-HistLlama3-8B-C019-pretrain-v0.2')
```
Git下载
```
#Git模型下载
git clone https://www.modelscope.cn/PKU-Alignment/ProgressGym-HistLlama3-8B-C019-pretrain-v0.2.git
# ProgressGym-HistLlama3-8B-C019-pretrain
## Overview
#### The ProgressGym Framework
![Framework Diagram](./readme-assets/main-diagram.png)
**ProgressGym-HistLlama3-8B-C019-pretrain** is part of the **ProgressGym** framework for research and experimentation on *progress alignment* - the emulation of moral progress in AI alignment algorithms, as a measure to prevent risks of societal value lock-in.
To quote the paper [*ProgressGym: Alignment with a Millennium of Moral Progress*](https://arxiv.org/abs/2406.20087):
> Frontier AI systems, including large language models (LLMs), hold increasing influence over the epistemology of human users. Such influence can reinforce prevailing societal values, potentially contributing to the lock-in of misguided moral beliefs and, consequently, the perpetuation of problematic moral practices on a broad scale.
>
> We introduce *progress alignment* as a technical solution to mitigate this imminent risk. Progress alignment algorithms learn to emulate the mechanics of human moral progress, thereby addressing the susceptibility of existing alignment methods to contemporary moral blindspots.
#### ProgressGym-HistLlama3-8B-C019-pretrain
ProgressGym-HistLlama3-8B-C019-pretrain is one of the **36 historical language models** in the ProgressGym framework. It is a pretrained model without instruction-tuning. For the instruction-tuned version, see [ProgressGym-HistLlama3-8B-C019-instruct](https://huggingface.co/PKU-Alignment/ProgressGym-HistLlama3-8B-C019-instruct).
**ProgressGym-HistLlama3-8B-C019-pretrain is under continual iteration.** Improving upon the current version, new versions of the model are currently being trained to reflect historical moral tendencies in ever more comprehensive ways.
**ProgressGym-HistLlama3-8B-C019-pretrain is a 19th-century historical language model.** Based on [Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B), It is continued-pretrained on the 19th-century text data from [ProgressGym-HistText](https://huggingface.co/datasets/PKU-Alignment/ProgressGym-HistText), using the following hyperparameters:
- learning_rate: 1.5e-05
- train_batch_size: 8
- eval_batch_size: 16
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 64
- total_eval_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: polynomial
- lr_scheduler_warmup_steps: 20
- num_epochs: 4.0
- mixed_precision_training: Native AMP
... with the following training results:
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| 2.3809 | 0.1923 | 200 | 2.4207 |
| 2.3057 | 0.3846 | 400 | 2.3750 |
| 2.35 | 0.5769 | 600 | 2.3477 |
| 2.3291 | 0.7692 | 800 | 2.3324 |
| 2.2998 | 0.9615 | 1000 | 2.3237 |
| 2.1248 | 1.1538 | 1200 | 2.3361 |
| 2.1239 | 1.3462 | 1400 | 2.3344 |
| 2.1521 | 1.5385 | 1600 | 2.3338 |
| 2.1359 | 1.7308 | 1800 | 2.3336 |
| 2.0531 | 1.9231 | 2000 | 2.3332 |
| 2.0783 | 2.1154 | 2200 | 2.3357 |
| 2.0952 | 2.3077 | 2400 | 2.3360 |
| 2.1009 | 2.5 | 2600 | 2.3361 |
| 2.125 | 2.6923 | 2800 | 2.3360 |
| 2.1206 | 2.8846 | 3000 | 2.3360 |
| 2.0593 | 3.0769 | 3200 | 2.3363 |
| 2.0927 | 3.2692 | 3400 | 2.3365 |
| 2.093 | 3.4615 | 3600 | 2.3368 |
| 2.066 | 3.6538 | 3800 | 2.3363 |
| 2.1086 | 3.8462 | 4000 | 2.3362 |
Note that the training data volume for the continued pretraining stage is capped at 3GB. When the corresponding century's corpus exceeds this volume, the training data is randomly sampled to fit the volume.
## Links
- **[Paper Preprint]** [ProgressGym: Alignment with a Millennium of Moral Progress](https://arxiv.org/abs/2406.20087)
- **[Leaderboard & Interactive Playground]** [PKU-Alignment/ProgressGym-LeaderBoard](https://huggingface.co/spaces/PKU-Alignment/ProgressGym-LeaderBoard)
- **[Huggingface Data & Model Collection]** [PKU-Alignment/ProgressGym](https://huggingface.co/collections/PKU-Alignment/progressgym-666735fcf3e4efa276226eaa)
- **[Github Codebase]** [PKU-Alignment/ProgressGym](https://github.com/PKU-Alignment/ProgressGym)
- **[Documentation]** [ProgressGym Documentation](https://pku-alignment.github.io/ProgressGym/)
- **[PyPI Package]** *(coming soon - [stay tuned](https://forms.gle/1TWFLL4ZCLeYTD5N6)!)*
## Citation
If the datasets, models, or framework of ProgressGym help you in your project, please cite ProgressGym using the bibtex entry below.
```text
@article{progressgym,
title={ProgressGym: Alignment with a Millennium of Moral Progress},
author={Tianyi Qiu and Yang Zhang and Xuchuan Huang and Jasmine Xinze Li and Jiaming Ji and Yaodong Yang},
journal={arXiv preprint arXiv:2406.20087},
eprint={2406.20087},
eprinttype = {arXiv},
year={2024}
}
```
<p style="color: lightgrey;">如果您是本模型的贡献者,我们邀请您根据<a href="https://modelscope.cn/docs/ModelScope%E6%A8%A1%E5%9E%8B%E6%8E%A5%E5%85%A5%E6%B5%81%E7%A8%8B%E6%A6%82%E8%A7%88" style="color: lightgrey; text-decoration: underline;">模型贡献文档</a>,及时完善模型卡片内容。</p>
## Ethics Statement
- **Copyright information of historical text data sources**:
- Project Gutenberg, one among our four source of our historical text data, consists only of texts in the public domain.
- For the text that we draw from Internet Archive, we only include those that uploaded by *Library of Congress*, which are texts freely released online by the U.S. Library of Congress for research and public use.
- The text data from Early English Books Online are, according to their publisher, "freely available to the public" and "available for access, distribution, use, or reuse by anyone".
- The last remaining source of our historical text data, the Pile of Law dataset, is released under a Creative Commons license, which we adhere to in our use.
- **Reproducibility**: To ensure reproducibility, we open-source all the code involved in the production of our main results (including the entire pipeline starting from data collection and model training), as well as the supporting infrastructure (the ProgressGym framework), making replication as easy as running a few simple script files.
- **Misuse Prevention**: In order to prevent potential misuse of progress alignment algorithms, we have carefully formulated progress alignment as strictly value-neutral, without *a priori* assumptions on the direction of progress. In the event of potential misuse of our dataset, we condemn any misuse attempt to the strongest degree possible, and will work with the research community on whistleblowing for such attempts.
- **Open-Sourcing**: We confirm that our code, data, and models are to be open-sourced under a CC-BY 4.0 license. We will continue to maintain and update our open-source repositories and models.

13
all_results.json Normal file
View File

@@ -0,0 +1,13 @@
{
"epoch": 4.0,
"eval_loss": 2.3333284854888916,
"eval_runtime": 352.6998,
"eval_samples_per_second": 205.14,
"eval_steps_per_second": 1.605,
"perplexity": 10.312208509221884,
"total_flos": 4255641501696000.0,
"train_loss": 2.2957492570853644,
"train_runtime": 58446.3006,
"train_samples_per_second": 44.566,
"train_steps_per_second": 0.696
}

28
config.json Normal file
View File

@@ -0,0 +1,28 @@
{
"_name_or_path": "/aifs4su/yaodong/models/Meta-Llama-3-8B",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128001,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.40.1",
"use_cache": false,
"vocab_size": 128256
}

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}

8
eval_results.json Normal file
View File

@@ -0,0 +1,8 @@
{
"epoch": 4.0,
"eval_loss": 2.3333284854888916,
"eval_runtime": 352.6998,
"eval_samples_per_second": 205.14,
"eval_steps_per_second": 1.605,
"perplexity": 10.312208509221884
}

9
generation_config.json Normal file
View File

@@ -0,0 +1,9 @@
{
"bos_token_id": 128000,
"do_sample": true,
"eos_token_id": 128001,
"max_length": 4096,
"temperature": 0.6,
"top_p": 0.9,
"transformers_version": "4.40.1"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f289dd3234231554f2e088b91fb3ec917b303bd7c22d1d6ad342ad8a9cd09599
size 4976698592

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:988615b002e33c9119acb63e007e9b0d0fbd7bbe6be5e3400009ec5f24f45b6c
size 4999802616

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b7223079cb9b1d8c8ec82961e405d001da61d5e2bd51152fedd8d69e12903598
size 4915916080

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:315c0c9d7d4ff47a6de18d97be8dbbf693fd0ed87bfd9cbe32706d365852e133
size 1168138808

View File

@@ -0,0 +1,298 @@
{
"metadata": {
"total_size": 16060522496
},
"weight_map": {
"lm_head.weight": "model-00004-of-00004.safetensors",
"model.embed_tokens.weight": "model-00001-of-00004.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.20.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.21.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.30.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.input_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.norm.weight": "model-00004-of-00004.safetensors"
}
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 85 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 210 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 178 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 218 KiB

17
special_tokens_map.json Normal file
View File

@@ -0,0 +1,17 @@
{
"bos_token": {
"content": "<|begin_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<|end_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": "<|end_of_text|>"
}

410563
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

2065
tokenizer_config.json Normal file

File diff suppressed because it is too large Load Diff

8
train_results.json Normal file
View File

@@ -0,0 +1,8 @@
{
"epoch": 4.0,
"total_flos": 4255641501696000.0,
"train_loss": 2.2957492570853644,
"train_runtime": 58446.3006,
"train_samples_per_second": 44.566,
"train_steps_per_second": 0.696
}

33
trainer_log.jsonl Normal file
View File

@@ -0,0 +1,33 @@
{"current_steps": 1, "total_steps": 40700, "loss": 2.7384, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 0.0, "epoch": 9.828009828009828e-05, "percentage": 0.0, "elapsed_time": "0:00:07", "remaining_time": "3 days, 10:46:01"}
{"current_steps": 2035, "total_steps": 40700, "loss": 2.4775, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.9927939731411727e-06, "epoch": 0.2, "percentage": 5.0, "elapsed_time": "0:48:37", "remaining_time": "15:23:44"}
{"current_steps": 4070, "total_steps": 40700, "loss": 2.404, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 2.238606233181306e-06, "epoch": 0.4, "percentage": 10.0, "elapsed_time": "1:34:37", "remaining_time": "14:11:37"}
{"current_steps": 4070, "total_steps": 40700, "loss": null, "eval_loss": 2.388622283935547, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.4, "percentage": 10.0, "elapsed_time": "1:34:37", "remaining_time": "14:11:37"}
{"current_steps": 6105, "total_steps": 40700, "loss": 2.3695, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.2183995585808034e-06, "epoch": 0.6, "percentage": 15.0, "elapsed_time": "2:27:16", "remaining_time": "13:54:33"}
{"current_steps": 8140, "total_steps": 40700, "loss": 2.3519, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 6.503021712782451e-07, "epoch": 0.8, "percentage": 20.0, "elapsed_time": "3:12:33", "remaining_time": "12:50:13"}
{"current_steps": 8140, "total_steps": 40700, "loss": null, "eval_loss": 2.347768545150757, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 0.8, "percentage": 20.0, "elapsed_time": "3:12:33", "remaining_time": "12:50:13"}
{"current_steps": 10175, "total_steps": 40700, "loss": 2.341, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 3.454633956957804e-07, "epoch": 1.0, "percentage": 25.0, "elapsed_time": "4:04:24", "remaining_time": "12:13:14"}
{"current_steps": 12210, "total_steps": 40700, "loss": 2.272, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.8843776358550853e-07, "epoch": 1.2, "percentage": 30.0, "elapsed_time": "4:49:33", "remaining_time": "11:15:37"}
{"current_steps": 12210, "total_steps": 40700, "loss": null, "eval_loss": 2.3399369716644287, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.2, "percentage": 30.0, "elapsed_time": "4:49:33", "remaining_time": "11:15:37"}
{"current_steps": 14245, "total_steps": 40700, "loss": 2.2758, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 1.1134838176940004e-07, "epoch": 1.4, "percentage": 35.0, "elapsed_time": "5:41:20", "remaining_time": "10:33:54"}
{"current_steps": 16280, "total_steps": 40700, "loss": 2.2737, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 7.549690625694449e-08, "epoch": 1.6, "percentage": 40.0, "elapsed_time": "6:26:27", "remaining_time": "9:39:41"}
{"current_steps": 16280, "total_steps": 40700, "loss": null, "eval_loss": 2.3373029232025146, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 1.6, "percentage": 40.0, "elapsed_time": "6:26:27", "remaining_time": "9:39:41"}
{"current_steps": 18315, "total_steps": 40700, "loss": 2.2725, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.980471884190691e-08, "epoch": 1.8, "percentage": 45.0, "elapsed_time": "7:18:12", "remaining_time": "8:55:34"}
{"current_steps": 20350, "total_steps": 40700, "loss": 2.2735, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.3444261786522375e-08, "epoch": 2.0, "percentage": 50.0, "elapsed_time": "8:03:21", "remaining_time": "8:03:21"}
{"current_steps": 20350, "total_steps": 40700, "loss": null, "eval_loss": 2.3360562324523926, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.0, "percentage": 50.0, "elapsed_time": "8:03:21", "remaining_time": "8:03:21"}
{"current_steps": 22385, "total_steps": 40700, "loss": 2.2617, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.108383189787308e-08, "epoch": 2.2, "percentage": 55.0, "elapsed_time": "8:55:06", "remaining_time": "7:17:48"}
{"current_steps": 24420, "total_steps": 40700, "loss": 2.2629, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.029810361474273e-08, "epoch": 2.4, "percentage": 60.0, "elapsed_time": "9:40:13", "remaining_time": "6:26:48"}
{"current_steps": 24420, "total_steps": 40700, "loss": null, "eval_loss": 2.3360707759857178, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.4, "percentage": 60.0, "elapsed_time": "9:40:13", "remaining_time": "6:26:48"}
{"current_steps": 26455, "total_steps": 40700, "loss": 2.2617, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.006893878600473e-08, "epoch": 2.6, "percentage": 65.0, "elapsed_time": "10:31:55", "remaining_time": "5:40:16"}
{"current_steps": 28490, "total_steps": 40700, "loss": 2.2655, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.001272696084112e-08, "epoch": 2.8, "percentage": 70.0, "elapsed_time": "11:16:59", "remaining_time": "4:50:08"}
{"current_steps": 28490, "total_steps": 40700, "loss": null, "eval_loss": 2.3352677822113037, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 2.8, "percentage": 70.0, "elapsed_time": "11:16:59", "remaining_time": "4:50:08"}
{"current_steps": 30525, "total_steps": 40700, "loss": 2.2615, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0001729586514593e-08, "epoch": 3.0, "percentage": 75.0, "elapsed_time": "12:08:41", "remaining_time": "4:02:53"}
{"current_steps": 32560, "total_steps": 40700, "loss": 2.2567, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.0000150744726626e-08, "epoch": 3.2, "percentage": 80.0, "elapsed_time": "12:53:45", "remaining_time": "3:13:26"}
{"current_steps": 32560, "total_steps": 40700, "loss": null, "eval_loss": 2.334810733795166, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.2, "percentage": 80.0, "elapsed_time": "12:53:45", "remaining_time": "3:13:26"}
{"current_steps": 34595, "total_steps": 40700, "loss": 2.2555, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.000000648758777e-08, "epoch": 3.4, "percentage": 85.0, "elapsed_time": "13:45:25", "remaining_time": "2:25:39"}
{"current_steps": 36630, "total_steps": 40700, "loss": 2.2581, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.000000007849755e-08, "epoch": 3.6, "percentage": 90.0, "elapsed_time": "14:30:28", "remaining_time": "1:36:43"}
{"current_steps": 36630, "total_steps": 40700, "loss": null, "eval_loss": 2.3342058658599854, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 3.6, "percentage": 90.0, "elapsed_time": "14:30:28", "remaining_time": "1:36:43"}
{"current_steps": 38665, "total_steps": 40700, "loss": 2.2592, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5.000000000004389e-08, "epoch": 3.8, "percentage": 95.0, "elapsed_time": "15:22:10", "remaining_time": "0:48:32"}
{"current_steps": 40700, "total_steps": 40700, "loss": 2.2607, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": 5e-08, "epoch": 4.0, "percentage": 100.0, "elapsed_time": "16:07:14", "remaining_time": "0:00:00"}
{"current_steps": 40700, "total_steps": 40700, "loss": null, "eval_loss": 2.3333284854888916, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 4.0, "percentage": 100.0, "elapsed_time": "16:07:14", "remaining_time": "0:00:00"}
{"current_steps": 40700, "total_steps": 40700, "loss": null, "eval_loss": null, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 4.0, "percentage": 100.0, "elapsed_time": "16:07:14", "remaining_time": "0:00:00"}
{"current_steps": 566, "total_steps": 566, "loss": null, "eval_loss": 2.3333284854888916, "predict_loss": null, "reward": null, "learning_rate": null, "epoch": 4.0, "percentage": 100.0, "elapsed_time": "16:20:14", "remaining_time": "0:00:00"}

257
trainer_state.json Normal file
View File

@@ -0,0 +1,257 @@
{
"best_metric": 2.3333284854888916,
"best_model_checkpoint": "./output/training_results/C019_random_sample_Meta-Llama-3-8B_pretrain_20240726_033210/checkpoint-40700",
"epoch": 4.0,
"eval_steps": 4070,
"global_step": 40700,
"is_hyper_param_search": false,
"is_local_process_zero": true,
"is_world_process_zero": true,
"log_history": [
{
"epoch": 9.828009828009828e-05,
"grad_norm": 0.0,
"learning_rate": 0.0,
"loss": 2.7384,
"step": 1
},
{
"epoch": 0.2,
"grad_norm": 2.155671963962152,
"learning_rate": 1.9927939731411727e-06,
"loss": 2.4775,
"step": 2035
},
{
"epoch": 0.4,
"grad_norm": 2.0610799346731716,
"learning_rate": 2.238606233181306e-06,
"loss": 2.404,
"step": 4070
},
{
"epoch": 0.4,
"eval_loss": 2.388622283935547,
"eval_runtime": 386.8721,
"eval_samples_per_second": 187.02,
"eval_steps_per_second": 1.463,
"step": 4070
},
{
"epoch": 0.6,
"grad_norm": 1.852185161682542,
"learning_rate": 1.2183995585808034e-06,
"loss": 2.3695,
"step": 6105
},
{
"epoch": 0.8,
"grad_norm": 1.963949683462878,
"learning_rate": 6.503021712782451e-07,
"loss": 2.3519,
"step": 8140
},
{
"epoch": 0.8,
"eval_loss": 2.347768545150757,
"eval_runtime": 353.4879,
"eval_samples_per_second": 204.683,
"eval_steps_per_second": 1.601,
"step": 8140
},
{
"epoch": 1.0,
"grad_norm": 1.8257205248825334,
"learning_rate": 3.454633956957804e-07,
"loss": 2.341,
"step": 10175
},
{
"epoch": 1.2,
"grad_norm": 1.934651438097829,
"learning_rate": 1.8843776358550853e-07,
"loss": 2.272,
"step": 12210
},
{
"epoch": 1.2,
"eval_loss": 2.3399369716644287,
"eval_runtime": 353.2432,
"eval_samples_per_second": 204.825,
"eval_steps_per_second": 1.602,
"step": 12210
},
{
"epoch": 1.4,
"grad_norm": 1.9087851196083852,
"learning_rate": 1.1134838176940004e-07,
"loss": 2.2758,
"step": 14245
},
{
"epoch": 1.6,
"grad_norm": 1.9434977796166109,
"learning_rate": 7.549690625694449e-08,
"loss": 2.2737,
"step": 16280
},
{
"epoch": 1.6,
"eval_loss": 2.3373029232025146,
"eval_runtime": 353.221,
"eval_samples_per_second": 204.838,
"eval_steps_per_second": 1.602,
"step": 16280
},
{
"epoch": 1.8,
"grad_norm": 1.8796392127857566,
"learning_rate": 5.980471884190691e-08,
"loss": 2.2725,
"step": 18315
},
{
"epoch": 2.0,
"grad_norm": 1.9345766812761183,
"learning_rate": 5.3444261786522375e-08,
"loss": 2.2735,
"step": 20350
},
{
"epoch": 2.0,
"eval_loss": 2.3360562324523926,
"eval_runtime": 353.5294,
"eval_samples_per_second": 204.659,
"eval_steps_per_second": 1.601,
"step": 20350
},
{
"epoch": 2.2,
"grad_norm": 1.941719791382991,
"learning_rate": 5.108383189787308e-08,
"loss": 2.2617,
"step": 22385
},
{
"epoch": 2.4,
"grad_norm": 1.9268741466862327,
"learning_rate": 5.029810361474273e-08,
"loss": 2.2629,
"step": 24420
},
{
"epoch": 2.4,
"eval_loss": 2.3360707759857178,
"eval_runtime": 353.1028,
"eval_samples_per_second": 204.906,
"eval_steps_per_second": 1.603,
"step": 24420
},
{
"epoch": 2.6,
"grad_norm": 1.9707963592912294,
"learning_rate": 5.006893878600473e-08,
"loss": 2.2617,
"step": 26455
},
{
"epoch": 2.8,
"grad_norm": 1.9597319665023907,
"learning_rate": 5.001272696084112e-08,
"loss": 2.2655,
"step": 28490
},
{
"epoch": 2.8,
"eval_loss": 2.3352677822113037,
"eval_runtime": 353.4276,
"eval_samples_per_second": 204.718,
"eval_steps_per_second": 1.601,
"step": 28490
},
{
"epoch": 3.0,
"grad_norm": 34.02276184361365,
"learning_rate": 5.0001729586514593e-08,
"loss": 2.2615,
"step": 30525
},
{
"epoch": 3.2,
"grad_norm": 2.124382265375981,
"learning_rate": 5.0000150744726626e-08,
"loss": 2.2567,
"step": 32560
},
{
"epoch": 3.2,
"eval_loss": 2.334810733795166,
"eval_runtime": 353.1613,
"eval_samples_per_second": 204.872,
"eval_steps_per_second": 1.603,
"step": 32560
},
{
"epoch": 3.4,
"grad_norm": 1.9867886821885041,
"learning_rate": 5.000000648758777e-08,
"loss": 2.2555,
"step": 34595
},
{
"epoch": 3.6,
"grad_norm": 2.0413183498941763,
"learning_rate": 5.000000007849755e-08,
"loss": 2.2581,
"step": 36630
},
{
"epoch": 3.6,
"eval_loss": 2.3342058658599854,
"eval_runtime": 353.3438,
"eval_samples_per_second": 204.767,
"eval_steps_per_second": 1.602,
"step": 36630
},
{
"epoch": 3.8,
"grad_norm": 1.954005554664922,
"learning_rate": 5.000000000004389e-08,
"loss": 2.2592,
"step": 38665
},
{
"epoch": 4.0,
"grad_norm": 1.9904895703350474,
"learning_rate": 5e-08,
"loss": 2.2607,
"step": 40700
},
{
"epoch": 4.0,
"eval_loss": 2.3333284854888916,
"eval_runtime": 353.3086,
"eval_samples_per_second": 204.787,
"eval_steps_per_second": 1.602,
"step": 40700
},
{
"epoch": 4.0,
"step": 40700,
"total_flos": 4255641501696000.0,
"train_loss": 2.2957492570853644,
"train_runtime": 58446.3006,
"train_samples_per_second": 44.566,
"train_steps_per_second": 0.696
}
],
"logging_steps": 2035,
"max_steps": 40700,
"num_input_tokens_seen": 0,
"num_train_epochs": 4,
"save_steps": 4070,
"total_flos": 4255641501696000.0,
"train_batch_size": 8,
"trial_name": null,
"trial_params": null
}

3
training_args.bin Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:293030b794726b6c53b1e5ab88792e0195e946a55ec4a5a551e1c4746c971cc7
size 6904

BIN
training_eval_loss.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

BIN
training_loss.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB