初始化项目,由ModelHub XC社区提供模型

Model: EleutherAI/polyglot-ko-12.8b
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-20 20:06:12 +08:00
commit b0fb6e33a6
38 changed files with 1001 additions and 0 deletions

49
.gitattributes vendored Normal file
View File

@@ -0,0 +1,49 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bin.* filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zstandard filter=lfs diff=lfs merge=lfs -text
*.tfevents* filter=lfs diff=lfs merge=lfs -text
*.db* filter=lfs diff=lfs merge=lfs -text
*.ark* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.gguf* filter=lfs diff=lfs merge=lfs -text
*.ggml filter=lfs diff=lfs merge=lfs -text
*.llamafile* filter=lfs diff=lfs merge=lfs -text
*.pt2 filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text

200
README.md Normal file
View File

@@ -0,0 +1,200 @@
---
language:
- ko
tags:
- pytorch
- causal-lm
license: apache-2.0
---
# Polyglot-Ko-12.8B
## Model Description
Polyglot-Ko is a series of large-scale Korean autoregressive language models made by the EleutherAI polyglot team.
| Hyperparameter | Value |
|----------------------|----------------------------------------------------------------------------------------------------------------------------------------|
| \\(n_{parameters}\\) | 12,898,631,680 |
| \\(n_{layers}\\) | 40 |
| \\(d_{model}\\) | 5120 |
| \\(d_{ff}\\) | 20,480 |
| \\(n_{heads}\\) | 40 |
| \\(d_{head}\\) | 128 |
| \\(n_{ctx}\\) | 2,048 |
| \\(n_{vocab}\\) | 30,003 / 30,080 |
| Positional Encoding | [Rotary Position Embedding (RoPE)](https://arxiv.org/abs/2104.09864) |
| RoPE Dimensions | [64](https://github.com/kingoflolz/mesh-transformer-jax/blob/f2aa66e0925de6593dcbb70e72399b97b4130482/mesh_transformer/layers.py#L223) |
The model consists of 40 transformer layers with a model dimension of 5120, and a feedforward dimension of 20480. The model
dimension is split into 40 heads, each with a dimension of 128. Rotary Position Embedding (RoPE) is applied to 64
dimensions of each head. The model is trained with a tokenization vocabulary of 30003.
## Training data
Polyglot-Ko-12.8B was trained on 863 GB of Korean language data (1.2TB before processing), a large-scale dataset curated by [TUNiB](https://tunib.ai/). The data collection process has abided by South Korean laws. This dataset was collected for the purpose of training Polyglot-Ko models, so it will not be released for public use.
| Source |Size (GB) | Link |
|-------------------------------------|---------|------------------------------------------|
| Korean blog posts | 682.3 | - |
| Korean news dataset | 87.0 | - |
| Modu corpus | 26.4 |corpus.korean.go.kr |
| Korean patent dataset | 19.0 | - |
| Korean Q & A dataset | 18.1 | - |
| KcBert dataset | 12.7 | github.com/Beomi/KcBERT |
| Korean fiction dataset | 6.1 | - |
| Korean online comments | 4.2 | - |
| Korean wikipedia | 1.4 | ko.wikipedia.org |
| Clova call | < 1.0 | github.com/clovaai/ClovaCall |
| Naver sentiment movie corpus | < 1.0 | github.com/e9t/nsmc |
| Korean hate speech dataset | < 1.0 | - |
| Open subtitles | < 1.0 | opus.nlpl.eu/OpenSubtitles.php |
| AIHub various tasks datasets | < 1.0 |aihub.or.kr |
| Standard Korean language dictionary | < 1.0 | stdict.korean.go.kr/main/main.do |
Furthermore, in order to avoid the model memorizing and generating personally identifiable information (PII) in the training data, we masked out the following sensitive information in the pre-processing stage:
* `<|acc|>` : bank account number
* `<|rrn|>` : resident registration number
* `<|tell|>` : phone number
## Training procedure
Polyglot-Ko-12.8B was trained for 167 billion tokens over 301,000 steps on 256 A100 GPUs with the [GPT-NeoX framework](https://github.com/EleutherAI/gpt-neox). It was trained as an autoregressive language model, using cross-entropy loss to maximize the likelihood of predicting the next token.
## How to use
This model can be easily loaded using the `AutoModelForCausalLM` class:
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/polyglot-ko-12.8b")
model = AutoModelForCausalLM.from_pretrained("EleutherAI/polyglot-ko-12.8b")
```
## Evaluation results
We evaluate Polyglot-Ko-3.8B on [KOBEST dataset](https://arxiv.org/abs/2204.04541), a benchmark with 5 downstream tasks, against comparable models such as skt/ko-gpt-trinity-1.2B-v0.5, kakaobrain/kogpt and facebook/xglm-7.5B, using the prompts provided in the paper.
The following tables show the results when the number of few-shot examples differ. You can reproduce these results using the [polyglot branch of lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/polyglot) and the following scripts. For a fair comparison, all models were run under the same conditions and using the same prompts. In the tables, `n` refers to the number of few-shot examples.
In case of WiC dataset, all models show random performance.
```console
python main.py \
--model gpt2 \
--model_args pretrained='EleutherAI/polyglot-ko-3.8b' \
--tasks kobest_copa,kobest_hellaswag \
--num_fewshot $YOUR_NUM_FEWSHOT \
--batch_size $YOUR_BATCH_SIZE \
--device $YOUR_DEVICE \
--output_path $/path/to/output/
```
### COPA (F1)
| Model | params | 0-shot | 5-shot | 10-shot | 50-shot |
|----------------------------------------------------------------------------------------------|--------|--------|--------|---------|---------|
| [skt/ko-gpt-trinity-1.2B-v0.5](https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5) | 1.2B | 0.6696 | 0.6477 | 0.6419 | 0.6514 |
| [kakaobrain/kogpt](https://huggingface.co/kakaobrain/kogpt) | 6.0B | 0.7345 | 0.7287 | 0.7277 | 0.7479 |
| [facebook/xglm-7.5B](https://huggingface.co/facebook/xglm-7.5B) | 7.5B | 0.6723 | 0.6731 | 0.6769 | 0.7119 |
| [EleutherAI/polyglot-ko-1.3b](https://huggingface.co/EleutherAI/polyglot-ko-1.3b) | 1.3B | 0.7196 | 0.7193 | 0.7204 | 0.7206 |
| [EleutherAI/polyglot-ko-3.8b](https://huggingface.co/EleutherAI/polyglot-ko-3.8b) | 3.8B | 0.7595 | 0.7608 | 0.7638 | 0.7788 |
| [EleutherAI/polyglot-ko-5.8b](https://huggingface.co/EleutherAI/polyglot-ko-5.8b) | 5.8B | 0.7745 | 0.7676 | 0.7775 | 0.7887 |
| **[EleutherAI/polyglot-ko-12.8b](https://huggingface.co/EleutherAI/polyglot-ko-12.8b) (this)** | **12.8B** | **0.7937** | **0.8108** | **0.8037** | **0.8369** |
<img src="https://github.com/EleutherAI/polyglot/assets/19511788/d5b49364-aed5-4467-bae2-5a322c8e2ceb" width="800px">
### HellaSwag (F1)
| Model | params | 0-shot | 5-shot | 10-shot | 50-shot |
|----------------------------------------------------------------------------------------------|--------|--------|--------|---------|---------|
| [skt/ko-gpt-trinity-1.2B-v0.5](https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5) | 1.2B | 0.5243 | 0.5272 | 0.5166 | 0.5352 |
| [kakaobrain/kogpt](https://huggingface.co/kakaobrain/kogpt) | 6.0B | 0.5590 | 0.5833 | 0.5828 | 0.5907 |
| [facebook/xglm-7.5B](https://huggingface.co/facebook/xglm-7.5B) | 7.5B | 0.5665 | 0.5689 | 0.5565 | 0.5622 |
| [EleutherAI/polyglot-ko-1.3b](https://huggingface.co/EleutherAI/polyglot-ko-1.3b) | 1.3B | 0.5247 | 0.5260 | 0.5278 | 0.5427 |
| [EleutherAI/polyglot-ko-3.8b](https://huggingface.co/EleutherAI/polyglot-ko-3.8b) | 3.8B | 0.5707 | 0.5830 | 0.5670 | 0.5787 |
| [EleutherAI/polyglot-ko-5.8b](https://huggingface.co/EleutherAI/polyglot-ko-5.8b) | 5.8B | 0.5976 | 0.5998 | 0.5979 | 0.6208 |
| **[EleutherAI/polyglot-ko-12.8b (this)](https://huggingface.co/EleutherAI/polyglot-ko-12.8b)** | **12.8B** | **0.5954** | **0.6306** | **0.6098** | **0.6118** |
<img src="https://github.com/EleutherAI/polyglot/assets/19511788/5acb60ac-161a-4ab3-a296-db4442e08b7f" width="800px">
### BoolQ (F1)
| Model | params | 0-shot | 5-shot | 10-shot | 50-shot |
|----------------------------------------------------------------------------------------------|--------|--------|--------|---------|---------|
| [skt/ko-gpt-trinity-1.2B-v0.5](https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5) | 1.2B | 0.3356 | 0.4014 | 0.3640 | 0.3560 |
| [kakaobrain/kogpt](https://huggingface.co/kakaobrain/kogpt) | 6.0B | 0.4514 | 0.5981 | 0.5499 | 0.5202 |
| [facebook/xglm-7.5B](https://huggingface.co/facebook/xglm-7.5B) | 7.5B | 0.4464 | 0.3324 | 0.3324 | 0.3324 |
| [EleutherAI/polyglot-ko-1.3b](https://huggingface.co/EleutherAI/polyglot-ko-1.3b) | 1.3B | 0.3552 | 0.4751 | 0.4109 | 0.4038 |
| [EleutherAI/polyglot-ko-3.8b](https://huggingface.co/EleutherAI/polyglot-ko-3.8b) | 3.8B | 0.4320 | 0.5263 | 0.4930 | 0.4038 |
| [EleutherAI/polyglot-ko-5.8b](https://huggingface.co/EleutherAI/polyglot-ko-5.8b) | 5.8B | 0.4356 | 0.5698 | 0.5187 | 0.5236 |
| **[EleutherAI/polyglot-ko-12.8b (this)](https://huggingface.co/EleutherAI/polyglot-ko-12.8b)** | **12.8B** | **0.4818** | **0.6041** | **0.6289** | **0.6448** |
<img src="https://github.com/EleutherAI/polyglot/assets/19511788/b74c23c0-01f3-4b68-9e10-a48e9aa052ab" width="800px">
### SentiNeg (F1)
| Model | params | 0-shot | 5-shot | 10-shot | 50-shot |
|----------------------------------------------------------------------------------------------|--------|--------|--------|---------|---------|
| [skt/ko-gpt-trinity-1.2B-v0.5](https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5) | 1.2B | 0.6065 | 0.6878 | 0.7280 | 0.8413 |
| [kakaobrain/kogpt](https://huggingface.co/kakaobrain/kogpt) | 6.0B | 0.3747 | 0.8942 | 0.9294 | 0.9698 |
| [facebook/xglm-7.5B](https://huggingface.co/facebook/xglm-7.5B) | 7.5B | 0.3578 | 0.4471 | 0.3964 | 0.5271 |
| [EleutherAI/polyglot-ko-1.3b](https://huggingface.co/EleutherAI/polyglot-ko-1.3b) | 1.3B | 0.6790 | 0.6257 | 0.5514 | 0.7851 |
| [EleutherAI/polyglot-ko-3.8b](https://huggingface.co/EleutherAI/polyglot-ko-3.8b) | 3.8B | 0.4858 | 0.7950 | 0.7320 | 0.7851 |
| [EleutherAI/polyglot-ko-5.8b](https://huggingface.co/EleutherAI/polyglot-ko-5.8b) | 5.8B | 0.3394 | 0.8841 | 0.8808 | 0.9521 |
| **[EleutherAI/polyglot-ko-12.8b (this)](https://huggingface.co/EleutherAI/polyglot-ko-12.8b)** | **12.8B** | **0.9117** | **0.9015** | **0.9345** | **0.9723** |
<img src="https://github.com/EleutherAI/polyglot/assets/19511788/95b56b19-d349-4b70-9ff9-94a5560f89ee" width="800px">
### WiC (F1)
| Model | params | 0-shot | 5-shot | 10-shot | 50-shot |
|----------------------------------------------------------------------------------------------|--------|--------|--------|---------|---------|
| [skt/ko-gpt-trinity-1.2B-v0.5](https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5) | 1.2B | 0.3290 | 0.4313 | 0.4001 | 0.3621 |
| [kakaobrain/kogpt](https://huggingface.co/kakaobrain/kogpt) | 6.0B | 0.3526 | 0.4775 | 0.4358 | 0.4061 |
| [facebook/xglm-7.5B](https://huggingface.co/facebook/xglm-7.5B) | 7.5B | 0.3280 | 0.4903 | 0.4945 | 0.3656 |
| [EleutherAI/polyglot-ko-1.3b](https://huggingface.co/EleutherAI/polyglot-ko-1.3b) | 1.3B | 0.3297 | 0.4850 | 0.4650 | 0.3290 |
| [EleutherAI/polyglot-ko-3.8b](https://huggingface.co/EleutherAI/polyglot-ko-3.8b) | 3.8B | 0.3390 | 0.4944 | 0.4203 | 0.3835 |
| [EleutherAI/polyglot-ko-5.8b](https://huggingface.co/EleutherAI/polyglot-ko-5.8b) | 5.8B | 0.3913 | 0.4688 | 0.4189 | 0.3910 |
| **[EleutherAI/polyglot-ko-12.8b](https://huggingface.co/EleutherAI/polyglot-ko-12.8b) (this)** | **12.8B** | **0.3985** | **0.3683** | **0.3307** | **0.3273** |
<img src="https://github.com/EleutherAI/polyglot/assets/19511788/4de4a4c3-d7ac-4e04-8b0c-0d533fe88294" width="800px">
## Limitations and Biases
Polyglot-Ko has been trained to optimize next token prediction. Language models such as this are often used for a wide variety of tasks and it is important to be aware of possible unexpected outcomes. For instance, Polyglot-Ko will not always return the most factual or accurate response but the most statistically likely one. In addition, Polyglot may produce socially unacceptable or offensive content. We recommend having a human curator or other filtering mechanism to censor sensitive content.
## Citation and Related Information
### BibTeX entry
If you find our work useful, please consider citing:
```bibtex
@misc{ko2023technical,
title={A Technical Report for Polyglot-Ko: Open-Source Large-Scale Korean Language Models},
author={Hyunwoong Ko and Kichang Yang and Minho Ryu and Taekyoon Choi and Seungmu Yang and jiwung Hyun and Sungho Park},
year={2023},
eprint={2306.02254},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
### Licensing
All our models are licensed under the terms of the Apache License 2.0.
```
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
```
### Acknowledgement
This project was made possible thanks to the computing resources from [Stability.ai](https://stability.ai), and thanks to [TUNiB](https://tunib.ai) for providing a large-scale Korean dataset for this work.

27
config.json Normal file
View File

@@ -0,0 +1,27 @@
{
"_name_or_path": "./polyglot-ko-12.8b/",
"architectures": [
"GPTNeoXForCausalLM"
],
"bos_token_id": 0,
"classifier_dropout": 0.1,
"eos_token_id": 2,
"hidden_act": "gelu",
"hidden_size": 5120,
"initializer_range": 0.02,
"intermediate_size": 20480,
"layer_norm_eps": 1e-05,
"max_position_embeddings": 2048,
"model_type": "gpt_neox",
"num_attention_heads": 40,
"num_hidden_layers": 40,
"num_steps": "global_step301000",
"rotary_emb_base": 10000,
"rotary_pct": 0.5,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.29.2",
"use_cache": true,
"use_parallel_residual": true,
"vocab_size": 30080
}

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}

6
generation_config.json Normal file
View File

@@ -0,0 +1,6 @@
{
"_from_model_config": true,
"bos_token_id": 0,
"eos_token_id": 2,
"transformers_version": "4.29.2"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c1077f2f2f2d1a4adda1ba69524723a3343e1c9af0c9d839e0818487700964a3
size 945730380

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:bf66e85dfaa55d13d00c37bc84acde2290a387c1345290c18dbd2d34dd711d03
size 843231522

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4725b4a4ca9469b1063fc393b0350ae368694a2c24a852a9b1ed44d98e2c72cf
size 843231290

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1b46da00378d9570cd8969063f680b6db2c2a045f778776652446c75258a6261
size 1004754108

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:947d464386804caf29f0b5e05cdc583eae20d44145ecd188405e0e0925d52291
size 895670546

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:11407a250b137bba535c36858f8a449532d364649bf526e8d5bbb312a1e3dac2
size 1004754108

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2c136a2a37e220233430fd98ef08f1a6b065c7de168be74a7cd33e5070276ce9
size 895670546

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f134b18eaafcd76b161903985b5c0ab7ea1e9fd00fbbb5428a4c08975a8b5920
size 1004754132

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b62e7815e79e7408bb0a675690941e528498da56f044393100d44b1d361623bf
size 895670570

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f151e74235be0f549323cba93a6003556b7e13d8480e27626d7db540b9db6bc6
size 1004754140

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a9c3d6eec5f29e8e1939d4ab754341023d25692d816185be3fbac9b9901d97ae
size 895670570

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:cdfd157f5e29d53c0ed3ecad7ce30721b2dafde9c2092328facc5df5eff5fde3
size 1004754140

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:fd724af69f86a500164cc1780478b7104afe3910ecc8c95cd516f550397f1c55
size 895670570

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:11b110a8f23c20c659195f1b146c1b82e4717ea4955ea242a4318be3e7ae2025
size 1004754140

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8323b729fb2df1177e357f7ecd96b25b142f0d024e920d21cf873fd6021326c1
size 895670570

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:08ee1e2e803c1272e60f3eae7cccf9c68647608d52b79315ee52edf8a059b945
size 1004754140

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9cb2de4ff8946b29a9776a2e0360f81d9cf02d6efbae6db26496d62cd01d87be
size 895670570

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ba8337215210384ad9ac82b3efd260bc6c4c6a5fe1e51f4ba76f293e8844be1c
size 1004754140

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:027f7e1ced944fb2180a95cb42cbe5fc9ae98537c7d2500166688f15526307a7
size 895670570

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:cac26f449546e6926433479c6acae25731adc9cb741f888050e5fc9562c28d1c
size 1004754140

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b5bf23c027ad7d34408b03592bf1a6d4719b557ca7646096fe3e0523fda00e56
size 895670570

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:53a5e1e407bf5a1c7dcfe71856b05a9eb7146a46012254c71a41c09de03d599c
size 1004754140

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:41993efaf3f39c8f244ce8fba2b199a9f3a46b524619dfe3c940a5211f4581ea
size 895670570

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:89b41c91e41e3c4bee9d8a1474381e3e78c388fe889cfba2640f6f5265104164
size 1004754140

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6074ebecce3abe57fa02f8f71c389945095831bc2795ce5e494f1be7a89fd35f
size 895670570

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ec350b1755ef7ac78bc353115e2eb15cb2e358a5505ccfedd9f2277cb4776548
size 1004754140

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:91a9f7671a47acdfaa6a17f7492271a7b4339b7344aaed47429ade0bdf7f6e15
size 895670570

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c7ea0188f126c4cd893bfdf40fbb5fb222da030f321ce595dfcf855887d72ead
size 517765688

View File

@@ -0,0 +1,611 @@
{
"metadata": {
"total_size": 25808184400.0
},
"weight_map": {
"embed_out.weight": "model-00028-of-00028.safetensors",
"gpt_neox.embed_in.weight": "model-00001-of-00028.safetensors",
"gpt_neox.final_layer_norm.bias": "model-00028-of-00028.safetensors",
"gpt_neox.final_layer_norm.weight": "model-00028-of-00028.safetensors",
"gpt_neox.layers.0.attention.bias": "model-00001-of-00028.safetensors",
"gpt_neox.layers.0.attention.dense.bias": "model-00001-of-00028.safetensors",
"gpt_neox.layers.0.attention.dense.weight": "model-00001-of-00028.safetensors",
"gpt_neox.layers.0.attention.masked_bias": "model-00001-of-00028.safetensors",
"gpt_neox.layers.0.attention.query_key_value.bias": "model-00001-of-00028.safetensors",
"gpt_neox.layers.0.attention.query_key_value.weight": "model-00001-of-00028.safetensors",
"gpt_neox.layers.0.attention.rotary_emb.inv_freq": "model-00001-of-00028.safetensors",
"gpt_neox.layers.0.input_layernorm.bias": "model-00001-of-00028.safetensors",
"gpt_neox.layers.0.input_layernorm.weight": "model-00001-of-00028.safetensors",
"gpt_neox.layers.0.mlp.dense_4h_to_h.bias": "model-00001-of-00028.safetensors",
"gpt_neox.layers.0.mlp.dense_4h_to_h.weight": "model-00001-of-00028.safetensors",
"gpt_neox.layers.0.mlp.dense_h_to_4h.bias": "model-00001-of-00028.safetensors",
"gpt_neox.layers.0.mlp.dense_h_to_4h.weight": "model-00001-of-00028.safetensors",
"gpt_neox.layers.0.post_attention_layernorm.bias": "model-00001-of-00028.safetensors",
"gpt_neox.layers.0.post_attention_layernorm.weight": "model-00001-of-00028.safetensors",
"gpt_neox.layers.1.attention.bias": "model-00001-of-00028.safetensors",
"gpt_neox.layers.1.attention.dense.bias": "model-00002-of-00028.safetensors",
"gpt_neox.layers.1.attention.dense.weight": "model-00002-of-00028.safetensors",
"gpt_neox.layers.1.attention.masked_bias": "model-00001-of-00028.safetensors",
"gpt_neox.layers.1.attention.query_key_value.bias": "model-00002-of-00028.safetensors",
"gpt_neox.layers.1.attention.query_key_value.weight": "model-00002-of-00028.safetensors",
"gpt_neox.layers.1.attention.rotary_emb.inv_freq": "model-00001-of-00028.safetensors",
"gpt_neox.layers.1.input_layernorm.bias": "model-00001-of-00028.safetensors",
"gpt_neox.layers.1.input_layernorm.weight": "model-00001-of-00028.safetensors",
"gpt_neox.layers.1.mlp.dense_4h_to_h.bias": "model-00002-of-00028.safetensors",
"gpt_neox.layers.1.mlp.dense_4h_to_h.weight": "model-00002-of-00028.safetensors",
"gpt_neox.layers.1.mlp.dense_h_to_4h.bias": "model-00002-of-00028.safetensors",
"gpt_neox.layers.1.mlp.dense_h_to_4h.weight": "model-00002-of-00028.safetensors",
"gpt_neox.layers.1.post_attention_layernorm.bias": "model-00001-of-00028.safetensors",
"gpt_neox.layers.1.post_attention_layernorm.weight": "model-00001-of-00028.safetensors",
"gpt_neox.layers.10.attention.bias": "model-00008-of-00028.safetensors",
"gpt_neox.layers.10.attention.dense.bias": "model-00008-of-00028.safetensors",
"gpt_neox.layers.10.attention.dense.weight": "model-00008-of-00028.safetensors",
"gpt_neox.layers.10.attention.masked_bias": "model-00008-of-00028.safetensors",
"gpt_neox.layers.10.attention.query_key_value.bias": "model-00008-of-00028.safetensors",
"gpt_neox.layers.10.attention.query_key_value.weight": "model-00008-of-00028.safetensors",
"gpt_neox.layers.10.attention.rotary_emb.inv_freq": "model-00008-of-00028.safetensors",
"gpt_neox.layers.10.input_layernorm.bias": "model-00008-of-00028.safetensors",
"gpt_neox.layers.10.input_layernorm.weight": "model-00008-of-00028.safetensors",
"gpt_neox.layers.10.mlp.dense_4h_to_h.bias": "model-00008-of-00028.safetensors",
"gpt_neox.layers.10.mlp.dense_4h_to_h.weight": "model-00008-of-00028.safetensors",
"gpt_neox.layers.10.mlp.dense_h_to_4h.bias": "model-00008-of-00028.safetensors",
"gpt_neox.layers.10.mlp.dense_h_to_4h.weight": "model-00008-of-00028.safetensors",
"gpt_neox.layers.10.post_attention_layernorm.bias": "model-00008-of-00028.safetensors",
"gpt_neox.layers.10.post_attention_layernorm.weight": "model-00008-of-00028.safetensors",
"gpt_neox.layers.11.attention.bias": "model-00008-of-00028.safetensors",
"gpt_neox.layers.11.attention.dense.bias": "model-00009-of-00028.safetensors",
"gpt_neox.layers.11.attention.dense.weight": "model-00009-of-00028.safetensors",
"gpt_neox.layers.11.attention.masked_bias": "model-00008-of-00028.safetensors",
"gpt_neox.layers.11.attention.query_key_value.bias": "model-00008-of-00028.safetensors",
"gpt_neox.layers.11.attention.query_key_value.weight": "model-00008-of-00028.safetensors",
"gpt_neox.layers.11.attention.rotary_emb.inv_freq": "model-00008-of-00028.safetensors",
"gpt_neox.layers.11.input_layernorm.bias": "model-00008-of-00028.safetensors",
"gpt_neox.layers.11.input_layernorm.weight": "model-00008-of-00028.safetensors",
"gpt_neox.layers.11.mlp.dense_4h_to_h.bias": "model-00009-of-00028.safetensors",
"gpt_neox.layers.11.mlp.dense_4h_to_h.weight": "model-00009-of-00028.safetensors",
"gpt_neox.layers.11.mlp.dense_h_to_4h.bias": "model-00009-of-00028.safetensors",
"gpt_neox.layers.11.mlp.dense_h_to_4h.weight": "model-00009-of-00028.safetensors",
"gpt_neox.layers.11.post_attention_layernorm.bias": "model-00008-of-00028.safetensors",
"gpt_neox.layers.11.post_attention_layernorm.weight": "model-00008-of-00028.safetensors",
"gpt_neox.layers.12.attention.bias": "model-00009-of-00028.safetensors",
"gpt_neox.layers.12.attention.dense.bias": "model-00009-of-00028.safetensors",
"gpt_neox.layers.12.attention.dense.weight": "model-00009-of-00028.safetensors",
"gpt_neox.layers.12.attention.masked_bias": "model-00009-of-00028.safetensors",
"gpt_neox.layers.12.attention.query_key_value.bias": "model-00009-of-00028.safetensors",
"gpt_neox.layers.12.attention.query_key_value.weight": "model-00009-of-00028.safetensors",
"gpt_neox.layers.12.attention.rotary_emb.inv_freq": "model-00009-of-00028.safetensors",
"gpt_neox.layers.12.input_layernorm.bias": "model-00009-of-00028.safetensors",
"gpt_neox.layers.12.input_layernorm.weight": "model-00009-of-00028.safetensors",
"gpt_neox.layers.12.mlp.dense_4h_to_h.bias": "model-00010-of-00028.safetensors",
"gpt_neox.layers.12.mlp.dense_4h_to_h.weight": "model-00010-of-00028.safetensors",
"gpt_neox.layers.12.mlp.dense_h_to_4h.bias": "model-00009-of-00028.safetensors",
"gpt_neox.layers.12.mlp.dense_h_to_4h.weight": "model-00009-of-00028.safetensors",
"gpt_neox.layers.12.post_attention_layernorm.bias": "model-00009-of-00028.safetensors",
"gpt_neox.layers.12.post_attention_layernorm.weight": "model-00009-of-00028.safetensors",
"gpt_neox.layers.13.attention.bias": "model-00010-of-00028.safetensors",
"gpt_neox.layers.13.attention.dense.bias": "model-00010-of-00028.safetensors",
"gpt_neox.layers.13.attention.dense.weight": "model-00010-of-00028.safetensors",
"gpt_neox.layers.13.attention.masked_bias": "model-00010-of-00028.safetensors",
"gpt_neox.layers.13.attention.query_key_value.bias": "model-00010-of-00028.safetensors",
"gpt_neox.layers.13.attention.query_key_value.weight": "model-00010-of-00028.safetensors",
"gpt_neox.layers.13.attention.rotary_emb.inv_freq": "model-00010-of-00028.safetensors",
"gpt_neox.layers.13.input_layernorm.bias": "model-00010-of-00028.safetensors",
"gpt_neox.layers.13.input_layernorm.weight": "model-00010-of-00028.safetensors",
"gpt_neox.layers.13.mlp.dense_4h_to_h.bias": "model-00010-of-00028.safetensors",
"gpt_neox.layers.13.mlp.dense_4h_to_h.weight": "model-00010-of-00028.safetensors",
"gpt_neox.layers.13.mlp.dense_h_to_4h.bias": "model-00010-of-00028.safetensors",
"gpt_neox.layers.13.mlp.dense_h_to_4h.weight": "model-00010-of-00028.safetensors",
"gpt_neox.layers.13.post_attention_layernorm.bias": "model-00010-of-00028.safetensors",
"gpt_neox.layers.13.post_attention_layernorm.weight": "model-00010-of-00028.safetensors",
"gpt_neox.layers.14.attention.bias": "model-00010-of-00028.safetensors",
"gpt_neox.layers.14.attention.dense.bias": "model-00011-of-00028.safetensors",
"gpt_neox.layers.14.attention.dense.weight": "model-00011-of-00028.safetensors",
"gpt_neox.layers.14.attention.masked_bias": "model-00010-of-00028.safetensors",
"gpt_neox.layers.14.attention.query_key_value.bias": "model-00010-of-00028.safetensors",
"gpt_neox.layers.14.attention.query_key_value.weight": "model-00010-of-00028.safetensors",
"gpt_neox.layers.14.attention.rotary_emb.inv_freq": "model-00010-of-00028.safetensors",
"gpt_neox.layers.14.input_layernorm.bias": "model-00010-of-00028.safetensors",
"gpt_neox.layers.14.input_layernorm.weight": "model-00010-of-00028.safetensors",
"gpt_neox.layers.14.mlp.dense_4h_to_h.bias": "model-00011-of-00028.safetensors",
"gpt_neox.layers.14.mlp.dense_4h_to_h.weight": "model-00011-of-00028.safetensors",
"gpt_neox.layers.14.mlp.dense_h_to_4h.bias": "model-00011-of-00028.safetensors",
"gpt_neox.layers.14.mlp.dense_h_to_4h.weight": "model-00011-of-00028.safetensors",
"gpt_neox.layers.14.post_attention_layernorm.bias": "model-00010-of-00028.safetensors",
"gpt_neox.layers.14.post_attention_layernorm.weight": "model-00010-of-00028.safetensors",
"gpt_neox.layers.15.attention.bias": "model-00011-of-00028.safetensors",
"gpt_neox.layers.15.attention.dense.bias": "model-00011-of-00028.safetensors",
"gpt_neox.layers.15.attention.dense.weight": "model-00011-of-00028.safetensors",
"gpt_neox.layers.15.attention.masked_bias": "model-00011-of-00028.safetensors",
"gpt_neox.layers.15.attention.query_key_value.bias": "model-00011-of-00028.safetensors",
"gpt_neox.layers.15.attention.query_key_value.weight": "model-00011-of-00028.safetensors",
"gpt_neox.layers.15.attention.rotary_emb.inv_freq": "model-00011-of-00028.safetensors",
"gpt_neox.layers.15.input_layernorm.bias": "model-00011-of-00028.safetensors",
"gpt_neox.layers.15.input_layernorm.weight": "model-00011-of-00028.safetensors",
"gpt_neox.layers.15.mlp.dense_4h_to_h.bias": "model-00012-of-00028.safetensors",
"gpt_neox.layers.15.mlp.dense_4h_to_h.weight": "model-00012-of-00028.safetensors",
"gpt_neox.layers.15.mlp.dense_h_to_4h.bias": "model-00011-of-00028.safetensors",
"gpt_neox.layers.15.mlp.dense_h_to_4h.weight": "model-00011-of-00028.safetensors",
"gpt_neox.layers.15.post_attention_layernorm.bias": "model-00011-of-00028.safetensors",
"gpt_neox.layers.15.post_attention_layernorm.weight": "model-00011-of-00028.safetensors",
"gpt_neox.layers.16.attention.bias": "model-00012-of-00028.safetensors",
"gpt_neox.layers.16.attention.dense.bias": "model-00012-of-00028.safetensors",
"gpt_neox.layers.16.attention.dense.weight": "model-00012-of-00028.safetensors",
"gpt_neox.layers.16.attention.masked_bias": "model-00012-of-00028.safetensors",
"gpt_neox.layers.16.attention.query_key_value.bias": "model-00012-of-00028.safetensors",
"gpt_neox.layers.16.attention.query_key_value.weight": "model-00012-of-00028.safetensors",
"gpt_neox.layers.16.attention.rotary_emb.inv_freq": "model-00012-of-00028.safetensors",
"gpt_neox.layers.16.input_layernorm.bias": "model-00012-of-00028.safetensors",
"gpt_neox.layers.16.input_layernorm.weight": "model-00012-of-00028.safetensors",
"gpt_neox.layers.16.mlp.dense_4h_to_h.bias": "model-00012-of-00028.safetensors",
"gpt_neox.layers.16.mlp.dense_4h_to_h.weight": "model-00012-of-00028.safetensors",
"gpt_neox.layers.16.mlp.dense_h_to_4h.bias": "model-00012-of-00028.safetensors",
"gpt_neox.layers.16.mlp.dense_h_to_4h.weight": "model-00012-of-00028.safetensors",
"gpt_neox.layers.16.post_attention_layernorm.bias": "model-00012-of-00028.safetensors",
"gpt_neox.layers.16.post_attention_layernorm.weight": "model-00012-of-00028.safetensors",
"gpt_neox.layers.17.attention.bias": "model-00012-of-00028.safetensors",
"gpt_neox.layers.17.attention.dense.bias": "model-00013-of-00028.safetensors",
"gpt_neox.layers.17.attention.dense.weight": "model-00013-of-00028.safetensors",
"gpt_neox.layers.17.attention.masked_bias": "model-00012-of-00028.safetensors",
"gpt_neox.layers.17.attention.query_key_value.bias": "model-00012-of-00028.safetensors",
"gpt_neox.layers.17.attention.query_key_value.weight": "model-00012-of-00028.safetensors",
"gpt_neox.layers.17.attention.rotary_emb.inv_freq": "model-00012-of-00028.safetensors",
"gpt_neox.layers.17.input_layernorm.bias": "model-00012-of-00028.safetensors",
"gpt_neox.layers.17.input_layernorm.weight": "model-00012-of-00028.safetensors",
"gpt_neox.layers.17.mlp.dense_4h_to_h.bias": "model-00013-of-00028.safetensors",
"gpt_neox.layers.17.mlp.dense_4h_to_h.weight": "model-00013-of-00028.safetensors",
"gpt_neox.layers.17.mlp.dense_h_to_4h.bias": "model-00013-of-00028.safetensors",
"gpt_neox.layers.17.mlp.dense_h_to_4h.weight": "model-00013-of-00028.safetensors",
"gpt_neox.layers.17.post_attention_layernorm.bias": "model-00012-of-00028.safetensors",
"gpt_neox.layers.17.post_attention_layernorm.weight": "model-00012-of-00028.safetensors",
"gpt_neox.layers.18.attention.bias": "model-00013-of-00028.safetensors",
"gpt_neox.layers.18.attention.dense.bias": "model-00013-of-00028.safetensors",
"gpt_neox.layers.18.attention.dense.weight": "model-00013-of-00028.safetensors",
"gpt_neox.layers.18.attention.masked_bias": "model-00013-of-00028.safetensors",
"gpt_neox.layers.18.attention.query_key_value.bias": "model-00013-of-00028.safetensors",
"gpt_neox.layers.18.attention.query_key_value.weight": "model-00013-of-00028.safetensors",
"gpt_neox.layers.18.attention.rotary_emb.inv_freq": "model-00013-of-00028.safetensors",
"gpt_neox.layers.18.input_layernorm.bias": "model-00013-of-00028.safetensors",
"gpt_neox.layers.18.input_layernorm.weight": "model-00013-of-00028.safetensors",
"gpt_neox.layers.18.mlp.dense_4h_to_h.bias": "model-00014-of-00028.safetensors",
"gpt_neox.layers.18.mlp.dense_4h_to_h.weight": "model-00014-of-00028.safetensors",
"gpt_neox.layers.18.mlp.dense_h_to_4h.bias": "model-00013-of-00028.safetensors",
"gpt_neox.layers.18.mlp.dense_h_to_4h.weight": "model-00013-of-00028.safetensors",
"gpt_neox.layers.18.post_attention_layernorm.bias": "model-00013-of-00028.safetensors",
"gpt_neox.layers.18.post_attention_layernorm.weight": "model-00013-of-00028.safetensors",
"gpt_neox.layers.19.attention.bias": "model-00014-of-00028.safetensors",
"gpt_neox.layers.19.attention.dense.bias": "model-00014-of-00028.safetensors",
"gpt_neox.layers.19.attention.dense.weight": "model-00014-of-00028.safetensors",
"gpt_neox.layers.19.attention.masked_bias": "model-00014-of-00028.safetensors",
"gpt_neox.layers.19.attention.query_key_value.bias": "model-00014-of-00028.safetensors",
"gpt_neox.layers.19.attention.query_key_value.weight": "model-00014-of-00028.safetensors",
"gpt_neox.layers.19.attention.rotary_emb.inv_freq": "model-00014-of-00028.safetensors",
"gpt_neox.layers.19.input_layernorm.bias": "model-00014-of-00028.safetensors",
"gpt_neox.layers.19.input_layernorm.weight": "model-00014-of-00028.safetensors",
"gpt_neox.layers.19.mlp.dense_4h_to_h.bias": "model-00014-of-00028.safetensors",
"gpt_neox.layers.19.mlp.dense_4h_to_h.weight": "model-00014-of-00028.safetensors",
"gpt_neox.layers.19.mlp.dense_h_to_4h.bias": "model-00014-of-00028.safetensors",
"gpt_neox.layers.19.mlp.dense_h_to_4h.weight": "model-00014-of-00028.safetensors",
"gpt_neox.layers.19.post_attention_layernorm.bias": "model-00014-of-00028.safetensors",
"gpt_neox.layers.19.post_attention_layernorm.weight": "model-00014-of-00028.safetensors",
"gpt_neox.layers.2.attention.bias": "model-00002-of-00028.safetensors",
"gpt_neox.layers.2.attention.dense.bias": "model-00002-of-00028.safetensors",
"gpt_neox.layers.2.attention.dense.weight": "model-00002-of-00028.safetensors",
"gpt_neox.layers.2.attention.masked_bias": "model-00002-of-00028.safetensors",
"gpt_neox.layers.2.attention.query_key_value.bias": "model-00002-of-00028.safetensors",
"gpt_neox.layers.2.attention.query_key_value.weight": "model-00002-of-00028.safetensors",
"gpt_neox.layers.2.attention.rotary_emb.inv_freq": "model-00002-of-00028.safetensors",
"gpt_neox.layers.2.input_layernorm.bias": "model-00002-of-00028.safetensors",
"gpt_neox.layers.2.input_layernorm.weight": "model-00002-of-00028.safetensors",
"gpt_neox.layers.2.mlp.dense_4h_to_h.bias": "model-00003-of-00028.safetensors",
"gpt_neox.layers.2.mlp.dense_4h_to_h.weight": "model-00003-of-00028.safetensors",
"gpt_neox.layers.2.mlp.dense_h_to_4h.bias": "model-00003-of-00028.safetensors",
"gpt_neox.layers.2.mlp.dense_h_to_4h.weight": "model-00003-of-00028.safetensors",
"gpt_neox.layers.2.post_attention_layernorm.bias": "model-00002-of-00028.safetensors",
"gpt_neox.layers.2.post_attention_layernorm.weight": "model-00002-of-00028.safetensors",
"gpt_neox.layers.20.attention.bias": "model-00014-of-00028.safetensors",
"gpt_neox.layers.20.attention.dense.bias": "model-00015-of-00028.safetensors",
"gpt_neox.layers.20.attention.dense.weight": "model-00015-of-00028.safetensors",
"gpt_neox.layers.20.attention.masked_bias": "model-00014-of-00028.safetensors",
"gpt_neox.layers.20.attention.query_key_value.bias": "model-00014-of-00028.safetensors",
"gpt_neox.layers.20.attention.query_key_value.weight": "model-00014-of-00028.safetensors",
"gpt_neox.layers.20.attention.rotary_emb.inv_freq": "model-00014-of-00028.safetensors",
"gpt_neox.layers.20.input_layernorm.bias": "model-00014-of-00028.safetensors",
"gpt_neox.layers.20.input_layernorm.weight": "model-00014-of-00028.safetensors",
"gpt_neox.layers.20.mlp.dense_4h_to_h.bias": "model-00015-of-00028.safetensors",
"gpt_neox.layers.20.mlp.dense_4h_to_h.weight": "model-00015-of-00028.safetensors",
"gpt_neox.layers.20.mlp.dense_h_to_4h.bias": "model-00015-of-00028.safetensors",
"gpt_neox.layers.20.mlp.dense_h_to_4h.weight": "model-00015-of-00028.safetensors",
"gpt_neox.layers.20.post_attention_layernorm.bias": "model-00014-of-00028.safetensors",
"gpt_neox.layers.20.post_attention_layernorm.weight": "model-00014-of-00028.safetensors",
"gpt_neox.layers.21.attention.bias": "model-00015-of-00028.safetensors",
"gpt_neox.layers.21.attention.dense.bias": "model-00015-of-00028.safetensors",
"gpt_neox.layers.21.attention.dense.weight": "model-00015-of-00028.safetensors",
"gpt_neox.layers.21.attention.masked_bias": "model-00015-of-00028.safetensors",
"gpt_neox.layers.21.attention.query_key_value.bias": "model-00015-of-00028.safetensors",
"gpt_neox.layers.21.attention.query_key_value.weight": "model-00015-of-00028.safetensors",
"gpt_neox.layers.21.attention.rotary_emb.inv_freq": "model-00015-of-00028.safetensors",
"gpt_neox.layers.21.input_layernorm.bias": "model-00015-of-00028.safetensors",
"gpt_neox.layers.21.input_layernorm.weight": "model-00015-of-00028.safetensors",
"gpt_neox.layers.21.mlp.dense_4h_to_h.bias": "model-00016-of-00028.safetensors",
"gpt_neox.layers.21.mlp.dense_4h_to_h.weight": "model-00016-of-00028.safetensors",
"gpt_neox.layers.21.mlp.dense_h_to_4h.bias": "model-00015-of-00028.safetensors",
"gpt_neox.layers.21.mlp.dense_h_to_4h.weight": "model-00015-of-00028.safetensors",
"gpt_neox.layers.21.post_attention_layernorm.bias": "model-00015-of-00028.safetensors",
"gpt_neox.layers.21.post_attention_layernorm.weight": "model-00015-of-00028.safetensors",
"gpt_neox.layers.22.attention.bias": "model-00016-of-00028.safetensors",
"gpt_neox.layers.22.attention.dense.bias": "model-00016-of-00028.safetensors",
"gpt_neox.layers.22.attention.dense.weight": "model-00016-of-00028.safetensors",
"gpt_neox.layers.22.attention.masked_bias": "model-00016-of-00028.safetensors",
"gpt_neox.layers.22.attention.query_key_value.bias": "model-00016-of-00028.safetensors",
"gpt_neox.layers.22.attention.query_key_value.weight": "model-00016-of-00028.safetensors",
"gpt_neox.layers.22.attention.rotary_emb.inv_freq": "model-00016-of-00028.safetensors",
"gpt_neox.layers.22.input_layernorm.bias": "model-00016-of-00028.safetensors",
"gpt_neox.layers.22.input_layernorm.weight": "model-00016-of-00028.safetensors",
"gpt_neox.layers.22.mlp.dense_4h_to_h.bias": "model-00016-of-00028.safetensors",
"gpt_neox.layers.22.mlp.dense_4h_to_h.weight": "model-00016-of-00028.safetensors",
"gpt_neox.layers.22.mlp.dense_h_to_4h.bias": "model-00016-of-00028.safetensors",
"gpt_neox.layers.22.mlp.dense_h_to_4h.weight": "model-00016-of-00028.safetensors",
"gpt_neox.layers.22.post_attention_layernorm.bias": "model-00016-of-00028.safetensors",
"gpt_neox.layers.22.post_attention_layernorm.weight": "model-00016-of-00028.safetensors",
"gpt_neox.layers.23.attention.bias": "model-00016-of-00028.safetensors",
"gpt_neox.layers.23.attention.dense.bias": "model-00017-of-00028.safetensors",
"gpt_neox.layers.23.attention.dense.weight": "model-00017-of-00028.safetensors",
"gpt_neox.layers.23.attention.masked_bias": "model-00016-of-00028.safetensors",
"gpt_neox.layers.23.attention.query_key_value.bias": "model-00016-of-00028.safetensors",
"gpt_neox.layers.23.attention.query_key_value.weight": "model-00016-of-00028.safetensors",
"gpt_neox.layers.23.attention.rotary_emb.inv_freq": "model-00016-of-00028.safetensors",
"gpt_neox.layers.23.input_layernorm.bias": "model-00016-of-00028.safetensors",
"gpt_neox.layers.23.input_layernorm.weight": "model-00016-of-00028.safetensors",
"gpt_neox.layers.23.mlp.dense_4h_to_h.bias": "model-00017-of-00028.safetensors",
"gpt_neox.layers.23.mlp.dense_4h_to_h.weight": "model-00017-of-00028.safetensors",
"gpt_neox.layers.23.mlp.dense_h_to_4h.bias": "model-00017-of-00028.safetensors",
"gpt_neox.layers.23.mlp.dense_h_to_4h.weight": "model-00017-of-00028.safetensors",
"gpt_neox.layers.23.post_attention_layernorm.bias": "model-00016-of-00028.safetensors",
"gpt_neox.layers.23.post_attention_layernorm.weight": "model-00016-of-00028.safetensors",
"gpt_neox.layers.24.attention.bias": "model-00017-of-00028.safetensors",
"gpt_neox.layers.24.attention.dense.bias": "model-00017-of-00028.safetensors",
"gpt_neox.layers.24.attention.dense.weight": "model-00017-of-00028.safetensors",
"gpt_neox.layers.24.attention.masked_bias": "model-00017-of-00028.safetensors",
"gpt_neox.layers.24.attention.query_key_value.bias": "model-00017-of-00028.safetensors",
"gpt_neox.layers.24.attention.query_key_value.weight": "model-00017-of-00028.safetensors",
"gpt_neox.layers.24.attention.rotary_emb.inv_freq": "model-00017-of-00028.safetensors",
"gpt_neox.layers.24.input_layernorm.bias": "model-00017-of-00028.safetensors",
"gpt_neox.layers.24.input_layernorm.weight": "model-00017-of-00028.safetensors",
"gpt_neox.layers.24.mlp.dense_4h_to_h.bias": "model-00018-of-00028.safetensors",
"gpt_neox.layers.24.mlp.dense_4h_to_h.weight": "model-00018-of-00028.safetensors",
"gpt_neox.layers.24.mlp.dense_h_to_4h.bias": "model-00017-of-00028.safetensors",
"gpt_neox.layers.24.mlp.dense_h_to_4h.weight": "model-00017-of-00028.safetensors",
"gpt_neox.layers.24.post_attention_layernorm.bias": "model-00017-of-00028.safetensors",
"gpt_neox.layers.24.post_attention_layernorm.weight": "model-00017-of-00028.safetensors",
"gpt_neox.layers.25.attention.bias": "model-00018-of-00028.safetensors",
"gpt_neox.layers.25.attention.dense.bias": "model-00018-of-00028.safetensors",
"gpt_neox.layers.25.attention.dense.weight": "model-00018-of-00028.safetensors",
"gpt_neox.layers.25.attention.masked_bias": "model-00018-of-00028.safetensors",
"gpt_neox.layers.25.attention.query_key_value.bias": "model-00018-of-00028.safetensors",
"gpt_neox.layers.25.attention.query_key_value.weight": "model-00018-of-00028.safetensors",
"gpt_neox.layers.25.attention.rotary_emb.inv_freq": "model-00018-of-00028.safetensors",
"gpt_neox.layers.25.input_layernorm.bias": "model-00018-of-00028.safetensors",
"gpt_neox.layers.25.input_layernorm.weight": "model-00018-of-00028.safetensors",
"gpt_neox.layers.25.mlp.dense_4h_to_h.bias": "model-00018-of-00028.safetensors",
"gpt_neox.layers.25.mlp.dense_4h_to_h.weight": "model-00018-of-00028.safetensors",
"gpt_neox.layers.25.mlp.dense_h_to_4h.bias": "model-00018-of-00028.safetensors",
"gpt_neox.layers.25.mlp.dense_h_to_4h.weight": "model-00018-of-00028.safetensors",
"gpt_neox.layers.25.post_attention_layernorm.bias": "model-00018-of-00028.safetensors",
"gpt_neox.layers.25.post_attention_layernorm.weight": "model-00018-of-00028.safetensors",
"gpt_neox.layers.26.attention.bias": "model-00018-of-00028.safetensors",
"gpt_neox.layers.26.attention.dense.bias": "model-00019-of-00028.safetensors",
"gpt_neox.layers.26.attention.dense.weight": "model-00019-of-00028.safetensors",
"gpt_neox.layers.26.attention.masked_bias": "model-00018-of-00028.safetensors",
"gpt_neox.layers.26.attention.query_key_value.bias": "model-00018-of-00028.safetensors",
"gpt_neox.layers.26.attention.query_key_value.weight": "model-00018-of-00028.safetensors",
"gpt_neox.layers.26.attention.rotary_emb.inv_freq": "model-00018-of-00028.safetensors",
"gpt_neox.layers.26.input_layernorm.bias": "model-00018-of-00028.safetensors",
"gpt_neox.layers.26.input_layernorm.weight": "model-00018-of-00028.safetensors",
"gpt_neox.layers.26.mlp.dense_4h_to_h.bias": "model-00019-of-00028.safetensors",
"gpt_neox.layers.26.mlp.dense_4h_to_h.weight": "model-00019-of-00028.safetensors",
"gpt_neox.layers.26.mlp.dense_h_to_4h.bias": "model-00019-of-00028.safetensors",
"gpt_neox.layers.26.mlp.dense_h_to_4h.weight": "model-00019-of-00028.safetensors",
"gpt_neox.layers.26.post_attention_layernorm.bias": "model-00018-of-00028.safetensors",
"gpt_neox.layers.26.post_attention_layernorm.weight": "model-00018-of-00028.safetensors",
"gpt_neox.layers.27.attention.bias": "model-00019-of-00028.safetensors",
"gpt_neox.layers.27.attention.dense.bias": "model-00019-of-00028.safetensors",
"gpt_neox.layers.27.attention.dense.weight": "model-00019-of-00028.safetensors",
"gpt_neox.layers.27.attention.masked_bias": "model-00019-of-00028.safetensors",
"gpt_neox.layers.27.attention.query_key_value.bias": "model-00019-of-00028.safetensors",
"gpt_neox.layers.27.attention.query_key_value.weight": "model-00019-of-00028.safetensors",
"gpt_neox.layers.27.attention.rotary_emb.inv_freq": "model-00019-of-00028.safetensors",
"gpt_neox.layers.27.input_layernorm.bias": "model-00019-of-00028.safetensors",
"gpt_neox.layers.27.input_layernorm.weight": "model-00019-of-00028.safetensors",
"gpt_neox.layers.27.mlp.dense_4h_to_h.bias": "model-00020-of-00028.safetensors",
"gpt_neox.layers.27.mlp.dense_4h_to_h.weight": "model-00020-of-00028.safetensors",
"gpt_neox.layers.27.mlp.dense_h_to_4h.bias": "model-00019-of-00028.safetensors",
"gpt_neox.layers.27.mlp.dense_h_to_4h.weight": "model-00019-of-00028.safetensors",
"gpt_neox.layers.27.post_attention_layernorm.bias": "model-00019-of-00028.safetensors",
"gpt_neox.layers.27.post_attention_layernorm.weight": "model-00019-of-00028.safetensors",
"gpt_neox.layers.28.attention.bias": "model-00020-of-00028.safetensors",
"gpt_neox.layers.28.attention.dense.bias": "model-00020-of-00028.safetensors",
"gpt_neox.layers.28.attention.dense.weight": "model-00020-of-00028.safetensors",
"gpt_neox.layers.28.attention.masked_bias": "model-00020-of-00028.safetensors",
"gpt_neox.layers.28.attention.query_key_value.bias": "model-00020-of-00028.safetensors",
"gpt_neox.layers.28.attention.query_key_value.weight": "model-00020-of-00028.safetensors",
"gpt_neox.layers.28.attention.rotary_emb.inv_freq": "model-00020-of-00028.safetensors",
"gpt_neox.layers.28.input_layernorm.bias": "model-00020-of-00028.safetensors",
"gpt_neox.layers.28.input_layernorm.weight": "model-00020-of-00028.safetensors",
"gpt_neox.layers.28.mlp.dense_4h_to_h.bias": "model-00020-of-00028.safetensors",
"gpt_neox.layers.28.mlp.dense_4h_to_h.weight": "model-00020-of-00028.safetensors",
"gpt_neox.layers.28.mlp.dense_h_to_4h.bias": "model-00020-of-00028.safetensors",
"gpt_neox.layers.28.mlp.dense_h_to_4h.weight": "model-00020-of-00028.safetensors",
"gpt_neox.layers.28.post_attention_layernorm.bias": "model-00020-of-00028.safetensors",
"gpt_neox.layers.28.post_attention_layernorm.weight": "model-00020-of-00028.safetensors",
"gpt_neox.layers.29.attention.bias": "model-00020-of-00028.safetensors",
"gpt_neox.layers.29.attention.dense.bias": "model-00021-of-00028.safetensors",
"gpt_neox.layers.29.attention.dense.weight": "model-00021-of-00028.safetensors",
"gpt_neox.layers.29.attention.masked_bias": "model-00020-of-00028.safetensors",
"gpt_neox.layers.29.attention.query_key_value.bias": "model-00020-of-00028.safetensors",
"gpt_neox.layers.29.attention.query_key_value.weight": "model-00020-of-00028.safetensors",
"gpt_neox.layers.29.attention.rotary_emb.inv_freq": "model-00020-of-00028.safetensors",
"gpt_neox.layers.29.input_layernorm.bias": "model-00020-of-00028.safetensors",
"gpt_neox.layers.29.input_layernorm.weight": "model-00020-of-00028.safetensors",
"gpt_neox.layers.29.mlp.dense_4h_to_h.bias": "model-00021-of-00028.safetensors",
"gpt_neox.layers.29.mlp.dense_4h_to_h.weight": "model-00021-of-00028.safetensors",
"gpt_neox.layers.29.mlp.dense_h_to_4h.bias": "model-00021-of-00028.safetensors",
"gpt_neox.layers.29.mlp.dense_h_to_4h.weight": "model-00021-of-00028.safetensors",
"gpt_neox.layers.29.post_attention_layernorm.bias": "model-00020-of-00028.safetensors",
"gpt_neox.layers.29.post_attention_layernorm.weight": "model-00020-of-00028.safetensors",
"gpt_neox.layers.3.attention.bias": "model-00003-of-00028.safetensors",
"gpt_neox.layers.3.attention.dense.bias": "model-00003-of-00028.safetensors",
"gpt_neox.layers.3.attention.dense.weight": "model-00003-of-00028.safetensors",
"gpt_neox.layers.3.attention.masked_bias": "model-00003-of-00028.safetensors",
"gpt_neox.layers.3.attention.query_key_value.bias": "model-00003-of-00028.safetensors",
"gpt_neox.layers.3.attention.query_key_value.weight": "model-00003-of-00028.safetensors",
"gpt_neox.layers.3.attention.rotary_emb.inv_freq": "model-00003-of-00028.safetensors",
"gpt_neox.layers.3.input_layernorm.bias": "model-00003-of-00028.safetensors",
"gpt_neox.layers.3.input_layernorm.weight": "model-00003-of-00028.safetensors",
"gpt_neox.layers.3.mlp.dense_4h_to_h.bias": "model-00004-of-00028.safetensors",
"gpt_neox.layers.3.mlp.dense_4h_to_h.weight": "model-00004-of-00028.safetensors",
"gpt_neox.layers.3.mlp.dense_h_to_4h.bias": "model-00003-of-00028.safetensors",
"gpt_neox.layers.3.mlp.dense_h_to_4h.weight": "model-00003-of-00028.safetensors",
"gpt_neox.layers.3.post_attention_layernorm.bias": "model-00003-of-00028.safetensors",
"gpt_neox.layers.3.post_attention_layernorm.weight": "model-00003-of-00028.safetensors",
"gpt_neox.layers.30.attention.bias": "model-00021-of-00028.safetensors",
"gpt_neox.layers.30.attention.dense.bias": "model-00021-of-00028.safetensors",
"gpt_neox.layers.30.attention.dense.weight": "model-00021-of-00028.safetensors",
"gpt_neox.layers.30.attention.masked_bias": "model-00021-of-00028.safetensors",
"gpt_neox.layers.30.attention.query_key_value.bias": "model-00021-of-00028.safetensors",
"gpt_neox.layers.30.attention.query_key_value.weight": "model-00021-of-00028.safetensors",
"gpt_neox.layers.30.attention.rotary_emb.inv_freq": "model-00021-of-00028.safetensors",
"gpt_neox.layers.30.input_layernorm.bias": "model-00021-of-00028.safetensors",
"gpt_neox.layers.30.input_layernorm.weight": "model-00021-of-00028.safetensors",
"gpt_neox.layers.30.mlp.dense_4h_to_h.bias": "model-00022-of-00028.safetensors",
"gpt_neox.layers.30.mlp.dense_4h_to_h.weight": "model-00022-of-00028.safetensors",
"gpt_neox.layers.30.mlp.dense_h_to_4h.bias": "model-00021-of-00028.safetensors",
"gpt_neox.layers.30.mlp.dense_h_to_4h.weight": "model-00021-of-00028.safetensors",
"gpt_neox.layers.30.post_attention_layernorm.bias": "model-00021-of-00028.safetensors",
"gpt_neox.layers.30.post_attention_layernorm.weight": "model-00021-of-00028.safetensors",
"gpt_neox.layers.31.attention.bias": "model-00022-of-00028.safetensors",
"gpt_neox.layers.31.attention.dense.bias": "model-00022-of-00028.safetensors",
"gpt_neox.layers.31.attention.dense.weight": "model-00022-of-00028.safetensors",
"gpt_neox.layers.31.attention.masked_bias": "model-00022-of-00028.safetensors",
"gpt_neox.layers.31.attention.query_key_value.bias": "model-00022-of-00028.safetensors",
"gpt_neox.layers.31.attention.query_key_value.weight": "model-00022-of-00028.safetensors",
"gpt_neox.layers.31.attention.rotary_emb.inv_freq": "model-00022-of-00028.safetensors",
"gpt_neox.layers.31.input_layernorm.bias": "model-00022-of-00028.safetensors",
"gpt_neox.layers.31.input_layernorm.weight": "model-00022-of-00028.safetensors",
"gpt_neox.layers.31.mlp.dense_4h_to_h.bias": "model-00022-of-00028.safetensors",
"gpt_neox.layers.31.mlp.dense_4h_to_h.weight": "model-00022-of-00028.safetensors",
"gpt_neox.layers.31.mlp.dense_h_to_4h.bias": "model-00022-of-00028.safetensors",
"gpt_neox.layers.31.mlp.dense_h_to_4h.weight": "model-00022-of-00028.safetensors",
"gpt_neox.layers.31.post_attention_layernorm.bias": "model-00022-of-00028.safetensors",
"gpt_neox.layers.31.post_attention_layernorm.weight": "model-00022-of-00028.safetensors",
"gpt_neox.layers.32.attention.bias": "model-00022-of-00028.safetensors",
"gpt_neox.layers.32.attention.dense.bias": "model-00023-of-00028.safetensors",
"gpt_neox.layers.32.attention.dense.weight": "model-00023-of-00028.safetensors",
"gpt_neox.layers.32.attention.masked_bias": "model-00022-of-00028.safetensors",
"gpt_neox.layers.32.attention.query_key_value.bias": "model-00022-of-00028.safetensors",
"gpt_neox.layers.32.attention.query_key_value.weight": "model-00022-of-00028.safetensors",
"gpt_neox.layers.32.attention.rotary_emb.inv_freq": "model-00022-of-00028.safetensors",
"gpt_neox.layers.32.input_layernorm.bias": "model-00022-of-00028.safetensors",
"gpt_neox.layers.32.input_layernorm.weight": "model-00022-of-00028.safetensors",
"gpt_neox.layers.32.mlp.dense_4h_to_h.bias": "model-00023-of-00028.safetensors",
"gpt_neox.layers.32.mlp.dense_4h_to_h.weight": "model-00023-of-00028.safetensors",
"gpt_neox.layers.32.mlp.dense_h_to_4h.bias": "model-00023-of-00028.safetensors",
"gpt_neox.layers.32.mlp.dense_h_to_4h.weight": "model-00023-of-00028.safetensors",
"gpt_neox.layers.32.post_attention_layernorm.bias": "model-00022-of-00028.safetensors",
"gpt_neox.layers.32.post_attention_layernorm.weight": "model-00022-of-00028.safetensors",
"gpt_neox.layers.33.attention.bias": "model-00023-of-00028.safetensors",
"gpt_neox.layers.33.attention.dense.bias": "model-00023-of-00028.safetensors",
"gpt_neox.layers.33.attention.dense.weight": "model-00023-of-00028.safetensors",
"gpt_neox.layers.33.attention.masked_bias": "model-00023-of-00028.safetensors",
"gpt_neox.layers.33.attention.query_key_value.bias": "model-00023-of-00028.safetensors",
"gpt_neox.layers.33.attention.query_key_value.weight": "model-00023-of-00028.safetensors",
"gpt_neox.layers.33.attention.rotary_emb.inv_freq": "model-00023-of-00028.safetensors",
"gpt_neox.layers.33.input_layernorm.bias": "model-00023-of-00028.safetensors",
"gpt_neox.layers.33.input_layernorm.weight": "model-00023-of-00028.safetensors",
"gpt_neox.layers.33.mlp.dense_4h_to_h.bias": "model-00024-of-00028.safetensors",
"gpt_neox.layers.33.mlp.dense_4h_to_h.weight": "model-00024-of-00028.safetensors",
"gpt_neox.layers.33.mlp.dense_h_to_4h.bias": "model-00023-of-00028.safetensors",
"gpt_neox.layers.33.mlp.dense_h_to_4h.weight": "model-00023-of-00028.safetensors",
"gpt_neox.layers.33.post_attention_layernorm.bias": "model-00023-of-00028.safetensors",
"gpt_neox.layers.33.post_attention_layernorm.weight": "model-00023-of-00028.safetensors",
"gpt_neox.layers.34.attention.bias": "model-00024-of-00028.safetensors",
"gpt_neox.layers.34.attention.dense.bias": "model-00024-of-00028.safetensors",
"gpt_neox.layers.34.attention.dense.weight": "model-00024-of-00028.safetensors",
"gpt_neox.layers.34.attention.masked_bias": "model-00024-of-00028.safetensors",
"gpt_neox.layers.34.attention.query_key_value.bias": "model-00024-of-00028.safetensors",
"gpt_neox.layers.34.attention.query_key_value.weight": "model-00024-of-00028.safetensors",
"gpt_neox.layers.34.attention.rotary_emb.inv_freq": "model-00024-of-00028.safetensors",
"gpt_neox.layers.34.input_layernorm.bias": "model-00024-of-00028.safetensors",
"gpt_neox.layers.34.input_layernorm.weight": "model-00024-of-00028.safetensors",
"gpt_neox.layers.34.mlp.dense_4h_to_h.bias": "model-00024-of-00028.safetensors",
"gpt_neox.layers.34.mlp.dense_4h_to_h.weight": "model-00024-of-00028.safetensors",
"gpt_neox.layers.34.mlp.dense_h_to_4h.bias": "model-00024-of-00028.safetensors",
"gpt_neox.layers.34.mlp.dense_h_to_4h.weight": "model-00024-of-00028.safetensors",
"gpt_neox.layers.34.post_attention_layernorm.bias": "model-00024-of-00028.safetensors",
"gpt_neox.layers.34.post_attention_layernorm.weight": "model-00024-of-00028.safetensors",
"gpt_neox.layers.35.attention.bias": "model-00024-of-00028.safetensors",
"gpt_neox.layers.35.attention.dense.bias": "model-00025-of-00028.safetensors",
"gpt_neox.layers.35.attention.dense.weight": "model-00025-of-00028.safetensors",
"gpt_neox.layers.35.attention.masked_bias": "model-00024-of-00028.safetensors",
"gpt_neox.layers.35.attention.query_key_value.bias": "model-00024-of-00028.safetensors",
"gpt_neox.layers.35.attention.query_key_value.weight": "model-00024-of-00028.safetensors",
"gpt_neox.layers.35.attention.rotary_emb.inv_freq": "model-00024-of-00028.safetensors",
"gpt_neox.layers.35.input_layernorm.bias": "model-00024-of-00028.safetensors",
"gpt_neox.layers.35.input_layernorm.weight": "model-00024-of-00028.safetensors",
"gpt_neox.layers.35.mlp.dense_4h_to_h.bias": "model-00025-of-00028.safetensors",
"gpt_neox.layers.35.mlp.dense_4h_to_h.weight": "model-00025-of-00028.safetensors",
"gpt_neox.layers.35.mlp.dense_h_to_4h.bias": "model-00025-of-00028.safetensors",
"gpt_neox.layers.35.mlp.dense_h_to_4h.weight": "model-00025-of-00028.safetensors",
"gpt_neox.layers.35.post_attention_layernorm.bias": "model-00024-of-00028.safetensors",
"gpt_neox.layers.35.post_attention_layernorm.weight": "model-00024-of-00028.safetensors",
"gpt_neox.layers.36.attention.bias": "model-00025-of-00028.safetensors",
"gpt_neox.layers.36.attention.dense.bias": "model-00025-of-00028.safetensors",
"gpt_neox.layers.36.attention.dense.weight": "model-00025-of-00028.safetensors",
"gpt_neox.layers.36.attention.masked_bias": "model-00025-of-00028.safetensors",
"gpt_neox.layers.36.attention.query_key_value.bias": "model-00025-of-00028.safetensors",
"gpt_neox.layers.36.attention.query_key_value.weight": "model-00025-of-00028.safetensors",
"gpt_neox.layers.36.attention.rotary_emb.inv_freq": "model-00025-of-00028.safetensors",
"gpt_neox.layers.36.input_layernorm.bias": "model-00025-of-00028.safetensors",
"gpt_neox.layers.36.input_layernorm.weight": "model-00025-of-00028.safetensors",
"gpt_neox.layers.36.mlp.dense_4h_to_h.bias": "model-00026-of-00028.safetensors",
"gpt_neox.layers.36.mlp.dense_4h_to_h.weight": "model-00026-of-00028.safetensors",
"gpt_neox.layers.36.mlp.dense_h_to_4h.bias": "model-00025-of-00028.safetensors",
"gpt_neox.layers.36.mlp.dense_h_to_4h.weight": "model-00025-of-00028.safetensors",
"gpt_neox.layers.36.post_attention_layernorm.bias": "model-00025-of-00028.safetensors",
"gpt_neox.layers.36.post_attention_layernorm.weight": "model-00025-of-00028.safetensors",
"gpt_neox.layers.37.attention.bias": "model-00026-of-00028.safetensors",
"gpt_neox.layers.37.attention.dense.bias": "model-00026-of-00028.safetensors",
"gpt_neox.layers.37.attention.dense.weight": "model-00026-of-00028.safetensors",
"gpt_neox.layers.37.attention.masked_bias": "model-00026-of-00028.safetensors",
"gpt_neox.layers.37.attention.query_key_value.bias": "model-00026-of-00028.safetensors",
"gpt_neox.layers.37.attention.query_key_value.weight": "model-00026-of-00028.safetensors",
"gpt_neox.layers.37.attention.rotary_emb.inv_freq": "model-00026-of-00028.safetensors",
"gpt_neox.layers.37.input_layernorm.bias": "model-00026-of-00028.safetensors",
"gpt_neox.layers.37.input_layernorm.weight": "model-00026-of-00028.safetensors",
"gpt_neox.layers.37.mlp.dense_4h_to_h.bias": "model-00026-of-00028.safetensors",
"gpt_neox.layers.37.mlp.dense_4h_to_h.weight": "model-00026-of-00028.safetensors",
"gpt_neox.layers.37.mlp.dense_h_to_4h.bias": "model-00026-of-00028.safetensors",
"gpt_neox.layers.37.mlp.dense_h_to_4h.weight": "model-00026-of-00028.safetensors",
"gpt_neox.layers.37.post_attention_layernorm.bias": "model-00026-of-00028.safetensors",
"gpt_neox.layers.37.post_attention_layernorm.weight": "model-00026-of-00028.safetensors",
"gpt_neox.layers.38.attention.bias": "model-00026-of-00028.safetensors",
"gpt_neox.layers.38.attention.dense.bias": "model-00027-of-00028.safetensors",
"gpt_neox.layers.38.attention.dense.weight": "model-00027-of-00028.safetensors",
"gpt_neox.layers.38.attention.masked_bias": "model-00026-of-00028.safetensors",
"gpt_neox.layers.38.attention.query_key_value.bias": "model-00026-of-00028.safetensors",
"gpt_neox.layers.38.attention.query_key_value.weight": "model-00026-of-00028.safetensors",
"gpt_neox.layers.38.attention.rotary_emb.inv_freq": "model-00026-of-00028.safetensors",
"gpt_neox.layers.38.input_layernorm.bias": "model-00026-of-00028.safetensors",
"gpt_neox.layers.38.input_layernorm.weight": "model-00026-of-00028.safetensors",
"gpt_neox.layers.38.mlp.dense_4h_to_h.bias": "model-00027-of-00028.safetensors",
"gpt_neox.layers.38.mlp.dense_4h_to_h.weight": "model-00027-of-00028.safetensors",
"gpt_neox.layers.38.mlp.dense_h_to_4h.bias": "model-00027-of-00028.safetensors",
"gpt_neox.layers.38.mlp.dense_h_to_4h.weight": "model-00027-of-00028.safetensors",
"gpt_neox.layers.38.post_attention_layernorm.bias": "model-00026-of-00028.safetensors",
"gpt_neox.layers.38.post_attention_layernorm.weight": "model-00026-of-00028.safetensors",
"gpt_neox.layers.39.attention.bias": "model-00027-of-00028.safetensors",
"gpt_neox.layers.39.attention.dense.bias": "model-00027-of-00028.safetensors",
"gpt_neox.layers.39.attention.dense.weight": "model-00027-of-00028.safetensors",
"gpt_neox.layers.39.attention.masked_bias": "model-00027-of-00028.safetensors",
"gpt_neox.layers.39.attention.query_key_value.bias": "model-00027-of-00028.safetensors",
"gpt_neox.layers.39.attention.query_key_value.weight": "model-00027-of-00028.safetensors",
"gpt_neox.layers.39.attention.rotary_emb.inv_freq": "model-00027-of-00028.safetensors",
"gpt_neox.layers.39.input_layernorm.bias": "model-00027-of-00028.safetensors",
"gpt_neox.layers.39.input_layernorm.weight": "model-00027-of-00028.safetensors",
"gpt_neox.layers.39.mlp.dense_4h_to_h.bias": "model-00028-of-00028.safetensors",
"gpt_neox.layers.39.mlp.dense_4h_to_h.weight": "model-00028-of-00028.safetensors",
"gpt_neox.layers.39.mlp.dense_h_to_4h.bias": "model-00027-of-00028.safetensors",
"gpt_neox.layers.39.mlp.dense_h_to_4h.weight": "model-00027-of-00028.safetensors",
"gpt_neox.layers.39.post_attention_layernorm.bias": "model-00027-of-00028.safetensors",
"gpt_neox.layers.39.post_attention_layernorm.weight": "model-00027-of-00028.safetensors",
"gpt_neox.layers.4.attention.bias": "model-00004-of-00028.safetensors",
"gpt_neox.layers.4.attention.dense.bias": "model-00004-of-00028.safetensors",
"gpt_neox.layers.4.attention.dense.weight": "model-00004-of-00028.safetensors",
"gpt_neox.layers.4.attention.masked_bias": "model-00004-of-00028.safetensors",
"gpt_neox.layers.4.attention.query_key_value.bias": "model-00004-of-00028.safetensors",
"gpt_neox.layers.4.attention.query_key_value.weight": "model-00004-of-00028.safetensors",
"gpt_neox.layers.4.attention.rotary_emb.inv_freq": "model-00004-of-00028.safetensors",
"gpt_neox.layers.4.input_layernorm.bias": "model-00004-of-00028.safetensors",
"gpt_neox.layers.4.input_layernorm.weight": "model-00004-of-00028.safetensors",
"gpt_neox.layers.4.mlp.dense_4h_to_h.bias": "model-00004-of-00028.safetensors",
"gpt_neox.layers.4.mlp.dense_4h_to_h.weight": "model-00004-of-00028.safetensors",
"gpt_neox.layers.4.mlp.dense_h_to_4h.bias": "model-00004-of-00028.safetensors",
"gpt_neox.layers.4.mlp.dense_h_to_4h.weight": "model-00004-of-00028.safetensors",
"gpt_neox.layers.4.post_attention_layernorm.bias": "model-00004-of-00028.safetensors",
"gpt_neox.layers.4.post_attention_layernorm.weight": "model-00004-of-00028.safetensors",
"gpt_neox.layers.5.attention.bias": "model-00004-of-00028.safetensors",
"gpt_neox.layers.5.attention.dense.bias": "model-00005-of-00028.safetensors",
"gpt_neox.layers.5.attention.dense.weight": "model-00005-of-00028.safetensors",
"gpt_neox.layers.5.attention.masked_bias": "model-00004-of-00028.safetensors",
"gpt_neox.layers.5.attention.query_key_value.bias": "model-00004-of-00028.safetensors",
"gpt_neox.layers.5.attention.query_key_value.weight": "model-00004-of-00028.safetensors",
"gpt_neox.layers.5.attention.rotary_emb.inv_freq": "model-00004-of-00028.safetensors",
"gpt_neox.layers.5.input_layernorm.bias": "model-00004-of-00028.safetensors",
"gpt_neox.layers.5.input_layernorm.weight": "model-00004-of-00028.safetensors",
"gpt_neox.layers.5.mlp.dense_4h_to_h.bias": "model-00005-of-00028.safetensors",
"gpt_neox.layers.5.mlp.dense_4h_to_h.weight": "model-00005-of-00028.safetensors",
"gpt_neox.layers.5.mlp.dense_h_to_4h.bias": "model-00005-of-00028.safetensors",
"gpt_neox.layers.5.mlp.dense_h_to_4h.weight": "model-00005-of-00028.safetensors",
"gpt_neox.layers.5.post_attention_layernorm.bias": "model-00004-of-00028.safetensors",
"gpt_neox.layers.5.post_attention_layernorm.weight": "model-00004-of-00028.safetensors",
"gpt_neox.layers.6.attention.bias": "model-00005-of-00028.safetensors",
"gpt_neox.layers.6.attention.dense.bias": "model-00005-of-00028.safetensors",
"gpt_neox.layers.6.attention.dense.weight": "model-00005-of-00028.safetensors",
"gpt_neox.layers.6.attention.masked_bias": "model-00005-of-00028.safetensors",
"gpt_neox.layers.6.attention.query_key_value.bias": "model-00005-of-00028.safetensors",
"gpt_neox.layers.6.attention.query_key_value.weight": "model-00005-of-00028.safetensors",
"gpt_neox.layers.6.attention.rotary_emb.inv_freq": "model-00005-of-00028.safetensors",
"gpt_neox.layers.6.input_layernorm.bias": "model-00005-of-00028.safetensors",
"gpt_neox.layers.6.input_layernorm.weight": "model-00005-of-00028.safetensors",
"gpt_neox.layers.6.mlp.dense_4h_to_h.bias": "model-00006-of-00028.safetensors",
"gpt_neox.layers.6.mlp.dense_4h_to_h.weight": "model-00006-of-00028.safetensors",
"gpt_neox.layers.6.mlp.dense_h_to_4h.bias": "model-00005-of-00028.safetensors",
"gpt_neox.layers.6.mlp.dense_h_to_4h.weight": "model-00005-of-00028.safetensors",
"gpt_neox.layers.6.post_attention_layernorm.bias": "model-00005-of-00028.safetensors",
"gpt_neox.layers.6.post_attention_layernorm.weight": "model-00005-of-00028.safetensors",
"gpt_neox.layers.7.attention.bias": "model-00006-of-00028.safetensors",
"gpt_neox.layers.7.attention.dense.bias": "model-00006-of-00028.safetensors",
"gpt_neox.layers.7.attention.dense.weight": "model-00006-of-00028.safetensors",
"gpt_neox.layers.7.attention.masked_bias": "model-00006-of-00028.safetensors",
"gpt_neox.layers.7.attention.query_key_value.bias": "model-00006-of-00028.safetensors",
"gpt_neox.layers.7.attention.query_key_value.weight": "model-00006-of-00028.safetensors",
"gpt_neox.layers.7.attention.rotary_emb.inv_freq": "model-00006-of-00028.safetensors",
"gpt_neox.layers.7.input_layernorm.bias": "model-00006-of-00028.safetensors",
"gpt_neox.layers.7.input_layernorm.weight": "model-00006-of-00028.safetensors",
"gpt_neox.layers.7.mlp.dense_4h_to_h.bias": "model-00006-of-00028.safetensors",
"gpt_neox.layers.7.mlp.dense_4h_to_h.weight": "model-00006-of-00028.safetensors",
"gpt_neox.layers.7.mlp.dense_h_to_4h.bias": "model-00006-of-00028.safetensors",
"gpt_neox.layers.7.mlp.dense_h_to_4h.weight": "model-00006-of-00028.safetensors",
"gpt_neox.layers.7.post_attention_layernorm.bias": "model-00006-of-00028.safetensors",
"gpt_neox.layers.7.post_attention_layernorm.weight": "model-00006-of-00028.safetensors",
"gpt_neox.layers.8.attention.bias": "model-00006-of-00028.safetensors",
"gpt_neox.layers.8.attention.dense.bias": "model-00007-of-00028.safetensors",
"gpt_neox.layers.8.attention.dense.weight": "model-00007-of-00028.safetensors",
"gpt_neox.layers.8.attention.masked_bias": "model-00006-of-00028.safetensors",
"gpt_neox.layers.8.attention.query_key_value.bias": "model-00006-of-00028.safetensors",
"gpt_neox.layers.8.attention.query_key_value.weight": "model-00006-of-00028.safetensors",
"gpt_neox.layers.8.attention.rotary_emb.inv_freq": "model-00006-of-00028.safetensors",
"gpt_neox.layers.8.input_layernorm.bias": "model-00006-of-00028.safetensors",
"gpt_neox.layers.8.input_layernorm.weight": "model-00006-of-00028.safetensors",
"gpt_neox.layers.8.mlp.dense_4h_to_h.bias": "model-00007-of-00028.safetensors",
"gpt_neox.layers.8.mlp.dense_4h_to_h.weight": "model-00007-of-00028.safetensors",
"gpt_neox.layers.8.mlp.dense_h_to_4h.bias": "model-00007-of-00028.safetensors",
"gpt_neox.layers.8.mlp.dense_h_to_4h.weight": "model-00007-of-00028.safetensors",
"gpt_neox.layers.8.post_attention_layernorm.bias": "model-00006-of-00028.safetensors",
"gpt_neox.layers.8.post_attention_layernorm.weight": "model-00006-of-00028.safetensors",
"gpt_neox.layers.9.attention.bias": "model-00007-of-00028.safetensors",
"gpt_neox.layers.9.attention.dense.bias": "model-00007-of-00028.safetensors",
"gpt_neox.layers.9.attention.dense.weight": "model-00007-of-00028.safetensors",
"gpt_neox.layers.9.attention.masked_bias": "model-00007-of-00028.safetensors",
"gpt_neox.layers.9.attention.query_key_value.bias": "model-00007-of-00028.safetensors",
"gpt_neox.layers.9.attention.query_key_value.weight": "model-00007-of-00028.safetensors",
"gpt_neox.layers.9.attention.rotary_emb.inv_freq": "model-00007-of-00028.safetensors",
"gpt_neox.layers.9.input_layernorm.bias": "model-00007-of-00028.safetensors",
"gpt_neox.layers.9.input_layernorm.weight": "model-00007-of-00028.safetensors",
"gpt_neox.layers.9.mlp.dense_4h_to_h.bias": "model-00008-of-00028.safetensors",
"gpt_neox.layers.9.mlp.dense_4h_to_h.weight": "model-00008-of-00028.safetensors",
"gpt_neox.layers.9.mlp.dense_h_to_4h.bias": "model-00007-of-00028.safetensors",
"gpt_neox.layers.9.mlp.dense_h_to_4h.weight": "model-00007-of-00028.safetensors",
"gpt_neox.layers.9.post_attention_layernorm.bias": "model-00007-of-00028.safetensors",
"gpt_neox.layers.9.post_attention_layernorm.weight": "model-00007-of-00028.safetensors"
}
}

3
pytorch_model.bin Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4c73c3524c705f333b73f47f4bd60eb3cde2998860d03d67c90e262ec1bb9f4c
size 25955180945

11
special_tokens_map.json Normal file
View File

@@ -0,0 +1,11 @@
{
"additional_special_tokens": [
"<|endoftext|>",
"<|sep|>",
"<|acc|>",
"<|tel|>",
"<|rrn|>"
],
"eos_token": "<|endoftext|>",
"pad_token": "<|endoftext|>"
}

3
tokenizer.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:349a7d3c457c9a656b655b77f2ba56c0a95ec1ddc1d7cbb978bbf063a443bad8
size 1652157

6
tokenizer_config.json Normal file
View File

@@ -0,0 +1,6 @@
{
"name_or_path": "EleutherAI/polyglot-ko-12.8b",
"eos_token": "<|endoftext|>",
"pad_token": "<|endoftext|>",
"tokenizer_class": "PreTrainedTokenizerFast"
}