初始化项目,由ModelHub XC社区提供模型

Model: Writer/InstructPalmyra-20b
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-06-08 12:02:26 +08:00
commit e386e5f805
17 changed files with 151171 additions and 0 deletions

34
.gitattributes vendored Normal file
View File

@@ -0,0 +1,34 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text

134
README.md Normal file
View File

@@ -0,0 +1,134 @@
---
license: apache-2.0
language:
- en
tags:
- InstructGPT
- hf
- palmyra
datasets:
- Writer/palmyra-data-index
---
# InstructPalmyra-20b
- **Developed by:** [https://writer.com/](https://writer.com/);
- **Model type:** Causal decoder-only;
- **Language(s) (NLP):** English;
- **License:** Apache 2.0;
- **Finetuned from model:** [Palmyra-20B](https://huggingface.co/Writer/palmyra-large).
<style>
img {
display: inline;
}
</style>
## Model Description
Introducing InstructPalmyra-20b, a state-of-the-art instruction-following 20b language model designed to deliver exceptional performance and versatility. Derived from the foundational architecture of [Palmyra-20b](https://huggingface.co/Writer/palmyra-large), InstructPalmyra-20b is specifically tailored to address the growing demand for advanced natural language processing and comprehension capabilities.
The InstructPalmyra-20b model is meticulously trained on an extensive dataset of approximately 70,000 instruction-response records. These records are generated by our dedicated Writer Linguist team, who possess considerable expertise in language modeling and fine-tuning techniques. By leveraging their skills and knowledge, the InstructPalmyra-20b model is primed to offer unparalleled proficiency in understanding and executing language-based instructions.
One of the key differentiators of InstructPalmyra-20b lies in its ability to process complex instructions and generate accurate, contextually appropriate responses. This makes it an ideal choice for a wide range of applications, including virtual assistants, customer support, content generation, and more. Additionally, the model's comprehensive training enables it to adapt and perform well under varying conditions and contexts, further expanding its potential use cases.
## Usage :
```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "Writer/InstructPalmyra-20b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
torch_dtype=torch.float16
)
instruction = "Describe a futuristic device that revolutionizes space travel."
PROMPT_DICT = {
"prompt_input": (
"Below is an instruction that describes a task, paired with an input that provides further context. "
"Write a response that appropriately completes the request\n\n"
"### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:"
),
"prompt_no_input": (
"Below is an instruction that describes a task. "
"Write a response that appropriately completes the request.\n\n"
"### Instruction:\n{instruction}\n\n### Response:"
),
}
text = (
PROMPT_DICT["prompt_no_input"].format(instruction=instruction)
if not input
else PROMPT_DICT["prompt_input"].format(instruction=instruction, input=input)
)
model_inputs = tokenizer(text, return_tensors="pt").to("cuda")
output_ids = model.generate(
**model_inputs,
max_length=256,
)
output_text = tokenizer.batch_decode(output_ids, skip_special_tokens=True)[0]
clean_output = output_text.split("### Response:")[1].strip()
print(clean_output)
```
It can also be used with text-generation-inference
```sh
model=Writer/InstructPalmyra-20b
volume=$PWD/data
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference --model-id $model
```
### Limitations and Biases
InstructPalmyra's core functionality is to take a string of text and predict the next token. While language models are widely used for other tasks, there are many unknowns in this work. When prompting InstructPalmyra, keep in mind that the next statistically likely token is not always the token that produces the most "accurate" text. Never rely on InstructPalmyra to produce factually correct results.
InstructPalmyra was trained on Writers custom data. As with all language models, it is difficult to predict how InstructPalmyra will respond to specific prompts, and offensive content may appear unexpectedly. We recommend that the outputs be curated or filtered by humans before they are released, both to censor undesirable content and to improve the quality of the results.
## Uses
### Out-of-Scope Use
Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.
## Bias, Risks, and Limitations
InstructPalmyra-20b is mostly trained on English data, and will not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online.
### Recommendations
We recommend users of InstructPalmyra-20b to develop guardrails and to take appropriate precautions for any production use.
## Citation and Related Information
To cite this model:
```
@misc{InstructPalmyra,
author = {Writer Engineering team},
title = {{InstructPalmyra-20b : Instruct tuned Palmyra-Large model}},
howpublished = {\url{https://dev.writer.com}},
year = 2023,
month = Augest
}
```
[![Model architecture](https://img.shields.io/badge/Model%20Arch-Transformer%20Decoder-green)](#model-architecture)|[![Model size](https://img.shields.io/badge/Params-20B-green)](#model-architecture)|[![Language](https://img.shields.io/badge/Language-en--US-lightgrey#model-badge)](#datasets)|![AUR license](https://img.shields.io/badge/license-Apache%202-blue)

3
added_tokens.json Normal file
View File

@@ -0,0 +1,3 @@
{
"[PAD]": 50257
}

32
config.json Normal file
View File

@@ -0,0 +1,32 @@
{
"_name_or_path": "/home/ubuntu/server/InstructPalmyra-20b",
"activation_function": "gelu",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50256,
"embd_pdrop": 0.1,
"eos_token_id": 50256,
"initializer_range": 0.008165,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_embd": 6144,
"n_head": 48,
"n_inner": 24576,
"n_layer": 44,
"n_positions": 2048,
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"torch_dtype": "float16",
"transformers_version": "4.32.0.dev0",
"use_cache": true,
"vocab_size": 50258
}

6
generation_config.json Normal file
View File

@@ -0,0 +1,6 @@
{
"_from_model_config": true,
"bos_token_id": 50256,
"eos_token_id": 50256,
"transformers_version": "4.32.0.dev0"
}

52
handler.py Normal file
View File

@@ -0,0 +1,52 @@
import torch
from typing import Dict, List, Any
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
# check for GPU
device = 0 if torch.cuda.is_available() else -1
format_input = (
"Below is an instruction that describes a task. "
"Write a response that appropriately completes the request.\n\n"
"### Instruction:\n{instruction}\n\n### Response:"
)
class EndpointHandler:
def __init__(self, path=""):
# load the model
tokenizer = AutoTokenizer.from_pretrained(path)
model = AutoModelForCausalLM.from_pretrained(
path,
device_map="auto",
torch_dtype=torch.float16,
)
# create inference pipeline
self.pipeline = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
device=device,
max_length=256,
)
def __call__(self, data: Any) -> List[List[Dict[str, float]]]:
inputs = data.pop("inputs", data)
parameters = data.pop("parameters", None)
text_input = format_input.format(instruction=inputs)
# pass inputs with all kwargs in data
if parameters is not None:
prediction = self.pipeline(text_input, **parameters)
else:
prediction = self.pipeline(text_input)
# postprocess the prediction
output = [
{"generated_text": pred["generated_text"].split("### Response:")[1].strip()}
for pred in prediction
]
return output

50001
merges.txt Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:016eb83728cc4f3ea3ad677700c287bf584d0011411efcb216caea4ddccfda31
size 9930627841

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9be3e150665aa6b28567d6141258fa9fc148d5a54e820330673c5b5ad51a72eb
size 9967469263

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a20b7b36a1e686f76ae5f3a98421d2ee9d6299ba8d0686181a34817fc1a60edd
size 9967469263

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:77278e4e765edd31cc085e4ef699443f610950a3668d9c51e1055bd063238f12
size 9967469263

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9f72cd3d8a5fc5b20d4b6dffb2288671a854c8dae19a5c57de227ab6871ae7c3
size 679603843

View File

@@ -0,0 +1,540 @@
{
"metadata": {
"total_size": 40512454656
},
"weight_map": {
"lm_head.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.0.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.0.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.0.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.0.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.0.ln_1.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.0.ln_1.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.0.ln_2.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.0.ln_2.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.0.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.0.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.0.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.0.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.1.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.1.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.1.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.1.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.1.ln_1.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.1.ln_1.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.1.ln_2.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.1.ln_2.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.1.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.1.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.1.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.1.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.10.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.10.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.10.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.10.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.10.ln_1.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.10.ln_1.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.10.ln_2.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.10.ln_2.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.10.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.10.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.10.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.10.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.11.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.11.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.11.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.11.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.11.ln_1.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.11.ln_1.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.11.ln_2.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.11.ln_2.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.11.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.11.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.11.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.11.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.12.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.12.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.12.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.12.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.12.ln_1.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.12.ln_1.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.12.ln_2.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.12.ln_2.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.12.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.12.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.12.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.12.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.13.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.13.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.13.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.13.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.13.ln_1.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.13.ln_1.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.13.ln_2.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.13.ln_2.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.13.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.13.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.13.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.13.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.14.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.14.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.14.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.14.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.14.ln_1.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.14.ln_1.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.14.ln_2.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.14.ln_2.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.14.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.14.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.14.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.14.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.15.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.15.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.15.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.15.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.15.ln_1.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.15.ln_1.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.15.ln_2.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.15.ln_2.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.15.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.15.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.15.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.15.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.16.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.16.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.16.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.16.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.16.ln_1.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.16.ln_1.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.16.ln_2.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.16.ln_2.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.16.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.16.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.16.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.16.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.17.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.17.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.17.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.17.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.17.ln_1.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.17.ln_1.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.17.ln_2.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.17.ln_2.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.17.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.17.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.17.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.17.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.18.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.18.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.18.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.18.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.18.ln_1.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.18.ln_1.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.18.ln_2.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.18.ln_2.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.18.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.18.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.18.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.18.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.19.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.19.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.19.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.19.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.19.ln_1.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.19.ln_1.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.19.ln_2.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.19.ln_2.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.19.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.19.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.19.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.19.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.2.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.2.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.2.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.2.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.2.ln_1.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.2.ln_1.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.2.ln_2.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.2.ln_2.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.2.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.2.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.2.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.2.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.20.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.20.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.20.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.20.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.20.ln_1.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.20.ln_1.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.20.ln_2.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.20.ln_2.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.20.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.20.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.20.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.20.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.21.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.21.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.21.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.21.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.21.ln_1.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.21.ln_1.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.21.ln_2.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.21.ln_2.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.21.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.21.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.21.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.21.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.22.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.22.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.22.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.22.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.22.ln_1.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.22.ln_1.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.22.ln_2.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.22.ln_2.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.22.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.22.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.22.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.22.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.23.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.23.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.23.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.23.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.23.ln_1.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.23.ln_1.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.23.ln_2.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.23.ln_2.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.23.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.23.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.23.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.23.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.24.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.24.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.24.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.24.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.24.ln_1.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.24.ln_1.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.24.ln_2.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.24.ln_2.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.24.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.24.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.24.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.24.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.25.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.25.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.25.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.25.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.25.ln_1.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.25.ln_1.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.25.ln_2.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.25.ln_2.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.25.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.25.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.25.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.25.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.26.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.26.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.26.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.26.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.26.ln_1.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.26.ln_1.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.26.ln_2.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.26.ln_2.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.26.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.26.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.26.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.26.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.27.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.27.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.27.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.27.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.27.ln_1.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.27.ln_1.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.27.ln_2.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.27.ln_2.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.27.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.27.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.27.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.27.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.28.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.28.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.28.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.28.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.28.ln_1.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.28.ln_1.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.28.ln_2.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.28.ln_2.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.28.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.28.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.28.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.28.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.29.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.29.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.29.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.29.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.29.ln_1.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.29.ln_1.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.29.ln_2.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.29.ln_2.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.29.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.29.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.29.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.29.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.3.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.3.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.3.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.3.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.3.ln_1.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.3.ln_1.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.3.ln_2.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.3.ln_2.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.3.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.3.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.3.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.3.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.30.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.30.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.30.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.30.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.30.ln_1.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.30.ln_1.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.30.ln_2.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.30.ln_2.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.30.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.30.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.30.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.30.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.31.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.31.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.31.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.31.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.31.ln_1.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.31.ln_1.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.31.ln_2.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.31.ln_2.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.31.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.31.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.31.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.31.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.32.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.32.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.32.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.32.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.32.ln_1.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.32.ln_1.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.32.ln_2.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.32.ln_2.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.32.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.32.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.32.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.32.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.33.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.33.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.33.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.33.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.33.ln_1.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.33.ln_1.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.33.ln_2.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.33.ln_2.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.33.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.33.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.33.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.33.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.34.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.34.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.34.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.34.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.34.ln_1.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.34.ln_1.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.34.ln_2.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.34.ln_2.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.34.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.34.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.34.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.34.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.35.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.35.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.35.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.35.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.35.ln_1.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.35.ln_1.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.35.ln_2.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.35.ln_2.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.35.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.35.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.35.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.35.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.36.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.36.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.36.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.36.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.36.ln_1.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.36.ln_1.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.36.ln_2.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.36.ln_2.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.36.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.36.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.36.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.36.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.37.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.37.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.37.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.37.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.37.ln_1.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.37.ln_1.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.37.ln_2.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.37.ln_2.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.37.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.37.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.37.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.37.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.38.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.38.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.38.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.38.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.38.ln_1.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.38.ln_1.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.38.ln_2.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.38.ln_2.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.38.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.38.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.38.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.38.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.39.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.39.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.39.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.39.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.39.ln_1.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.39.ln_1.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.39.ln_2.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.39.ln_2.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.39.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.39.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.39.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.39.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.4.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.4.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.4.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.4.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.4.ln_1.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.4.ln_1.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.4.ln_2.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.4.ln_2.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.4.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.4.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.4.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.4.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.40.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.40.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.40.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.40.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.40.ln_1.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.40.ln_1.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.40.ln_2.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.40.ln_2.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.40.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.40.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.40.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.40.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.41.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.41.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.41.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.41.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.41.ln_1.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.41.ln_1.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.41.ln_2.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.41.ln_2.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.41.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.41.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.41.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.41.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.42.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.42.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.42.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.42.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.42.ln_1.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.42.ln_1.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.42.ln_2.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.42.ln_2.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.42.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.42.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.42.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.42.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.43.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.43.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.43.attn.c_proj.bias": "pytorch_model-00005-of-00005.bin",
"transformer.h.43.attn.c_proj.weight": "pytorch_model-00005-of-00005.bin",
"transformer.h.43.ln_1.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.43.ln_1.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.43.ln_2.bias": "pytorch_model-00005-of-00005.bin",
"transformer.h.43.ln_2.weight": "pytorch_model-00005-of-00005.bin",
"transformer.h.43.mlp.c_fc.bias": "pytorch_model-00005-of-00005.bin",
"transformer.h.43.mlp.c_fc.weight": "pytorch_model-00005-of-00005.bin",
"transformer.h.43.mlp.c_proj.bias": "pytorch_model-00005-of-00005.bin",
"transformer.h.43.mlp.c_proj.weight": "pytorch_model-00005-of-00005.bin",
"transformer.h.5.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.5.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.5.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.5.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.5.ln_1.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.5.ln_1.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.5.ln_2.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.5.ln_2.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.5.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.5.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.5.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.5.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.6.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.6.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.6.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.6.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.6.ln_1.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.6.ln_1.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.6.ln_2.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.6.ln_2.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.6.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.6.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.6.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.6.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.7.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.7.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.7.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.7.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.7.ln_1.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.7.ln_1.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.7.ln_2.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.7.ln_2.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.7.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.7.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.7.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.7.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.8.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.8.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.8.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.8.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.8.ln_1.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.8.ln_1.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.8.ln_2.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.8.ln_2.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.8.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.8.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.8.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.8.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.9.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.9.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.9.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.9.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.9.ln_1.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.9.ln_1.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.9.ln_2.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.9.ln_2.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.9.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.9.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.9.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.9.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.ln_f.bias": "pytorch_model-00005-of-00005.bin",
"transformer.ln_f.weight": "pytorch_model-00005-of-00005.bin",
"transformer.wpe.weight": "pytorch_model-00001-of-00005.bin",
"transformer.wte.weight": "pytorch_model-00001-of-00005.bin"
}
}

6
special_tokens_map.json Normal file
View File

@@ -0,0 +1,6 @@
{
"bos_token": "<|endoftext|>",
"eos_token": "<|endoftext|>",
"pad_token": "[PAD]",
"unk_token": "<|endoftext|>"
}

100313
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

34
tokenizer_config.json Normal file
View File

@@ -0,0 +1,34 @@
{
"add_bos_token": false,
"add_prefix_space": false,
"bos_token": {
"__type": "AddedToken",
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"clean_up_tokenization_spaces": true,
"eos_token": {
"__type": "AddedToken",
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"errors": "replace",
"model_max_length": 2048,
"pad_token": null,
"padding_side": "right",
"tokenizer_class": "GPT2Tokenizer",
"unk_token": {
"__type": "AddedToken",
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
}
}

1
vocab.json Normal file

File diff suppressed because one or more lines are too long