初始化项目，由ModelHub XC社区提供模型

Model: Writer/InstructPalmyra-20b Source: Original Platform
2026-06-08 12:02:26 +08:00
commit e386e5f805
17 changed files with 151171 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,34 @@
+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,134 @@
+---
+license: apache-2.0
+language:
+- en
+tags:
+- InstructGPT
+- hf
+- palmyra
+datasets:
+- Writer/palmyra-data-index
+---
+
+
+
+# InstructPalmyra-20b
+
+- **Developed by:** [https://writer.com/](https://writer.com/);
+- **Model type:** Causal decoder-only;
+- **Language(s) (NLP):** English;
+- **License:** Apache 2.0;
+- **Finetuned from model:** [Palmyra-20B](https://huggingface.co/Writer/palmyra-large).
+
+
+<style>
+img {
+ display: inline;
+}
+</style>
+
+
+## Model Description
+
+Introducing InstructPalmyra-20b, a state-of-the-art instruction-following 20b language model designed to deliver exceptional performance and versatility. Derived from the foundational architecture of [Palmyra-20b](https://huggingface.co/Writer/palmyra-large), InstructPalmyra-20b is specifically tailored to address the growing demand for advanced natural language processing and comprehension capabilities.
+
+The InstructPalmyra-20b model is meticulously trained on an extensive dataset of approximately 70,000 instruction-response records. These records are generated by our dedicated Writer Linguist team, who possess considerable expertise in language modeling and fine-tuning techniques. By leveraging their skills and knowledge, the InstructPalmyra-20b model is primed to offer unparalleled proficiency in understanding and executing language-based instructions.
+
+One of the key differentiators of InstructPalmyra-20b lies in its ability to process complex instructions and generate accurate, contextually appropriate responses. This makes it an ideal choice for a wide range of applications, including virtual assistants, customer support, content generation, and more. Additionally, the model's comprehensive training enables it to adapt and perform well under varying conditions and contexts, further expanding its potential use cases.
+
+
+
+## Usage :
+```python
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM
+
+model_name = "Writer/InstructPalmyra-20b"
+
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    device_map="auto",
+    torch_dtype=torch.float16
+)
+
+instruction = "Describe a futuristic device that revolutionizes space travel."
+
+
+PROMPT_DICT = {
+    "prompt_input": (
+        "Below is an instruction that describes a task, paired with an input that provides further context. "
+        "Write a response that appropriately completes the request\n\n"
+        "### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:"
+    ),
+    "prompt_no_input": (
+        "Below is an instruction that describes a task. "
+        "Write a response that appropriately completes the request.\n\n"
+        "### Instruction:\n{instruction}\n\n### Response:"
+    ),
+}
+
+text = (
+    PROMPT_DICT["prompt_no_input"].format(instruction=instruction)
+    if not input
+    else PROMPT_DICT["prompt_input"].format(instruction=instruction, input=input)
+)
+
+model_inputs = tokenizer(text, return_tensors="pt").to("cuda")
+output_ids = model.generate(
+    **model_inputs,
+    max_length=256,
+)
+output_text = tokenizer.batch_decode(output_ids, skip_special_tokens=True)[0]
+clean_output = output_text.split("### Response:")[1].strip()
+
+print(clean_output)
+```
+
+It can also be used with text-generation-inference
+
+```sh
+model=Writer/InstructPalmyra-20b
+volume=$PWD/data
+
+docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference --model-id $model
+```
+
+
+### Limitations and Biases
+
+InstructPalmyra's core functionality is to take a string of text and predict the next token. While language models are widely used for other tasks, there are many unknowns in this work. When prompting InstructPalmyra, keep in mind that the next statistically likely token is not always the token that produces the most "accurate" text. Never rely on InstructPalmyra to produce factually correct results.
+
+InstructPalmyra was trained on Writer’s custom data. As with all language models, it is difficult to predict how InstructPalmyra will respond to specific prompts, and offensive content may appear unexpectedly. We recommend that the outputs be curated or filtered by humans before they are released, both to censor undesirable content and to improve the quality of the results.
+
+## Uses
+
+
+### Out-of-Scope Use
+
+Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful. 
+
+## Bias, Risks, and Limitations
+
+InstructPalmyra-20b is mostly trained on English data, and will not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online.
+
+### Recommendations
+
+We recommend users of InstructPalmyra-20b to develop guardrails and to take appropriate precautions for any production use.
+
+
+
+## Citation and Related Information
+
+
+To cite this model:
+```
+@misc{InstructPalmyra,
+  author = {Writer Engineering team},
+  title = {{InstructPalmyra-20b : Instruct tuned Palmyra-Large model}},
+  howpublished = {\url{https://dev.writer.com}},
+  year = 2023,
+  month = Augest 
+}
+```
+[![Model architecture](https://img.shields.io/badge/Model%20Arch-Transformer%20Decoder-green)](#model-architecture)|[![Model size](https://img.shields.io/badge/Params-20B-green)](#model-architecture)|[![Language](https://img.shields.io/badge/Language-en--US-lightgrey#model-badge)](#datasets)|![AUR license](https://img.shields.io/badge/license-Apache%202-blue)
--- a/added_tokens.json
+++ b/added_tokens.json
@@ -0,0 +1,3 @@
+{
+  "[PAD]": 50257
+}
--- a/config.json
+++ b/config.json
@@ -0,0 +1,32 @@
+{
+  "_name_or_path": "/home/ubuntu/server/InstructPalmyra-20b",
+  "activation_function": "gelu",
+  "architectures": [
+    "GPT2LMHeadModel"
+  ],
+  "attn_pdrop": 0.1,
+  "bos_token_id": 50256,
+  "embd_pdrop": 0.1,
+  "eos_token_id": 50256,
+  "initializer_range": 0.008165,
+  "layer_norm_epsilon": 1e-05,
+  "model_type": "gpt2",
+  "n_embd": 6144,
+  "n_head": 48,
+  "n_inner": 24576,
+  "n_layer": 44,
+  "n_positions": 2048,
+  "reorder_and_upcast_attn": false,
+  "resid_pdrop": 0.1,
+  "scale_attn_by_inverse_layer_idx": false,
+  "scale_attn_weights": true,
+  "summary_activation": null,
+  "summary_first_dropout": 0.1,
+  "summary_proj_to_labels": true,
+  "summary_type": "cls_index",
+  "summary_use_proj": true,
+  "torch_dtype": "float16",
+  "transformers_version": "4.32.0.dev0",
+  "use_cache": true,
+  "vocab_size": 50258
+}
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,6 @@
+{
+  "_from_model_config": true,
+  "bos_token_id": 50256,
+  "eos_token_id": 50256,
+  "transformers_version": "4.32.0.dev0"
+}
--- a/handler.py
+++ b/handler.py
@@ -0,0 +1,52 @@
+import torch
+from typing import Dict, List, Any
+from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
+
+# check for GPU
+device = 0 if torch.cuda.is_available() else -1
+
+
+format_input = (
+    "Below is an instruction that describes a task. "
+    "Write a response that appropriately completes the request.\n\n"
+    "### Instruction:\n{instruction}\n\n### Response:"
+)
+
+
+class EndpointHandler:
+    def __init__(self, path=""):
+        # load the model
+        tokenizer = AutoTokenizer.from_pretrained(path)
+        model = AutoModelForCausalLM.from_pretrained(
+            path,
+            device_map="auto",
+            torch_dtype=torch.float16,
+        )
+        # create inference pipeline
+        self.pipeline = pipeline(
+            "text-generation",
+            model=model,
+            tokenizer=tokenizer,
+            device=device,
+            max_length=256,
+        )
+
+    def __call__(self, data: Any) -> List[List[Dict[str, float]]]:
+        inputs = data.pop("inputs", data)
+        parameters = data.pop("parameters", None)
+
+        text_input = format_input.format(instruction=inputs)
+
+        # pass inputs with all kwargs in data
+        if parameters is not None:
+            prediction = self.pipeline(text_input, **parameters)
+        else:
+            prediction = self.pipeline(text_input)
+
+        # postprocess the prediction
+        output = [
+            {"generated_text": pred["generated_text"].split("### Response:")[1].strip()}
+            for pred in prediction
+        ]
+
+        return output
--- a/merges.txt
+++ b/merges.txt
--- a/pytorch_model-00001-of-00005.bin
+++ b/pytorch_model-00001-of-00005.bin
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:016eb83728cc4f3ea3ad677700c287bf584d0011411efcb216caea4ddccfda31
+size 9930627841
--- a/pytorch_model-00002-of-00005.bin
+++ b/pytorch_model-00002-of-00005.bin
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:9be3e150665aa6b28567d6141258fa9fc148d5a54e820330673c5b5ad51a72eb
+size 9967469263
--- a/pytorch_model-00003-of-00005.bin
+++ b/pytorch_model-00003-of-00005.bin
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:a20b7b36a1e686f76ae5f3a98421d2ee9d6299ba8d0686181a34817fc1a60edd
+size 9967469263
--- a/pytorch_model-00004-of-00005.bin
+++ b/pytorch_model-00004-of-00005.bin
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:77278e4e765edd31cc085e4ef699443f610950a3668d9c51e1055bd063238f12
+size 9967469263
--- a/pytorch_model-00005-of-00005.bin
+++ b/pytorch_model-00005-of-00005.bin
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:9f72cd3d8a5fc5b20d4b6dffb2288671a854c8dae19a5c57de227ab6871ae7c3
+size 679603843
--- a/pytorch_model.bin.index.json
+++ b/pytorch_model.bin.index.json
@@ -0,0 +1,540 @@
+{
+  "metadata": {
+    "total_size": 40512454656
+  },
+  "weight_map": {
+    "lm_head.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.0.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.0.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.0.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.0.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.0.ln_1.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.0.ln_1.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.0.ln_2.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.0.ln_2.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.0.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.0.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.0.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.0.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.1.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.1.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.1.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.1.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.1.ln_1.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.1.ln_1.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.1.ln_2.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.1.ln_2.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.1.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.1.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.1.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.1.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.10.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.10.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.10.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.10.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.10.ln_1.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.10.ln_1.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.10.ln_2.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.10.ln_2.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.10.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.10.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.10.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.10.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.11.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.11.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.11.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.11.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.11.ln_1.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.11.ln_1.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.11.ln_2.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.11.ln_2.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.11.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.11.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.11.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.11.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.12.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.12.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.12.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.12.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.12.ln_1.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.12.ln_1.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.12.ln_2.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.12.ln_2.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.12.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.12.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.12.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.12.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.13.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.13.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.13.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.13.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.13.ln_1.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.13.ln_1.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.13.ln_2.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.13.ln_2.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.13.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.13.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.13.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.13.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.14.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.14.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.14.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.14.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.14.ln_1.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.14.ln_1.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.14.ln_2.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.14.ln_2.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.14.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.14.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.14.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.14.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.15.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.15.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.15.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.15.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.15.ln_1.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.15.ln_1.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.15.ln_2.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.15.ln_2.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.15.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.15.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.15.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.15.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.16.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.16.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.16.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.16.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.16.ln_1.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.16.ln_1.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.16.ln_2.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.16.ln_2.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.16.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.16.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.16.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.16.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.17.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.17.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.17.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.17.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.17.ln_1.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.17.ln_1.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.17.ln_2.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.17.ln_2.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.17.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.17.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.17.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.17.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.18.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.18.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.18.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.18.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.18.ln_1.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.18.ln_1.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.18.ln_2.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.18.ln_2.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.18.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.18.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.18.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.18.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.19.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.19.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.19.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.19.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.19.ln_1.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.19.ln_1.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.19.ln_2.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.19.ln_2.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.19.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.19.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.19.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.19.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.2.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.2.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.2.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.2.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.2.ln_1.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.2.ln_1.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.2.ln_2.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.2.ln_2.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.2.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.2.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.2.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.2.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.20.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.20.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.20.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.20.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.20.ln_1.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.20.ln_1.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.20.ln_2.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.20.ln_2.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.20.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.20.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.20.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.20.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.21.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.21.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.21.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.21.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.21.ln_1.bias": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.21.ln_1.weight": "pytorch_model-00002-of-00005.bin",
+    "transformer.h.21.ln_2.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.21.ln_2.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.21.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.21.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.21.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.21.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.22.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.22.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.22.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.22.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.22.ln_1.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.22.ln_1.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.22.ln_2.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.22.ln_2.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.22.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.22.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.22.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.22.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.23.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.23.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.23.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.23.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.23.ln_1.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.23.ln_1.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.23.ln_2.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.23.ln_2.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.23.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.23.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.23.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.23.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.24.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.24.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.24.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.24.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.24.ln_1.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.24.ln_1.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.24.ln_2.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.24.ln_2.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.24.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.24.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.24.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.24.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.25.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.25.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.25.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.25.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.25.ln_1.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.25.ln_1.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.25.ln_2.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.25.ln_2.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.25.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.25.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.25.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.25.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.26.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.26.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.26.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.26.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.26.ln_1.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.26.ln_1.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.26.ln_2.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.26.ln_2.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.26.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.26.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.26.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.26.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.27.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.27.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.27.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.27.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.27.ln_1.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.27.ln_1.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.27.ln_2.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.27.ln_2.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.27.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.27.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.27.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.27.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.28.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.28.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.28.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.28.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.28.ln_1.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.28.ln_1.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.28.ln_2.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.28.ln_2.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.28.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.28.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.28.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.28.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.29.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.29.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.29.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.29.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.29.ln_1.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.29.ln_1.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.29.ln_2.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.29.ln_2.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.29.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.29.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.29.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.29.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.3.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.3.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.3.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.3.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.3.ln_1.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.3.ln_1.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.3.ln_2.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.3.ln_2.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.3.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.3.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.3.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.3.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.30.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.30.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.30.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.30.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.30.ln_1.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.30.ln_1.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.30.ln_2.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.30.ln_2.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.30.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.30.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.30.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.30.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.31.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.31.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.31.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.31.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.31.ln_1.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.31.ln_1.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.31.ln_2.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.31.ln_2.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.31.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.31.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.31.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.31.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.32.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.32.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.32.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.32.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.32.ln_1.bias": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.32.ln_1.weight": "pytorch_model-00003-of-00005.bin",
+    "transformer.h.32.ln_2.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.32.ln_2.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.32.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.32.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.32.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.32.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.33.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.33.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.33.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.33.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.33.ln_1.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.33.ln_1.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.33.ln_2.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.33.ln_2.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.33.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.33.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.33.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.33.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.34.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.34.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.34.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.34.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.34.ln_1.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.34.ln_1.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.34.ln_2.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.34.ln_2.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.34.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.34.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.34.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.34.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.35.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.35.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.35.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.35.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.35.ln_1.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.35.ln_1.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.35.ln_2.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.35.ln_2.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.35.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.35.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.35.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.35.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.36.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.36.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.36.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.36.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.36.ln_1.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.36.ln_1.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.36.ln_2.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.36.ln_2.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.36.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.36.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.36.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.36.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.37.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.37.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.37.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.37.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.37.ln_1.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.37.ln_1.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.37.ln_2.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.37.ln_2.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.37.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.37.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.37.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.37.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.38.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.38.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.38.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.38.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.38.ln_1.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.38.ln_1.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.38.ln_2.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.38.ln_2.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.38.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.38.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.38.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.38.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.39.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.39.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.39.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.39.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.39.ln_1.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.39.ln_1.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.39.ln_2.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.39.ln_2.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.39.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.39.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.39.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.39.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.4.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.4.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.4.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.4.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.4.ln_1.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.4.ln_1.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.4.ln_2.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.4.ln_2.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.4.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.4.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.4.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.4.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.40.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.40.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.40.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.40.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.40.ln_1.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.40.ln_1.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.40.ln_2.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.40.ln_2.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.40.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.40.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.40.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.40.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.41.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.41.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.41.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.41.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.41.ln_1.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.41.ln_1.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.41.ln_2.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.41.ln_2.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.41.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.41.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.41.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.41.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.42.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.42.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.42.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.42.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.42.ln_1.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.42.ln_1.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.42.ln_2.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.42.ln_2.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.42.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.42.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.42.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.42.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.43.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.43.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.43.attn.c_proj.bias": "pytorch_model-00005-of-00005.bin",
+    "transformer.h.43.attn.c_proj.weight": "pytorch_model-00005-of-00005.bin",
+    "transformer.h.43.ln_1.bias": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.43.ln_1.weight": "pytorch_model-00004-of-00005.bin",
+    "transformer.h.43.ln_2.bias": "pytorch_model-00005-of-00005.bin",
+    "transformer.h.43.ln_2.weight": "pytorch_model-00005-of-00005.bin",
+    "transformer.h.43.mlp.c_fc.bias": "pytorch_model-00005-of-00005.bin",
+    "transformer.h.43.mlp.c_fc.weight": "pytorch_model-00005-of-00005.bin",
+    "transformer.h.43.mlp.c_proj.bias": "pytorch_model-00005-of-00005.bin",
+    "transformer.h.43.mlp.c_proj.weight": "pytorch_model-00005-of-00005.bin",
+    "transformer.h.5.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.5.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.5.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.5.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.5.ln_1.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.5.ln_1.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.5.ln_2.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.5.ln_2.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.5.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.5.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.5.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.5.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.6.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.6.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.6.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.6.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.6.ln_1.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.6.ln_1.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.6.ln_2.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.6.ln_2.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.6.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.6.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.6.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.6.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.7.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.7.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.7.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.7.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.7.ln_1.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.7.ln_1.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.7.ln_2.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.7.ln_2.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.7.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.7.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.7.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.7.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.8.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.8.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.8.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.8.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.8.ln_1.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.8.ln_1.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.8.ln_2.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.8.ln_2.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.8.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.8.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.8.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.8.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.9.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.9.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.9.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.9.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.9.ln_1.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.9.ln_1.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.9.ln_2.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.9.ln_2.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.9.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.9.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.9.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
+    "transformer.h.9.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.ln_f.bias": "pytorch_model-00005-of-00005.bin",
+    "transformer.ln_f.weight": "pytorch_model-00005-of-00005.bin",
+    "transformer.wpe.weight": "pytorch_model-00001-of-00005.bin",
+    "transformer.wte.weight": "pytorch_model-00001-of-00005.bin"
+  }
+}
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,6 @@
+{
+  "bos_token": "<|endoftext|>",
+  "eos_token": "<|endoftext|>",
+  "pad_token": "[PAD]",
+  "unk_token": "<|endoftext|>"
+}
--- a/tokenizer.json
+++ b/tokenizer.json
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,34 @@
+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "bos_token": {
+    "__type": "AddedToken",
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "clean_up_tokenization_spaces": true,
+  "eos_token": {
+    "__type": "AddedToken",
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "errors": "replace",
+  "model_max_length": 2048,
+  "pad_token": null,
+  "padding_side": "right",
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": {
+    "__type": "AddedToken",
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}
--- a/vocab.json
+++ b/vocab.json