初始化项目，由ModelHub XC社区提供模型

Model: Writer-Org/palmyra-base Source: Original Platform
2026-06-05 13:04:13 +08:00
commit 61baa594b4
16 changed files with 50952 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,53 @@
+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+ 
+ 
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+ 
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zstandard filter=lfs diff=lfs merge=lfs -text
+*.tfevents* filter=lfs diff=lfs merge=lfs -text
+*.db* filter=lfs diff=lfs merge=lfs -text
+*.ark* filter=lfs diff=lfs merge=lfs -text
+**/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
+**/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
+**/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
+ 
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.gguf* filter=lfs diff=lfs merge=lfs -text
+*.ggml filter=lfs diff=lfs merge=lfs -text
+*.llamafile* filter=lfs diff=lfs merge=lfs -text
+*.pt2 filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+
+model-00002-of-00002.safetensors filter=lfs diff=lfs merge=lfs -text
+pytorch_model-00001-of-00002.bin filter=lfs diff=lfs merge=lfs -text
+pytorch_model-00002-of-00002.bin filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text
+model-00001-of-00002.safetensors filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,112 @@
+---
+language:
+- en
+datasets:
+- English
+tags:
+- text generation
+- pytorch
+- causal-lm
+- Writer-data
+- gpt
+- NeMo
+- palmyra
+pipeline_tag: text-generation
+library_name: transformers
+license: apache-2.0
+---
+**DEPRECATED MODEL NOTICE**
+==========================
+
+Please note that this model is no longer maintained or supported by our team. We strongly advise against using it in production or for any critical applications.
+
+Instead, we recommend using our latest and greatest models, which can be found at:
+
+https://huggingface.co/collections/Writer/palmyra-writer-license-66476fa8156169f8720a2c89
+
+==========================
+
+
+# Palmyra Base 5B
+
+<style>
+img {
+ display: inline;
+}
+</style>
+
+|[![Model architecture](https://img.shields.io/badge/Model%20Arch-Transformer%20Decoder-green)](#model-architecture)|[![Model size](https://img.shields.io/badge/Params-5B-green)](#model-architecture)|[![Language](https://img.shields.io/badge/Language-en--US-lightgrey#model-badge)](#datasets)
+
+
+## Model Description
+
+Palmyra Base was primarily pre-trained with English text. Note that there is still a trace amount of non-English data present within the training corpus that was accessed through CommonCrawl. A causal language modeling (CLM) objective was utilized during the process of the model's pretraining. Similar to GPT-3, Palmyra Base is a member of the same family of models that only contain a decoder. As a result, it was pre-trained utilizing the objective of self-supervised causal language modeling. Palmyra Base uses the prompts and general experimental setup from GPT-3 in order to conduct its evaluation per GPT-3.
+
+### Use case
+Palmyra Base is extremely powerful while being extremely fast. This model excels at many nuanced tasks such as sentiment classification and summarization.
+
+
+## Training data
+
+Palmyra Base (5b) was trained on Writer’s custom dataset.
+
+
+## Intended Use and Limitations
+
+Palmyra Base learns an inner representation of the English language that can be used to extract features useful for downstream tasks. However, the model is best at what it was pre-trained for which is generating text from a prompt.
+
+### How to use
+
+This model can be easily loaded using the `AutoModelForCausalLM` functionality:
+
+```python
+
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+
+model = AutoModelForCausalLM.from_pretrained("Writer/palmyra-base", torch_dtype=torch.float16).cuda()
+
+# the fast tokenizer currently does not work correctly
+tokenizer = AutoTokenizer.from_pretrained("Writer/palmyra-base", use_fast=False)
+
+
+```
+
+### Limitations and Biases
+
+Palmyra Base’s core functionality is to take a string of text and predict the next token. While language models are widely used for other tasks, there are many unknowns in this work. When prompting Palmyra Base, keep in mind that the next statistically likely token is not always the token that produces the most "accurate" text. Never rely on Palmyra Base to produce factually correct results.
+
+Palmyra Base was trained on Writer’s custom data. As with all language models, it is difficult to predict how Palmyra Base will respond to specific prompts, and offensive content may appear unexpectedly. We recommend that the outputs be curated or filtered by humans before they are released, both to censor undesirable content and to improve the quality of the results.
+
+
+## Evaluation results
+
+Evaluation of Palmyra-base model on the SuperGLUE benchmark
+
+
+|   Task     | Metric | Value |
+|------------|--------|-------|
+|   boolq    |  acc   | 64.43 |
+|   cb       |  acc   | 10.71 |
+|            |  f1    | 08.32 |
+|   copa     |  acc   | 76.00 |
+|   multirc  |  acc   | 01.26 |
+|   record   |  f1    | 84.02 |
+|            |  em    | 83.29 |
+|   wic      |  acc   | 50.00 |
+|   wsc      |  acc   | 36.54 |
+
+
+## Citation and Related Information
+
+
+To cite this model:
+```
+@misc{Palmyra,
+  author = {Writer Engineering team},
+  title = {{Palmyra-base Parameter Autoregressive Language Model}},
+  howpublished = {\url{https://dev.writer.com}},
+  year = 2023,
+  month = January 
+}
+```
--- a/config.json
+++ b/config.json
@@ -0,0 +1,31 @@
+{
+  "activation_function": "gelu",
+  "architectures": [
+    "GPT2LMHeadModel"
+  ],
+  "attn_pdrop": 0.1,
+  "bos_token_id": 50256,
+  "embd_pdrop": 0.1,
+  "eos_token_id": 50256,
+  "initializer_range": 0.01,
+  "layer_norm_epsilon": 1e-05,
+  "model_type": "gpt2",
+  "n_embd": 4096,
+  "n_head": 32,
+  "n_inner": 16384,
+  "n_layer": 24,
+  "n_positions": 2048,
+  "reorder_and_upcast_attn": false,
+  "resid_pdrop": 0.1,
+  "scale_attn_by_inverse_layer_idx": false,
+  "scale_attn_weights": true,
+  "summary_activation": null,
+  "summary_first_dropout": 0.1,
+  "summary_proj_to_labels": true,
+  "summary_type": "cls_index",
+  "summary_use_proj": true,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.24.0",
+  "use_cache": true,
+  "vocab_size": 50257
+}
--- a/configuration.json
+++ b/configuration.json
@@ -0,0 +1 @@
+{"framework": "pytorch", "task": "text-generation", "allow_remote": true}
--- a/handler.py
+++ b/handler.py
@@ -0,0 +1,27 @@
+import torch
+from typing import Dict, List, Any
+from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
+
+# check for GPU
+device = 0 if torch.cuda.is_available() else -1
+
+
+class EndpointHandler:
+    def __init__(self, path=""):
+        # load the model
+        tokenizer = AutoTokenizer.from_pretrained(path)
+        model = AutoModelForCausalLM.from_pretrained(path, low_cpu_mem_usage=True)
+        # create inference pipeline
+        self.pipeline = pipeline("text-generation", model=model, tokenizer=tokenizer, device=device)
+
+    def __call__(self, data: Any) -> List[List[Dict[str, float]]]:
+        inputs = data.pop("inputs", data)
+        parameters = data.pop("parameters", None)
+
+        # pass inputs with all kwargs in data
+        if parameters is not None:
+            prediction = self.pipeline(inputs, **parameters)
+        else:
+            prediction = self.pipeline(inputs)
+        # postprocess the prediction
+        return prediction
--- a/merges.txt
+++ b/merges.txt
--- a/model-00001-of-00002.safetensors
+++ b/model-00001-of-00002.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:8f30273adde3abebc9bd1d73af24616c608176957434e5934ccab92438e72937
+size 9994021176
--- a/model-00002-of-00002.safetensors
+++ b/model-00002-of-00002.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:8249857a5bdd57cc96fe2fd1f6e9603b8a4c81abc5a434d7c163747da7150977
+size 713778312
--- a/model.safetensors.index.json
+++ b/model.safetensors.index.json
@@ -0,0 +1,348 @@
+{
+    "metadata": {
+        "total_size": 10707763248
+    },
+    "weight_map": {
+        "lm_head.weight": "model-00002-of-00002.safetensors",
+        "transformer.h.0.attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.0.attn.c_attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.0.attn.c_attn.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.0.attn.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.0.attn.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.0.attn.masked_bias": "model-00001-of-00002.safetensors",
+        "transformer.h.0.ln_1.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.0.ln_1.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.0.ln_2.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.0.ln_2.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.0.mlp.c_fc.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.0.mlp.c_fc.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.0.mlp.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.0.mlp.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.1.attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.1.attn.c_attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.1.attn.c_attn.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.1.attn.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.1.attn.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.1.attn.masked_bias": "model-00001-of-00002.safetensors",
+        "transformer.h.1.ln_1.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.1.ln_1.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.1.ln_2.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.1.ln_2.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.1.mlp.c_fc.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.1.mlp.c_fc.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.1.mlp.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.1.mlp.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.10.attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.10.attn.c_attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.10.attn.c_attn.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.10.attn.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.10.attn.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.10.attn.masked_bias": "model-00001-of-00002.safetensors",
+        "transformer.h.10.ln_1.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.10.ln_1.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.10.ln_2.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.10.ln_2.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.10.mlp.c_fc.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.10.mlp.c_fc.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.10.mlp.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.10.mlp.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.11.attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.11.attn.c_attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.11.attn.c_attn.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.11.attn.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.11.attn.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.11.attn.masked_bias": "model-00001-of-00002.safetensors",
+        "transformer.h.11.ln_1.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.11.ln_1.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.11.ln_2.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.11.ln_2.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.11.mlp.c_fc.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.11.mlp.c_fc.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.11.mlp.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.11.mlp.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.12.attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.12.attn.c_attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.12.attn.c_attn.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.12.attn.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.12.attn.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.12.attn.masked_bias": "model-00001-of-00002.safetensors",
+        "transformer.h.12.ln_1.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.12.ln_1.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.12.ln_2.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.12.ln_2.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.12.mlp.c_fc.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.12.mlp.c_fc.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.12.mlp.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.12.mlp.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.13.attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.13.attn.c_attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.13.attn.c_attn.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.13.attn.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.13.attn.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.13.attn.masked_bias": "model-00001-of-00002.safetensors",
+        "transformer.h.13.ln_1.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.13.ln_1.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.13.ln_2.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.13.ln_2.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.13.mlp.c_fc.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.13.mlp.c_fc.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.13.mlp.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.13.mlp.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.14.attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.14.attn.c_attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.14.attn.c_attn.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.14.attn.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.14.attn.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.14.attn.masked_bias": "model-00001-of-00002.safetensors",
+        "transformer.h.14.ln_1.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.14.ln_1.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.14.ln_2.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.14.ln_2.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.14.mlp.c_fc.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.14.mlp.c_fc.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.14.mlp.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.14.mlp.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.15.attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.15.attn.c_attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.15.attn.c_attn.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.15.attn.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.15.attn.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.15.attn.masked_bias": "model-00001-of-00002.safetensors",
+        "transformer.h.15.ln_1.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.15.ln_1.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.15.ln_2.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.15.ln_2.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.15.mlp.c_fc.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.15.mlp.c_fc.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.15.mlp.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.15.mlp.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.16.attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.16.attn.c_attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.16.attn.c_attn.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.16.attn.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.16.attn.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.16.attn.masked_bias": "model-00001-of-00002.safetensors",
+        "transformer.h.16.ln_1.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.16.ln_1.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.16.ln_2.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.16.ln_2.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.16.mlp.c_fc.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.16.mlp.c_fc.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.16.mlp.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.16.mlp.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.17.attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.17.attn.c_attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.17.attn.c_attn.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.17.attn.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.17.attn.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.17.attn.masked_bias": "model-00001-of-00002.safetensors",
+        "transformer.h.17.ln_1.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.17.ln_1.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.17.ln_2.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.17.ln_2.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.17.mlp.c_fc.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.17.mlp.c_fc.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.17.mlp.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.17.mlp.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.18.attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.18.attn.c_attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.18.attn.c_attn.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.18.attn.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.18.attn.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.18.attn.masked_bias": "model-00001-of-00002.safetensors",
+        "transformer.h.18.ln_1.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.18.ln_1.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.18.ln_2.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.18.ln_2.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.18.mlp.c_fc.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.18.mlp.c_fc.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.18.mlp.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.18.mlp.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.19.attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.19.attn.c_attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.19.attn.c_attn.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.19.attn.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.19.attn.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.19.attn.masked_bias": "model-00001-of-00002.safetensors",
+        "transformer.h.19.ln_1.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.19.ln_1.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.19.ln_2.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.19.ln_2.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.19.mlp.c_fc.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.19.mlp.c_fc.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.19.mlp.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.19.mlp.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.2.attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.2.attn.c_attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.2.attn.c_attn.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.2.attn.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.2.attn.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.2.attn.masked_bias": "model-00001-of-00002.safetensors",
+        "transformer.h.2.ln_1.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.2.ln_1.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.2.ln_2.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.2.ln_2.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.2.mlp.c_fc.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.2.mlp.c_fc.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.2.mlp.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.2.mlp.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.20.attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.20.attn.c_attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.20.attn.c_attn.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.20.attn.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.20.attn.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.20.attn.masked_bias": "model-00001-of-00002.safetensors",
+        "transformer.h.20.ln_1.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.20.ln_1.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.20.ln_2.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.20.ln_2.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.20.mlp.c_fc.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.20.mlp.c_fc.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.20.mlp.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.20.mlp.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.21.attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.21.attn.c_attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.21.attn.c_attn.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.21.attn.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.21.attn.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.21.attn.masked_bias": "model-00001-of-00002.safetensors",
+        "transformer.h.21.ln_1.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.21.ln_1.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.21.ln_2.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.21.ln_2.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.21.mlp.c_fc.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.21.mlp.c_fc.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.21.mlp.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.21.mlp.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.22.attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.22.attn.c_attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.22.attn.c_attn.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.22.attn.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.22.attn.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.22.attn.masked_bias": "model-00001-of-00002.safetensors",
+        "transformer.h.22.ln_1.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.22.ln_1.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.22.ln_2.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.22.ln_2.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.22.mlp.c_fc.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.22.mlp.c_fc.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.22.mlp.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.22.mlp.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.23.attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.23.attn.c_attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.23.attn.c_attn.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.23.attn.c_proj.bias": "model-00002-of-00002.safetensors",
+        "transformer.h.23.attn.c_proj.weight": "model-00002-of-00002.safetensors",
+        "transformer.h.23.attn.masked_bias": "model-00001-of-00002.safetensors",
+        "transformer.h.23.ln_1.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.23.ln_1.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.23.ln_2.bias": "model-00002-of-00002.safetensors",
+        "transformer.h.23.ln_2.weight": "model-00002-of-00002.safetensors",
+        "transformer.h.23.mlp.c_fc.bias": "model-00002-of-00002.safetensors",
+        "transformer.h.23.mlp.c_fc.weight": "model-00002-of-00002.safetensors",
+        "transformer.h.23.mlp.c_proj.bias": "model-00002-of-00002.safetensors",
+        "transformer.h.23.mlp.c_proj.weight": "model-00002-of-00002.safetensors",
+        "transformer.h.3.attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.3.attn.c_attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.3.attn.c_attn.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.3.attn.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.3.attn.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.3.attn.masked_bias": "model-00001-of-00002.safetensors",
+        "transformer.h.3.ln_1.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.3.ln_1.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.3.ln_2.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.3.ln_2.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.3.mlp.c_fc.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.3.mlp.c_fc.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.3.mlp.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.3.mlp.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.4.attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.4.attn.c_attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.4.attn.c_attn.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.4.attn.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.4.attn.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.4.attn.masked_bias": "model-00001-of-00002.safetensors",
+        "transformer.h.4.ln_1.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.4.ln_1.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.4.ln_2.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.4.ln_2.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.4.mlp.c_fc.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.4.mlp.c_fc.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.4.mlp.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.4.mlp.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.5.attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.5.attn.c_attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.5.attn.c_attn.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.5.attn.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.5.attn.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.5.attn.masked_bias": "model-00001-of-00002.safetensors",
+        "transformer.h.5.ln_1.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.5.ln_1.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.5.ln_2.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.5.ln_2.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.5.mlp.c_fc.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.5.mlp.c_fc.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.5.mlp.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.5.mlp.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.6.attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.6.attn.c_attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.6.attn.c_attn.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.6.attn.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.6.attn.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.6.attn.masked_bias": "model-00001-of-00002.safetensors",
+        "transformer.h.6.ln_1.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.6.ln_1.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.6.ln_2.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.6.ln_2.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.6.mlp.c_fc.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.6.mlp.c_fc.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.6.mlp.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.6.mlp.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.7.attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.7.attn.c_attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.7.attn.c_attn.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.7.attn.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.7.attn.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.7.attn.masked_bias": "model-00001-of-00002.safetensors",
+        "transformer.h.7.ln_1.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.7.ln_1.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.7.ln_2.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.7.ln_2.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.7.mlp.c_fc.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.7.mlp.c_fc.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.7.mlp.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.7.mlp.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.8.attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.8.attn.c_attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.8.attn.c_attn.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.8.attn.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.8.attn.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.8.attn.masked_bias": "model-00001-of-00002.safetensors",
+        "transformer.h.8.ln_1.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.8.ln_1.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.8.ln_2.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.8.ln_2.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.8.mlp.c_fc.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.8.mlp.c_fc.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.8.mlp.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.8.mlp.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.9.attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.9.attn.c_attn.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.9.attn.c_attn.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.9.attn.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.9.attn.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.9.attn.masked_bias": "model-00001-of-00002.safetensors",
+        "transformer.h.9.ln_1.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.9.ln_1.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.9.ln_2.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.9.ln_2.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.9.mlp.c_fc.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.9.mlp.c_fc.weight": "model-00001-of-00002.safetensors",
+        "transformer.h.9.mlp.c_proj.bias": "model-00001-of-00002.safetensors",
+        "transformer.h.9.mlp.c_proj.weight": "model-00001-of-00002.safetensors",
+        "transformer.ln_f.bias": "model-00002-of-00002.safetensors",
+        "transformer.ln_f.weight": "model-00002-of-00002.safetensors",
+        "transformer.wpe.weight": "model-00001-of-00002.safetensors",
+        "transformer.wte.weight": "model-00001-of-00002.safetensors"
+    }
+}
--- a/pytorch_model-00001-of-00002.bin
+++ b/pytorch_model-00001-of-00002.bin
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:460fb533573809670588cad8539eb50863af11063823696f63a182d4cb58cd67
+size 9994100913
--- a/pytorch_model-00002-of-00002.bin
+++ b/pytorch_model-00002-of-00002.bin
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:30e07d2adb75b5d7e592c435850ed219130dfb95408a78364a5b9a341976caaa
+size 713781205
--- a/pytorch_model.bin.index.json
+++ b/pytorch_model.bin.index.json
@@ -0,0 +1,348 @@
+{
+  "metadata": {
+    "total_size": 10707763248
+  },
+  "weight_map": {
+    "lm_head.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.0.attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.0.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.0.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.0.attn.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.0.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.0.attn.masked_bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.0.ln_1.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.0.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.0.ln_2.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.0.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.0.mlp.c_fc.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.0.mlp.c_fc.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.0.mlp.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.0.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.1.attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.1.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.1.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.1.attn.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.1.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.1.attn.masked_bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.1.ln_1.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.1.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.1.ln_2.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.1.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.1.mlp.c_fc.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.1.mlp.c_fc.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.1.mlp.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.1.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.10.attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.10.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.10.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.10.attn.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.10.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.10.attn.masked_bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.10.ln_1.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.10.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.10.ln_2.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.10.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.10.mlp.c_fc.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.10.mlp.c_fc.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.10.mlp.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.10.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.11.attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.11.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.11.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.11.attn.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.11.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.11.attn.masked_bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.11.ln_1.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.11.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.11.ln_2.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.11.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.11.mlp.c_fc.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.11.mlp.c_fc.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.11.mlp.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.11.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.12.attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.12.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.12.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.12.attn.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.12.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.12.attn.masked_bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.12.ln_1.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.12.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.12.ln_2.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.12.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.12.mlp.c_fc.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.12.mlp.c_fc.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.12.mlp.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.12.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.13.attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.13.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.13.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.13.attn.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.13.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.13.attn.masked_bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.13.ln_1.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.13.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.13.ln_2.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.13.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.13.mlp.c_fc.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.13.mlp.c_fc.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.13.mlp.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.13.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.14.attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.14.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.14.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.14.attn.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.14.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.14.attn.masked_bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.14.ln_1.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.14.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.14.ln_2.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.14.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.14.mlp.c_fc.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.14.mlp.c_fc.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.14.mlp.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.14.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.15.attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.15.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.15.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.15.attn.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.15.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.15.attn.masked_bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.15.ln_1.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.15.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.15.ln_2.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.15.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.15.mlp.c_fc.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.15.mlp.c_fc.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.15.mlp.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.15.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.16.attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.16.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.16.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.16.attn.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.16.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.16.attn.masked_bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.16.ln_1.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.16.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.16.ln_2.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.16.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.16.mlp.c_fc.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.16.mlp.c_fc.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.16.mlp.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.16.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.17.attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.17.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.17.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.17.attn.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.17.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.17.attn.masked_bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.17.ln_1.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.17.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.17.ln_2.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.17.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.17.mlp.c_fc.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.17.mlp.c_fc.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.17.mlp.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.17.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.18.attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.18.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.18.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.18.attn.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.18.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.18.attn.masked_bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.18.ln_1.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.18.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.18.ln_2.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.18.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.18.mlp.c_fc.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.18.mlp.c_fc.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.18.mlp.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.18.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.19.attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.19.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.19.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.19.attn.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.19.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.19.attn.masked_bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.19.ln_1.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.19.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.19.ln_2.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.19.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.19.mlp.c_fc.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.19.mlp.c_fc.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.19.mlp.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.19.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.2.attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.2.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.2.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.2.attn.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.2.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.2.attn.masked_bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.2.ln_1.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.2.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.2.ln_2.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.2.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.2.mlp.c_fc.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.2.mlp.c_fc.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.2.mlp.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.2.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.20.attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.20.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.20.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.20.attn.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.20.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.20.attn.masked_bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.20.ln_1.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.20.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.20.ln_2.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.20.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.20.mlp.c_fc.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.20.mlp.c_fc.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.20.mlp.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.20.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.21.attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.21.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.21.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.21.attn.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.21.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.21.attn.masked_bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.21.ln_1.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.21.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.21.ln_2.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.21.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.21.mlp.c_fc.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.21.mlp.c_fc.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.21.mlp.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.21.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.22.attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.22.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.22.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.22.attn.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.22.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.22.attn.masked_bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.22.ln_1.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.22.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.22.ln_2.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.22.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.22.mlp.c_fc.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.22.mlp.c_fc.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.22.mlp.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.22.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.23.attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.23.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.23.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.23.attn.c_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.23.attn.c_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.23.attn.masked_bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.23.ln_1.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.23.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.23.ln_2.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.23.ln_2.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.23.mlp.c_fc.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.23.mlp.c_fc.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.23.mlp.c_proj.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.23.mlp.c_proj.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.h.3.attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.3.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.3.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.3.attn.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.3.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.3.attn.masked_bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.3.ln_1.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.3.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.3.ln_2.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.3.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.3.mlp.c_fc.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.3.mlp.c_fc.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.3.mlp.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.3.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.4.attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.4.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.4.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.4.attn.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.4.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.4.attn.masked_bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.4.ln_1.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.4.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.4.ln_2.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.4.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.4.mlp.c_fc.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.4.mlp.c_fc.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.4.mlp.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.4.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.5.attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.5.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.5.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.5.attn.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.5.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.5.attn.masked_bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.5.ln_1.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.5.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.5.ln_2.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.5.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.5.mlp.c_fc.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.5.mlp.c_fc.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.5.mlp.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.5.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.6.attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.6.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.6.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.6.attn.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.6.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.6.attn.masked_bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.6.ln_1.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.6.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.6.ln_2.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.6.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.6.mlp.c_fc.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.6.mlp.c_fc.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.6.mlp.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.6.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.7.attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.7.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.7.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.7.attn.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.7.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.7.attn.masked_bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.7.ln_1.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.7.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.7.ln_2.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.7.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.7.mlp.c_fc.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.7.mlp.c_fc.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.7.mlp.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.7.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.8.attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.8.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.8.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.8.attn.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.8.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.8.attn.masked_bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.8.ln_1.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.8.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.8.ln_2.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.8.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.8.mlp.c_fc.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.8.mlp.c_fc.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.8.mlp.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.8.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.9.attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.9.attn.c_attn.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.9.attn.c_attn.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.9.attn.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.9.attn.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.9.attn.masked_bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.9.ln_1.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.9.ln_1.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.9.ln_2.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.9.ln_2.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.9.mlp.c_fc.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.9.mlp.c_fc.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.9.mlp.c_proj.bias": "pytorch_model-00001-of-00002.bin",
+    "transformer.h.9.mlp.c_proj.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.ln_f.bias": "pytorch_model-00002-of-00002.bin",
+    "transformer.ln_f.weight": "pytorch_model-00002-of-00002.bin",
+    "transformer.wpe.weight": "pytorch_model-00001-of-00002.bin",
+    "transformer.wte.weight": "pytorch_model-00001-of-00002.bin"
+  }
+}
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,5 @@
+{
+  "bos_token": "<|endoftext|>",
+  "eos_token": "<|endoftext|>",
+  "unk_token": "<|endoftext|>"
+}
--- a/tokenizer.json
+++ b/tokenizer.json
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:86f12f9648802a0c45c4b87ef2ab235e9bfdf1a43cc40571291507c81e35c4c3
+size 2107625
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,10 @@
+{
+  "add_prefix_space": false,
+  "bos_token": "<|endoftext|>",
+  "eos_token": "<|endoftext|>",
+  "model_max_length": 1024,
+  "name_or_path": "gpt2",
+  "special_tokens_map_file": null,
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "<|endoftext|>"
+}
--- a/vocab.json
+++ b/vocab.json
				`@@ -0,0 +1 @@`
				`{"framework": "pytorch", "task": "text-generation", "allow_remote": true}`