初始化项目，由ModelHub XC社区提供模型

Model: NousResearch/Redmond-Hermes-Coder Source: Original Platform
2026-05-23 19:19:14 +08:00
commit db2d11fcf0
16 changed files with 147919 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,39 @@
+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+pytorch_model-00001-of-00004.bin filter=lfs diff=lfs merge=lfs -text
+pytorch_model-00002-of-00004.bin filter=lfs diff=lfs merge=lfs -text
+pytorch_model-00003-of-00004.bin filter=lfs diff=lfs merge=lfs -text
+pytorch_model-00004-of-00004.bin filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,109 @@
+---
+license: gpl
+language:
+- en
+tags:
+- starcoder
+- wizardcoder
+- code
+- self-instruct
+- distillation
+---
+
+# Model Card: Redmond-Hermes-Coder 15B
+
+## Model Description
+
+Redmond-Hermes-Coder 15B is a state-of-the-art language model fine-tuned on over 300,000 instructions. This model was fine-tuned by Nous Research, with Teknium and Karan4D leading the fine tuning process and dataset curation, Redmond AI sponsoring the compute, and several other contributors.
+
+This model was trained with a WizardCoder base, which itself uses a StarCoder base model. 
+
+The model is truly great at code, but, it does come with a tradeoff though. While far better at code than the original Nous-Hermes built on Llama, it is worse than WizardCoder at pure code benchmarks, like HumanEval.
+
+It comes in at 39% on HumanEval, with WizardCoder at 57%. This is a preliminary experiment, and we are exploring improvements now.
+
+However, it does seem better at non-code than WizardCoder on a variety of things, including writing tasks.
+
+## Model Training
+
+The model was trained almost entirely on synthetic GPT-4 outputs. This includes data from diverse sources such as GPTeacher, the general, roleplay v1&2, code instruct datasets, Nous Instruct & PDACTL (unpublished), CodeAlpaca, Evol_Instruct Uncensored, GPT4-LLM, and Unnatural Instructions. 
+
+Additional data inputs came from Camel-AI's Biology/Physics/Chemistry and Math Datasets, Airoboros' (v1) GPT-4 Dataset, and more from CodeAlpaca. The total volume of data encompassed over 300,000 instructions.
+
+## Collaborators
+The model fine-tuning and the datasets were a collaboration of efforts and resources from members of Nous Research, includingTeknium, Karan4D, Huemin Art, and Redmond AI's generous compute grants. 
+  
+Huge shoutout and acknowledgement is deserved for all the dataset creators who generously share their datasets openly. 
+
+Among the contributors of datasets, GPTeacher was made available by Teknium, Wizard LM by nlpxucan, and the Nous Research Instruct Dataset was provided by Karan4D and HueminArt.  
+The GPT4-LLM and Unnatural Instructions were provided by Microsoft, Airoboros dataset by jondurbin, Camel-AI datasets are from Camel-AI, and CodeAlpaca dataset by Sahil 2801.
+If anyone was left out, please open a thread in the community tab.
+
+## Prompt Format
+
+The model follows the Alpaca prompt format:
+```
+### Instruction:
+
+### Response:
+```
+
+or 
+
+```
+### Instruction:
+
+### Input:
+
+### Response:
+```  
+
+## Resources for Applied Use Cases:
+For an example of a back and forth chatbot using huggingface transformers and discord, check out: https://github.com/teknium1/alpaca-discord  
+For an example of a roleplaying discord bot, check out this: https://github.com/teknium1/alpaca-roleplay-discordbot  
+
+## Future Plans
+The model is currently being uploaded in FP16 format, and there are plans to convert the model to GGML and GPTQ 4bit quantizations. The team is also working on a full benchmark, similar to what was done for GPT4-x-Vicuna. We will try to get in discussions to get the model included in the GPT4All.
+
+## Benchmark Results
+```
+HumanEval: 39%
+|                      Task                      |Version|       Metric        |Value |   |Stderr|
+|------------------------------------------------|------:|---------------------|-----:|---|-----:|
+|arc_challenge                                   |      0|acc                  |0.2858|±  |0.0132|
+|                                                |       |acc_norm             |0.3148|±  |0.0136|
+|arc_easy                                        |      0|acc                  |0.5349|±  |0.0102|
+|                                                |       |acc_norm             |0.5097|±  |0.0103|
+|bigbench_causal_judgement                       |      0|multiple_choice_grade|0.5158|±  |0.0364|
+|bigbench_date_understanding                     |      0|multiple_choice_grade|0.5230|±  |0.0260|
+|bigbench_disambiguation_qa                      |      0|multiple_choice_grade|0.3295|±  |0.0293|
+|bigbench_geometric_shapes                       |      0|multiple_choice_grade|0.1003|±  |0.0159|
+|                                                |       |exact_str_match      |0.0000|±  |0.0000|
+|bigbench_logical_deduction_five_objects         |      0|multiple_choice_grade|0.2260|±  |0.0187|
+|bigbench_logical_deduction_seven_objects        |      0|multiple_choice_grade|0.1957|±  |0.0150|
+|bigbench_logical_deduction_three_objects        |      0|multiple_choice_grade|0.3733|±  |0.0280|
+|bigbench_movie_recommendation                   |      0|multiple_choice_grade|0.3200|±  |0.0209|
+|bigbench_navigate                               |      0|multiple_choice_grade|0.4830|±  |0.0158|
+|bigbench_reasoning_about_colored_objects        |      0|multiple_choice_grade|0.4150|±  |0.0110|
+|bigbench_ruin_names                             |      0|multiple_choice_grade|0.2143|±  |0.0194|
+|bigbench_salient_translation_error_detection    |      0|multiple_choice_grade|0.2926|±  |0.0144|
+|bigbench_snarks                                 |      0|multiple_choice_grade|0.5249|±  |0.0372|
+|bigbench_sports_understanding                   |      0|multiple_choice_grade|0.4817|±  |0.0159|
+|bigbench_temporal_sequences                     |      0|multiple_choice_grade|0.2700|±  |0.0140|
+|bigbench_tracking_shuffled_objects_five_objects |      0|multiple_choice_grade|0.1864|±  |0.0110|
+|bigbench_tracking_shuffled_objects_seven_objects|      0|multiple_choice_grade|0.1349|±  |0.0082|
+|bigbench_tracking_shuffled_objects_three_objects|      0|multiple_choice_grade|0.3733|±  |0.0280|
+|boolq                                           |      1|acc                  |0.5498|±  |0.0087|
+|hellaswag                                       |      0|acc                  |0.3814|±  |0.0048|
+|                                                |       |acc_norm             |0.4677|±  |0.0050|
+|openbookqa                                      |      0|acc                  |0.1960|±  |0.0178|
+|                                                |       |acc_norm             |0.3100|±  |0.0207|
+|piqa                                            |      0|acc                  |0.6600|±  |0.0111|
+|                                                |       |acc_norm             |0.6610|±  |0.0110|
+|winogrande                                      |      0|acc                  |0.5343|±  |0.0140|
+```
+
+## Model Usage
+The model is available for download on Hugging Face. It is suitable for a wide range of language tasks, from generating creative text to understanding and following complex instructions.
+  
+Compute provided by our project sponsor Redmond AI, thank you!!
--- a/added_tokens.json
+++ b/added_tokens.json
@@ -0,0 +1,3 @@
+{
+  "[PAD]": 49152
+}
--- a/config.json
+++ b/config.json
@@ -0,0 +1,39 @@
+{
+  "_name_or_path": "./hermeswizardcoder-step3800/",
+  "activation_function": "gelu",
+  "architectures": [
+    "GPTBigCodeForCausalLM"
+  ],
+  "attention_softmax_in_fp32": true,
+  "attn_pdrop": 0.1,
+  "bos_token_id": 0,
+  "embd_pdrop": 0.1,
+  "eos_token_id": 0,
+  "inference_runner": 0,
+  "initializer_range": 0.02,
+  "layer_norm_epsilon": 1e-05,
+  "max_batch_size": null,
+  "max_sequence_length": null,
+  "model_type": "gpt_bigcode",
+  "multi_query": true,
+  "n_embd": 6144,
+  "n_head": 48,
+  "n_inner": 24576,
+  "n_layer": 40,
+  "n_positions": 8192,
+  "pad_key_length": true,
+  "pre_allocate_kv_cache": false,
+  "resid_pdrop": 0.1,
+  "scale_attention_softmax_in_fp32": true,
+  "scale_attn_weights": true,
+  "summary_activation": null,
+  "summary_first_dropout": 0.1,
+  "summary_proj_to_labels": true,
+  "summary_type": "cls_index",
+  "summary_use_proj": true,
+  "torch_dtype": "float16",
+  "transformers_version": "4.29.2",
+  "use_cache": false,
+  "validate_runner_input": true,
+  "vocab_size": 49153
+}
--- a/configuration.json
+++ b/configuration.json
@@ -0,0 +1 @@
+{"framework": "pytorch", "task": "text-generation", "allow_remote": true}
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,6 @@
+{
+  "_from_model_config": true,
+  "bos_token_id": 0,
+  "eos_token_id": 0,
+  "transformers_version": "4.29.2"
+}
--- a/merges.txt
+++ b/merges.txt
--- a/pytorch_model-00001-of-00004.bin
+++ b/pytorch_model-00001-of-00004.bin
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:4af05737ba28721d6999be329b3cf0aee859983975c4911db3235b87653afa13
+size 9957995189
--- a/pytorch_model-00002-of-00004.bin
+++ b/pytorch_model-00002-of-00004.bin
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:926f98a133e87848b4ff8b5f9e5d6733fd144776df6e008b21276e5f24455994
+size 9857381671
--- a/pytorch_model-00003-of-00004.bin
+++ b/pytorch_model-00003-of-00004.bin
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:92e986d197fc82fb5e834d9f02abaa9c65cc45a6a7681a12505108664ff3ebc3
+size 9857381671
--- a/pytorch_model-00004-of-00004.bin
+++ b/pytorch_model-00004-of-00004.bin
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:d54966bd5baa095ac39ea10d70f01b6ae37448f94f8850f977be8d6c30243a96
+size 1966320293
--- a/pytorch_model.bin.index.json
+++ b/pytorch_model.bin.index.json
@@ -0,0 +1,492 @@
+{
+  "metadata": {
+    "total_size": 31638917120
+  },
+  "weight_map": {
+    "lm_head.weight": "pytorch_model-00004-of-00004.bin",
+    "transformer.h.0.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.0.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.0.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.0.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.0.ln_1.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.0.ln_1.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.0.ln_2.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.0.ln_2.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.0.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.0.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.0.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.0.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.1.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.1.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.1.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.1.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.1.ln_1.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.1.ln_1.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.1.ln_2.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.1.ln_2.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.1.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.1.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.1.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.1.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.10.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.10.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.10.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.10.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.10.ln_1.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.10.ln_1.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.10.ln_2.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.10.ln_2.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.10.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.10.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.10.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.10.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.11.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.11.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.11.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.11.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.11.ln_1.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.11.ln_1.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.11.ln_2.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.11.ln_2.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.11.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.11.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.11.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.11.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.12.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.12.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.12.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.12.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.12.ln_1.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.12.ln_1.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.12.ln_2.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.12.ln_2.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.12.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.12.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.12.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.12.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.13.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.13.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.13.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.13.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.13.ln_1.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.13.ln_1.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.13.ln_2.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.13.ln_2.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.13.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.13.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.13.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.13.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.14.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.14.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.14.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.14.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.14.ln_1.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.14.ln_1.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.14.ln_2.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.14.ln_2.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.14.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.14.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.14.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.14.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.15.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.15.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.15.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.15.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.15.ln_1.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.15.ln_1.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.15.ln_2.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.15.ln_2.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.15.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.15.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.15.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.15.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.16.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.16.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.16.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.16.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.16.ln_1.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.16.ln_1.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.16.ln_2.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.16.ln_2.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.16.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.16.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.16.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.16.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.17.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.17.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.17.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.17.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.17.ln_1.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.17.ln_1.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.17.ln_2.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.17.ln_2.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.17.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.17.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.17.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.17.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.18.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.18.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.18.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.18.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.18.ln_1.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.18.ln_1.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.18.ln_2.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.18.ln_2.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.18.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.18.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.18.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.18.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.19.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.19.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.19.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.19.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.19.ln_1.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.19.ln_1.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.19.ln_2.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.19.ln_2.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.19.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.19.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.19.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.19.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.2.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.2.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.2.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.2.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.2.ln_1.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.2.ln_1.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.2.ln_2.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.2.ln_2.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.2.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.2.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.2.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.2.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.20.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.20.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.20.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.20.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.20.ln_1.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.20.ln_1.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.20.ln_2.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.20.ln_2.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.20.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.20.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.20.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.20.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.21.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.21.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.21.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.21.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.21.ln_1.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.21.ln_1.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.21.ln_2.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.21.ln_2.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.21.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.21.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.21.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.21.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.22.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.22.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.22.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.22.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.22.ln_1.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.22.ln_1.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.22.ln_2.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.22.ln_2.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.22.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.22.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.22.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.22.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.23.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.23.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.23.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.23.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.23.ln_1.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.23.ln_1.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.23.ln_2.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.23.ln_2.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.23.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.23.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.23.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.23.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.24.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.24.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.24.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.24.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.24.ln_1.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.24.ln_1.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.24.ln_2.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.24.ln_2.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.24.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.24.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.24.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.24.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.25.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.25.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.25.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.25.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.25.ln_1.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.25.ln_1.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.25.ln_2.bias": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.25.ln_2.weight": "pytorch_model-00002-of-00004.bin",
+    "transformer.h.25.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.25.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.25.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.25.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.26.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.26.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.26.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.26.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.26.ln_1.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.26.ln_1.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.26.ln_2.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.26.ln_2.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.26.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.26.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.26.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.26.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.27.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.27.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.27.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.27.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.27.ln_1.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.27.ln_1.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.27.ln_2.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.27.ln_2.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.27.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.27.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.27.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.27.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.28.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.28.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.28.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.28.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.28.ln_1.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.28.ln_1.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.28.ln_2.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.28.ln_2.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.28.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.28.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.28.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.28.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.29.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.29.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.29.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.29.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.29.ln_1.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.29.ln_1.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.29.ln_2.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.29.ln_2.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.29.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.29.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.29.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.29.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.3.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.3.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.3.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.3.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.3.ln_1.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.3.ln_1.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.3.ln_2.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.3.ln_2.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.3.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.3.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.3.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.3.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.30.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.30.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.30.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.30.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.30.ln_1.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.30.ln_1.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.30.ln_2.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.30.ln_2.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.30.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.30.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.30.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.30.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.31.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.31.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.31.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.31.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.31.ln_1.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.31.ln_1.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.31.ln_2.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.31.ln_2.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.31.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.31.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.31.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.31.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.32.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.32.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.32.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.32.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.32.ln_1.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.32.ln_1.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.32.ln_2.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.32.ln_2.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.32.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.32.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.32.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.32.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.33.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.33.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.33.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.33.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.33.ln_1.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.33.ln_1.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.33.ln_2.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.33.ln_2.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.33.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.33.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.33.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.33.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.34.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.34.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.34.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.34.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.34.ln_1.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.34.ln_1.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.34.ln_2.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.34.ln_2.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.34.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.34.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.34.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.34.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.35.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.35.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.35.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.35.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.35.ln_1.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.35.ln_1.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.35.ln_2.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.35.ln_2.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.35.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.35.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.35.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.35.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.36.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.36.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.36.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.36.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.36.ln_1.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.36.ln_1.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.36.ln_2.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.36.ln_2.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.36.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.36.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.36.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.36.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.37.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.37.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.37.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.37.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.37.ln_1.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.37.ln_1.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.37.ln_2.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.37.ln_2.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.37.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.37.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.37.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.37.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.38.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.38.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.38.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.38.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.38.ln_1.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.38.ln_1.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.38.ln_2.bias": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.38.ln_2.weight": "pytorch_model-00003-of-00004.bin",
+    "transformer.h.38.mlp.c_fc.bias": "pytorch_model-00004-of-00004.bin",
+    "transformer.h.38.mlp.c_fc.weight": "pytorch_model-00004-of-00004.bin",
+    "transformer.h.38.mlp.c_proj.bias": "pytorch_model-00004-of-00004.bin",
+    "transformer.h.38.mlp.c_proj.weight": "pytorch_model-00004-of-00004.bin",
+    "transformer.h.39.attn.c_attn.bias": "pytorch_model-00004-of-00004.bin",
+    "transformer.h.39.attn.c_attn.weight": "pytorch_model-00004-of-00004.bin",
+    "transformer.h.39.attn.c_proj.bias": "pytorch_model-00004-of-00004.bin",
+    "transformer.h.39.attn.c_proj.weight": "pytorch_model-00004-of-00004.bin",
+    "transformer.h.39.ln_1.bias": "pytorch_model-00004-of-00004.bin",
+    "transformer.h.39.ln_1.weight": "pytorch_model-00004-of-00004.bin",
+    "transformer.h.39.ln_2.bias": "pytorch_model-00004-of-00004.bin",
+    "transformer.h.39.ln_2.weight": "pytorch_model-00004-of-00004.bin",
+    "transformer.h.39.mlp.c_fc.bias": "pytorch_model-00004-of-00004.bin",
+    "transformer.h.39.mlp.c_fc.weight": "pytorch_model-00004-of-00004.bin",
+    "transformer.h.39.mlp.c_proj.bias": "pytorch_model-00004-of-00004.bin",
+    "transformer.h.39.mlp.c_proj.weight": "pytorch_model-00004-of-00004.bin",
+    "transformer.h.4.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.4.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.4.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.4.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.4.ln_1.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.4.ln_1.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.4.ln_2.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.4.ln_2.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.4.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.4.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.4.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.4.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.5.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.5.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.5.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.5.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.5.ln_1.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.5.ln_1.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.5.ln_2.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.5.ln_2.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.5.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.5.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.5.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.5.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.6.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.6.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.6.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.6.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.6.ln_1.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.6.ln_1.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.6.ln_2.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.6.ln_2.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.6.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.6.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.6.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.6.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.7.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.7.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.7.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.7.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.7.ln_1.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.7.ln_1.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.7.ln_2.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.7.ln_2.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.7.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.7.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.7.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.7.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.8.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.8.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.8.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.8.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.8.ln_1.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.8.ln_1.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.8.ln_2.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.8.ln_2.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.8.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.8.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.8.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.8.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.9.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.9.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.9.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.9.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.9.ln_1.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.9.ln_1.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.9.ln_2.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.9.ln_2.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.9.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.9.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.9.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
+    "transformer.h.9.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.ln_f.bias": "pytorch_model-00004-of-00004.bin",
+    "transformer.ln_f.weight": "pytorch_model-00004-of-00004.bin",
+    "transformer.wpe.weight": "pytorch_model-00001-of-00004.bin",
+    "transformer.wte.weight": "pytorch_model-00001-of-00004.bin"
+  }
+}
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,27 @@
+{
+  "additional_special_tokens": [
+    "<|endoftext|>",
+    "<fim_prefix>",
+    "<fim_middle>",
+    "<fim_suffix>",
+    "<fim_pad>",
+    "<filename>",
+    "<gh_stars>",
+    "<issue_start>",
+    "<issue_comment>",
+    "<issue_closed>",
+    "<jupyter_start>",
+    "<jupyter_text>",
+    "<jupyter_code>",
+    "<jupyter_output>",
+    "<empty_output>",
+    "<commit_before>",
+    "<commit_msg>",
+    "<commit_after>",
+    "<reponame>"
+  ],
+  "bos_token": "<|endoftext|>",
+  "eos_token": "<|endoftext|>",
+  "pad_token": "[PAD]",
+  "unk_token": "<|endoftext|>"
+}
--- a/tokenizer.json
+++ b/tokenizer.json
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,32 @@
+{
+  "add_prefix_space": false,
+  "additional_special_tokens": [
+    "<|endoftext|>",
+    "<fim_prefix>",
+    "<fim_middle>",
+    "<fim_suffix>",
+    "<fim_pad>",
+    "<filename>",
+    "<gh_stars>",
+    "<issue_start>",
+    "<issue_comment>",
+    "<issue_closed>",
+    "<jupyter_start>",
+    "<jupyter_text>",
+    "<jupyter_code>",
+    "<jupyter_output>",
+    "<empty_output>",
+    "<commit_before>",
+    "<commit_msg>",
+    "<commit_after>",
+    "<reponame>"
+  ],
+  "bos_token": "<|endoftext|>",
+  "clean_up_tokenization_spaces": true,
+  "eos_token": "<|endoftext|>",
+  "model_max_length": 2048,
+  "padding_side": "right",
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "<|endoftext|>",
+  "vocab_size": 49152
+}
--- a/vocab.json
+++ b/vocab.json
				`@@ -0,0 +1 @@`
				`{"framework": "pytorch", "task": "text-generation", "allow_remote": true}`