初始化项目，由ModelHub XC社区提供模型

Model: NousResearch/Redmond-Hermes-Coder Source: Original Platform
2026-05-23 19:19:14 +08:00
commit db2d11fcf0
16 changed files with 147919 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,39 @@
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.bz2 filter=lfs diff=lfs merge=lfs -text
 *.ckpt filter=lfs diff=lfs merge=lfs -text
 *.ftz filter=lfs diff=lfs merge=lfs -text
 *.gz filter=lfs diff=lfs merge=lfs -text
 *.h5 filter=lfs diff=lfs merge=lfs -text
 *.joblib filter=lfs diff=lfs merge=lfs -text
 *.lfs.* filter=lfs diff=lfs merge=lfs -text
 *.mlmodel filter=lfs diff=lfs merge=lfs -text
 *.model filter=lfs diff=lfs merge=lfs -text
 *.msgpack filter=lfs diff=lfs merge=lfs -text
 *.npy filter=lfs diff=lfs merge=lfs -text
 *.npz filter=lfs diff=lfs merge=lfs -text
 *.onnx filter=lfs diff=lfs merge=lfs -text
 *.ot filter=lfs diff=lfs merge=lfs -text
 *.parquet filter=lfs diff=lfs merge=lfs -text
 *.pb filter=lfs diff=lfs merge=lfs -text
 *.pickle filter=lfs diff=lfs merge=lfs -text
 *.pkl filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
 *.rar filter=lfs diff=lfs merge=lfs -text
 *.safetensors filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
 *.tar filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text
 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 pytorch_model-00001-of-00004.bin filter=lfs diff=lfs merge=lfs -text
 pytorch_model-00002-of-00004.bin filter=lfs diff=lfs merge=lfs -text
 pytorch_model-00003-of-00004.bin filter=lfs diff=lfs merge=lfs -text
 pytorch_model-00004-of-00004.bin filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,109 @@
 ---
 license: gpl
 language:
 - en
 tags:
 - starcoder
 - wizardcoder
 - code
 - self-instruct
 - distillation
 ---
 # Model Card: Redmond-Hermes-Coder 15B
 ## Model Description
 Redmond-Hermes-Coder 15B is a state-of-the-art language model fine-tuned on over 300,000 instructions. This model was fine-tuned by Nous Research, with Teknium and Karan4D leading the fine tuning process and dataset curation, Redmond AI sponsoring the compute, and several other contributors.
 This model was trained with a WizardCoder base, which itself uses a StarCoder base model. 
 The model is truly great at code, but, it does come with a tradeoff though. While far better at code than the original Nous-Hermes built on Llama, it is worse than WizardCoder at pure code benchmarks, like HumanEval.
 It comes in at 39% on HumanEval, with WizardCoder at 57%. This is a preliminary experiment, and we are exploring improvements now.
 However, it does seem better at non-code than WizardCoder on a variety of things, including writing tasks.
 ## Model Training
 The model was trained almost entirely on synthetic GPT-4 outputs. This includes data from diverse sources such as GPTeacher, the general, roleplay v1&2, code instruct datasets, Nous Instruct & PDACTL (unpublished), CodeAlpaca, Evol_Instruct Uncensored, GPT4-LLM, and Unnatural Instructions. 
 Additional data inputs came from Camel-AI's Biology/Physics/Chemistry and Math Datasets, Airoboros' (v1) GPT-4 Dataset, and more from CodeAlpaca. The total volume of data encompassed over 300,000 instructions.
 ## Collaborators
 The model fine-tuning and the datasets were a collaboration of efforts and resources from members of Nous Research, includingTeknium, Karan4D, Huemin Art, and Redmond AI's generous compute grants. 
 Huge shoutout and acknowledgement is deserved for all the dataset creators who generously share their datasets openly. 
 Among the contributors of datasets, GPTeacher was made available by Teknium, Wizard LM by nlpxucan, and the Nous Research Instruct Dataset was provided by Karan4D and HueminArt.  
 The GPT4-LLM and Unnatural Instructions were provided by Microsoft, Airoboros dataset by jondurbin, Camel-AI datasets are from Camel-AI, and CodeAlpaca dataset by Sahil 2801.
 If anyone was left out, please open a thread in the community tab.
 ## Prompt Format
 The model follows the Alpaca prompt format:
 ```
 ### Instruction:
 ### Response:
 ```
 or 
 ```
 ### Instruction:
 ### Input:
 ### Response:
 ```  
 ## Resources for Applied Use Cases:
 For an example of a back and forth chatbot using huggingface transformers and discord, check out: https://github.com/teknium1/alpaca-discord  
 For an example of a roleplaying discord bot, check out this: https://github.com/teknium1/alpaca-roleplay-discordbot  
 ## Future Plans
 The model is currently being uploaded in FP16 format, and there are plans to convert the model to GGML and GPTQ 4bit quantizations. The team is also working on a full benchmark, similar to what was done for GPT4-x-Vicuna. We will try to get in discussions to get the model included in the GPT4All.
 ## Benchmark Results
 ```
 HumanEval: 39%
 |                      Task                      |Version|       Metric        |Value |   |Stderr|
 |------------------------------------------------|------:|---------------------|-----:|---|-----:|
 |arc_challenge                                   |      0|acc                  |0.2858|±  |0.0132|
 |                                                |       |acc_norm             |0.3148|±  |0.0136|
 |arc_easy                                        |      0|acc                  |0.5349|±  |0.0102|
 |                                                |       |acc_norm             |0.5097|±  |0.0103|
 |bigbench_causal_judgement                       |      0|multiple_choice_grade|0.5158|±  |0.0364|
 |bigbench_date_understanding                     |      0|multiple_choice_grade|0.5230|±  |0.0260|
 |bigbench_disambiguation_qa                      |      0|multiple_choice_grade|0.3295|±  |0.0293|
 |bigbench_geometric_shapes                       |      0|multiple_choice_grade|0.1003|±  |0.0159|
 |                                                |       |exact_str_match      |0.0000|±  |0.0000|
 |bigbench_logical_deduction_five_objects         |      0|multiple_choice_grade|0.2260|±  |0.0187|
 |bigbench_logical_deduction_seven_objects        |      0|multiple_choice_grade|0.1957|±  |0.0150|
 |bigbench_logical_deduction_three_objects        |      0|multiple_choice_grade|0.3733|±  |0.0280|
 |bigbench_movie_recommendation                   |      0|multiple_choice_grade|0.3200|±  |0.0209|
 |bigbench_navigate                               |      0|multiple_choice_grade|0.4830|±  |0.0158|
 |bigbench_reasoning_about_colored_objects        |      0|multiple_choice_grade|0.4150|±  |0.0110|
 |bigbench_ruin_names                             |      0|multiple_choice_grade|0.2143|±  |0.0194|
 |bigbench_salient_translation_error_detection    |      0|multiple_choice_grade|0.2926|±  |0.0144|
 |bigbench_snarks                                 |      0|multiple_choice_grade|0.5249|±  |0.0372|
 |bigbench_sports_understanding                   |      0|multiple_choice_grade|0.4817|±  |0.0159|
 |bigbench_temporal_sequences                     |      0|multiple_choice_grade|0.2700|±  |0.0140|
 |bigbench_tracking_shuffled_objects_five_objects |      0|multiple_choice_grade|0.1864|±  |0.0110|
 |bigbench_tracking_shuffled_objects_seven_objects|      0|multiple_choice_grade|0.1349|±  |0.0082|
 |bigbench_tracking_shuffled_objects_three_objects|      0|multiple_choice_grade|0.3733|±  |0.0280|
 |boolq                                           |      1|acc                  |0.5498|±  |0.0087|
 |hellaswag                                       |      0|acc                  |0.3814|±  |0.0048|
 |                                                |       |acc_norm             |0.4677|±  |0.0050|
 |openbookqa                                      |      0|acc                  |0.1960|±  |0.0178|
 |                                                |       |acc_norm             |0.3100|±  |0.0207|
 |piqa                                            |      0|acc                  |0.6600|±  |0.0111|
 |                                                |       |acc_norm             |0.6610|±  |0.0110|
 |winogrande                                      |      0|acc                  |0.5343|±  |0.0140|
 ```
 ## Model Usage
 The model is available for download on Hugging Face. It is suitable for a wide range of language tasks, from generating creative text to understanding and following complex instructions.
 Compute provided by our project sponsor Redmond AI, thank you!!
--- a/added_tokens.json
+++ b/added_tokens.json
@@ -0,0 +1,3 @@
 {
  "[PAD]": 49152
 }
--- a/config.json
+++ b/config.json
@@ -0,0 +1,39 @@
 {
  "_name_or_path": "./hermeswizardcoder-step3800/",
  "activation_function": "gelu",
  "architectures": [
    "GPTBigCodeForCausalLM"
  ],
  "attention_softmax_in_fp32": true,
  "attn_pdrop": 0.1,
  "bos_token_id": 0,
  "embd_pdrop": 0.1,
  "eos_token_id": 0,
  "inference_runner": 0,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "max_batch_size": null,
  "max_sequence_length": null,
  "model_type": "gpt_bigcode",
  "multi_query": true,
  "n_embd": 6144,
  "n_head": 48,
  "n_inner": 24576,
  "n_layer": 40,
  "n_positions": 8192,
  "pad_key_length": true,
  "pre_allocate_kv_cache": false,
  "resid_pdrop": 0.1,
  "scale_attention_softmax_in_fp32": true,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "torch_dtype": "float16",
  "transformers_version": "4.29.2",
  "use_cache": false,
  "validate_runner_input": true,
  "vocab_size": 49153
 }
--- a/configuration.json
+++ b/configuration.json
@@ -0,0 +1 @@
 {"framework": "pytorch", "task": "text-generation", "allow_remote": true}
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,6 @@
 {
  "_from_model_config": true,
  "bos_token_id": 0,
  "eos_token_id": 0,
  "transformers_version": "4.29.2"
 }
--- a/merges.txt
+++ b/merges.txt
--- a/pytorch_model-00001-of-00004.bin
+++ b/pytorch_model-00001-of-00004.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:4af05737ba28721d6999be329b3cf0aee859983975c4911db3235b87653afa13
 size 9957995189
--- a/pytorch_model-00002-of-00004.bin
+++ b/pytorch_model-00002-of-00004.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:926f98a133e87848b4ff8b5f9e5d6733fd144776df6e008b21276e5f24455994
 size 9857381671
--- a/pytorch_model-00003-of-00004.bin
+++ b/pytorch_model-00003-of-00004.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:92e986d197fc82fb5e834d9f02abaa9c65cc45a6a7681a12505108664ff3ebc3
 size 9857381671
--- a/pytorch_model-00004-of-00004.bin
+++ b/pytorch_model-00004-of-00004.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:d54966bd5baa095ac39ea10d70f01b6ae37448f94f8850f977be8d6c30243a96
 size 1966320293
--- a/pytorch_model.bin.index.json
+++ b/pytorch_model.bin.index.json
@@ -0,0 +1,492 @@
 {
  "metadata": {
    "total_size": 31638917120
  },
  "weight_map": {
    "lm_head.weight": "pytorch_model-00004-of-00004.bin",
    "transformer.h.0.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.0.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.0.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.0.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.0.ln_1.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.0.ln_1.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.0.ln_2.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.0.ln_2.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.0.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.0.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.0.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.0.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.1.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.1.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.1.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.1.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.1.ln_1.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.1.ln_1.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.1.ln_2.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.1.ln_2.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.1.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.1.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.1.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.1.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.10.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.10.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.10.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.10.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.10.ln_1.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.10.ln_1.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.10.ln_2.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.10.ln_2.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.10.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.10.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.10.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.10.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.11.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.11.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.11.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.11.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.11.ln_1.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.11.ln_1.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.11.ln_2.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.11.ln_2.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.11.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.11.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.11.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.11.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.12.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.12.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.12.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.12.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.12.ln_1.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.12.ln_1.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.12.ln_2.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.12.ln_2.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.12.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.12.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.12.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.12.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.13.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.13.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.13.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.13.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.13.ln_1.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.13.ln_1.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.13.ln_2.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.13.ln_2.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.13.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.13.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.13.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.13.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.14.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.14.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.14.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.14.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.14.ln_1.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.14.ln_1.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.14.ln_2.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.14.ln_2.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.14.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.14.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.14.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.14.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.15.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.15.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.15.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.15.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.15.ln_1.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.15.ln_1.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.15.ln_2.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.15.ln_2.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.15.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.15.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.15.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.15.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.16.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.16.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.16.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.16.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.16.ln_1.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.16.ln_1.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.16.ln_2.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.16.ln_2.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.16.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.16.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.16.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.16.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.17.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.17.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.17.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.17.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.17.ln_1.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.17.ln_1.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.17.ln_2.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.17.ln_2.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.17.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.17.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.17.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.17.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.18.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.18.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.18.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.18.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.18.ln_1.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.18.ln_1.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.18.ln_2.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.18.ln_2.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.18.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.18.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.18.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.18.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.19.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.19.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.19.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.19.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.19.ln_1.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.19.ln_1.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.19.ln_2.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.19.ln_2.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.19.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.19.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.19.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.19.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.2.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.2.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.2.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.2.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.2.ln_1.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.2.ln_1.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.2.ln_2.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.2.ln_2.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.2.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.2.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.2.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.2.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.20.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.20.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.20.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.20.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.20.ln_1.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.20.ln_1.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.20.ln_2.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.20.ln_2.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.20.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.20.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.20.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.20.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.21.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.21.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.21.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.21.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.21.ln_1.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.21.ln_1.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.21.ln_2.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.21.ln_2.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.21.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.21.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.21.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.21.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.22.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.22.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.22.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.22.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.22.ln_1.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.22.ln_1.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.22.ln_2.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.22.ln_2.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.22.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.22.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.22.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.22.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.23.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.23.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.23.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.23.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.23.ln_1.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.23.ln_1.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.23.ln_2.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.23.ln_2.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.23.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.23.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.23.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.23.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.24.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.24.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.24.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.24.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.24.ln_1.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.24.ln_1.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.24.ln_2.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.24.ln_2.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.24.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.24.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.24.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.24.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.25.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.25.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.25.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.25.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.25.ln_1.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.25.ln_1.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.25.ln_2.bias": "pytorch_model-00002-of-00004.bin",
    "transformer.h.25.ln_2.weight": "pytorch_model-00002-of-00004.bin",
    "transformer.h.25.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.25.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.25.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.25.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.26.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.26.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.26.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.26.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.26.ln_1.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.26.ln_1.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.26.ln_2.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.26.ln_2.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.26.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.26.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.26.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.26.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.27.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.27.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.27.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.27.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.27.ln_1.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.27.ln_1.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.27.ln_2.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.27.ln_2.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.27.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.27.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.27.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.27.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.28.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.28.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.28.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.28.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.28.ln_1.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.28.ln_1.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.28.ln_2.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.28.ln_2.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.28.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.28.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.28.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.28.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.29.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.29.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.29.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.29.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.29.ln_1.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.29.ln_1.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.29.ln_2.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.29.ln_2.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.29.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.29.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.29.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.29.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.3.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.3.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.3.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.3.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.3.ln_1.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.3.ln_1.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.3.ln_2.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.3.ln_2.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.3.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.3.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.3.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.3.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.30.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.30.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.30.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.30.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.30.ln_1.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.30.ln_1.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.30.ln_2.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.30.ln_2.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.30.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.30.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.30.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.30.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.31.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.31.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.31.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.31.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.31.ln_1.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.31.ln_1.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.31.ln_2.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.31.ln_2.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.31.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.31.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.31.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.31.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.32.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.32.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.32.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.32.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.32.ln_1.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.32.ln_1.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.32.ln_2.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.32.ln_2.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.32.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.32.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.32.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.32.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.33.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.33.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.33.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.33.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.33.ln_1.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.33.ln_1.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.33.ln_2.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.33.ln_2.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.33.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.33.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.33.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.33.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.34.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.34.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.34.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.34.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.34.ln_1.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.34.ln_1.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.34.ln_2.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.34.ln_2.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.34.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.34.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.34.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.34.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.35.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.35.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.35.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.35.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.35.ln_1.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.35.ln_1.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.35.ln_2.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.35.ln_2.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.35.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.35.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.35.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.35.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.36.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.36.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.36.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.36.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.36.ln_1.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.36.ln_1.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.36.ln_2.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.36.ln_2.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.36.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.36.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.36.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.36.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.37.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.37.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.37.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.37.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.37.ln_1.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.37.ln_1.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.37.ln_2.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.37.ln_2.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.37.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.37.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.37.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.37.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.38.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.38.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.38.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.38.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.38.ln_1.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.38.ln_1.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.38.ln_2.bias": "pytorch_model-00003-of-00004.bin",
    "transformer.h.38.ln_2.weight": "pytorch_model-00003-of-00004.bin",
    "transformer.h.38.mlp.c_fc.bias": "pytorch_model-00004-of-00004.bin",
    "transformer.h.38.mlp.c_fc.weight": "pytorch_model-00004-of-00004.bin",
    "transformer.h.38.mlp.c_proj.bias": "pytorch_model-00004-of-00004.bin",
    "transformer.h.38.mlp.c_proj.weight": "pytorch_model-00004-of-00004.bin",
    "transformer.h.39.attn.c_attn.bias": "pytorch_model-00004-of-00004.bin",
    "transformer.h.39.attn.c_attn.weight": "pytorch_model-00004-of-00004.bin",
    "transformer.h.39.attn.c_proj.bias": "pytorch_model-00004-of-00004.bin",
    "transformer.h.39.attn.c_proj.weight": "pytorch_model-00004-of-00004.bin",
    "transformer.h.39.ln_1.bias": "pytorch_model-00004-of-00004.bin",
    "transformer.h.39.ln_1.weight": "pytorch_model-00004-of-00004.bin",
    "transformer.h.39.ln_2.bias": "pytorch_model-00004-of-00004.bin",
    "transformer.h.39.ln_2.weight": "pytorch_model-00004-of-00004.bin",
    "transformer.h.39.mlp.c_fc.bias": "pytorch_model-00004-of-00004.bin",
    "transformer.h.39.mlp.c_fc.weight": "pytorch_model-00004-of-00004.bin",
    "transformer.h.39.mlp.c_proj.bias": "pytorch_model-00004-of-00004.bin",
    "transformer.h.39.mlp.c_proj.weight": "pytorch_model-00004-of-00004.bin",
    "transformer.h.4.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.4.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.4.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.4.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.4.ln_1.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.4.ln_1.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.4.ln_2.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.4.ln_2.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.4.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.4.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.4.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.4.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.5.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.5.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.5.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.5.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.5.ln_1.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.5.ln_1.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.5.ln_2.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.5.ln_2.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.5.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.5.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.5.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.5.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.6.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.6.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.6.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.6.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.6.ln_1.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.6.ln_1.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.6.ln_2.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.6.ln_2.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.6.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.6.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.6.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.6.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.7.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.7.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.7.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.7.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.7.ln_1.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.7.ln_1.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.7.ln_2.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.7.ln_2.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.7.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.7.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.7.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.7.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.8.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.8.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.8.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.8.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.8.ln_1.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.8.ln_1.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.8.ln_2.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.8.ln_2.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.8.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.8.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.8.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.8.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.9.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.9.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.9.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.9.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.9.ln_1.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.9.ln_1.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.9.ln_2.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.9.ln_2.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.9.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.9.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.h.9.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
    "transformer.h.9.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.ln_f.bias": "pytorch_model-00004-of-00004.bin",
    "transformer.ln_f.weight": "pytorch_model-00004-of-00004.bin",
    "transformer.wpe.weight": "pytorch_model-00001-of-00004.bin",
    "transformer.wte.weight": "pytorch_model-00001-of-00004.bin"
  }
 }
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,27 @@
 {
  "additional_special_tokens": [
    "<|endoftext|>",
    "<fim_prefix>",
    "<fim_middle>",
    "<fim_suffix>",
    "<fim_pad>",
    "<filename>",
    "<gh_stars>",
    "<issue_start>",
    "<issue_comment>",
    "<issue_closed>",
    "<jupyter_start>",
    "<jupyter_text>",
    "<jupyter_code>",
    "<jupyter_output>",
    "<empty_output>",
    "<commit_before>",
    "<commit_msg>",
    "<commit_after>",
    "<reponame>"
  ],
  "bos_token": "<|endoftext|>",
  "eos_token": "<|endoftext|>",
  "pad_token": "[PAD]",
  "unk_token": "<|endoftext|>"
 }
--- a/tokenizer.json
+++ b/tokenizer.json
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,32 @@
 {
  "add_prefix_space": false,
  "additional_special_tokens": [
    "<|endoftext|>",
    "<fim_prefix>",
    "<fim_middle>",
    "<fim_suffix>",
    "<fim_pad>",
    "<filename>",
    "<gh_stars>",
    "<issue_start>",
    "<issue_comment>",
    "<issue_closed>",
    "<jupyter_start>",
    "<jupyter_text>",
    "<jupyter_code>",
    "<jupyter_output>",
    "<empty_output>",
    "<commit_before>",
    "<commit_msg>",
    "<commit_after>",
    "<reponame>"
  ],
  "bos_token": "<|endoftext|>",
  "clean_up_tokenization_spaces": true,
  "eos_token": "<|endoftext|>",
  "model_max_length": 2048,
  "padding_side": "right",
  "tokenizer_class": "GPT2Tokenizer",
  "unk_token": "<|endoftext|>",
  "vocab_size": 49152
 }
--- a/vocab.json
+++ b/vocab.json
		`@@ -0,0 +1 @@`
							`{"framework": "pytorch", "task": "text-generation", "allow_remote": true}`