初始化项目，由ModelHub XC社区提供模型

Model: SUSTech/SUS-Chat-34B Source: Original Platform
2026-05-05 01:59:18 +08:00
commit 8fd6cc1221
15 changed files with 175229 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,35 @@
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.bz2 filter=lfs diff=lfs merge=lfs -text
 *.ckpt filter=lfs diff=lfs merge=lfs -text
 *.ftz filter=lfs diff=lfs merge=lfs -text
 *.gz filter=lfs diff=lfs merge=lfs -text
 *.h5 filter=lfs diff=lfs merge=lfs -text
 *.joblib filter=lfs diff=lfs merge=lfs -text
 *.lfs.* filter=lfs diff=lfs merge=lfs -text
 *.mlmodel filter=lfs diff=lfs merge=lfs -text
 *.model filter=lfs diff=lfs merge=lfs -text
 *.msgpack filter=lfs diff=lfs merge=lfs -text
 *.npy filter=lfs diff=lfs merge=lfs -text
 *.npz filter=lfs diff=lfs merge=lfs -text
 *.onnx filter=lfs diff=lfs merge=lfs -text
 *.ot filter=lfs diff=lfs merge=lfs -text
 *.parquet filter=lfs diff=lfs merge=lfs -text
 *.pb filter=lfs diff=lfs merge=lfs -text
 *.pickle filter=lfs diff=lfs merge=lfs -text
 *.pkl filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
 *.rar filter=lfs diff=lfs merge=lfs -text
 *.safetensors filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
 *.tar filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text
 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,415 @@
 ---
 widget:
 - example_title: SUS-Chat
  text: hi
  output:
    text: ' Hello! How can I assist you today?'
 pipeline_tag: text-generation
 license: apache-2.0
 ---
 # 🐷SUS-Chat: Instruction tuning done right
 <p align="left">
 <a href="README_CN.md">中文</a>&nbsp ｜ &nbspEnglish&nbsp
 </p>
 <br><br>
 <div align="center">
 <p align="center">
 <img src="https://github.com/SUSTech-IDEA/SUS-Chat/raw/main/assets/sustech.svg?sanitize=true" width="200px">
 <img src="https://github.com/SUSTech-IDEA/SUS-Chat/raw/main/assets/ccnl.png?sanitize=true" width="200px">
 </p>
 <div style="display: inline-block;">
 <a rel="noopener nofollow" href="https://github.com/SUSTech-IDEA/SUS-Chat/issues">
 <img src="https://img.shields.io/github/issues/SUSTech-IDEA/SUS-Chat?logo=github" style="margin: 0 0;">
 </a>
 </div>
 <div style="display: inline-block;">
 <a href="https://huggingface.co/SUSTech">
 <img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-SUSTech-blue" style="margin: 0 0;">
 </a>
 </div>
 <div style="display: inline-block;">
 <a rel="noopener nofollow" href="https://www.modelscope.cn/organization/sustc/">
 <img src="https://img.shields.io/badge/🤖ModelScope-sustc-blue" style="margin: 0 0;">
 </a>
 </div>
 <a href="https://wisemodel.cn/organization/SUSTech">
 <img src="https://img.shields.io/badge/WiseModel-SUSTech-blue"> </a>
 <div style="display: inline-block;">
 <a rel="noopener nofollow" href="https://github.com/SUSTech-IDEA/SUS-Chat/blob/main/LICENSE">
 <img src="https://img.shields.io/badge/Code_License-Apache_2.0-lightblue" style="margin: 0 0;">
 </a>
 </div>
 <div style="display: inline-block;">
 <a rel="noopener nofollow" href="https://github.com/01-ai/Yi/blob/main/MODEL_LICENSE_AGREEMENT.txt">
 <img src="https://img.shields.io/badge/Model_License-Model_Agreement-lightblue" style="margin: 0 0;">
 </a>
 </div>
 <div style="display: inline-block;">
 <a rel="noopener nofollow" href="mailto:oss@data.sustech.edu.cn">
 <img src="https://img.shields.io/badge/✉️-data@sustech.edu.cn-FFE01B" style="margin: 0 0;">
 </a>
 </div>
 </div>
 # News
 - 2024-1-04: 🔥 `cloudyu` created a series of top ranked
  [MOE](https://huggingface.co/cloudyu/Yi-34Bx2-MoE-60B) based on our
  model
 - 2023-12-09: 🔥 `Tigerbot` variant has been
  [deleted](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard/discussions/438),
  `SUS-Chat-34B` is now the the top-ranked LLaMA model and the
  top-ranked chat model.
 - 2023-12-07: SUS-Chat-34B is now available on
  [WiseModel🧠](https://wisemodel.cn/model/SUSTech/SUS-Chat-34B).
 - 2023-12-06: Try [SUS-Chat-34B
  chat-ui](https://huggingface.co/spaces/SUSTech/SUS-Chat-34B).
 - 2023-12-05: SUS-Chat-34B is now available on
  [ModelScope🤖](https://www.modelscope.cn/models/SUSTC/SUS-Chat-34B/summary)
 - 2023-12-05: SUS-Chat-34B is ranked 2nd in [Open LLM
  leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
  and surpassed all models under 70B.
 - 2023-12-01: SUS-Chat-34B is now available on
  [HuggingFace🤗](https://huggingface.co/SUSTech/SUS-Chat-34B).
 # Introduction
 <img src="https://hackmd.io/_uploads/HJlDtzhBa.png" id="fig-sus"
 alt="Figure 1: DALL·E 2023-12-01 11.03.28 - An imposing, majestic wild boar combined with elements of a futuristic transformer robot. The boar itself should be intricately blended with these tra" />
 **SUS-Chat-34B** is a 34B bilingual Chinese-English dialogue model,
 jointly released by the **[Southern University of Science and
 Technology](https://huggingface.co/SUSTech)** and
 **[IDEA-CCNL](https://huggingface.co/IDEA-CCNL)**. This model is based
 on [`01-ai/Yi-34B`](https://huggingface.co/01-ai/Yi-34B) and has been
 fine-tuned on millions of high-quality, multilingual instruction data.
 While maintaining the strong language capabilities of the base model,
 the SUS-Chat-34B model has improved the model’s response to human
 instructions through high-quality instruction fine-tuning and excels at
 imitating human thought processes through chains of thought. It
 introduces inter-instruction attention sharing in long texts, expanding
 the window size from 4K to 8K, significantly enhancing the usability of
 multi-turn dialogues.
 It has surpassed all models of the same size in almost all benchmark
 tests and is better suited to meet the practical needs of complex
 multilingual tasks. Compared to larger models, SUS-Chat-34B remains
 highly competitive and has achieved state-of-the-art performance in our
 comprehensive evaluations.
 SUS-Chat-34B model has the following highlights:
 1.  Large-scale complex instruction following data: Trained with 1.4
    billion tokens of high-quality complex instruction data, covering
    Chinese and English, multi-turn dialogues, mathematics, reasoning,
    and various other types of instruction data;
 2.  Strong performance in general tasks: The SUS-Chat-34B model excels
    in numerous mainstream Chinese and English tasks, surpassing other
    open-source instruction fine-tuned models of the same parameter
    scale. It also competes well against models with larger parameter
    scales;
 3.  Longer context window and excellent multi-turn dialogue
    capabilities: Currently, SUS-Chat-34B supports an 8K context window,
    and is trained with a large amount of multi-turn instruction and
    single-multi-turn mixed data, demonstrating remarkable capabilities
    in long-text dialogue information focus and instruction follow-up.
 SUS-Chat powerfully demonstrates that through the right instruction
 fine-tuning, academic institutions can achieve better performance
 without increasing model parameters, using open-source datasets and
 models. This bridges the gap between academia and industry in large
 language models and opens new possibilities for collaboration between
 academic and industrial sectors.
 # Performance
 To better evaluate the performance of the SUS-Chat-34B model, we
 conducted assessments across multiple benchmark tests and have
 open-sourced the evaluation framework
 [TLEM](https://huggingface.co/spaces/SUSTech/tlem) to facilitate
 replication and comparison by other researchers.
 In TLEM, we utilized various benchmark tests including MMLU, CMMLU,
 C-Eval, BBH, GSM-8K, and MATH, to measure the model’s knowledge and
 thinking capabilities. In these metrics, the SUS-Chat-34B model achieved
 state-of-the-art performance. Additionally, we incorporated
 [lm-eval](https://github.com/EleutherAI/lm-evaluation-harness) to test
 SUS-Chat and similar models on winogrande, hellaswag, arc, and
 truthful-qa, assessing the model’s common-sense reasoning ability and
 susceptibility to illusions.
 Overall, the SUS-Chat-34B model significantly outperformed models of
 similar scale and achieved the most advanced comprehensive performance.
 <img
 src="https://github.com/SUSTech-IDEA/SUS-Chat/raw/main/assets/radar.png"
 id="fig-bench" alt="Figure 2: Benchmark" />
 <div>
 <table>
 <colgroup>
 <col style="width: 50%" />
 <col style="width: 50%" />
 </colgroup>
 <tbody>
 <tr class="odd">
 <td style="text-align: center;"><div width="50.0%"
 data-layout-align="center">
 <h2 id="english-understanding">English Understanding</h2>
 <table>
 <thead>
 <tr class="header">
 <th style="text-align: right;">Model</th>
 <th style="text-align: center;">mmlu (0-shot)</th>
 </tr>
 </thead>
 <tbody>
 <tr class="odd">
 <td style="text-align: right;">GPT-4</td>
 <td style="text-align: center;">83</td>
 </tr>
 <tr class="even">
 <td style="text-align: right;">SUS-Chat-34B</td>
 <td style="text-align: center;"><u>74.35</u></td>
 </tr>
 <tr class="odd">
 <td style="text-align: right;">Qwen-72b-Chat</td>
 <td style="text-align: center;"><strong>74.52</strong></td>
 </tr>
 <tr class="even">
 <td style="text-align: right;">Deepseek-68b-Chat</td>
 <td style="text-align: center;">69.43</td>
 </tr>
 <tr class="odd">
 <td style="text-align: right;">OrionStar-Yi-34B-Chat</td>
 <td style="text-align: center;">68.51</td>
 </tr>
 <tr class="even">
 <td style="text-align: right;">Yi-34B-Chat</td>
 <td style="text-align: center;">66.96</td>
 </tr>
 </tbody>
 </table>
 </div></td>
 <td style="text-align: center;"><div width="50.0%"
 data-layout-align="center">
 <h2 id="chinese-capabilities">Chinese Capabilities</h2>
 <table>
 <colgroup>
 <col style="width: 34%" />
 <col style="width: 32%" />
 <col style="width: 32%" />
 </colgroup>
 <thead>
 <tr class="header">
 <th style="text-align: right;">Model</th>
 <th style="text-align: center;">cmmlu (0-shot)</th>
 <th style="text-align: center;">C-Eval (0-shot)<a href="#fn1"
 class="footnote-ref" id="fnref1"
 role="doc-noteref"><sup>1</sup></a></th>
 </tr>
 </thead>
 <tbody>
 <tr class="odd">
 <td style="text-align: right;">GPT-4</td>
 <td style="text-align: center;">71</td>
 <td style="text-align: center;">69.9</td>
 </tr>
 <tr class="even">
 <td style="text-align: right;">SUS-Chat-34B</td>
 <td style="text-align: center;"><strong>78.68</strong></td>
 <td style="text-align: center;"><strong>82.42</strong></td>
 </tr>
 <tr class="odd">
 <td style="text-align: right;">Qwen-72b-Chat</td>
 <td style="text-align: center;"><u>77.02</u></td>
 <td style="text-align: center;"><u>77.22</u></td>
 </tr>
 <tr class="even">
 <td style="text-align: right;">Deepseek-68b-Chat</td>
 <td style="text-align: center;">48.51</td>
 <td style="text-align: center;">59.7</td>
 </tr>
 <tr class="odd">
 <td style="text-align: right;">OrionStar-Yi-34B-Chat</td>
 <td style="text-align: center;">66.88</td>
 <td style="text-align: center;">65.13</td>
 </tr>
 <tr class="even">
 <td style="text-align: right;">Yi-34B-Chat</td>
 <td style="text-align: center;">55.16</td>
 <td style="text-align: center;">77.16</td>
 </tr>
 </tbody>
 </table>
 </div></td>
 </tr>
 </tbody>
 </table>
 <section id="footnotes" class="footnotes footnotes-end-of-document"
 role="doc-endnotes">
 <hr />
 <ol>
 <li id="fn1"><p>C-Eval results are evaluated on the validation
 datasets<a href="#fnref1" class="footnote-back"
 role="doc-backlink">↩︎</a></p></li>
 </ol>
 </section>
 </div>
 ## Math & Reasoning
 |                 Model | gsm8k (0-shot) | MATH (0-shot) | BBH (0-shot) |
 |----------------------:|:--------------:|:-------------:|:------------:|
 |                 GPT-4 |      91.4      |     45.8      |     86.7     |
 |          SUS-Chat-34B |   **80.06**    |     28.7      |    67.62     |
 |         Qwen-72b-Chat |  <u>76.57</u>  |   **35.9**    |  **72.63**   |
 |     Deepseek-68b-Chat |     74.45      | <u>29.56</u>  | <u>69.73</u> |
 | OrionStar-Yi-34B-Chat |     54.36      |     12.8      |    62.88     |
 |           Yi-34B-Chat |     63.76      |     10.02     |    61.54     |
 ## More Tasks
 |                 Model | winogrande (5-shot) | arc (25-shot) | hellaswag (10-shot) | TruthfulQA mc1 (0-shot) | TruthfulQA mc2 (0-shot) |
 |----------------------:|:-------------------:|:-------------:|:-------------------:|:-----------------------:|:-----------------------:|
 |                 GPT-4 |          —          |     94.5      |        91.4         |          59.00          |            —            |
 |          SUS-Chat-34B |      **81.22**      | <u>81.54</u>  |        83.79        |        **40.64**        |        **57.47**        |
 |         Qwen-72b-Chat |        76.09        |   **82.10**   |    <u>86.06</u>     |          39.17          |      <u>56.37</u>       |
 |     Deepseek-68b-Chat |    <u>80.58</u>     |     81.29     |      **87.02**      |      <u>40.02</u>       |          50.64          |
 | OrionStar-Yi-34B-Chat |        77.27        |     80.19     |        84.54        |          36.47          |          53.24          |
 |           Yi-34B-Chat |        76.64        |     70.66     |        82.29        |          38.19          |          54.57          |
 ## Overall
 |                 Model |  Average  |
 |----------------------:|:---------:|
 |          SUS-Chat-34B | **69.05** |
 |         Qwen-72b-Chat |   68.41   |
 |     Deepseek-68b-Chat |   62.91   |
 | OrionStar-Yi-34B-Chat |   60.21   |
 |           Yi-34B-Chat |   59.72   |
 To reproduce the results, please start a corresponding vllm server and
 refer to
 [here](https://sustech-tlem.static.hf.space/index.html#start-evaluating-your-model-in-3-line).
 # Usage
 SUS-Chat-34B is a standard LLaMA model and should be seamlessly
 compatible with the LLaMA ecosystem. We provide the following example to
 demonstrate how it can be used for multi-turn dialogues.
 Feel free to [open an
 issue](https://github.com/SUSTech-IDEA/SUS-Chat/issues) if you have any
 questions.
 ``` python
 from transformers import AutoModelForCausalLM, AutoTokenizer # 🤗 Transformers, or 
 # from modelscope import AutoModelForCausalLM, AutoTokenizer # 🤖 ModelScope
 def chat_template(messages):
    history = ""
    for message in messages:
        match message:
            case {"role": "user", "content": message}:
                history += f"### Human: {message}\n\n### Assistant: "
            case {"role": "assistant", "content": message}:
                history += message
    return history
 model_path = "SUSTech/SUS-Chat-34B"
 # model_path = "SUSTC/SUS-Chat-34B" # ModelScope
 tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
 model = AutoModelForCausalLM.from_pretrained(
    model_path, device_map="auto", torch_dtype="auto"
 ).eval()
 messages = [{"role": "user", "content": "hi"}]
 input_ids = tokenizer.encode(
    chat_template(messages), return_tensors="pt", add_special_tokens=False
 ).to("cuda")
 output_ids = model.generate(input_ids.to("cuda"), max_length=256)
 response = tokenizer.decode(
    output_ids[0][input_ids.shape[1] :], skip_special_tokens=False
 )
 messages.append({"role": "assistant", "content": response})
 # Second round
 messages.append({"role": "user", "content": "What is the capital of China?"})
 input_ids = tokenizer.encode(
    chat_template(messages), return_tensors="pt", add_special_tokens=False
 ).to("cuda")
 output_ids = model.generate(input_ids.to("cuda"), max_length=256)
 response = tokenizer.decode(
    output_ids[0][input_ids.shape[1] :], skip_special_tokens=False
 )
 messages.append({"role": "assistant", "content": response})
 ```
 # Limitations
 SUS-Chat has only undergone supervised fine-tuning and has not yet been
 trained on human preference learning. As a result, it may produce
 unreasonable responses in some situations and exacerbate existing issues
 in language models, including hallucinations, non-determinism, and
 cumulative errors. To achieve better performance for downstream tasks,
 we recommend adjusting the generation configuration parameters
 accordingly.
 # Disclaimer
 During the training process, we used data compliance check algorithms to
 ensure the compliance of the training model as much as possible. Due to
 the complexity of the data and the diverse use cases of language models,
 we cannot guarantee that the model will produce correct and reasonable
 outputs in all scenarios. Please be aware that there is still a risk of
 the model generating problematic outputs. We will not be responsible for
 any risks or issues arising from misuse, misguidance, illegal use, and
 related misinformation, as well as data security issues related to the
 model.
 # License
 This model is developed entirely for academic research and free
 commercial use, but it must adhere to the
 [license](https://github.com/01-ai/Yi/blob/main/MODEL_LICENSE_AGREEMENT.txt)
 from [01-ai](https://huggingface.co/01-ai).
--- a/config.json
+++ b/config.json
@@ -0,0 +1,28 @@
 {
  "_name_or_path": "01-ai/Yi-34B",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 7168,
  "initializer_range": 0.02,
  "intermediate_size": 20480,
  "max_position_embeddings": 8192,
  "model_type": "llama",
  "num_attention_heads": 56,
  "num_hidden_layers": 60,
  "num_key_value_heads": 8,
  "pad_token_id": 0,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 5000000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.35.0",
  "use_cache": true,
  "vocab_size": 64000
 }
--- a/pytorch_model-00001-of-00007.bin
+++ b/pytorch_model-00001-of-00007.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:0acac7bd0ea4633a7526bb54d61bce0925b2bf44d47c6a547fd879e9ecb67084
 size 9975374044
--- a/pytorch_model-00002-of-00007.bin
+++ b/pytorch_model-00002-of-00007.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:a86877bf8b41673167312a8497e7d1b877a52800c5bb5d2a6e1c245d24a0f652
 size 9909328458
--- a/pytorch_model-00003-of-00007.bin
+++ b/pytorch_model-00003-of-00007.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:1706550408523bbcff79bc510df61302f2fdab40f26ec67c1c5c40e7945b3481
 size 9747848406
--- a/pytorch_model-00004-of-00007.bin
+++ b/pytorch_model-00004-of-00007.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:a183bdc84d2b59a60783451ada665a28914173ad21284c96991da6eb161cb9a9
 size 9747848406
--- a/pytorch_model-00005-of-00007.bin
+++ b/pytorch_model-00005-of-00007.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:7d3af3b25ed249c687c9925a0591156c6d0a9f7b589c099a1c312f2d3900a1b9
 size 9747848490
--- a/pytorch_model-00006-of-00007.bin
+++ b/pytorch_model-00006-of-00007.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:f210d371b840e393fc4b9c77bc67a46dac91fcfe2b6a085ec830bb97c60cfe13
 size 9938659962
--- a/pytorch_model-00007-of-00007.bin
+++ b/pytorch_model-00007-of-00007.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:301589b2ce96530a0c887ad2c6b5f7512261fc0541ac3785d6c461e85bdb3baa
 size 9711116122
--- a/pytorch_model.bin.index.json
+++ b/pytorch_model.bin.index.json
@@ -0,0 +1,550 @@
 {
  "metadata": {
    "total_size": 68777834496
  },
  "weight_map": {
    "lm_head.weight": "pytorch_model-00007-of-00007.bin",
    "model.embed_tokens.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.10.input_layernorm.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.10.mlp.down_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.10.mlp.gate_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.10.mlp.up_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.10.post_attention_layernorm.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.10.self_attn.k_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.10.self_attn.o_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.10.self_attn.q_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.10.self_attn.v_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.11.input_layernorm.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.11.mlp.down_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.11.mlp.gate_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.11.mlp.up_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.11.post_attention_layernorm.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.11.self_attn.k_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.11.self_attn.o_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.11.self_attn.q_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.11.self_attn.v_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.12.input_layernorm.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.12.mlp.down_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.12.mlp.gate_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.12.mlp.up_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.12.post_attention_layernorm.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.12.self_attn.k_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.12.self_attn.o_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.12.self_attn.q_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.12.self_attn.v_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.13.input_layernorm.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.13.mlp.down_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.13.mlp.gate_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.13.mlp.up_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.13.post_attention_layernorm.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.13.self_attn.k_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.13.self_attn.o_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.13.self_attn.q_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.13.self_attn.v_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.14.input_layernorm.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.14.mlp.down_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.14.mlp.gate_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.14.mlp.up_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.14.post_attention_layernorm.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.14.self_attn.k_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.14.self_attn.o_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.14.self_attn.q_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.14.self_attn.v_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.15.input_layernorm.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.15.mlp.down_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.15.mlp.gate_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.15.mlp.up_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.15.post_attention_layernorm.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.15.self_attn.k_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.15.self_attn.o_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.15.self_attn.q_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.15.self_attn.v_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.16.input_layernorm.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.16.mlp.down_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.16.mlp.gate_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.16.mlp.up_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.16.post_attention_layernorm.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.16.self_attn.k_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.16.self_attn.o_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.16.self_attn.q_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.16.self_attn.v_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.17.input_layernorm.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.17.mlp.down_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.17.mlp.gate_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.17.mlp.up_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.17.post_attention_layernorm.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.17.self_attn.k_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.17.self_attn.o_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.17.self_attn.q_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.17.self_attn.v_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.18.input_layernorm.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.18.mlp.down_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.18.mlp.gate_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.18.mlp.up_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.18.post_attention_layernorm.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.18.self_attn.k_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.18.self_attn.o_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.18.self_attn.q_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.18.self_attn.v_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.19.input_layernorm.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.19.mlp.down_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.19.mlp.gate_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.19.mlp.up_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.19.post_attention_layernorm.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.19.self_attn.k_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.19.self_attn.o_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.19.self_attn.q_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.19.self_attn.v_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.2.mlp.down_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.2.mlp.gate_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.2.mlp.up_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.2.self_attn.o_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.20.input_layernorm.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.20.mlp.down_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.20.mlp.gate_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.20.mlp.up_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.20.post_attention_layernorm.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.20.self_attn.k_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.20.self_attn.o_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.20.self_attn.q_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.20.self_attn.v_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.21.input_layernorm.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.21.mlp.down_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.21.mlp.gate_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.21.mlp.up_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.21.post_attention_layernorm.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.21.self_attn.k_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.21.self_attn.o_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.21.self_attn.q_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.21.self_attn.v_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.22.input_layernorm.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.22.mlp.down_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.22.mlp.gate_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.22.mlp.up_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.22.post_attention_layernorm.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.22.self_attn.k_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.22.self_attn.o_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.22.self_attn.q_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.22.self_attn.v_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.23.input_layernorm.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.23.mlp.down_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.23.mlp.gate_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.23.mlp.up_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.23.post_attention_layernorm.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.23.self_attn.k_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.23.self_attn.o_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.23.self_attn.q_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.23.self_attn.v_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.24.input_layernorm.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.24.mlp.down_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.24.mlp.gate_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.24.mlp.up_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.24.post_attention_layernorm.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.24.self_attn.k_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.24.self_attn.o_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.24.self_attn.q_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.24.self_attn.v_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.25.input_layernorm.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.25.mlp.down_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.25.mlp.gate_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.25.mlp.up_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.25.post_attention_layernorm.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.25.self_attn.k_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.25.self_attn.o_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.25.self_attn.q_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.25.self_attn.v_proj.weight": "pytorch_model-00003-of-00007.bin",
    "model.layers.26.input_layernorm.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.26.mlp.down_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.26.mlp.gate_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.26.mlp.up_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.26.post_attention_layernorm.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.26.self_attn.k_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.26.self_attn.o_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.26.self_attn.q_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.26.self_attn.v_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.27.input_layernorm.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.27.mlp.down_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.27.mlp.gate_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.27.mlp.up_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.27.post_attention_layernorm.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.27.self_attn.k_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.27.self_attn.o_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.27.self_attn.q_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.27.self_attn.v_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.28.input_layernorm.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.28.mlp.down_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.28.mlp.gate_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.28.mlp.up_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.28.post_attention_layernorm.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.28.self_attn.k_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.28.self_attn.o_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.28.self_attn.q_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.28.self_attn.v_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.29.input_layernorm.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.29.mlp.down_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.29.mlp.gate_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.29.mlp.up_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.29.post_attention_layernorm.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.29.self_attn.k_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.29.self_attn.o_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.29.self_attn.q_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.29.self_attn.v_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.3.input_layernorm.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.3.mlp.down_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.3.mlp.gate_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.3.mlp.up_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.3.post_attention_layernorm.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.3.self_attn.k_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.3.self_attn.o_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.3.self_attn.q_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.3.self_attn.v_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.30.input_layernorm.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.30.mlp.down_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.30.mlp.gate_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.30.mlp.up_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.30.post_attention_layernorm.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.30.self_attn.k_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.30.self_attn.o_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.30.self_attn.q_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.30.self_attn.v_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.31.input_layernorm.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.31.mlp.down_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.31.mlp.gate_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.31.mlp.up_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.31.post_attention_layernorm.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.31.self_attn.k_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.31.self_attn.o_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.31.self_attn.q_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.31.self_attn.v_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.32.input_layernorm.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.32.mlp.down_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.32.mlp.gate_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.32.mlp.up_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.32.post_attention_layernorm.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.32.self_attn.k_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.32.self_attn.o_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.32.self_attn.q_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.32.self_attn.v_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.33.input_layernorm.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.33.mlp.down_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.33.mlp.gate_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.33.mlp.up_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.33.post_attention_layernorm.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.33.self_attn.k_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.33.self_attn.o_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.33.self_attn.q_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.33.self_attn.v_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.34.input_layernorm.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.34.mlp.down_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.34.mlp.gate_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.34.mlp.up_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.34.post_attention_layernorm.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.34.self_attn.k_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.34.self_attn.o_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.34.self_attn.q_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.34.self_attn.v_proj.weight": "pytorch_model-00004-of-00007.bin",
    "model.layers.35.input_layernorm.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.35.mlp.down_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.35.mlp.gate_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.35.mlp.up_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.35.post_attention_layernorm.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.35.self_attn.k_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.35.self_attn.o_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.35.self_attn.q_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.35.self_attn.v_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.36.input_layernorm.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.36.mlp.down_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.36.mlp.gate_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.36.mlp.up_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.36.post_attention_layernorm.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.36.self_attn.k_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.36.self_attn.o_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.36.self_attn.q_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.36.self_attn.v_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.37.input_layernorm.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.37.mlp.down_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.37.mlp.gate_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.37.mlp.up_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.37.post_attention_layernorm.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.37.self_attn.k_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.37.self_attn.o_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.37.self_attn.q_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.37.self_attn.v_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.38.input_layernorm.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.38.mlp.down_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.38.mlp.gate_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.38.mlp.up_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.38.post_attention_layernorm.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.38.self_attn.k_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.38.self_attn.o_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.38.self_attn.q_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.38.self_attn.v_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.39.input_layernorm.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.39.mlp.down_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.39.mlp.gate_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.39.mlp.up_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.39.post_attention_layernorm.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.39.self_attn.k_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.39.self_attn.o_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.39.self_attn.q_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.39.self_attn.v_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.4.input_layernorm.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.4.mlp.down_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.4.mlp.gate_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.4.mlp.up_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.4.post_attention_layernorm.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.4.self_attn.k_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.4.self_attn.o_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.4.self_attn.q_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.4.self_attn.v_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.40.input_layernorm.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.40.mlp.down_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.40.mlp.gate_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.40.mlp.up_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.40.post_attention_layernorm.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.40.self_attn.k_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.40.self_attn.o_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.40.self_attn.q_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.40.self_attn.v_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.41.input_layernorm.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.41.mlp.down_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.41.mlp.gate_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.41.mlp.up_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.41.post_attention_layernorm.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.41.self_attn.k_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.41.self_attn.o_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.41.self_attn.q_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.41.self_attn.v_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.42.input_layernorm.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.42.mlp.down_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.42.mlp.gate_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.42.mlp.up_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.42.post_attention_layernorm.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.42.self_attn.k_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.42.self_attn.o_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.42.self_attn.q_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.42.self_attn.v_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.43.input_layernorm.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.43.mlp.down_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.43.mlp.gate_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.43.mlp.up_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.43.post_attention_layernorm.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.43.self_attn.k_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.43.self_attn.o_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.43.self_attn.q_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.43.self_attn.v_proj.weight": "pytorch_model-00005-of-00007.bin",
    "model.layers.44.input_layernorm.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.44.mlp.down_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.44.mlp.gate_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.44.mlp.up_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.44.post_attention_layernorm.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.44.self_attn.k_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.44.self_attn.o_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.44.self_attn.q_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.44.self_attn.v_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.45.input_layernorm.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.45.mlp.down_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.45.mlp.gate_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.45.mlp.up_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.45.post_attention_layernorm.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.45.self_attn.k_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.45.self_attn.o_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.45.self_attn.q_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.45.self_attn.v_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.46.input_layernorm.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.46.mlp.down_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.46.mlp.gate_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.46.mlp.up_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.46.post_attention_layernorm.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.46.self_attn.k_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.46.self_attn.o_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.46.self_attn.q_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.46.self_attn.v_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.47.input_layernorm.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.47.mlp.down_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.47.mlp.gate_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.47.mlp.up_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.47.post_attention_layernorm.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.47.self_attn.k_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.47.self_attn.o_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.47.self_attn.q_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.47.self_attn.v_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.48.input_layernorm.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.48.mlp.down_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.48.mlp.gate_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.48.mlp.up_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.48.post_attention_layernorm.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.48.self_attn.k_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.48.self_attn.o_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.48.self_attn.q_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.48.self_attn.v_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.49.input_layernorm.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.49.mlp.down_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.49.mlp.gate_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.49.mlp.up_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.49.post_attention_layernorm.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.49.self_attn.k_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.49.self_attn.o_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.49.self_attn.q_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.49.self_attn.v_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.5.input_layernorm.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.5.mlp.down_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.5.mlp.gate_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.5.mlp.up_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.5.post_attention_layernorm.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.5.self_attn.k_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.5.self_attn.o_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.5.self_attn.q_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.5.self_attn.v_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.50.input_layernorm.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.50.mlp.down_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.50.mlp.gate_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.50.mlp.up_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.50.post_attention_layernorm.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.50.self_attn.k_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.50.self_attn.o_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.50.self_attn.q_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.50.self_attn.v_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.51.input_layernorm.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.51.mlp.down_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.51.mlp.gate_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.51.mlp.up_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.51.post_attention_layernorm.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.51.self_attn.k_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.51.self_attn.o_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.51.self_attn.q_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.51.self_attn.v_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.52.input_layernorm.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.52.mlp.down_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.52.mlp.gate_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.52.mlp.up_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.52.post_attention_layernorm.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.52.self_attn.k_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.52.self_attn.o_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.52.self_attn.q_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.52.self_attn.v_proj.weight": "pytorch_model-00006-of-00007.bin",
    "model.layers.53.input_layernorm.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.53.mlp.down_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.53.mlp.gate_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.53.mlp.up_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.53.post_attention_layernorm.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.53.self_attn.k_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.53.self_attn.o_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.53.self_attn.q_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.53.self_attn.v_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.54.input_layernorm.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.54.mlp.down_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.54.mlp.gate_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.54.mlp.up_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.54.post_attention_layernorm.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.54.self_attn.k_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.54.self_attn.o_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.54.self_attn.q_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.54.self_attn.v_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.55.input_layernorm.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.55.mlp.down_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.55.mlp.gate_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.55.mlp.up_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.55.post_attention_layernorm.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.55.self_attn.k_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.55.self_attn.o_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.55.self_attn.q_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.55.self_attn.v_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.56.input_layernorm.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.56.mlp.down_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.56.mlp.gate_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.56.mlp.up_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.56.post_attention_layernorm.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.56.self_attn.k_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.56.self_attn.o_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.56.self_attn.q_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.56.self_attn.v_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.57.input_layernorm.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.57.mlp.down_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.57.mlp.gate_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.57.mlp.up_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.57.post_attention_layernorm.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.57.self_attn.k_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.57.self_attn.o_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.57.self_attn.q_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.57.self_attn.v_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.58.input_layernorm.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.58.mlp.down_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.58.mlp.gate_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.58.mlp.up_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.58.post_attention_layernorm.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.58.self_attn.k_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.58.self_attn.o_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.58.self_attn.q_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.58.self_attn.v_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.59.input_layernorm.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.59.mlp.down_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.59.mlp.gate_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.59.mlp.up_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.59.post_attention_layernorm.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.59.self_attn.k_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.59.self_attn.o_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.59.self_attn.q_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.59.self_attn.v_proj.weight": "pytorch_model-00007-of-00007.bin",
    "model.layers.6.input_layernorm.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.6.mlp.down_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.6.mlp.gate_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.6.mlp.up_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.6.post_attention_layernorm.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.6.self_attn.k_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.6.self_attn.o_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.6.self_attn.q_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.6.self_attn.v_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.7.input_layernorm.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.7.mlp.down_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.7.mlp.gate_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.7.mlp.up_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.7.post_attention_layernorm.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.7.self_attn.k_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.7.self_attn.o_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.7.self_attn.q_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.7.self_attn.v_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.8.input_layernorm.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.8.mlp.down_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.8.mlp.gate_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.8.mlp.up_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.8.post_attention_layernorm.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.8.self_attn.k_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.8.self_attn.o_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.8.self_attn.q_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.8.self_attn.v_proj.weight": "pytorch_model-00001-of-00007.bin",
    "model.layers.9.input_layernorm.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.9.mlp.down_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.9.mlp.gate_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.9.mlp.up_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.9.post_attention_layernorm.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.9.self_attn.k_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.9.self_attn.o_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.9.self_attn.q_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.layers.9.self_attn.v_proj.weight": "pytorch_model-00002-of-00007.bin",
    "model.norm.weight": "pytorch_model-00001-of-00007.bin"
  }
 }
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,6 @@
 {
  "bos_token": "<|startoftext|>",
  "eos_token": "<|endoftext|>",
  "pad_token": "<unk>",
  "unk_token": "<unk>"
 }
--- a/tokenizer.json
+++ b/tokenizer.json
--- a/tokenizer.model
+++ b/tokenizer.model
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:386c49cf943d71aa110361135338c50e38beeff0a66593480421f37b319e1a39
 size 1033105
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,38 @@
 {
  "added_tokens_decoder": {
    "0": {
      "content": "<unk>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "1": {
      "content": "<|startoftext|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "2": {
      "content": "<|endoftext|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    }
  },
  "bos_token": "<|startoftext|>",
  "clean_up_tokenization_spaces": false,
  "eos_token": "<|endoftext|>",
  "legacy": true,
  "model_max_length": 4096,
  "pad_token": "<unk>",
  "sp_model_kwargs": {},
  "tokenizer_class": "LlamaTokenizer",
  "unk_token": "<unk>",
  "use_default_system_prompt": false
 }