初始化项目，由ModelHub XC社区提供模型

Model: nvidia/AceReason-Nemotron-14B Source: Original Platform
2026-06-06 08:18:43 +08:00
commit 21c9ab1f0d
17 changed files with 1216 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,37 @@
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.bz2 filter=lfs diff=lfs merge=lfs -text
 *.ckpt filter=lfs diff=lfs merge=lfs -text
 *.ftz filter=lfs diff=lfs merge=lfs -text
 *.gz filter=lfs diff=lfs merge=lfs -text
 *.h5 filter=lfs diff=lfs merge=lfs -text
 *.joblib filter=lfs diff=lfs merge=lfs -text
 *.lfs.* filter=lfs diff=lfs merge=lfs -text
 *.mlmodel filter=lfs diff=lfs merge=lfs -text
 *.model filter=lfs diff=lfs merge=lfs -text
 *.msgpack filter=lfs diff=lfs merge=lfs -text
 *.npy filter=lfs diff=lfs merge=lfs -text
 *.npz filter=lfs diff=lfs merge=lfs -text
 *.onnx filter=lfs diff=lfs merge=lfs -text
 *.ot filter=lfs diff=lfs merge=lfs -text
 *.parquet filter=lfs diff=lfs merge=lfs -text
 *.pb filter=lfs diff=lfs merge=lfs -text
 *.pickle filter=lfs diff=lfs merge=lfs -text
 *.pkl filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
 *.rar filter=lfs diff=lfs merge=lfs -text
 *.safetensors filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
 *.tar filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text
 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 *.png filter=lfs diff=lfs merge=lfs -text
 tokenizer.json filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,146 @@
 ---
 library_name: transformers
 license: other
 license_name: nvidia-open-model-license
 license_link: >-
  https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
 pipeline_tag: text-generation
 language:
  - en
 tags:
  - nvidia
  - reasoning
  - math
  - code
  - reinforcement learning
  - pytorch
 ---
 # AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning
 <p align="center">
 [![Technical Report](https://img.shields.io/badge/2505.16400-Technical_Report-blue)](https://arxiv.org/abs/2505.16400)
 [![Dataset](https://img.shields.io/badge/🤗-Math_RL_Datset-blue)](https://huggingface.co/datasets/nvidia/AceReason-Math)
 [![Models](https://img.shields.io/badge/🤗-Models-blue)](https://huggingface.co/collections/nvidia/acereason-682f4e1261dc22f697fd1485)
 [![Eval Toolkit](https://img.shields.io/badge/🤗-Eval_Code-blue)](https://huggingface.co/nvidia/AceReason-Nemotron-14B/blob/main/README_EVALUATION.md)
 </p>
 <img src="fig/main_fig.png" alt="main_fig" style="width: 600px; max-width: 100%;" />
 ## 🔥News
 - **6/16/2025**: We are excited to share our new release combining SFT with RL: **AceReason-Nemotron-1.1-7B**
  - Paper: https://arxiv.org/pdf/2506.13284
  - Model: https://huggingface.co/nvidia/AceReason-Nemotron-1.1-7B
  - 4M SFT Data: https://huggingface.co/datasets/nvidia/AceReason-1.1-SFT
 - **6/11/2025**: We share our evaluation toolkit at [AceReason Evalution](https://huggingface.co/nvidia/AceReason-Nemotron-14B/blob/main/README_EVALUATION.md) including:
  - scripts to run inference and scoring
  - LiveCodeBench (avg@8): model prediction files and scores for each month (2023/5-2025/5)
  - AIME24/25 (avg@64): model prediction files and scores
 - **6/2/2025**: We are excited to share our Math RL training dataset at [AceReason-Math](https://huggingface.co/datasets/nvidia/AceReason-Math)
 We're thrilled to introduce AceReason-Nemotron-14B, a math and code reasoning model trained entirely through reinforcement learning (RL), starting from the DeepSeek-R1-Distilled-Qwen-14B. It delivers impressive results, achieving 78.6% on AIME 2024 (+8.9%), 67.4% on AIME 2025 (+17.4%), 61.1% on LiveCodeBench v5 (+8%), 54.9% on LiveCodeBench v6 (+7%), and 2024 on Codeforces (+543). We systematically study the RL training process through extensive ablations and propose a simple yet effective approach: first RL training on math-only prompts, then RL training on code-only prompts. Notably, we find that math-only RL not only significantly enhances the performance of strong distilled models on math benchmarks, but also code reasoning tasks. In addition, extended code-only RL further improves code benchmark performance while causing minimal degradation in math results. We find that RL not only elicits the foundational reasoning capabilities acquired during pre-training and supervised fine-tuning (e.g., distillation), but also pushes the limits of the model's reasoning ability, enabling it to solve problems that were previously unsolvable.
 We share our training recipe, training logs in our [technical report](https://arxiv.org/abs/2505.16400).
 ## Results
 We evaluate our model against competitive reasoning models of comparable size within Qwen2.5 and Llama3.1 model family on AIME 2024, AIME 2025, LiveCodeBench v5 (2024/08/01 - 2025/02/01), and LiveCodeBench v6 (2025/02/01-2025/05/01). More evaluation results can be found in our [technical report](https://arxiv.org/abs/2505.16400).
 | **Model** | **AIME 2024<br>(avg@64)** | **AIME 2025<br>(avg@64)** | **LCB v5<br>(avg@8)** | **LCB v6<br>(avg@8)** |
 | :---: | :---: | :---: | :---: | :---: |
 | <small>QwQ-32B</small> | 79.5 | 65.8 | 63.4 | - |
 | <small>DeepSeek-R1-671B</small> | 79.8 | 70.0 | 65.9 | - |
 | <small>Llama-Nemotron-Ultra-253B</small> | 80.8 | 72.5 | 66.3 | - |
 | <small>o3-mini (medium)</small> | 79.6 | 76.7 | 67.4 | - |
 | <small>Light-R1-14B</small> | 74 | 60.2 | 57.9 | 51.5 |
 | <small>DeepCoder-14B (32K Inference)</small> | 71 | 56.1 | 57.9 | 50.4 |
 | <small>OpenMath-Nemotron-14B</small> | 76.3 | 63.0 | - | - |
 | <small>OpenCodeReasoning-Nemotron-14B</small> | - | - | 59.4 | 54.1 |
 | <small>Llama-Nemotron-Super-49B-v1</small> | 67.5 | 60.0 | 45.5 | - |
 | <small>DeepSeek-R1-Distilled-Qwen-14B</small> | 69.7 | 50.2 | 53.1 | 47.9 |
 | <small>DeepSeek-R1-Distilled-Qwen-32B</small> | 72.6 | 54.9 | 57.2 | - |
 | [AceReason-Nemotron-7B 🤗](https://huggingface.co/nvidia/AceReason-Nemotron-7B)| 69.0 | 53.6 | 51.8 | 44.1 |
 | [AceReason-Nemotron-14B 🤗](https://huggingface.co/nvidia/AceReason-Nemotron-14B)| 78.6 | 67.4 | 61.1 | 54.9 |
 ## How to use
 ```python
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model_name = 'nvidia/AceReason-Nemotron-14B'
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
 prompt = "Jen enters a lottery by picking $4$ distinct numbers from $S=\\{1,2,3,\\cdots,9,10\\}.$ $4$ numbers are randomly chosen from $S.$ She wins a prize if at least two of her numbers were $2$ of the randomly chosen numbers, and wins the grand prize if all four of her numbers were the randomly chosen numbers. The probability of her winning the grand prize given that she won a prize is $\\tfrac{m}{n}$ where $m$ and $n$ are relatively prime positive integers. Find $m+n$."
 messages = [{"role": "user", "content": prompt}]
 text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
 )
 model_inputs = tokenizer([text], return_tensors="pt").to("cuda")
 generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768,
    temperature=0.6,
    top_p=0.95
 )
 generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
 ]
 response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 ```
 ## Usage Recommendations
 1. Don't include a system prompt; instead, place all instructions directly in the user prompt.
 2. We recommend using the following instruction for math questions: Please reason step by step, and put your final answer within \\boxed{}. 
 3. We recommend using the following instruction for code questions:
 ```python
 question = "" # code question
 starter_code = "" # starter code function header
 code_instruction_nostartercode = """Write Python code to solve the problem. Please place the solution code in the following format:\n```python\n# Your solution code here\n```"""
 code_instruction_hasstartercode = """Please place the solution code in the following format:\n```python\n# Your solution code here\n```"""
 if starter_code != "":
    question += "\n\n" + "Solve the problem starting with the provided function header.\n\nFunction header:\n" + "```\n" + starter_code + "\n```"
    question += "\n\n" + code_instruction_hasstartercode
 else:
    question += "\n\n" + code_instruction_nostartercode
 final_prompt = "<｜User｜>" + question + "<｜Assistant｜><think>\n"
 ```
 4. Our inference engine for evaluation is **vLLM==0.7.3** using top-p=0.95, temperature=0.6, max_tokens=32768.
 ## Evaluation Toolkit
 Please check evaluation code, scripts, cached prediction files in https://huggingface.co/nvidia/AceReason-Nemotron-14B/blob/main/README_EVALUATION.md
 ## Correspondence to
 Yang Chen (yachen@nvidia.com), Zhuolin Yang (zhuoliny@nvidia.com), Zihan Liu (zihanl@nvidia.com), Chankyu Lee (chankyul@nvidia.com), Wei Ping (wping@nvidia.com)
 ## License
 Your use of this model is governed by the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
 ## Citation
 ```
@article{chen2025acereason,
  title={AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning},
  author={Chen, Yang and Yang, Zhuolin and Liu, Zihan and Lee, Chankyu and Xu, Peng and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
  journal={arXiv preprint arXiv:2505.16400},
  year={2025}
 }
 ```
--- a/README_EVALUATION.md
+++ b/README_EVALUATION.md
@@ -0,0 +1,166 @@
 # AceReason Evaluation Toolkit
 We share our evaluation script and code in https://huggingface.co/nvidia/AceReason-Nemotron-14B/blob/main/evaluation.tar.gz
 ## Environment
 - vllm==0.7.3
 - torch==2.5.1
 - transformers==4.48.2
 - 8x NVIDIA H100 80GB HBM3 (CUDA Version: 12.8)
 ### Dataset Download
 LiveCodeBench:
 ```
 from datasets import load_dataset
 ds = load_dataset(
    "livecodebench/code_generation_lite",
    version_tag="release_v6",
 )["test"]
 ds.to_json("data/livecodebench_problems.json", orient="records", lines=False)
 ```
 Math: see data/*
 ## Evaluation Script
 For model generation on single seed, please use the following command:
 ```
 bash generate_livecodebench.sh ${model_path} ${seed} ${output_path} ${model_type}
 bash generate_aime.sh ${model_path} ${seed} aime24 ${output_path} ${model_type}
 bash generate_aime.sh ${model_path} ${seed} aime25 ${output_path} ${model_type}
 ```
 Please specify model_type as r1 for AceReason-Nemotron-1.0 models, and qwen for AceReason-Nemotron-1.1 models.
 Or you can use our configured seeds to reproduce our results on AIME 24/25 (avg@64) and LiveCodeBench v5/v6 (avg@8) as follows:
 ```
 bash run_livecodebench.sh ${model_path} ${output_path}
 bash run_aime.sh ${model_path} ${output_path}
 ```
 For benchmark evaluation, we provide the following evaluation command to reproduce our results:
 ```
 python evaluate_livecodebench.py -g ${output_path}
 python evaluate_aime.py --modelfolder ${output_path} --test_data data/aime24.jsonl
 python evaluate_aime.py --modelfolder ${output_path} --test_data data/aime25.jsonl
 ```
 ## Reference Results
 We also left our generations into cache.tar.gz as references.
 ```
 LiveCodeBench AceReason-Nemotron-1.0-7B (Avg@8)
 =================================================================
 Months          Corrects        Total           Accuracy
 2023-05         180             272             66.17647058823529
 2023-06         238             312             76.28205128205128
 2023-07         337             432             78.00925925925925
 2023-08         185             288             64.23611111111111
 2023-09         275             352             78.125
 2023-10         257             352             73.01136363636364
 2023-11         217             280             77.5
 2023-12         228             320             71.25
 2024-01         193             288             67.01388888888889
 2024-02         169             256             66.015625
 2024-03         234             360             65.0
 2024-04         226             296             76.35135135135135
 2024-05         211             288             73.26388888888889
 05/23-05/24     2950            4096            72.021484375
 2024-06         277             368             75.27173913043478
 2024-07         223             344             64.82558139534883
 2024-08         275             528             52.083333333333336
 2024-09         204             376             54.255319148936174
 2024-10         209             424             49.29245283018868
 2024-11         216             456             47.36842105263158
 2024-12         223             392             56.88775510204081
 2025-01         161             408             39.46078431372549
 06/24-01/25     1788            3296            54.24757281553398
 2025-02         179             408             43.872549019607845
 2025-03         258             544             47.4264705882353
 2025-04         38              96              39.583333333333336
 v5              1142            2232            51.16487455197132
 v6              621             1400            44.357142857142854
 LiveCodeBench AceReason-Nemotron-1.0-14B (Avg@8)
 =================================================================
 Months          Corrects        Total           Accuracy
 2023-05         211             272             77.57352941176471
 2023-06         282             312             90.38461538461539
 2023-07         393             432             90.97222222222223
 2023-08         219             288             76.04166666666667
 2023-09         315             352             89.48863636363636
 2023-10         294             352             83.52272727272727
 2023-11         229             280             81.78571428571429
 2023-12         263             320             82.1875
 2024-01         219             288             76.04166666666667
 2024-02         201             256             78.515625
 2024-03         296             360             82.22222222222223
 2024-04         252             296             85.13513513513513
 2024-05         233             288             80.90277777777777
 05/23-05/24     3407            4096            83.1787109375
 2024-06         311             368             84.51086956521739
 2024-07         248             344             72.09302325581395
 2024-08         299             528             56.628787878787875
 2024-09         232             376             61.702127659574465
 2024-10         266             424             62.735849056603776
 2024-11         282             456             61.8421052631579
 2024-12         253             392             64.54081632653062
 2025-01         217             408             53.18627450980392
 06/24-01/25     2108            3296            63.95631067961165
 2025-02         211             408             51.71568627450981
 2025-03         324             544             59.55882352941177
 2025-04         41              96              42.708333333333336
 v5              1350            2232            60.483870967741936
 v6              775             1400            55.357142857142854
 LiveCodeBench AceReason-Nemotron-1.1-7B (Avg@8)
 =================================================================
 Months          Corrects        Total           Accuracy
 2023-05         205             272             75.36764705882354
 2023-06         255             312             81.73076923076923
 2023-07         356             432             82.4074074074074
 2023-08         208             288             72.22222222222223
 2023-09         287             352             81.5340909090909
 2023-10         278             352             78.97727272727273
 2023-11         234             280             83.57142857142857
 2023-12         263             320             82.1875
 2024-01         215             288             74.65277777777777
 2024-02         182             256             71.09375
 2024-03         270             360             75.0
 2024-04         254             296             85.8108108108108
 2024-05         221             288             76.73611111111111
 05/23-05/24     3228            4096            78.80859375
 2024-06         309             368             83.96739130434783
 2024-07         235             344             68.31395348837209
 2024-08         292             528             55.303030303030305
 2024-09         211             376             56.11702127659574
 2024-10         254             424             59.905660377358494
 2024-11         269             456             58.99122807017544
 2024-12         239             392             60.96938775510204
 2025-01         194             408             47.549019607843135
 06/24-01/25     2003            3296            60.77063106796116
 2025-02         203             408             49.754901960784316
 2025-03         306             544             56.25
 2025-04         41              96              42.708333333333336
 v5              1283            2232            57.482078853046595
 v6              726             1400            51.857142857142854
 AceReason-Nemotron-7B
 ====================================
 AIME2024 (Avg@64) 68.64583333333334
 AIME2025 (Avg@64) 53.59375000000002
 AceReason-Nemotron-14B
 ====================================
 AIME2024 (Avg@64) 78.43749999999997
 AIME2025 (Avg@64) 67.65625
 AceReason-Nemotron-1.1-7B
 ====================================
 AIME2024 (Avg@64) 72.60416666666667
 AIME2025 (Avg@64) 64.84375
 ```
--- a/config.json
+++ b/config.json
@@ -0,0 +1,29 @@
 {
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151646,
  "eos_token_id": 151643,
  "hidden_act": "silu",
  "hidden_size": 5120,
  "initializer_range": 0.02,
  "intermediate_size": 13824,
  "max_position_embeddings": 131072,
  "max_window_layers": 48,
  "model_type": "qwen2",
  "num_attention_heads": 40,
  "num_hidden_layers": 48,
  "num_key_value_heads": 8,
  "pad_token_id": 151643,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 1000000.0,
  "sliding_window": null,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.49.0",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 152064
 }
--- a/evaluation.tar.gz
+++ b/evaluation.tar.gz
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:1c1daadb3bfead2369b1ad8937369b85a4e550b4a2fadd6c9fac4b429c24e818
 size 374916092
--- a/fig/main_fig.png
+++ b/fig/main_fig.png
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:40ed09ffba7835a9a3f4c1d39c809c8ca5fe7d947e91199b4e9b266fa85178d0
 size 105946
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,7 @@
 {
  "_from_model_config": true,
  "bos_token_id": 151646,
  "eos_token_id": 151643,
  "pad_token_id": 151643,
  "transformers_version": "4.49.0"
 }
--- a/model-00001-of-00006.safetensors
+++ b/model-00001-of-00006.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:e322e1e9498cee8d18b0c04f592ee221a54344e6137cb3c948a603b807e37217
 size 4986211280
--- a/model-00002-of-00006.safetensors
+++ b/model-00002-of-00006.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:fa4520787b14831eab69fcb517f2f24893ff79ece41972de1cfcabf786f89ad4
 size 4954847344
--- a/model-00003-of-00006.safetensors
+++ b/model-00003-of-00006.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:c3716bd5b77888fbe5f838b742b0146005bc19ae82adfb6db4a21e0ca7ea1b72
 size 4954847392
--- a/model-00004-of-00006.safetensors
+++ b/model-00004-of-00006.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:6211c78217e6d8fdfcc81db8edea3737519a1a8984d15d0c3639ce312447f731
 size 4954847392
--- a/model-00005-of-00006.safetensors
+++ b/model-00005-of-00006.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:7c260cbd5fd753270f75a0cea6e2f5f52b2d7bbf28bd019755ca2639e129166f
 size 4954847392
--- a/model-00006-of-00006.safetensors
+++ b/model-00006-of-00006.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:c175ad433d94c2a121424e9fb4ceb35a273cdbd7510716268d5790cd39a4c301
 size 4734533160
--- a/model.safetensors.index.json
+++ b/model.safetensors.index.json
@@ -0,0 +1,586 @@
 {
  "metadata": {
    "total_size": 29540067328
  },
  "weight_map": {
    "lm_head.weight": "model-00006-of-00006.safetensors",
    "model.embed_tokens.weight": "model-00001-of-00006.safetensors",
    "model.layers.0.input_layernorm.weight": "model-00001-of-00006.safetensors",
    "model.layers.0.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.0.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
    "model.layers.0.self_attn.k_proj.bias": "model-00001-of-00006.safetensors",
    "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.0.self_attn.q_proj.bias": "model-00001-of-00006.safetensors",
    "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.0.self_attn.v_proj.bias": "model-00001-of-00006.safetensors",
    "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.1.input_layernorm.weight": "model-00001-of-00006.safetensors",
    "model.layers.1.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.1.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
    "model.layers.1.self_attn.k_proj.bias": "model-00001-of-00006.safetensors",
    "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.1.self_attn.q_proj.bias": "model-00001-of-00006.safetensors",
    "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.1.self_attn.v_proj.bias": "model-00001-of-00006.safetensors",
    "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.10.input_layernorm.weight": "model-00002-of-00006.safetensors",
    "model.layers.10.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.10.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.10.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
    "model.layers.10.self_attn.k_proj.bias": "model-00002-of-00006.safetensors",
    "model.layers.10.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.10.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.10.self_attn.q_proj.bias": "model-00002-of-00006.safetensors",
    "model.layers.10.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.10.self_attn.v_proj.bias": "model-00002-of-00006.safetensors",
    "model.layers.10.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.11.input_layernorm.weight": "model-00002-of-00006.safetensors",
    "model.layers.11.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.11.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.11.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
    "model.layers.11.self_attn.k_proj.bias": "model-00002-of-00006.safetensors",
    "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.11.self_attn.q_proj.bias": "model-00002-of-00006.safetensors",
    "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.11.self_attn.v_proj.bias": "model-00002-of-00006.safetensors",
    "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.12.input_layernorm.weight": "model-00002-of-00006.safetensors",
    "model.layers.12.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.12.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.12.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
    "model.layers.12.self_attn.k_proj.bias": "model-00002-of-00006.safetensors",
    "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.12.self_attn.q_proj.bias": "model-00002-of-00006.safetensors",
    "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.12.self_attn.v_proj.bias": "model-00002-of-00006.safetensors",
    "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.13.input_layernorm.weight": "model-00002-of-00006.safetensors",
    "model.layers.13.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.13.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.13.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
    "model.layers.13.self_attn.k_proj.bias": "model-00002-of-00006.safetensors",
    "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.13.self_attn.q_proj.bias": "model-00002-of-00006.safetensors",
    "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.13.self_attn.v_proj.bias": "model-00002-of-00006.safetensors",
    "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.14.input_layernorm.weight": "model-00002-of-00006.safetensors",
    "model.layers.14.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.14.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.14.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
    "model.layers.14.self_attn.k_proj.bias": "model-00002-of-00006.safetensors",
    "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.14.self_attn.q_proj.bias": "model-00002-of-00006.safetensors",
    "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.14.self_attn.v_proj.bias": "model-00002-of-00006.safetensors",
    "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.15.input_layernorm.weight": "model-00003-of-00006.safetensors",
    "model.layers.15.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.15.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.15.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.15.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
    "model.layers.15.self_attn.k_proj.bias": "model-00002-of-00006.safetensors",
    "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.15.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.15.self_attn.q_proj.bias": "model-00002-of-00006.safetensors",
    "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.15.self_attn.v_proj.bias": "model-00002-of-00006.safetensors",
    "model.layers.15.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.16.input_layernorm.weight": "model-00003-of-00006.safetensors",
    "model.layers.16.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.16.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.16.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.16.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
    "model.layers.16.self_attn.k_proj.bias": "model-00003-of-00006.safetensors",
    "model.layers.16.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.16.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.16.self_attn.q_proj.bias": "model-00003-of-00006.safetensors",
    "model.layers.16.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.16.self_attn.v_proj.bias": "model-00003-of-00006.safetensors",
    "model.layers.16.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.17.input_layernorm.weight": "model-00003-of-00006.safetensors",
    "model.layers.17.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.17.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.17.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.17.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
    "model.layers.17.self_attn.k_proj.bias": "model-00003-of-00006.safetensors",
    "model.layers.17.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.17.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.17.self_attn.q_proj.bias": "model-00003-of-00006.safetensors",
    "model.layers.17.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.17.self_attn.v_proj.bias": "model-00003-of-00006.safetensors",
    "model.layers.17.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.18.input_layernorm.weight": "model-00003-of-00006.safetensors",
    "model.layers.18.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.18.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.18.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.18.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
    "model.layers.18.self_attn.k_proj.bias": "model-00003-of-00006.safetensors",
    "model.layers.18.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.18.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.18.self_attn.q_proj.bias": "model-00003-of-00006.safetensors",
    "model.layers.18.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.18.self_attn.v_proj.bias": "model-00003-of-00006.safetensors",
    "model.layers.18.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.19.input_layernorm.weight": "model-00003-of-00006.safetensors",
    "model.layers.19.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.19.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.19.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.19.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
    "model.layers.19.self_attn.k_proj.bias": "model-00003-of-00006.safetensors",
    "model.layers.19.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.19.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.19.self_attn.q_proj.bias": "model-00003-of-00006.safetensors",
    "model.layers.19.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.19.self_attn.v_proj.bias": "model-00003-of-00006.safetensors",
    "model.layers.19.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.2.input_layernorm.weight": "model-00001-of-00006.safetensors",
    "model.layers.2.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.2.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
    "model.layers.2.self_attn.k_proj.bias": "model-00001-of-00006.safetensors",
    "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.2.self_attn.q_proj.bias": "model-00001-of-00006.safetensors",
    "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.2.self_attn.v_proj.bias": "model-00001-of-00006.safetensors",
    "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.20.input_layernorm.weight": "model-00003-of-00006.safetensors",
    "model.layers.20.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.20.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.20.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.20.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
    "model.layers.20.self_attn.k_proj.bias": "model-00003-of-00006.safetensors",
    "model.layers.20.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.20.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.20.self_attn.q_proj.bias": "model-00003-of-00006.safetensors",
    "model.layers.20.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.20.self_attn.v_proj.bias": "model-00003-of-00006.safetensors",
    "model.layers.20.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.21.input_layernorm.weight": "model-00003-of-00006.safetensors",
    "model.layers.21.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.21.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.21.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.21.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
    "model.layers.21.self_attn.k_proj.bias": "model-00003-of-00006.safetensors",
    "model.layers.21.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.21.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.21.self_attn.q_proj.bias": "model-00003-of-00006.safetensors",
    "model.layers.21.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.21.self_attn.v_proj.bias": "model-00003-of-00006.safetensors",
    "model.layers.21.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.22.input_layernorm.weight": "model-00003-of-00006.safetensors",
    "model.layers.22.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.22.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.22.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.22.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
    "model.layers.22.self_attn.k_proj.bias": "model-00003-of-00006.safetensors",
    "model.layers.22.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.22.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.22.self_attn.q_proj.bias": "model-00003-of-00006.safetensors",
    "model.layers.22.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.22.self_attn.v_proj.bias": "model-00003-of-00006.safetensors",
    "model.layers.22.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.23.input_layernorm.weight": "model-00003-of-00006.safetensors",
    "model.layers.23.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.23.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.23.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.23.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
    "model.layers.23.self_attn.k_proj.bias": "model-00003-of-00006.safetensors",
    "model.layers.23.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.23.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.23.self_attn.q_proj.bias": "model-00003-of-00006.safetensors",
    "model.layers.23.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.23.self_attn.v_proj.bias": "model-00003-of-00006.safetensors",
    "model.layers.23.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.24.input_layernorm.weight": "model-00004-of-00006.safetensors",
    "model.layers.24.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.24.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.24.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.24.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
    "model.layers.24.self_attn.k_proj.bias": "model-00003-of-00006.safetensors",
    "model.layers.24.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.24.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.24.self_attn.q_proj.bias": "model-00003-of-00006.safetensors",
    "model.layers.24.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.24.self_attn.v_proj.bias": "model-00003-of-00006.safetensors",
    "model.layers.24.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
    "model.layers.25.input_layernorm.weight": "model-00004-of-00006.safetensors",
    "model.layers.25.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.25.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.25.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.25.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
    "model.layers.25.self_attn.k_proj.bias": "model-00004-of-00006.safetensors",
    "model.layers.25.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.25.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.25.self_attn.q_proj.bias": "model-00004-of-00006.safetensors",
    "model.layers.25.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.25.self_attn.v_proj.bias": "model-00004-of-00006.safetensors",
    "model.layers.25.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.26.input_layernorm.weight": "model-00004-of-00006.safetensors",
    "model.layers.26.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.26.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.26.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.26.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
    "model.layers.26.self_attn.k_proj.bias": "model-00004-of-00006.safetensors",
    "model.layers.26.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.26.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.26.self_attn.q_proj.bias": "model-00004-of-00006.safetensors",
    "model.layers.26.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.26.self_attn.v_proj.bias": "model-00004-of-00006.safetensors",
    "model.layers.26.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.27.input_layernorm.weight": "model-00004-of-00006.safetensors",
    "model.layers.27.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.27.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.27.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.27.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
    "model.layers.27.self_attn.k_proj.bias": "model-00004-of-00006.safetensors",
    "model.layers.27.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.27.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.27.self_attn.q_proj.bias": "model-00004-of-00006.safetensors",
    "model.layers.27.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.27.self_attn.v_proj.bias": "model-00004-of-00006.safetensors",
    "model.layers.27.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.28.input_layernorm.weight": "model-00004-of-00006.safetensors",
    "model.layers.28.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.28.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.28.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.28.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
    "model.layers.28.self_attn.k_proj.bias": "model-00004-of-00006.safetensors",
    "model.layers.28.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.28.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.28.self_attn.q_proj.bias": "model-00004-of-00006.safetensors",
    "model.layers.28.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.28.self_attn.v_proj.bias": "model-00004-of-00006.safetensors",
    "model.layers.28.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.29.input_layernorm.weight": "model-00004-of-00006.safetensors",
    "model.layers.29.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.29.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.29.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.29.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
    "model.layers.29.self_attn.k_proj.bias": "model-00004-of-00006.safetensors",
    "model.layers.29.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.29.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.29.self_attn.q_proj.bias": "model-00004-of-00006.safetensors",
    "model.layers.29.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.29.self_attn.v_proj.bias": "model-00004-of-00006.safetensors",
    "model.layers.29.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.3.input_layernorm.weight": "model-00001-of-00006.safetensors",
    "model.layers.3.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.3.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
    "model.layers.3.self_attn.k_proj.bias": "model-00001-of-00006.safetensors",
    "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.3.self_attn.q_proj.bias": "model-00001-of-00006.safetensors",
    "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.3.self_attn.v_proj.bias": "model-00001-of-00006.safetensors",
    "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.30.input_layernorm.weight": "model-00004-of-00006.safetensors",
    "model.layers.30.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.30.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.30.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.30.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
    "model.layers.30.self_attn.k_proj.bias": "model-00004-of-00006.safetensors",
    "model.layers.30.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.30.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.30.self_attn.q_proj.bias": "model-00004-of-00006.safetensors",
    "model.layers.30.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.30.self_attn.v_proj.bias": "model-00004-of-00006.safetensors",
    "model.layers.30.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.31.input_layernorm.weight": "model-00004-of-00006.safetensors",
    "model.layers.31.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.31.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.31.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.31.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
    "model.layers.31.self_attn.k_proj.bias": "model-00004-of-00006.safetensors",
    "model.layers.31.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.31.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.31.self_attn.q_proj.bias": "model-00004-of-00006.safetensors",
    "model.layers.31.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.31.self_attn.v_proj.bias": "model-00004-of-00006.safetensors",
    "model.layers.31.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.32.input_layernorm.weight": "model-00004-of-00006.safetensors",
    "model.layers.32.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.32.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.32.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.32.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
    "model.layers.32.self_attn.k_proj.bias": "model-00004-of-00006.safetensors",
    "model.layers.32.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.32.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.32.self_attn.q_proj.bias": "model-00004-of-00006.safetensors",
    "model.layers.32.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.32.self_attn.v_proj.bias": "model-00004-of-00006.safetensors",
    "model.layers.32.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.33.input_layernorm.weight": "model-00005-of-00006.safetensors",
    "model.layers.33.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.33.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.33.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.33.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
    "model.layers.33.self_attn.k_proj.bias": "model-00004-of-00006.safetensors",
    "model.layers.33.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.33.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.33.self_attn.q_proj.bias": "model-00004-of-00006.safetensors",
    "model.layers.33.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.33.self_attn.v_proj.bias": "model-00004-of-00006.safetensors",
    "model.layers.33.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
    "model.layers.34.input_layernorm.weight": "model-00005-of-00006.safetensors",
    "model.layers.34.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.34.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.34.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.34.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
    "model.layers.34.self_attn.k_proj.bias": "model-00005-of-00006.safetensors",
    "model.layers.34.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.34.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.34.self_attn.q_proj.bias": "model-00005-of-00006.safetensors",
    "model.layers.34.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.34.self_attn.v_proj.bias": "model-00005-of-00006.safetensors",
    "model.layers.34.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.35.input_layernorm.weight": "model-00005-of-00006.safetensors",
    "model.layers.35.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.35.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.35.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.35.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
    "model.layers.35.self_attn.k_proj.bias": "model-00005-of-00006.safetensors",
    "model.layers.35.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.35.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.35.self_attn.q_proj.bias": "model-00005-of-00006.safetensors",
    "model.layers.35.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.35.self_attn.v_proj.bias": "model-00005-of-00006.safetensors",
    "model.layers.35.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.36.input_layernorm.weight": "model-00005-of-00006.safetensors",
    "model.layers.36.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.36.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.36.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.36.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
    "model.layers.36.self_attn.k_proj.bias": "model-00005-of-00006.safetensors",
    "model.layers.36.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.36.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.36.self_attn.q_proj.bias": "model-00005-of-00006.safetensors",
    "model.layers.36.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.36.self_attn.v_proj.bias": "model-00005-of-00006.safetensors",
    "model.layers.36.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.37.input_layernorm.weight": "model-00005-of-00006.safetensors",
    "model.layers.37.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.37.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.37.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.37.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
    "model.layers.37.self_attn.k_proj.bias": "model-00005-of-00006.safetensors",
    "model.layers.37.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.37.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.37.self_attn.q_proj.bias": "model-00005-of-00006.safetensors",
    "model.layers.37.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.37.self_attn.v_proj.bias": "model-00005-of-00006.safetensors",
    "model.layers.37.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.38.input_layernorm.weight": "model-00005-of-00006.safetensors",
    "model.layers.38.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.38.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.38.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.38.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
    "model.layers.38.self_attn.k_proj.bias": "model-00005-of-00006.safetensors",
    "model.layers.38.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.38.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.38.self_attn.q_proj.bias": "model-00005-of-00006.safetensors",
    "model.layers.38.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.38.self_attn.v_proj.bias": "model-00005-of-00006.safetensors",
    "model.layers.38.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.39.input_layernorm.weight": "model-00005-of-00006.safetensors",
    "model.layers.39.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.39.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.39.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.39.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
    "model.layers.39.self_attn.k_proj.bias": "model-00005-of-00006.safetensors",
    "model.layers.39.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.39.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.39.self_attn.q_proj.bias": "model-00005-of-00006.safetensors",
    "model.layers.39.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.39.self_attn.v_proj.bias": "model-00005-of-00006.safetensors",
    "model.layers.39.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.4.input_layernorm.weight": "model-00001-of-00006.safetensors",
    "model.layers.4.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.4.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
    "model.layers.4.self_attn.k_proj.bias": "model-00001-of-00006.safetensors",
    "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.4.self_attn.q_proj.bias": "model-00001-of-00006.safetensors",
    "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.4.self_attn.v_proj.bias": "model-00001-of-00006.safetensors",
    "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.40.input_layernorm.weight": "model-00005-of-00006.safetensors",
    "model.layers.40.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.40.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.40.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.40.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
    "model.layers.40.self_attn.k_proj.bias": "model-00005-of-00006.safetensors",
    "model.layers.40.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.40.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.40.self_attn.q_proj.bias": "model-00005-of-00006.safetensors",
    "model.layers.40.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.40.self_attn.v_proj.bias": "model-00005-of-00006.safetensors",
    "model.layers.40.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.41.input_layernorm.weight": "model-00005-of-00006.safetensors",
    "model.layers.41.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.41.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.41.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.41.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
    "model.layers.41.self_attn.k_proj.bias": "model-00005-of-00006.safetensors",
    "model.layers.41.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.41.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.41.self_attn.q_proj.bias": "model-00005-of-00006.safetensors",
    "model.layers.41.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.41.self_attn.v_proj.bias": "model-00005-of-00006.safetensors",
    "model.layers.41.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.42.input_layernorm.weight": "model-00006-of-00006.safetensors",
    "model.layers.42.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.42.mlp.gate_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.42.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.42.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
    "model.layers.42.self_attn.k_proj.bias": "model-00005-of-00006.safetensors",
    "model.layers.42.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.42.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.42.self_attn.q_proj.bias": "model-00005-of-00006.safetensors",
    "model.layers.42.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.42.self_attn.v_proj.bias": "model-00005-of-00006.safetensors",
    "model.layers.42.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
    "model.layers.43.input_layernorm.weight": "model-00006-of-00006.safetensors",
    "model.layers.43.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.43.mlp.gate_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.43.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.43.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
    "model.layers.43.self_attn.k_proj.bias": "model-00006-of-00006.safetensors",
    "model.layers.43.self_attn.k_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.43.self_attn.o_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.43.self_attn.q_proj.bias": "model-00006-of-00006.safetensors",
    "model.layers.43.self_attn.q_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.43.self_attn.v_proj.bias": "model-00006-of-00006.safetensors",
    "model.layers.43.self_attn.v_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.44.input_layernorm.weight": "model-00006-of-00006.safetensors",
    "model.layers.44.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.44.mlp.gate_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.44.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.44.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
    "model.layers.44.self_attn.k_proj.bias": "model-00006-of-00006.safetensors",
    "model.layers.44.self_attn.k_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.44.self_attn.o_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.44.self_attn.q_proj.bias": "model-00006-of-00006.safetensors",
    "model.layers.44.self_attn.q_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.44.self_attn.v_proj.bias": "model-00006-of-00006.safetensors",
    "model.layers.44.self_attn.v_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.45.input_layernorm.weight": "model-00006-of-00006.safetensors",
    "model.layers.45.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.45.mlp.gate_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.45.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.45.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
    "model.layers.45.self_attn.k_proj.bias": "model-00006-of-00006.safetensors",
    "model.layers.45.self_attn.k_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.45.self_attn.o_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.45.self_attn.q_proj.bias": "model-00006-of-00006.safetensors",
    "model.layers.45.self_attn.q_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.45.self_attn.v_proj.bias": "model-00006-of-00006.safetensors",
    "model.layers.45.self_attn.v_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.46.input_layernorm.weight": "model-00006-of-00006.safetensors",
    "model.layers.46.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.46.mlp.gate_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.46.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.46.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
    "model.layers.46.self_attn.k_proj.bias": "model-00006-of-00006.safetensors",
    "model.layers.46.self_attn.k_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.46.self_attn.o_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.46.self_attn.q_proj.bias": "model-00006-of-00006.safetensors",
    "model.layers.46.self_attn.q_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.46.self_attn.v_proj.bias": "model-00006-of-00006.safetensors",
    "model.layers.46.self_attn.v_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.47.input_layernorm.weight": "model-00006-of-00006.safetensors",
    "model.layers.47.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.47.mlp.gate_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.47.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.47.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
    "model.layers.47.self_attn.k_proj.bias": "model-00006-of-00006.safetensors",
    "model.layers.47.self_attn.k_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.47.self_attn.o_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.47.self_attn.q_proj.bias": "model-00006-of-00006.safetensors",
    "model.layers.47.self_attn.q_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.47.self_attn.v_proj.bias": "model-00006-of-00006.safetensors",
    "model.layers.47.self_attn.v_proj.weight": "model-00006-of-00006.safetensors",
    "model.layers.5.input_layernorm.weight": "model-00001-of-00006.safetensors",
    "model.layers.5.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.5.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
    "model.layers.5.self_attn.k_proj.bias": "model-00001-of-00006.safetensors",
    "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.5.self_attn.q_proj.bias": "model-00001-of-00006.safetensors",
    "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.5.self_attn.v_proj.bias": "model-00001-of-00006.safetensors",
    "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.6.input_layernorm.weight": "model-00002-of-00006.safetensors",
    "model.layers.6.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.6.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.6.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.6.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
    "model.layers.6.self_attn.k_proj.bias": "model-00001-of-00006.safetensors",
    "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.6.self_attn.q_proj.bias": "model-00001-of-00006.safetensors",
    "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.6.self_attn.v_proj.bias": "model-00001-of-00006.safetensors",
    "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
    "model.layers.7.input_layernorm.weight": "model-00002-of-00006.safetensors",
    "model.layers.7.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.7.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.7.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.7.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
    "model.layers.7.self_attn.k_proj.bias": "model-00002-of-00006.safetensors",
    "model.layers.7.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.7.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.7.self_attn.q_proj.bias": "model-00002-of-00006.safetensors",
    "model.layers.7.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.7.self_attn.v_proj.bias": "model-00002-of-00006.safetensors",
    "model.layers.7.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.8.input_layernorm.weight": "model-00002-of-00006.safetensors",
    "model.layers.8.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.8.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.8.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.8.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
    "model.layers.8.self_attn.k_proj.bias": "model-00002-of-00006.safetensors",
    "model.layers.8.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.8.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.8.self_attn.q_proj.bias": "model-00002-of-00006.safetensors",
    "model.layers.8.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.8.self_attn.v_proj.bias": "model-00002-of-00006.safetensors",
    "model.layers.8.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.9.input_layernorm.weight": "model-00002-of-00006.safetensors",
    "model.layers.9.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.9.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.9.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.9.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
    "model.layers.9.self_attn.k_proj.bias": "model-00002-of-00006.safetensors",
    "model.layers.9.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.9.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.9.self_attn.q_proj.bias": "model-00002-of-00006.safetensors",
    "model.layers.9.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
    "model.layers.9.self_attn.v_proj.bias": "model-00002-of-00006.safetensors",
    "model.layers.9.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
    "model.norm.weight": "model-00006-of-00006.safetensors"
  }
 }
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,23 @@
 {
  "bos_token": {
    "content": "<｜begin▁of▁sentence｜>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "eos_token": {
    "content": "<｜end▁of▁sentence｜>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "pad_token": {
    "content": "<｜end▁of▁sentence｜>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  }
 }
--- a/tokenizer.json
+++ b/tokenizer.json
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:e20ddafc659ba90242154b55275402edeca0715e5dbb30f56815a4ce081f4893
 size 11422778
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,195 @@
 {
  "add_bos_token": true,
  "add_eos_token": false,
  "add_prefix_space": null,
  "added_tokens_decoder": {
    "151643": {
      "content": "<｜end▁of▁sentence｜>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151644": {
      "content": "<｜User｜>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151645": {
      "content": "<｜Assistant｜>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151646": {
      "content": "<｜begin▁of▁sentence｜>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151647": {
      "content": "<|EOT|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151648": {
      "content": "<think>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151649": {
      "content": "</think>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151650": {
      "content": "<|quad_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151651": {
      "content": "<|quad_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151652": {
      "content": "<|vision_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151653": {
      "content": "<|vision_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151654": {
      "content": "<|vision_pad|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151655": {
      "content": "<|image_pad|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151656": {
      "content": "<|video_pad|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151657": {
      "content": "<tool_call>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151658": {
      "content": "</tool_call>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151659": {
      "content": "<|fim_prefix|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151660": {
      "content": "<|fim_middle|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151661": {
      "content": "<|fim_suffix|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151662": {
      "content": "<|fim_pad|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151663": {
      "content": "<|repo_name|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151664": {
      "content": "<|file_sep|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    }
  },
  "bos_token": "<｜begin▁of▁sentence｜>",
  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='') %}{%- for message in messages %}{%- if message['role'] == 'system' %}{% set ns.system_prompt = message['content'] %}{%- endif %}{%- endfor %}{{bos_token}}{{ns.system_prompt}}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{{'<｜User｜>' + message['content']}}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is none %}{%- set ns.is_tool = false -%}{%- for tool in message['tool_calls']%}{%- if not ns.is_first %}{{'<｜Assistant｜><｜tool▁calls▁begin｜><｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<｜tool▁call▁end｜>'}}{%- set ns.is_first = true -%}{%- else %}{{'\\n' + '<｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<｜tool▁call▁end｜>'}}{{'<｜tool▁calls▁end｜><｜end▁of▁sentence｜>'}}{%- endif %}{%- endfor %}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is not none %}{%- if ns.is_tool %}{{'<｜tool▁outputs▁end｜>' + message['content'] + '<｜end▁of▁sentence｜>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{% if '</think>' in content %}{% set content = content.split('</think>')[-1] %}{% endif %}{{'<｜Assistant｜>' + content + '<｜end▁of▁sentence｜>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<｜tool▁outputs▁begin｜><｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- set ns.is_output_first = false %}{%- else %}{{'\\n<｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<｜tool▁outputs▁end｜>'}}{% endif %}{% if add_generation_prompt and not ns.is_tool %}{{'<｜Assistant｜><think>\\n'}}{% endif %}",
  "clean_up_tokenization_spaces": false,
  "eos_token": "<｜end▁of▁sentence｜>",
  "extra_special_tokens": {},
  "legacy": true,
  "model_max_length": 16384,
  "pad_token": "<｜end▁of▁sentence｜>",
  "sp_model_kwargs": {},
  "tokenizer_class": "LlamaTokenizerFast",
  "unk_token": null,
  "use_default_system_prompt": false
 }