初始化项目，由ModelHub XC社区提供模型

Model: TIGER-Lab/MAmmoTH-13B Source: Original Platform
2026-05-02 12:25:41 +08:00
commit e87d804a7f
16 changed files with 666 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,35 @@
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.bz2 filter=lfs diff=lfs merge=lfs -text
 *.ckpt filter=lfs diff=lfs merge=lfs -text
 *.ftz filter=lfs diff=lfs merge=lfs -text
 *.gz filter=lfs diff=lfs merge=lfs -text
 *.h5 filter=lfs diff=lfs merge=lfs -text
 *.joblib filter=lfs diff=lfs merge=lfs -text
 *.lfs.* filter=lfs diff=lfs merge=lfs -text
 *.mlmodel filter=lfs diff=lfs merge=lfs -text
 *.model filter=lfs diff=lfs merge=lfs -text
 *.msgpack filter=lfs diff=lfs merge=lfs -text
 *.npy filter=lfs diff=lfs merge=lfs -text
 *.npz filter=lfs diff=lfs merge=lfs -text
 *.onnx filter=lfs diff=lfs merge=lfs -text
 *.ot filter=lfs diff=lfs merge=lfs -text
 *.parquet filter=lfs diff=lfs merge=lfs -text
 *.pb filter=lfs diff=lfs merge=lfs -text
 *.pickle filter=lfs diff=lfs merge=lfs -text
 *.pkl filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
 *.rar filter=lfs diff=lfs merge=lfs -text
 *.safetensors filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
 *.tar filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text
 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,101 @@
 ---
 license: mit
 datasets:
 - TIGER-Lab/MathInstruct
 language:
 - en
 ---
 # 🦣 MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning
 Project Page: [https://tiger-ai-lab.github.io/MAmmoTH/](https://tiger-ai-lab.github.io/MAmmoTH/)
 Paper: [https://arxiv.org/pdf/2309.05653.pdf](https://arxiv.org/pdf/2309.05653.pdf)
 Code: [https://github.com/TIGER-AI-Lab/MAmmoTH](https://github.com/TIGER-AI-Lab/MAmmoTH)
 ## Introduction
 We introduce 🦣 MAmmoTH, a series of open-source large language models (LLMs) specifically tailored for general math problem-solving. The MAmmoTH models are trained on 🤗 [MathInstruct Dataset](https://huggingface.co/datasets/TIGER-Lab/MathInstruct), a meticulously curated instruction tuning dataset that is lightweight yet generalizable. MathInstruct is compiled from 13 math rationale datasets, six of which are newly curated by this work. It uniquely focuses on the hybrid use of chain-of-thought (CoT) and program-of-thought (PoT) rationales, and ensures extensive coverage of diverse mathematical fields. 
 |     | **Base Model: Llama-2**                                       | **Base Model: Code Llama**                                               |
 |-----|---------------------------------------------------------------|--------------------------------------------------------------------------|
 | 7B  | 🦣 [MAmmoTH-7B](https://huggingface.co/TIGER-Lab/MAmmoTH-7B)   | 🦣 [MAmmoTH-Coder-7B](https://huggingface.co/TIGER-Lab/MAmmoTH-Coder-7B)  |
 | 13B | 🦣 [MAmmoTH-13B](https://huggingface.co/TIGER-Lab/MAmmoTH-13B) | 🦣 [MAmmoTH-Coder-13B](https://huggingface.co/TIGER-Lab/MAmmoTH-Coder-13B)|
 | 34B | -                                                             | 🦣 [MAmmoTH-Coder-34B](https://huggingface.co/TIGER-Lab/MAmmoTH-Coder-34B)|
 | 70B | 🦣 [MAmmoTH-70B](https://huggingface.co/TIGER-Lab/MAmmoTH-70B) | -                                                                        |
 ## Training Data
 The models are trained on the 🤗 [MathInstruct Dataset](https://huggingface.co/datasets/TIGER-Lab/MathInstruct), which is compiled from 13 different math rationale datasets. Check out the dataset card for more details.
 ## Training Procedure
 The models are fine-tuned with the MathInstruct dataset using the original Llama-2 and Code Llama models as base models. The training procedure varies for different models based on their sizes. Check out our paper for more details.
 ## Evaluation
 The models are evaluated using open-ended and multiple-choice math problems from several datasets. Here are the results:
 | **Model**             	| **Decoding** 	| **GSM**  	| **MATH** 	| **AQuA** 	| **NumG** 	| **SVA**  	| **Mat**  	| **Sim**  	| **SAT**  	| **MMLU** 	| **AVG**  	|
 |-----------------------|--------------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|
 | **MAmmoTH-7B**        	| CoT          	| 50.5     	| 10.4     	| 43.7     	| 44.0     	| 47.3     	| 9.2      	| 18.9     	| 32.7     	| 39.9     	| 33.0     	|
 |                       	| PoT          	| 51.6     	| 28.7     	| 43.3     	| 52.3     	| 65.1     	| 41.9     	| 48.2     	| 39.1     	| 44.6     	| 46.1     	|
 |                       	| **Hybrid**   	| **53.6** 	| **31.5** 	| **44.5** 	| **61.2** 	| **67.7** 	| **46.3** 	| **41.2** 	| **42.7** 	| **42.6** 	| **47.9** 	|
 | **MAmmoTH-Coder-7B**  	| CoT          	| 22.4     	| 7.9      	| 36.2     	| 36.0     	| 37.0     	| 8.2      	| 7.2      	| 32.7     	| 34.6     	| 24.7     	|
 |                       	| PoT          	| 58.8     	| 32.1     	| 47.2     	| 57.1     	| 71.1     	| 53.9     	| 44.6     	| 40.0     	| 47.8     	| 50.3     	|
 |                       	| **Hybrid**   	| **59.4** 	| **33.4** 	| **47.2** 	| **66.4** 	| **71.4** 	| **55.4** 	| **45.9** 	| **40.5** 	| **48.3** 	| **52.0** 	|
 | **MAmmoTH-13B**       	| CoT          	| 56.3     	| 12.9     	| 45.3     	| 45.6     	| 53.8     	| 11.7     	| 22.4     	| 43.6     	| 42.3     	| 37.1     	|
 |                       	| PoT          	| 61.3     	| 32.6     	| 48.8     	| 59.6     	| 72.2     	| 48.5     	| 40.3     	| 46.8     	| 45.4     	| 50.6     	|
 |                       	| **Hybrid**   	| **62.0** 	| **34.2** 	| **51.6** 	| **68.7** 	| **72.4** 	| **49.2** 	| **43.2** 	| **46.8** 	| **47.6** 	| **52.9** 	|
 | **MAmmoTH-Coder-13B** 	| CoT          	| 32.1     	| 10.2     	| 40.6     	| 36.2     	| 43.0     	| 9.6      	| 10.1     	| 40.9     	| 36.6     	| 28.8     	|
 |                       	| PoT          	| 64.3     	| 35.2     	| 46.8     	| 54.2     	| 73.2     	| 60.0     	| 44.2     	| 48.2     	| 48.2     	| 52.7     	|
 |                       	| **Hybrid**   	| **64.7** 	| **36.3** 	| **46.9** 	| **66.8** 	| **73.7** 	| **61.5** 	| **47.1** 	| **48.6** 	| **48.3** 	| **54.9** 	|
 | **MAmmoTH-Coder-33B** 	| CoT          	| 34.3     	| 11.6     	| 39.0     	| 36.2     	| 44.6     	| 10.8     	| 10.9     	| 46.4     	| 42.9     	| 30.7     	|
 |                       	| PoT          	| 72.3     	| 42.8     	| 53.8     	| 59.6     	| 84.0     	| 64.7     	| 50.6     	| 58.6     	| 52.7     	| 59.9     	|
 |                       	| **Hybrid**   	| **72.7** 	| **43.6** 	| **54.7** 	| **71.6** 	| **84.3** 	| **65.4** 	| **51.8** 	| **60.9** 	| **53.8** 	| **62.1** 	|
 | **MAmmoTH-70B**       	| CoT          	| 72.4     	| 21.1     	| 57.9     	| 58.9     	| 71.6     	| 20.0     	| 31.9     	| 57.3     	| 52.1     	| 49.2     	|
 |                       	| PoT          	| 76.7     	| 40.1     	| 60.2     	| 64.3     	| 81.7     	| 55.3     	| 45.3     	| 64.1     	| 53.5     	| 60.1     	|
 |                       	| **Hybrid**   	| **76.9** 	| **41.8** 	| **65.0** 	| **74.4** 	| **82.4** 	| **55.6** 	| **51.4** 	| **66.4** 	| **56.7** 	| **63.4** 	|
 ## Usage
 You can use the models through Huggingface's Transformers library. Use the pipeline function to create a text-generation pipeline with the model of your choice, then feed in a math problem to get the solution.
 Check our Github repo for more advanced use: [https://github.com/TIGER-AI-Lab/MAmmoTH](https://github.com/TIGER-AI-Lab/MAmmoTH)
 ## Prompt Format
 If you want to do CoT:
 ```
 Below is an instruction that describes a task. Write a response that appropriately completes the request.
 ### Instruction:
 {instruction}
 ### Response:
 ```
 If you want to do PoT:
 ```
 Below is an instruction that describes a task. Write a response that appropriately completes the request.
 ### Instruction:
 {instruction} Let's write a program.
 ### Response:
 ```
 ## Intended Uses
 These models are trained for research purposes. They are designed to solve general math problems. They can be used in educational software, tutoring systems, or any application where a solution to a math problem is needed. The models can generate both a chain of thought (CoT) rationale and a program of thought (PoT) rationale, providing a comprehensive solution to a given math problem.
 ## Limitations
 We've tried our best to build math generalist models. However, we acknowledge that the models' performance may vary based on the complexity and specifics of the math problem. Still not all mathematical fields can be covered comprehensively.
 ## Citation
 If you use the models, data, or code from this project, please cite the original paper:
 ```
@article{yue2023mammoth,
  title={MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning},
  author={Xiang Yue, Xingwei Qu, Ge Zhang, Yao Fu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen},
  journal={arXiv preprint arXiv:2309.05653},
  year={2023}
 }
 ```
--- a/added_tokens.json
+++ b/added_tokens.json
@@ -0,0 +1,3 @@
 {
  "[PAD]": 32000
 }
--- a/config.json
+++ b/config.json
@@ -0,0 +1,26 @@
 {
  "_name_or_path": "/ML-A100/home/xiangyue/models/Llama-2-13b-hf",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 5120,
  "initializer_range": 0.02,
  "intermediate_size": 13824,
  "max_position_embeddings": 4096,
  "model_type": "llama",
  "num_attention_heads": 40,
  "num_hidden_layers": 40,
  "num_key_value_heads": 40,
  "pad_token_id": 0,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "tie_word_embeddings": false,
  "torch_dtype": "float32",
  "transformers_version": "4.29.1",
  "use_cache": true,
  "vocab_size": 32001
 }
--- a/configuration.json
+++ b/configuration.json
@@ -0,0 +1 @@
 {"framework": "pytorch", "task": "text-generation", "allow_remote": true}
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,10 @@
 {
  "bos_token_id": 1,
  "do_sample": true,
  "eos_token_id": 2,
  "max_length": 4096,
  "pad_token_id": 0,
  "temperature": 0.6,
  "top_p": 0.9,
  "transformers_version": "4.29.1"
 }
--- a/pytorch_model-00001-of-00006.bin
+++ b/pytorch_model-00001-of-00006.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:74b5319f0991b558e8316e7b891ea634548f5b53bdf655c05a4dde47de855de8
 size 9956566923
--- a/pytorch_model-00002-of-00006.bin
+++ b/pytorch_model-00002-of-00006.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:da20934d63d91d2f83294f716d875d7468ca6524a49ff74bbb86235abde6072b
 size 9940859009
--- a/pytorch_model-00003-of-00006.bin
+++ b/pytorch_model-00003-of-00006.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:480130a352b28dc63cfd739eb31eedace9ad932ef043d2608f967d205320151b
 size 9940859567
--- a/pytorch_model-00004-of-00006.bin
+++ b/pytorch_model-00004-of-00006.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:715eebf63cc7e267fc12ef4f3e0fc6f34ea331aa28e5a0e569065055c581a796
 size 9867417913
--- a/pytorch_model-00005-of-00006.bin
+++ b/pytorch_model-00005-of-00006.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:5287dfd8be851682fc2055a3494111d199c7b161e7ca022c77d9db50962af0a9
 size 9867459649
--- a/pytorch_model-00006-of-00006.bin
+++ b/pytorch_model-00006-of-00006.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:40191a3ad0b48a0eb66cf5aee5f45d0f9ae56feb43c102af71b50aa46ebefb3e
 size 2490497199
--- a/pytorch_model.bin.index.json
+++ b/pytorch_model.bin.index.json
@@ -0,0 +1,410 @@
 {
  "metadata": {
    "total_size": 52063508480
  },
  "weight_map": {
    "lm_head.weight": "pytorch_model-00006-of-00006.bin",
    "model.embed_tokens.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.0.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00006.bin",
    "model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.1.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00006.bin",
    "model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.10.input_layernorm.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.10.mlp.down_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.10.mlp.gate_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.10.mlp.up_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.10.post_attention_layernorm.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.10.self_attn.k_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.10.self_attn.o_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.10.self_attn.q_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.10.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00006.bin",
    "model.layers.10.self_attn.v_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.11.input_layernorm.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.11.mlp.down_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.11.mlp.gate_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.11.mlp.up_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.11.post_attention_layernorm.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.11.self_attn.k_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.11.self_attn.o_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.11.self_attn.q_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.11.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00006.bin",
    "model.layers.11.self_attn.v_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.12.input_layernorm.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.12.mlp.down_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.12.mlp.gate_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.12.mlp.up_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.12.post_attention_layernorm.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.12.self_attn.k_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.12.self_attn.o_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.12.self_attn.q_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.12.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00006.bin",
    "model.layers.12.self_attn.v_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.13.input_layernorm.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.13.mlp.down_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.13.mlp.gate_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.13.mlp.up_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.13.post_attention_layernorm.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.13.self_attn.k_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.13.self_attn.o_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.13.self_attn.q_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.13.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00006.bin",
    "model.layers.13.self_attn.v_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.14.input_layernorm.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.14.mlp.down_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.14.mlp.gate_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.14.mlp.up_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.14.post_attention_layernorm.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.14.self_attn.k_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.14.self_attn.o_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.14.self_attn.q_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.14.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00006.bin",
    "model.layers.14.self_attn.v_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.15.input_layernorm.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.15.mlp.down_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.15.mlp.gate_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.15.mlp.up_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.15.post_attention_layernorm.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.15.self_attn.k_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.15.self_attn.o_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.15.self_attn.q_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.15.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00006.bin",
    "model.layers.15.self_attn.v_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.16.input_layernorm.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.16.mlp.down_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.16.mlp.gate_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.16.mlp.up_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.16.post_attention_layernorm.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.16.self_attn.k_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.16.self_attn.o_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.16.self_attn.q_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.16.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00006.bin",
    "model.layers.16.self_attn.v_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.17.input_layernorm.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.17.mlp.down_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.17.mlp.gate_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.17.mlp.up_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.17.post_attention_layernorm.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.17.self_attn.k_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.17.self_attn.o_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.17.self_attn.q_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.17.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00006.bin",
    "model.layers.17.self_attn.v_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.18.input_layernorm.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.18.mlp.down_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.18.mlp.gate_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.18.mlp.up_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.18.post_attention_layernorm.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.18.self_attn.k_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.18.self_attn.o_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.18.self_attn.q_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.18.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00006.bin",
    "model.layers.18.self_attn.v_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.19.input_layernorm.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.19.mlp.down_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.19.mlp.gate_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.19.mlp.up_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.19.post_attention_layernorm.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.19.self_attn.k_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.19.self_attn.o_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.19.self_attn.q_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.19.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00006.bin",
    "model.layers.19.self_attn.v_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.2.mlp.down_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.2.mlp.gate_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.2.mlp.up_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.2.self_attn.o_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.2.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00006.bin",
    "model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.20.input_layernorm.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.20.mlp.down_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.20.mlp.gate_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.20.mlp.up_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.20.post_attention_layernorm.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.20.self_attn.k_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.20.self_attn.o_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.20.self_attn.q_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.20.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00006.bin",
    "model.layers.20.self_attn.v_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.21.input_layernorm.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.21.mlp.down_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.21.mlp.gate_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.21.mlp.up_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.21.post_attention_layernorm.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.21.self_attn.k_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.21.self_attn.o_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.21.self_attn.q_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.21.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00006.bin",
    "model.layers.21.self_attn.v_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.22.input_layernorm.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.22.mlp.down_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.22.mlp.gate_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.22.mlp.up_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.22.post_attention_layernorm.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.22.self_attn.k_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.22.self_attn.o_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.22.self_attn.q_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.22.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00006.bin",
    "model.layers.22.self_attn.v_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.23.input_layernorm.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.23.mlp.down_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.23.mlp.gate_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.23.mlp.up_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.23.post_attention_layernorm.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.23.self_attn.k_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.23.self_attn.o_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.23.self_attn.q_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.23.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00006.bin",
    "model.layers.23.self_attn.v_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.24.input_layernorm.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.24.mlp.down_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.24.mlp.gate_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.24.mlp.up_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.24.post_attention_layernorm.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.24.self_attn.k_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.24.self_attn.o_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.24.self_attn.q_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.24.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00006.bin",
    "model.layers.24.self_attn.v_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.25.input_layernorm.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.25.mlp.down_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.25.mlp.gate_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.25.mlp.up_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.25.post_attention_layernorm.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.25.self_attn.k_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.25.self_attn.o_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.25.self_attn.q_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.25.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00006.bin",
    "model.layers.25.self_attn.v_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.26.input_layernorm.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.26.mlp.down_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.26.mlp.gate_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.26.mlp.up_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.26.post_attention_layernorm.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.26.self_attn.k_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.26.self_attn.o_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.26.self_attn.q_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.26.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00006.bin",
    "model.layers.26.self_attn.v_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.27.input_layernorm.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.27.mlp.down_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.27.mlp.gate_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.27.mlp.up_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.27.post_attention_layernorm.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.27.self_attn.k_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.27.self_attn.o_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.27.self_attn.q_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.27.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00006.bin",
    "model.layers.27.self_attn.v_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.28.input_layernorm.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.28.mlp.down_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.28.mlp.gate_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.28.mlp.up_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.28.post_attention_layernorm.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.28.self_attn.k_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.28.self_attn.o_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.28.self_attn.q_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.28.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00006.bin",
    "model.layers.28.self_attn.v_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.29.input_layernorm.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.29.mlp.down_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.29.mlp.gate_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.29.mlp.up_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.29.post_attention_layernorm.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.29.self_attn.k_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.29.self_attn.o_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.29.self_attn.q_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.29.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00006.bin",
    "model.layers.29.self_attn.v_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.3.input_layernorm.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.3.mlp.down_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.3.mlp.gate_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.3.mlp.up_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.3.post_attention_layernorm.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.3.self_attn.k_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.3.self_attn.o_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.3.self_attn.q_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.3.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00006.bin",
    "model.layers.3.self_attn.v_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.30.input_layernorm.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.30.mlp.down_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.30.mlp.gate_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.30.mlp.up_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.30.post_attention_layernorm.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.30.self_attn.k_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.30.self_attn.o_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.30.self_attn.q_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.30.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00006.bin",
    "model.layers.30.self_attn.v_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.31.input_layernorm.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.31.mlp.down_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.31.mlp.gate_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.31.mlp.up_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.31.post_attention_layernorm.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.31.self_attn.k_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.31.self_attn.o_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.31.self_attn.q_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.31.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00006.bin",
    "model.layers.31.self_attn.v_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.32.input_layernorm.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.32.mlp.down_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.32.mlp.gate_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.32.mlp.up_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.32.post_attention_layernorm.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.32.self_attn.k_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.32.self_attn.o_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.32.self_attn.q_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.32.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00006.bin",
    "model.layers.32.self_attn.v_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.33.input_layernorm.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.33.mlp.down_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.33.mlp.gate_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.33.mlp.up_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.33.post_attention_layernorm.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.33.self_attn.k_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.33.self_attn.o_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.33.self_attn.q_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.33.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00006.bin",
    "model.layers.33.self_attn.v_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.34.input_layernorm.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.34.mlp.down_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.34.mlp.gate_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.34.mlp.up_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.34.post_attention_layernorm.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.34.self_attn.k_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.34.self_attn.o_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.34.self_attn.q_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.34.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00006.bin",
    "model.layers.34.self_attn.v_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.35.input_layernorm.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.35.mlp.down_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.35.mlp.gate_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.35.mlp.up_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.35.post_attention_layernorm.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.35.self_attn.k_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.35.self_attn.o_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.35.self_attn.q_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.35.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00006.bin",
    "model.layers.35.self_attn.v_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.36.input_layernorm.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.36.mlp.down_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.36.mlp.gate_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.36.mlp.up_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.36.post_attention_layernorm.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.36.self_attn.k_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.36.self_attn.o_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.36.self_attn.q_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.36.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00006.bin",
    "model.layers.36.self_attn.v_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.37.input_layernorm.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.37.mlp.down_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.37.mlp.gate_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.37.mlp.up_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.37.post_attention_layernorm.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.37.self_attn.k_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.37.self_attn.o_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.37.self_attn.q_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.37.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00006.bin",
    "model.layers.37.self_attn.v_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.38.input_layernorm.weight": "pytorch_model-00006-of-00006.bin",
    "model.layers.38.mlp.down_proj.weight": "pytorch_model-00006-of-00006.bin",
    "model.layers.38.mlp.gate_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.38.mlp.up_proj.weight": "pytorch_model-00006-of-00006.bin",
    "model.layers.38.post_attention_layernorm.weight": "pytorch_model-00006-of-00006.bin",
    "model.layers.38.self_attn.k_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.38.self_attn.o_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.38.self_attn.q_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.38.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00006.bin",
    "model.layers.38.self_attn.v_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.39.input_layernorm.weight": "pytorch_model-00006-of-00006.bin",
    "model.layers.39.mlp.down_proj.weight": "pytorch_model-00006-of-00006.bin",
    "model.layers.39.mlp.gate_proj.weight": "pytorch_model-00006-of-00006.bin",
    "model.layers.39.mlp.up_proj.weight": "pytorch_model-00006-of-00006.bin",
    "model.layers.39.post_attention_layernorm.weight": "pytorch_model-00006-of-00006.bin",
    "model.layers.39.self_attn.k_proj.weight": "pytorch_model-00006-of-00006.bin",
    "model.layers.39.self_attn.o_proj.weight": "pytorch_model-00006-of-00006.bin",
    "model.layers.39.self_attn.q_proj.weight": "pytorch_model-00006-of-00006.bin",
    "model.layers.39.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00006.bin",
    "model.layers.39.self_attn.v_proj.weight": "pytorch_model-00006-of-00006.bin",
    "model.layers.4.input_layernorm.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.4.mlp.down_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.4.mlp.gate_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.4.mlp.up_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.4.post_attention_layernorm.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.4.self_attn.k_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.4.self_attn.o_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.4.self_attn.q_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.4.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00006.bin",
    "model.layers.4.self_attn.v_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.5.input_layernorm.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.5.mlp.down_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.5.mlp.gate_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.5.mlp.up_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.5.post_attention_layernorm.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.5.self_attn.k_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.5.self_attn.o_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.5.self_attn.q_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.5.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00006.bin",
    "model.layers.5.self_attn.v_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.6.input_layernorm.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.6.mlp.down_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.6.mlp.gate_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.6.mlp.up_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.6.post_attention_layernorm.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.6.self_attn.k_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.6.self_attn.o_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.6.self_attn.q_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.6.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00006.bin",
    "model.layers.6.self_attn.v_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.7.input_layernorm.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.7.mlp.down_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.7.mlp.gate_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.7.mlp.up_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.7.post_attention_layernorm.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.7.self_attn.k_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.7.self_attn.o_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.7.self_attn.q_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.7.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00006.bin",
    "model.layers.7.self_attn.v_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.8.input_layernorm.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.8.mlp.down_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.8.mlp.gate_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.8.mlp.up_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.8.post_attention_layernorm.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.8.self_attn.k_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.8.self_attn.o_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.8.self_attn.q_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.8.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00006.bin",
    "model.layers.8.self_attn.v_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.9.input_layernorm.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.9.mlp.down_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.9.mlp.gate_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.9.mlp.up_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.9.post_attention_layernorm.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.9.self_attn.k_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.9.self_attn.o_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.9.self_attn.q_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.9.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00006.bin",
    "model.layers.9.self_attn.v_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.norm.weight": "pytorch_model-00006-of-00006.bin"
  }
 }
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,24 @@
 {
  "bos_token": {
    "content": "<s>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "eos_token": {
    "content": "</s>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "pad_token": "[PAD]",
  "unk_token": {
    "content": "<unk>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  }
 }
--- a/tokenizer.model
+++ b/tokenizer.model
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
 size 499723
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,35 @@
 {
  "add_bos_token": true,
  "add_eos_token": false,
  "bos_token": {
    "__type": "AddedToken",
    "content": "<s>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "clean_up_tokenization_spaces": false,
  "eos_token": {
    "__type": "AddedToken",
    "content": "</s>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "legacy": false,
  "model_max_length": 512,
  "pad_token": null,
  "padding_side": "right",
  "sp_model_kwargs": {},
  "tokenizer_class": "LlamaTokenizer",
  "unk_token": {
    "__type": "AddedToken",
    "content": "<unk>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  }
 }
		`@@ -0,0 +1 @@`
							`{"framework": "pytorch", "task": "text-generation", "allow_remote": true}`