初始化项目，由ModelHub XC社区提供模型

Model: Open-Orca/OpenOrca Source: Original Platform
2026-06-01 13:55:12 +08:00
commit 8fdd35be35
16 changed files with 752 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,52 @@
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bz2 filter=lfs diff=lfs merge=lfs -text
 *.ftz filter=lfs diff=lfs merge=lfs -text
 *.gz filter=lfs diff=lfs merge=lfs -text
 *.h5 filter=lfs diff=lfs merge=lfs -text
 *.joblib filter=lfs diff=lfs merge=lfs -text
 *.lfs.* filter=lfs diff=lfs merge=lfs -text
 *.msgpack filter=lfs diff=lfs merge=lfs -text
 *.onnx filter=lfs diff=lfs merge=lfs -text
 *.ot filter=lfs diff=lfs merge=lfs -text
 *.parquet filter=lfs diff=lfs merge=lfs -text
 *.pb filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
 *.rar filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zstandard filter=lfs diff=lfs merge=lfs -text
 *.tfevents* filter=lfs diff=lfs merge=lfs -text
 *.db* filter=lfs diff=lfs merge=lfs -text
 *.ark* filter=lfs diff=lfs merge=lfs -text
 **/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
 **/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
 **/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
 *.safetensors filter=lfs diff=lfs merge=lfs -text
 *.ckpt filter=lfs diff=lfs merge=lfs -text
 *.gguf* filter=lfs diff=lfs merge=lfs -text
 *.ggml filter=lfs diff=lfs merge=lfs -text
 *.llamafile* filter=lfs diff=lfs merge=lfs -text
 *.pt2 filter=lfs diff=lfs merge=lfs -text
 *.mlmodel filter=lfs diff=lfs merge=lfs -text
 *.npy filter=lfs diff=lfs merge=lfs -text
 *.npz filter=lfs diff=lfs merge=lfs -text
 *.pickle filter=lfs diff=lfs merge=lfs -text
 *.pkl filter=lfs diff=lfs merge=lfs -text
 *.tar filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 tokenizer.model filter=lfs diff=lfs merge=lfs -text
 pytorch_model-00003-of-00003.bin filter=lfs diff=lfs merge=lfs -text
 pytorch_model-00002-of-00003.bin filter=lfs diff=lfs merge=lfs -text
 pytorch_model-00001-of-00003.bin filter=lfs diff=lfs merge=lfs -text
--- a/OO_Preview1_AGIEval.png
+++ b/OO_Preview1_AGIEval.png
--- a/OO_Preview1_BigBenchHard.png
+++ b/OO_Preview1_BigBenchHard.png
--- a/Atlas.png
+++ b/Atlas.png
--- a/README.md
+++ b/README.md
@@ -0,0 +1,123 @@
 ---
 license: mit
 language:
 - en
 library_name: transformers
 pipeline_tag: text-generation
 datasets:
 - Open-Orca/OpenOrca
 ---
 <p><h1>🐋 The First OpenOrca Model Preview! 🐋</h1></p>
 ![OpenOrca Logo](https://huggingface.co/datasets/Open-Orca/OpenOrca/resolve/main/OpenOrcaLogo.png "OpenOrca Logo")
 # OpenOrca-Preview1-13B
 We have used our own [OpenOrca dataset](https://huggingface.co/datasets/Open-Orca/OpenOrca) to fine-tune LLaMA-13B.
 This dataset is our attempt to reproduce the dataset generated for Microsoft Research's [Orca Paper](https://arxiv.org/abs/2306.02707).
 We have trained on less than 6% of our data, just to give a preview of what is possible while we further refine our dataset!
 We trained a refined selection of 200k GPT-4 entries from OpenOrca.
 We have filtered our GPT-4 augmentations to remove statements like, "As an AI language model..." and other responses which have been shown to harm model reasoning capabilities. Further details on our dataset curation practices will be forthcoming with our full model releases.
 This release highlights that even a small portion of our training data can produce state of the art results in this model class with training costs <$200 in total.
 Want to visualize our full (pre-filtering) dataset? Check out our [Nomic Atlas Map](https://atlas.nomic.ai/map/c1b88b47-2d9b-47e0-9002-b80766792582/2560fd25-52fe-42f1-a58f-ff5eccc890d2).
  [<img src="https://huggingface.co/Open-Orca/OpenOrca-Preview1-13B/resolve/main/OpenOrca%20Nomic%20Atlas.png" alt="Atlas Nomic Dataset Map" width="400" height="400" />](https://atlas.nomic.ai/map/c1b88b47-2d9b-47e0-9002-b80766792582/2560fd25-52fe-42f1-a58f-ff5eccc890d2)
 We are in-process with training more models, so keep a look out on our org for releases coming soon with exciting partners.
 We will also give sneak-peak announcements on our Discord, which you can find here:
 https://AlignmentLab.ai
 # Evaluation
 We have evaluated OpenOrca-Preview1-13B on hard reasoning tasks from BigBench-Hard and AGIEval as outlined in the Orca paper.
 Our average performance for BigBench-Hard: 0.3753
 Average for AGIEval: 0.3638
 In the Orca paper, they measured their score relative to Vicuna on these evals.
 We've done the same and have found our score averages to ~60% of the total improvement that was shown in the Orca paper.
 So we got 60% of the improvement with 6% of the data!
 ## BigBench-Hard Performance
 ![OpenOrca Preview1 BigBench-Hard Performance](https://huggingface.co/Open-Orca/OpenOrca-Preview1-13B/resolve/main/OO_Preview1_BigBenchHard.png "BigBench-Hard Performance")
 ## AGIEval Performance
 ![OpenOrca Preview1 AGIEval Performance](https://huggingface.co/Open-Orca/OpenOrca-Preview1-13B/resolve/main/OO_Preview1_AGIEval.png "AGIEval Performance")
 We will report our results on  [HuggingFaceH4 Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) Evals once we receive them.
 # Dataset
 We used a small (6%, 200k) subset of our data from OpenOrca, which aims to reproduce the Orca Research Paper dataset.
 As this release is intended as a preview, please await our full releases for further details on the training data.
 # Training
 [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
 We trained with 8x A100-80G GPUs for 15 hours. Commodity cost was < $200.
 We trained for 4 epochs and selected a snapshot at 3 epochs for peak performance.
 Please await our full releases for further training details.
 # Prompting
 It uses the Alpaca format (see [FastChat implementation example](https://github.com/lm-sys/FastChat/blob/daa2b9abe20597ebf34dc5df164d450456610c74/fastchat/conversation.py#L198-L229)):
 ```
 ### Instruction:
 ### Response:
 ```
 # Citation
 ```bibtex
@software{OpenOrca_Preview1,
  title = {OpenOrca_Preview1: A LLaMA-13B Model Fine-tuned on Small Portion of OpenOrcaV1 Dataset},
  author = {Wing Lian and Bleys Goodson and Eugene Pentland and Austin Cook and Chanvichet Vong and "Teknium"},
  year = {2023},
  publisher = {HuggingFace},
  journal = {HuggingFace repository},
  howpublished = {\url{https://https://huggingface.co/Open-Orca/OpenOrca-Preview1-13B},
 }
 ```
 ```bibtex
@misc{mukherjee2023orca,
      title={Orca: Progressive Learning from Complex Explanation Traces of GPT-4}, 
      author={Subhabrata Mukherjee and Arindam Mitra and Ganesh Jawahar and Sahaj Agarwal and Hamid Palangi and Ahmed Awadallah},
      year={2023},
      eprint={2306.02707},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
 }
 ```
 ```bibtex
@misc{longpre2023flan,
      title={The Flan Collection: Designing Data and Methods for Effective Instruction Tuning}, 
      author={Shayne Longpre and Le Hou and Tu Vu and Albert Webson and Hyung Won Chung and Yi Tay and Denny Zhou and Quoc V. Le and Barret Zoph and Jason Wei and Adam Roberts},
      year={2023},
      eprint={2301.13688},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
 }
 ```
 ```bibtex
@software{touvron2023llama,
  title={LLaMA: Open and Efficient Foundation Language Models},
  author={Touvron, Hugo and Lavril, Thibaut and Izacard, Gautier and Martinet, Xavier and Lachaux, Marie-Anne and Lacroix, Timoth{\'e}e and Rozi{\`e}re, Baptiste and Goyal, Naman and Hambro, Eric and Azhar, Faisal and Rodriguez, Aurelien and Joulin, Armand and Grave, Edouard and Lample, Guillaume},
  journal={arXiv preprint arXiv:2302.13971},
  year={2023}
 }
 ```
--- a/config.json
+++ b/config.json
@@ -0,0 +1,24 @@
 {
  "_name_or_path": "/home/sgugger/tmp/llama/llama-13b/",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 5120,
  "initializer_range": 0.02,
  "intermediate_size": 13824,
  "max_position_embeddings": 2048,
  "max_sequence_length": 2048,
  "model_type": "llama",
  "num_attention_heads": 40,
  "num_hidden_layers": 40,
  "pad_token_id": 0,
  "rms_norm_eps": 1e-06,
  "tie_word_embeddings": false,
  "torch_dtype": "float16",
  "transformers_version": "4.28.0.dev0",
  "use_cache": true,
  "vocab_size": 32000
 }
--- a/configs/openorca.yml
+++ b/configs/openorca.yml
@@ -0,0 +1,67 @@
 base_model: huggyllama/llama-13b
 base_model_config: huggyllama/llama-13b
 model_type: LlamaForCausalLM
 tokenizer_type: LlamaTokenizer
 load_in_8bit: false
 load_in_4bit: false
 strict: false
 push_dataset_to_hub: openaccess-ai-collective
 hf_use_auth_token: true
 datasets:
  - path: Open-Orca/oo-gpt4-200k
    type: alpaca_w_system.load_open_orca
 dataset_prepared_path: last_run_prepared
 val_set_size: 0.05
 adapter:
 lora_model_dir:
 sequence_len: 2048
 max_packed_sequence_len:
 lora_r:
 lora_alpha:
 lora_dropout:
 lora_target_modules:
 lora_target_linear:
 lora_fan_in_fan_out:
 wandb_project: open-orca-13b
 wandb_watch:
 wandb_run_id:
 wandb_log_model:
 output_dir: ./open-orca-13b
 gradient_accumulation_steps: 1
 micro_batch_size: 8
 num_epochs: 5
 optimizer: adamw_bnb_8bit
 torchdistx_path:
 lr_scheduler: cosine
 learning_rate: 0.00005
 train_on_inputs: false
 group_by_length: true
 bf16: true
 fp16: false
 tf32: true
 gradient_checkpointing: true
 early_stopping_patience:
 resume_from_checkpoint:
 local_rank:
 logging_steps: 1
 xformers_attention: true
 flash_attention:
 gptq_groupsize:
 gptq_model_v1:
 warmup_steps: 150
 eval_steps: 495 
 save_steps: 2970
 debug:
 deepspeed:
 weight_decay: 0.03
 fsdp:
  - full_shard
  - auto_wrap
 fsdp_config:
  fsdp_offload_params: true
  fsdp_state_dict_type: FULL_STATE_DICT
  fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer
 special_tokens:
  bos_token: "<s>"
  eos_token: "</s>"
  unk_token: "<unk>"
--- a/configuration.json
+++ b/configuration.json
@@ -0,0 +1 @@
 {"framework": "pytorch", "task": "text-generation", "allow_remote": true}
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,7 @@
 {
  "_from_model_config": true,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "pad_token_id": 0,
  "transformers_version": "4.28.0.dev0"
 }
--- a/pytorch_model-00001-of-00003.bin
+++ b/pytorch_model-00001-of-00003.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:bbf75ebe44c0486660355f41c86049ed77b37444f73642b491a728ed7861471b
 size 9948733550
--- a/pytorch_model-00002-of-00003.bin
+++ b/pytorch_model-00002-of-00003.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:69e0df73c5382f56d202f715b2f5f76d99709bf106c78e987fa01d03799a175c
 size 9904170208
--- a/pytorch_model-00003-of-00003.bin
+++ b/pytorch_model-00003-of-00003.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:a3137d84aa8f5599c9c12b12515355ec0b090a7b6d93469070fcb5dbdc70af54
 size 6178986889
--- a/pytorch_model.bin.index.json
+++ b/pytorch_model.bin.index.json
@@ -0,0 +1,410 @@
 {
  "metadata": {
    "total_size": 26031738880
  },
  "weight_map": {
    "lm_head.weight": "pytorch_model-00003-of-00003.bin",
    "model.embed_tokens.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.0.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
    "model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.1.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
    "model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.10.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.10.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.10.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.10.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.10.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.10.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.10.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.10.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.10.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
    "model.layers.10.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.11.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.11.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.11.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.11.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.11.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.11.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.11.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.11.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.11.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
    "model.layers.11.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.12.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.12.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.12.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.12.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.12.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.12.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.12.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.12.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.12.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
    "model.layers.12.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.13.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.13.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.13.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.13.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.13.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.13.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.13.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.13.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.13.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
    "model.layers.13.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.14.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.14.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.14.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.14.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.14.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.14.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.14.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.14.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.14.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
    "model.layers.14.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.15.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.15.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.15.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.15.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.15.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.15.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.15.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.15.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.15.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
    "model.layers.15.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.16.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.16.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.16.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.16.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.16.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.16.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.16.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.16.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.16.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
    "model.layers.16.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.17.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.17.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.17.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.17.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.17.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.17.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.17.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.17.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.17.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
    "model.layers.17.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.18.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.18.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.18.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.18.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.18.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.18.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.18.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.18.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.18.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
    "model.layers.18.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.19.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.19.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.19.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.19.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.19.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.19.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.19.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.19.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.19.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
    "model.layers.19.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.2.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.2.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.2.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.2.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.2.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
    "model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.20.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.20.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.20.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.20.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.20.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.20.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.20.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.20.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.20.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
    "model.layers.20.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.21.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.21.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.21.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.21.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.21.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.21.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.21.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.21.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.21.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
    "model.layers.21.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.22.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.22.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.22.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.22.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.22.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.22.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.22.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.22.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.22.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
    "model.layers.22.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.23.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.23.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.23.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.23.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.23.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.23.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.23.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.23.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.23.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
    "model.layers.23.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.24.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.24.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.24.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.24.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.24.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.24.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.24.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.24.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.24.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
    "model.layers.24.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.25.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.25.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.25.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.25.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.25.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.25.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.25.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.25.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.25.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
    "model.layers.25.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.26.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.26.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.26.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.26.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.26.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.26.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.26.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.26.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.26.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
    "model.layers.26.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.27.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.27.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.27.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.27.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.27.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.27.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.27.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.27.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.27.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
    "model.layers.27.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.28.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.28.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.28.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.28.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.28.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.28.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.28.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.28.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.28.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
    "model.layers.28.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.29.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.29.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.29.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.29.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.29.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.29.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.29.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.29.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.29.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
    "model.layers.29.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.3.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.3.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.3.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.3.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.3.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.3.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.3.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.3.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.3.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
    "model.layers.3.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.30.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.30.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.30.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.30.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.30.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.30.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.30.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.30.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.30.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
    "model.layers.30.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
    "model.layers.31.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.31.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.31.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.31.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.31.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.31.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.31.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.31.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.31.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
    "model.layers.31.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.32.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.32.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.32.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.32.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.32.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.32.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.32.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.32.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.32.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
    "model.layers.32.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.33.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.33.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.33.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.33.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.33.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.33.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.33.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.33.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.33.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
    "model.layers.33.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.34.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.34.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.34.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.34.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.34.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.34.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.34.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.34.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.34.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
    "model.layers.34.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.35.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.35.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.35.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.35.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.35.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.35.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.35.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.35.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.35.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
    "model.layers.35.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.36.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.36.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.36.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.36.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.36.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.36.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.36.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.36.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.36.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
    "model.layers.36.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.37.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.37.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.37.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.37.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.37.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.37.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.37.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.37.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.37.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
    "model.layers.37.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.38.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.38.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.38.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.38.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.38.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.38.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.38.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.38.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.38.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
    "model.layers.38.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.39.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.39.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.39.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.39.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.39.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.39.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.39.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.39.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.39.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
    "model.layers.39.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
    "model.layers.4.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.4.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.4.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.4.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.4.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.4.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.4.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.4.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.4.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
    "model.layers.4.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.5.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.5.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.5.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.5.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.5.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.5.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.5.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.5.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.5.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
    "model.layers.5.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.6.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.6.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.6.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.6.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.6.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.6.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.6.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.6.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.6.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
    "model.layers.6.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.7.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.7.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.7.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.7.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.7.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.7.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.7.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.7.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.7.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
    "model.layers.7.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.8.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.8.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.8.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.8.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.8.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.8.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.8.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.8.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.8.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
    "model.layers.8.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.9.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.9.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.9.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.9.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.9.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.9.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.9.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.9.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.layers.9.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
    "model.layers.9.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
    "model.norm.weight": "pytorch_model-00003-of-00003.bin"
  }
 }
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,23 @@
 {
  "bos_token": {
    "content": "<s>",
    "lstrip": false,
    "normalized": true,
    "rstrip": false,
    "single_word": false
  },
  "eos_token": {
    "content": "</s>",
    "lstrip": false,
    "normalized": true,
    "rstrip": false,
    "single_word": false
  },
  "unk_token": {
    "content": "<unk>",
    "lstrip": false,
    "normalized": true,
    "rstrip": false,
    "single_word": false
  }
 }
--- a/tokenizer.model
+++ b/tokenizer.model
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
 size 499723
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,33 @@
 {
  "add_bos_token": true,
  "add_eos_token": false,
  "bos_token": {
    "__type": "AddedToken",
    "content": "<s>",
    "lstrip": false,
    "normalized": true,
    "rstrip": false,
    "single_word": false
  },
  "clean_up_tokenization_spaces": false,
  "eos_token": {
    "__type": "AddedToken",
    "content": "</s>",
    "lstrip": false,
    "normalized": true,
    "rstrip": false,
    "single_word": false
  },
  "model_max_length": 2048,
  "pad_token": null,
  "sp_model_kwargs": {},
  "tokenizer_class": "LlamaTokenizer",
  "unk_token": {
    "__type": "AddedToken",
    "content": "<unk>",
    "lstrip": false,
    "normalized": true,
    "rstrip": false,
    "single_word": false
  }
 }
		`@@ -0,0 +1 @@`
							`{"framework": "pytorch", "task": "text-generation", "allow_remote": true}`