初始化项目，由ModelHub XC社区提供模型

Model: kevinpro/Vicuna-13B-CoT Source: Original Platform
2026-05-12 21:17:24 +08:00
commit 185a6c73ef
51 changed files with 6762 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,34 @@
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.bz2 filter=lfs diff=lfs merge=lfs -text
 *.ckpt filter=lfs diff=lfs merge=lfs -text
 *.ftz filter=lfs diff=lfs merge=lfs -text
 *.gz filter=lfs diff=lfs merge=lfs -text
 *.h5 filter=lfs diff=lfs merge=lfs -text
 *.joblib filter=lfs diff=lfs merge=lfs -text
 *.lfs.* filter=lfs diff=lfs merge=lfs -text
 *.mlmodel filter=lfs diff=lfs merge=lfs -text
 *.model filter=lfs diff=lfs merge=lfs -text
 *.msgpack filter=lfs diff=lfs merge=lfs -text
 *.npy filter=lfs diff=lfs merge=lfs -text
 *.npz filter=lfs diff=lfs merge=lfs -text
 *.onnx filter=lfs diff=lfs merge=lfs -text
 *.ot filter=lfs diff=lfs merge=lfs -text
 *.parquet filter=lfs diff=lfs merge=lfs -text
 *.pb filter=lfs diff=lfs merge=lfs -text
 *.pickle filter=lfs diff=lfs merge=lfs -text
 *.pkl filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
 *.rar filter=lfs diff=lfs merge=lfs -text
 *.safetensors filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text
 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,208 @@
 ---
 datasets:
 - QingyiSi/Alpaca-CoT
 language:
 - en
 library_name: transformers
 pipeline_tag: text-generation
 tags:
 - code
 ---
 # Model Card for Model ID
 SFT to enhance the CoT capabiliy of Vicuna
 If you find the model helpful, please click "like" to support us. 
 We also welcome feedback on your usage experience and any issues you encounter in the issues section.
 Another 7B version: https://huggingface.co/kevinpro/Vicuna-7B-CoT
 ## Model Details
 ### Model Description
 <!-- Provide a longer summary of what this model is. -->
 - **Developed by:** [More Information Needed]
 - **Shared by [optional]:** [More Information Needed]
 - **Model type:** [More Information Needed]
 - **Language(s) (NLP):** [More Information Needed]
 - **License:** [More Information Needed]
 - **Finetuned from model [optional]:** [More Information Needed]
 ### Model Sources [optional]
 <!-- Provide the basic links for the model. -->
 - **Repository:** [More Information Needed]
 - **Paper [optional]:** [More Information Needed]
 - **Demo [optional]:** [More Information Needed]
 ## Uses
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 ### Direct Use
 <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
 [More Information Needed]
 ### Downstream Use [optional]
 <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
 [More Information Needed]
 ### Out-of-Scope Use
 <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
 [More Information Needed]
 ## Bias, Risks, and Limitations
 <!-- This section is meant to convey both technical and sociotechnical limitations. -->
 [More Information Needed]
 ### Recommendations
 <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
 Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 ## How to Get Started with the Model
 Use the code below to get started with the model.
 [More Information Needed]
 ## Training Details
 ### Training Data
 <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
 [More Information Needed]
 ### Training Procedure 
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 #### Preprocessing [optional]
 [More Information Needed]
 #### Training Hyperparameters
 - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 #### Speeds, Sizes, Times [optional]
 <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
 [More Information Needed]
 ## Evaluation
 <!-- This section describes the evaluation protocols and provides the results. -->
 ### Testing Data, Factors & Metrics
 #### Testing Data
 <!-- This should link to a Data Card if possible. -->
 [More Information Needed]
 #### Factors
 <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
 [More Information Needed]
 #### Metrics
 <!-- These are the evaluation metrics being used, ideally with a description of why. -->
 [More Information Needed]
 ### Results
 [More Information Needed]
 #### Summary
 ## Model Examination [optional]
 <!-- Relevant interpretability work for the model goes here -->
 [More Information Needed]
 ## Environmental Impact
 <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
 Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
 - **Hardware Type:** [More Information Needed]
 - **Hours used:** [More Information Needed]
 - **Cloud Provider:** [More Information Needed]
 - **Compute Region:** [More Information Needed]
 - **Carbon Emitted:** [More Information Needed]
 ## Technical Specifications [optional]
 ### Model Architecture and Objective
 [More Information Needed]
 ### Compute Infrastructure
 [More Information Needed]
 #### Hardware
 [More Information Needed]
 #### Software
 [More Information Needed]
 ## Citation [optional]
 <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
 **BibTeX:**
 [More Information Needed]
 **APA:**
 [More Information Needed]
 ## Glossary [optional]
 <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
 [More Information Needed]
 ## More Information [optional]
 [More Information Needed]
 ## Model Card Authors [optional]
 [More Information Needed]
 ## Model Card Contact
 [More Information Needed]
--- a/config.json
+++ b/config.json
@@ -0,0 +1,24 @@
 {
  "_name_or_path": "/mnt/data1/sheshuaijie/Code/Alpaca-CoT/mnt/data1/sheshuaijie/Output/CoT/Trained/vicuna-13b_english-cot+auto-cot_0.0002/merged",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "bos_token_id": 0,
  "eos_token_id": 1,
  "hidden_act": "silu",
  "hidden_size": 5120,
  "initializer_range": 0.02,
  "intermediate_size": 13824,
  "max_position_embeddings": 2048,
  "max_sequence_length": 2048,
  "model_type": "llama",
  "num_attention_heads": 40,
  "num_hidden_layers": 40,
  "pad_token_id": -1,
  "rms_norm_eps": 1e-06,
  "tie_word_embeddings": false,
  "torch_dtype": "float32",
  "transformers_version": "4.29.2",
  "use_cache": true,
  "vocab_size": 32000
 }
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,7 @@
 {
  "_from_model_config": true,
  "bos_token_id": 0,
  "eos_token_id": 1,
  "pad_token_id": 0,
  "transformers_version": "4.29.2"
 }
--- a/pytorch_model-00001-of-00006.bin
+++ b/pytorch_model-00001-of-00006.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:5362a059f344e14022276b27e7bc0bc1a6bb14f6bf26afdabf526bffce54cfd0
 size 9956543883
--- a/pytorch_model-00002-of-00006.bin
+++ b/pytorch_model-00002-of-00006.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:acea5d185b57e323f3b0d97ce81ec78fbc602b945a76230e80260c4fa8413d9b
 size 9940856385
--- a/pytorch_model-00003-of-00006.bin
+++ b/pytorch_model-00003-of-00006.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:7f6a894e848279b57f9d7a14fcdca8e02c9f14bd39324a9a6ab6c708ddf8312a
 size 9940856943
--- a/pytorch_model-00004-of-00006.bin
+++ b/pytorch_model-00004-of-00006.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:12aade76bb9bb9e2c98e3ccdc1dcfd4571b57a15afc077a5bdfb13aea1c0598b
 size 9867415289
--- a/pytorch_model-00005-of-00006.bin
+++ b/pytorch_model-00005-of-00006.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:1f984f23a73ebb6f10fc7b35afc9f87925d0a821dcf9f14eddf386738635a07e
 size 9867456961
--- a/pytorch_model-00006-of-00006.bin
+++ b/pytorch_model-00006-of-00006.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:226eee221f360c205ed4918d5ba0705048c635aea978f056d5acaf6123dec0e3
 size 2490476207
--- a/pytorch_model.bin.index.json
+++ b/pytorch_model.bin.index.json
@@ -0,0 +1,410 @@
 {
  "metadata": {
    "total_size": 52063467520
  },
  "weight_map": {
    "lm_head.weight": "pytorch_model-00006-of-00006.bin",
    "model.embed_tokens.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.0.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00006.bin",
    "model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.1.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00006.bin",
    "model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.10.input_layernorm.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.10.mlp.down_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.10.mlp.gate_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.10.mlp.up_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.10.post_attention_layernorm.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.10.self_attn.k_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.10.self_attn.o_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.10.self_attn.q_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.10.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00006.bin",
    "model.layers.10.self_attn.v_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.11.input_layernorm.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.11.mlp.down_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.11.mlp.gate_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.11.mlp.up_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.11.post_attention_layernorm.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.11.self_attn.k_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.11.self_attn.o_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.11.self_attn.q_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.11.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00006.bin",
    "model.layers.11.self_attn.v_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.12.input_layernorm.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.12.mlp.down_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.12.mlp.gate_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.12.mlp.up_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.12.post_attention_layernorm.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.12.self_attn.k_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.12.self_attn.o_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.12.self_attn.q_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.12.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00006.bin",
    "model.layers.12.self_attn.v_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.13.input_layernorm.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.13.mlp.down_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.13.mlp.gate_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.13.mlp.up_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.13.post_attention_layernorm.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.13.self_attn.k_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.13.self_attn.o_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.13.self_attn.q_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.13.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00006.bin",
    "model.layers.13.self_attn.v_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.14.input_layernorm.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.14.mlp.down_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.14.mlp.gate_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.14.mlp.up_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.14.post_attention_layernorm.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.14.self_attn.k_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.14.self_attn.o_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.14.self_attn.q_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.14.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00006.bin",
    "model.layers.14.self_attn.v_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.15.input_layernorm.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.15.mlp.down_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.15.mlp.gate_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.15.mlp.up_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.15.post_attention_layernorm.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.15.self_attn.k_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.15.self_attn.o_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.15.self_attn.q_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.15.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00006.bin",
    "model.layers.15.self_attn.v_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.16.input_layernorm.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.16.mlp.down_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.16.mlp.gate_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.16.mlp.up_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.16.post_attention_layernorm.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.16.self_attn.k_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.16.self_attn.o_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.16.self_attn.q_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.16.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00006.bin",
    "model.layers.16.self_attn.v_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.17.input_layernorm.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.17.mlp.down_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.17.mlp.gate_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.17.mlp.up_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.17.post_attention_layernorm.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.17.self_attn.k_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.17.self_attn.o_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.17.self_attn.q_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.17.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00006.bin",
    "model.layers.17.self_attn.v_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.18.input_layernorm.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.18.mlp.down_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.18.mlp.gate_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.18.mlp.up_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.18.post_attention_layernorm.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.18.self_attn.k_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.18.self_attn.o_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.18.self_attn.q_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.18.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00006.bin",
    "model.layers.18.self_attn.v_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.19.input_layernorm.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.19.mlp.down_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.19.mlp.gate_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.19.mlp.up_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.19.post_attention_layernorm.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.19.self_attn.k_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.19.self_attn.o_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.19.self_attn.q_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.19.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00006.bin",
    "model.layers.19.self_attn.v_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.2.mlp.down_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.2.mlp.gate_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.2.mlp.up_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.2.self_attn.o_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.2.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00006.bin",
    "model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.20.input_layernorm.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.20.mlp.down_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.20.mlp.gate_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.20.mlp.up_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.20.post_attention_layernorm.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.20.self_attn.k_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.20.self_attn.o_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.20.self_attn.q_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.20.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00006.bin",
    "model.layers.20.self_attn.v_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.21.input_layernorm.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.21.mlp.down_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.21.mlp.gate_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.21.mlp.up_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.21.post_attention_layernorm.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.21.self_attn.k_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.21.self_attn.o_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.21.self_attn.q_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.21.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00006.bin",
    "model.layers.21.self_attn.v_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.22.input_layernorm.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.22.mlp.down_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.22.mlp.gate_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.22.mlp.up_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.22.post_attention_layernorm.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.22.self_attn.k_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.22.self_attn.o_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.22.self_attn.q_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.22.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00006.bin",
    "model.layers.22.self_attn.v_proj.weight": "pytorch_model-00003-of-00006.bin",
    "model.layers.23.input_layernorm.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.23.mlp.down_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.23.mlp.gate_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.23.mlp.up_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.23.post_attention_layernorm.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.23.self_attn.k_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.23.self_attn.o_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.23.self_attn.q_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.23.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00006.bin",
    "model.layers.23.self_attn.v_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.24.input_layernorm.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.24.mlp.down_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.24.mlp.gate_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.24.mlp.up_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.24.post_attention_layernorm.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.24.self_attn.k_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.24.self_attn.o_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.24.self_attn.q_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.24.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00006.bin",
    "model.layers.24.self_attn.v_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.25.input_layernorm.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.25.mlp.down_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.25.mlp.gate_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.25.mlp.up_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.25.post_attention_layernorm.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.25.self_attn.k_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.25.self_attn.o_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.25.self_attn.q_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.25.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00006.bin",
    "model.layers.25.self_attn.v_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.26.input_layernorm.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.26.mlp.down_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.26.mlp.gate_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.26.mlp.up_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.26.post_attention_layernorm.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.26.self_attn.k_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.26.self_attn.o_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.26.self_attn.q_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.26.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00006.bin",
    "model.layers.26.self_attn.v_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.27.input_layernorm.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.27.mlp.down_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.27.mlp.gate_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.27.mlp.up_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.27.post_attention_layernorm.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.27.self_attn.k_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.27.self_attn.o_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.27.self_attn.q_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.27.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00006.bin",
    "model.layers.27.self_attn.v_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.28.input_layernorm.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.28.mlp.down_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.28.mlp.gate_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.28.mlp.up_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.28.post_attention_layernorm.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.28.self_attn.k_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.28.self_attn.o_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.28.self_attn.q_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.28.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00006.bin",
    "model.layers.28.self_attn.v_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.29.input_layernorm.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.29.mlp.down_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.29.mlp.gate_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.29.mlp.up_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.29.post_attention_layernorm.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.29.self_attn.k_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.29.self_attn.o_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.29.self_attn.q_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.29.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00006.bin",
    "model.layers.29.self_attn.v_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.3.input_layernorm.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.3.mlp.down_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.3.mlp.gate_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.3.mlp.up_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.3.post_attention_layernorm.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.3.self_attn.k_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.3.self_attn.o_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.3.self_attn.q_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.3.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00006.bin",
    "model.layers.3.self_attn.v_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.30.input_layernorm.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.30.mlp.down_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.30.mlp.gate_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.30.mlp.up_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.30.post_attention_layernorm.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.30.self_attn.k_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.30.self_attn.o_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.30.self_attn.q_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.30.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00006.bin",
    "model.layers.30.self_attn.v_proj.weight": "pytorch_model-00004-of-00006.bin",
    "model.layers.31.input_layernorm.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.31.mlp.down_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.31.mlp.gate_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.31.mlp.up_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.31.post_attention_layernorm.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.31.self_attn.k_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.31.self_attn.o_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.31.self_attn.q_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.31.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00006.bin",
    "model.layers.31.self_attn.v_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.32.input_layernorm.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.32.mlp.down_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.32.mlp.gate_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.32.mlp.up_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.32.post_attention_layernorm.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.32.self_attn.k_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.32.self_attn.o_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.32.self_attn.q_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.32.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00006.bin",
    "model.layers.32.self_attn.v_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.33.input_layernorm.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.33.mlp.down_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.33.mlp.gate_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.33.mlp.up_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.33.post_attention_layernorm.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.33.self_attn.k_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.33.self_attn.o_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.33.self_attn.q_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.33.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00006.bin",
    "model.layers.33.self_attn.v_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.34.input_layernorm.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.34.mlp.down_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.34.mlp.gate_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.34.mlp.up_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.34.post_attention_layernorm.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.34.self_attn.k_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.34.self_attn.o_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.34.self_attn.q_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.34.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00006.bin",
    "model.layers.34.self_attn.v_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.35.input_layernorm.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.35.mlp.down_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.35.mlp.gate_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.35.mlp.up_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.35.post_attention_layernorm.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.35.self_attn.k_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.35.self_attn.o_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.35.self_attn.q_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.35.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00006.bin",
    "model.layers.35.self_attn.v_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.36.input_layernorm.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.36.mlp.down_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.36.mlp.gate_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.36.mlp.up_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.36.post_attention_layernorm.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.36.self_attn.k_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.36.self_attn.o_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.36.self_attn.q_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.36.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00006.bin",
    "model.layers.36.self_attn.v_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.37.input_layernorm.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.37.mlp.down_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.37.mlp.gate_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.37.mlp.up_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.37.post_attention_layernorm.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.37.self_attn.k_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.37.self_attn.o_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.37.self_attn.q_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.37.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00006.bin",
    "model.layers.37.self_attn.v_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.38.input_layernorm.weight": "pytorch_model-00006-of-00006.bin",
    "model.layers.38.mlp.down_proj.weight": "pytorch_model-00006-of-00006.bin",
    "model.layers.38.mlp.gate_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.38.mlp.up_proj.weight": "pytorch_model-00006-of-00006.bin",
    "model.layers.38.post_attention_layernorm.weight": "pytorch_model-00006-of-00006.bin",
    "model.layers.38.self_attn.k_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.38.self_attn.o_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.38.self_attn.q_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.38.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00006.bin",
    "model.layers.38.self_attn.v_proj.weight": "pytorch_model-00005-of-00006.bin",
    "model.layers.39.input_layernorm.weight": "pytorch_model-00006-of-00006.bin",
    "model.layers.39.mlp.down_proj.weight": "pytorch_model-00006-of-00006.bin",
    "model.layers.39.mlp.gate_proj.weight": "pytorch_model-00006-of-00006.bin",
    "model.layers.39.mlp.up_proj.weight": "pytorch_model-00006-of-00006.bin",
    "model.layers.39.post_attention_layernorm.weight": "pytorch_model-00006-of-00006.bin",
    "model.layers.39.self_attn.k_proj.weight": "pytorch_model-00006-of-00006.bin",
    "model.layers.39.self_attn.o_proj.weight": "pytorch_model-00006-of-00006.bin",
    "model.layers.39.self_attn.q_proj.weight": "pytorch_model-00006-of-00006.bin",
    "model.layers.39.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00006.bin",
    "model.layers.39.self_attn.v_proj.weight": "pytorch_model-00006-of-00006.bin",
    "model.layers.4.input_layernorm.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.4.mlp.down_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.4.mlp.gate_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.4.mlp.up_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.4.post_attention_layernorm.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.4.self_attn.k_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.4.self_attn.o_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.4.self_attn.q_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.4.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00006.bin",
    "model.layers.4.self_attn.v_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.5.input_layernorm.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.5.mlp.down_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.5.mlp.gate_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.5.mlp.up_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.5.post_attention_layernorm.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.5.self_attn.k_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.5.self_attn.o_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.5.self_attn.q_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.5.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00006.bin",
    "model.layers.5.self_attn.v_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.6.input_layernorm.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.6.mlp.down_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.6.mlp.gate_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.6.mlp.up_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.6.post_attention_layernorm.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.6.self_attn.k_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.6.self_attn.o_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.6.self_attn.q_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.6.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00006.bin",
    "model.layers.6.self_attn.v_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.7.input_layernorm.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.7.mlp.down_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.7.mlp.gate_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.7.mlp.up_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.7.post_attention_layernorm.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.7.self_attn.k_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.7.self_attn.o_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.7.self_attn.q_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.7.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00006.bin",
    "model.layers.7.self_attn.v_proj.weight": "pytorch_model-00001-of-00006.bin",
    "model.layers.8.input_layernorm.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.8.mlp.down_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.8.mlp.gate_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.8.mlp.up_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.8.post_attention_layernorm.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.8.self_attn.k_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.8.self_attn.o_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.8.self_attn.q_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.8.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00006.bin",
    "model.layers.8.self_attn.v_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.9.input_layernorm.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.9.mlp.down_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.9.mlp.gate_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.9.mlp.up_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.9.post_attention_layernorm.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.9.self_attn.k_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.9.self_attn.o_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.9.self_attn.q_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.layers.9.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00006.bin",
    "model.layers.9.self_attn.v_proj.weight": "pytorch_model-00002-of-00006.bin",
    "model.norm.weight": "pytorch_model-00006-of-00006.bin"
  }
 }
--- a/tokenizer.model
+++ b/tokenizer.model
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,33 @@
 {
  "add_bos_token": true,
  "add_eos_token": false,
  "bos_token": {
    "__type": "AddedToken",
    "content": "<s>",
    "lstrip": false,
    "normalized": true,
    "rstrip": false,
    "single_word": false
  },
  "clean_up_tokenization_spaces": false,
  "eos_token": {
    "__type": "AddedToken",
    "content": "</s>",
    "lstrip": false,
    "normalized": true,
    "rstrip": false,
    "single_word": false
  },
  "model_max_length": 1000000000000000019884624838656,
  "pad_token": null,
  "sp_model_kwargs": {},
  "tokenizer_class": "LlamaTokenizer",
  "unk_token": {
    "__type": "AddedToken",
    "content": "<unk>",
    "lstrip": false,
    "normalized": true,
    "rstrip": false,
    "single_word": false
  }
 }
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/adapter_config.json
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/adapter_config.json
@@ -0,0 +1,17 @@
 {
  "base_model_name_or_path": "/mnt/data1/sheshuaijie/Data/PLM/vicuna-13b",
  "bias": "none",
  "fan_in_fan_out": false,
  "inference_mode": true,
  "init_lora_weights": true,
  "lora_alpha": 32,
  "lora_dropout": 0.1,
  "modules_to_save": null,
  "peft_type": "LORA",
  "r": 32,
  "target_modules": [
    "q_proj",
    "v_proj"
  ],
  "task_type": "CAUSAL_LM"
 }
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/adapter_model.bin
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/adapter_model.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:e5e1621f48d9ad8feb1d6d31050275f0aafd080c5c07153301fe2f48411f4406
 size 443
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1036/adapter_config.json
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1036/adapter_config.json
@@ -0,0 +1,17 @@
 {
  "base_model_name_or_path": "/mnt/data1/sheshuaijie/Data/PLM/vicuna-13b",
  "bias": "none",
  "fan_in_fan_out": false,
  "inference_mode": true,
  "init_lora_weights": true,
  "lora_alpha": 32,
  "lora_dropout": 0.1,
  "modules_to_save": null,
  "peft_type": "LORA",
  "r": 32,
  "target_modules": [
    "q_proj",
    "v_proj"
  ],
  "task_type": "CAUSAL_LM"
 }
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1036/adapter_model.bin
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1036/adapter_model.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:e5e1621f48d9ad8feb1d6d31050275f0aafd080c5c07153301fe2f48411f4406
 size 443
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1036/optimizer.pt
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1036/optimizer.pt
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:7c1bd9db551f5053d250cdfad0dba63ab63f4d8770467f05a1c7eaa8ed882949
 size 209810181
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1036/pytorch_model.bin
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1036/pytorch_model.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:69b558ca414e2e4c64708ff75297722b21126c012ed94cb456e60ba088812b43
 size 104915277
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1036/rng_state_0.pth
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1036/rng_state_0.pth
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:4ef7df9d45b0ccca9c69f77e005511b666063ab1a038311314a04cfc65d660f7
 size 17655
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1036/rng_state_1.pth
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1036/rng_state_1.pth
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:6148b0540f53b460a67b06d568497039eab81579af3b1dcf86260cde4dfa7d0f
 size 17655
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1036/rng_state_2.pth
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1036/rng_state_2.pth
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:9169c3e4560897db5cad83367bb48e655eb9c5072a53b2307ccd3fd4d5b3e6e1
 size 17655
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1036/rng_state_3.pth
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1036/rng_state_3.pth
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:9825b237af1085d3086638ea705364d338e4efd8605bccbcd424c38224ad9556
 size 17655
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1036/scaler.pt
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1036/scaler.pt
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:4ceb2c0dcfaa71275aedf3b6024b198fa7f5f75980e818d6f48b3ec93fb208e4
 size 557
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1036/scheduler.pt
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1036/scheduler.pt
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:013d6a8410e7e8bcd20ee7e13d0ade381e59c981e59cf7840d165535290bf571
 size 627
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1036/trainer_state.json
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1036/trainer_state.json
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1036/training_args.bin
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1036/training_args.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:bc36676b8e75725a5f1d435df06d955098ecfd2dcf5fc632f668dfc3b7a43333
 size 4027
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1498/adapter_config.json
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1498/adapter_config.json
@@ -0,0 +1,17 @@
 {
  "base_model_name_or_path": "/mnt/data1/sheshuaijie/Data/PLM/vicuna-13b",
  "bias": "none",
  "fan_in_fan_out": false,
  "inference_mode": true,
  "init_lora_weights": true,
  "lora_alpha": 32,
  "lora_dropout": 0.1,
  "modules_to_save": null,
  "peft_type": "LORA",
  "r": 32,
  "target_modules": [
    "q_proj",
    "v_proj"
  ],
  "task_type": "CAUSAL_LM"
 }
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1498/adapter_model.bin
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1498/adapter_model.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:e5e1621f48d9ad8feb1d6d31050275f0aafd080c5c07153301fe2f48411f4406
 size 443
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1498/optimizer.pt
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1498/optimizer.pt
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:f004630f45896ece26280742f15b28aa38383360da760ef92b5c911f2c3f9aa3
 size 209810181
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1498/pytorch_model.bin
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1498/pytorch_model.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:bde1cee0b6d13acb7f6e4f0570a720103f02bfc85709715a1476e8ab898d34de
 size 104915277
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1498/rng_state_0.pth
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1498/rng_state_0.pth
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:afe1da91613c11605b6b4152085eca4017e91869c6313d6573a590da7d74380c
 size 17655
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1498/rng_state_1.pth
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1498/rng_state_1.pth
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:d484d85179931750298e99a718675725e5543bae46ad569434867192e34748d5
 size 17655
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1498/rng_state_2.pth
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1498/rng_state_2.pth
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:676f6e638b772a449173fd3339abbd5bf726e99dc2bd16fcdc317cf65a8c2fd7
 size 17655
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1498/rng_state_3.pth
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1498/rng_state_3.pth
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:962120d774cd2704683c32143d6c74751313598654f6ce29854b9bd48267b5bd
 size 17655
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1498/scaler.pt
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1498/scaler.pt
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:8ef2b11a4a54f62b6adba79797208808147dd3f66e3935d1a14649ef17b698f2
 size 557
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1498/scheduler.pt
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1498/scheduler.pt
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:5e76f15f7c4cc04aa6f245833e60b6a09a99f2e2af4927ea728c672ab303892d
 size 627
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1498/trainer_state.json
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1498/trainer_state.json
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1498/training_args.bin
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1498/training_args.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:bc36676b8e75725a5f1d435df06d955098ecfd2dcf5fc632f668dfc3b7a43333
 size 4027
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1505/adapter_config.json
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1505/adapter_config.json
@@ -0,0 +1,17 @@
 {
  "base_model_name_or_path": "/mnt/data1/sheshuaijie/Data/PLM/vicuna-13b",
  "bias": "none",
  "fan_in_fan_out": false,
  "inference_mode": true,
  "init_lora_weights": true,
  "lora_alpha": 32,
  "lora_dropout": 0.1,
  "modules_to_save": null,
  "peft_type": "LORA",
  "r": 32,
  "target_modules": [
    "q_proj",
    "v_proj"
  ],
  "task_type": "CAUSAL_LM"
 }
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1505/adapter_model.bin
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1505/adapter_model.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:e5e1621f48d9ad8feb1d6d31050275f0aafd080c5c07153301fe2f48411f4406
 size 443
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1505/optimizer.pt
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1505/optimizer.pt
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:600984ca73a7bcc82d6e506f83e7ea45c58242b907c95801a51ca7cf628a944d
 size 209810181
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1505/pytorch_model.bin
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1505/pytorch_model.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:8fc81ad838ec66e35bfeaa2d344f465fa25c26f759c9f5da6603b7dbfba1c707
 size 104915277
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1505/rng_state_0.pth
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1505/rng_state_0.pth
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:099ca4487ecb97b228d9c4d856d7cb40e78143486e3926d363eb14dac0abff9c
 size 17655
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1505/rng_state_1.pth
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1505/rng_state_1.pth
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:8a5e5185bd3bc1ae8afe44cddd6043bf42ba9dba5e8235b006a13345104d82aa
 size 17655
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1505/rng_state_2.pth
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1505/rng_state_2.pth
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:a5f76e77983866613863b903be722e030054d9ae8710f3ab58224ede927d3aeb
 size 17655
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1505/rng_state_3.pth
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1505/rng_state_3.pth
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:c40c667796940f73ff28232e62ef528a856b2c70a52c68460694468f88035fb8
 size 17655
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1505/scaler.pt
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1505/scaler.pt
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:e5825b874b4f341241d4fd914ad00c8a60bb6b7ba4b42e2f3651cdc5c47332c4
 size 557
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1505/scheduler.pt
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1505/scheduler.pt
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:410b88655bf978bcc67ca9dedbd5d0efe4960796f940d7db489de71cbc92a331
 size 627
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1505/trainer_state.json
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1505/trainer_state.json
--- a/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1505/training_args.bin
+++ b/vicuna-13b_english-cot+auto-cot_0.0002/lora/checkpoint-1505/training_args.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:bc36676b8e75725a5f1d435df06d955098ecfd2dcf5fc632f668dfc3b7a43333
 size 4027