初始化项目,由ModelHub XC社区提供模型
Model: Mathoctopus/Parallel_7B Source: Original Platform
This commit is contained in:
35
.gitattributes
vendored
Normal file
35
.gitattributes
vendored
Normal file
@@ -0,0 +1,35 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
157
README.md
Normal file
157
README.md
Normal file
@@ -0,0 +1,157 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
datasets:
|
||||
- Mathoctopus/GSM8KInstruct_Parallel
|
||||
language:
|
||||
- en
|
||||
- es
|
||||
- zh
|
||||
- de
|
||||
- ru
|
||||
- th
|
||||
- sw
|
||||
- ja
|
||||
- fr
|
||||
- bn
|
||||
---
|
||||
|
||||
# 🐙 Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations
|
||||
|
||||
Project Page: [https://mathoctopus.github.io/](https://mathoctopus.github.io/)
|
||||
|
||||
Paper: [https://arxiv.org/abs/2310.20246.pdf](https://arxiv.org/abs/2310.20246.pdf)
|
||||
|
||||
Code: [https://github.com/microsoft/MathOctopus](https://github.com/microsoft/MathOctopus)
|
||||
|
||||
### Introduction
|
||||
|
||||
We introduce 🐙 MathOctopus, a series of open-source large language models (LLMs) specifically tailored for multilingual math problem-solving. The MathOctopus models are trained on 🤗 MGSM8KInstruct Dataset, encompassing ten distinct languages.
|
||||
MathOctopus notably outperforms conventional open-source LLMs and exhibits superiority over ChatGPT in few-shot scenarios.
|
||||
|
||||
### Datasets
|
||||
|
||||
#### **MGSM8KInstruct**
|
||||
|
||||
| Training Dataset | En | Sw | Zh | Bn | De | Es | Fr | Ja | Ru | Th | Overall |
|
||||
|:----------------------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|
|
||||
| MGSM8KInstruct | 7473 | 7472 | 7466 | 6539 | 7466 | 7470 | 7469 | 7471 | 7361 | 7473 | **73.6K** |
|
||||
|
||||
|
||||
#### **MSVAMP**
|
||||
|
||||
| Test Dataset | En | Sw | Zh | Bn | De | Es | Fr | Ja | Ru | Th | Overall |
|
||||
|:----------------------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|
|
||||
| MSVAMP | 1000 | 1000 | 1000 | 1000 | 1000 | 1000 | 1000 | 1000 | 1000 | 1000 | **10K** |
|
||||
|
||||
#### Usage
|
||||
|
||||
Our dataset and models are all available at Huggingface.
|
||||
|
||||
🤗 [MGSM8KInstruct_Parallel Dataset](https://huggingface.co/datasets/Mathoctopus/GSM8KInstruct_Parallel)
|
||||
|
||||
🤗 [MGSM8KInstruct_Cross Dataset](https://huggingface.co/datasets/Mathoctopus/MGSM8KInstruct_Cross)
|
||||
|
||||
🤗 [MSVAMP Dataset](https://huggingface.co/datasets/Mathoctopus/MSVAMP)
|
||||
|
||||
|
||||
## Models
|
||||
|
||||
| Base Model: LLama | Parallel-Training | Cross-Training |
|
||||
|----|---------------------------------------------------------------|---------------------------------------------------------------------------|
|
||||
| 7B-LLaMA 2 | 🐙 [MathOctopus-Parallel-7B](https://huggingface.co/Mathoctopus/Parallel_7B) | 🐙 [MathOctopus-Cross-7B](https://huggingface.co/Mathoctopus/Cross_7B) |
|
||||
|| 🐙[MathOctopus-Parallel-xRFT-7B](https://huggingface.co/Mathoctopus/Parallel_xRFT_7B)|🐙[MathOctopus-Cross-xRFT-7B](https://huggingface.co/Mathoctopus/Cross_xRFT_7B)|
|
||||
| 13B-LLaMA 2 | 🐙 [MathOctopus-Parallel-13B](https://huggingface.co/Mathoctopus/Parallel_13B) | 🐙 [MathOctopus-Cross-13B](https://huggingface.co/Mathoctopus/Cross_13B) |
|
||||
|| 🐙[MathOctopus-Parallel-xRFT-13B](https://huggingface.co/Mathoctopus/Parallel_xRFT_13B)|🐙[MathOctopus-Cross-xRFT-13B]|
|
||||
| 33B-LLaMA 1 | 🐙 [MathOctopus-Parallel-33B](https://huggingface.co/Mathoctopus/Parallel_33B) | 🐙 [MathOctopus-Cross-33B] |
|
||||
| 70B-LLaMA 2 | Coming soon! | Coming Soon! |
|
||||
|
||||
*-Parallel refers to our model trained with the parallel-training strategy.
|
||||
|
||||
*-Cross refers to our model trained with cross-training strategy.
|
||||
|
||||
*-xRFT means we train the model with multilingual rejection sampling.
|
||||
|
||||
### **Overall Results on MGSM**
|
||||
|
||||
| 7B Model | En | Sw | Zh | Bn | De | Es | Fr | Ja | Ru | Th | Overall |
|
||||
|:--------------------------------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|
|
||||
| MathOctopus<sup>C</sup> | 52.0 | 23.6 | 31.6 | 18.8 | 38.0 | 39.2 | 36.4 | 27.2 | 33.6 | 21.6 | 32.2 |
|
||||
| **xRFT**-MathOctopus<sup>C</sup>| 51.2 | 24.0 | 33.2 | 18.8 | 36.0 | 41.2 | 37.6 | 29.6 | 36.4 | 25.2 | 33.3 |
|
||||
| MathOctopus<sup>P</sup>-LoRA | 30.4 | 15.2 | 23.6 | 10.4 | 22.8 | 24.8 | 26.4 | 18.0 | 22.0 | 14.8 | 20.8 |
|
||||
| MathOctopus<sup>P</sup> | 52.4 | 39.2 | 38.4 | 28.8 | 44.8 | 42.4 | 43.6 | 36.0 | 39.6 | 34.4 | 40.0 |
|
||||
| **xRFT**-MathOctopus<sup>P</sup>| 54.8 | 38.4 | 45.2 | 33.2 | 43.6 | 45.2 | 38.0 | 35.6 | 48.4 | 36.4 | 41.9 |
|
||||
<p></p >
|
||||
|
||||
| 13B Model | En | Sw | Zh | Bn | De | Es | Fr | Ja | Ru | Th | Overall |
|
||||
|:--------------------------------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|
|
||||
| MathOctopus<sup>C</sup> | 56.4 | 27.2 | 39.2 | 24.0 | 47.6 | 49.6 | 47.6 | 40.4 | 42.0 | 24.8 | 39.9 |
|
||||
| **xRFT**-MathOctopus<sup>C</sup>| 53.6 | 28.0 | 45.2 | 21.2 | 48.0 | 46.4 | 46.0 | 35.2 | 45.6 | 28.8 | 39.8 |
|
||||
| MathOctopus<sup>P</sup> | 53.2 | 42.8 | 48.8 | 35.2 | 44.4 | 48.0 | 48.4 | 43.2 | 47.6 | 46.8 | 45.8 |
|
||||
| **xRFT**-MathOctopus<sup>P</sup>| 51.6 | 46.0 | 51.2 | 42.0 | 49.2 | 53.2 | 49.6 | 39.6 | 47.6 | 46.0 | 47.6 |
|
||||
<p></p >
|
||||
|
||||
| 30-34B Model | En | Sw | Zh | Bn | De | Es | Fr | Ja | Ru | Th | Overall |
|
||||
|:--------------------------------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|
|
||||
| MathOctopus<sup>C</sup> | 55.6 | 24.4 | 36.0 | 19.2 | 40.4 | 51.2 | 44.4 | 27.2 | 37.2 | 21.6 | 35.7 |
|
||||
| **xRFT**-MathOctopus<sup>C</sup>| 53.6 | 27.6 | 34.4 | 19.2 | 47.2 | 47.6 | 44.8 | 30.8 | 38.8 | 22.8 | 36.7 |
|
||||
| MathOctopus<sup>P</sup> | 56.4 | 46.8 | 52.0 | 35.2 | 47.2 | 53.2 | 48.0 | 39.2 | 45.6 | 41.2 | 46.5 |
|
||||
| **xRFT**-MathOctopus<sup>P</sup>| 51.6 | 47.2 | 52.4 | 37.6 | 51.2 | 52.8 | 44.4 | 41.6 | 50.0 | 47.6 | 47.6 |
|
||||
|
||||
|
||||
### **Overall Results on MSVAMP**
|
||||
|
||||
| 7B Model | En | Sw | Zh | Bn | De | Es | Fr | Ja | Ru | Th | Overall |
|
||||
|:--------------------------------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|
|
||||
| MathOctopus<sup>C</sup> | 49.2 | 36.6 | 43.6 | 30.2 | 48.6 | 46.8 | 46.4 | 42.5 | 46.7 | 34.0 | 42.5 |
|
||||
| **xRFT**-MathOctopus<sup>C</sup>| 49.9 | 37.7 | 43.3 | 32.9 | 46.5 | 47.6 | 47.3 | 42.7 | 46.6 | 36.2 | 43.1 |
|
||||
| MathOctopus<sup>P</sup>-LoRA | 30.4 | 15.2 | 23.6 | 10.4 | 22.8 | 24.8 | 26.4 | 18.0 | 22.0 | 14.8 | 20.8 |
|
||||
| MathOctopus<sup>P</sup> | 46.5 | 40.1 | 42.5 | 29.1 | 43.5 | 45.4 | 46.0 | 42.5 | 45.4 | 35.7 | 41.7 |
|
||||
| **xRFT**-MathOctopus<sup>P</sup>| 46.8 | 42.3 | 43.2 | 32.8 | 43.1 | 44.5 | 45.3 | 43.2 | 42.1 | 40.5 | 42.4 |
|
||||
<p></p >
|
||||
|
||||
| 13B Model | En | Sw | Zh | Bn | De | Es | Fr | Ja | Ru | Th | Overall |
|
||||
|:--------------------------------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|
|
||||
| MathOctopus<sup>C</sup> | 56.6 | 40.4 | 49.0 | 30.3 | 50.9 | 54.2 | 54.7 | 46.3 | 52.4 | 35.7 | 47.1 |
|
||||
| **xRFT**-MathOctopus<sup>C</sup>| 52.9 | 41.9 | 49.2 | 34.1 | 50.5 | 52.8 | 51.5 | 45.8 | 50.2 | 35.7 | 46.5 |
|
||||
| MathOctopus<sup>P</sup> | 50.7 | 43.4 | 42.6 | 31.8 | 48.4 | 49.4 | 50.6 | 41.1 | 46.9 | 39.3 | 44.4 |
|
||||
| **xRFT**-MathOctopus<sup>P</sup>| 44.6 | 43.4 | 46.4 | 34.2 | 47.7 | 48.2 | 49.9 | 43.1 | 48.2 | 39.5 | 44.5 |
|
||||
<p></p >
|
||||
|
||||
| 30-34B Model | En | Sw | Zh | Bn | De | Es | Fr | Ja | Ru | Th | Overall |
|
||||
|:--------------------------------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|:--------|
|
||||
| MathOctopus<sup>C</sup> | 51.5 | 42.1 | 46.2 | 23.2 | 50.5 | 52.1 | 52.9 | 42.2 | 50.5 | 33.4 | 44.5 |
|
||||
| **xRFT**-MathOctopus<sup>C</sup>| 48.1 | 42.8 | 43.6 | 23.3 | 48.7 | 50.0 | 48.9 | 43.4 | 44.6 | 35.5 | 42.9 |
|
||||
| MathOctopus<sup>P</sup> | 56.4 | 46.8 | 52.0 | 35.2 | 47.2 | 53.2 | 48.0 | 39.2 | 45.6 | 41.2 | 46.5 |
|
||||
| **xRFT**-MathOctopus<sup>P</sup>| 48.0 | 42.3 | 46.1 | 36.2 | 47.5 | 48.5 | 48.3 | 45.8 | 47.2 | 41.2 | 45.1 |
|
||||
|
||||
|
||||
### **MathOctopus in English**
|
||||
|
||||
| Models | GSM8K | SVAMP |
|
||||
|:--------------------------------|:--------|:--------|
|
||||
| LLaMA 2-7B | 42.4 | 38.3 |
|
||||
| MathOctopus<sup>P</sup>-7B | 49.3 | 46.8 |
|
||||
| MathOctopus<sup>C</sup>-7B | 50.8 | 49.3 |
|
||||
| LLaMA 2-13B | 51.0 | 50.9 |
|
||||
| MathOctopus<sup>P</sup>-13B | 55.5 | 52.1 |
|
||||
| MathOctopus<sup>C</sup>-13B | 56.6 | 56.6 |
|
||||
| LLaMA 1-33B | 50.0 | 49.0 |
|
||||
| MathOctopus<sup>P</sup>-33B | 56.0 | 52.5 |
|
||||
| MathOctopus<sup>C</sup>-33B | 53.7 | 51.5 |
|
||||
|
||||
## Intended Uses
|
||||
These models are trained for research purposes. They are designed to solve multilingual math problems. They can be used in educational software, tutoring systems, or any application where a solution to a math problem is needed.
|
||||
|
||||
## Citation
|
||||
Please cite our paper if you use our data, model or code. Please also kindly cite the original dataset papers.
|
||||
|
||||
```
|
||||
@misc{chen2023breaking,
|
||||
title={Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations},
|
||||
author={Nuo Chen and Zinan Zheng and Ning Wu and Linjun Shou and Ming Gong and Yangqiu Song and Dongmei Zhang and Jia Li},
|
||||
year={2023},
|
||||
eprint={2310.20246},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.CL}
|
||||
}
|
||||
```
|
||||
3
added_tokens.json
Normal file
3
added_tokens.json
Normal file
@@ -0,0 +1,3 @@
|
||||
{
|
||||
"[PAD]": 32000
|
||||
}
|
||||
24
config.json
Normal file
24
config.json
Normal file
@@ -0,0 +1,24 @@
|
||||
{
|
||||
"_name_or_path": "llama-2-7b-hf",
|
||||
"architectures": [
|
||||
"LlamaForCausalLM"
|
||||
],
|
||||
"bos_token_id": 1,
|
||||
"eos_token_id": 2,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 4096,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 11008,
|
||||
"max_position_embeddings": 2048,
|
||||
"model_type": "llama",
|
||||
"num_attention_heads": 32,
|
||||
"num_hidden_layers": 32,
|
||||
"num_key_value_heads": 32,
|
||||
"pad_token_id": 0,
|
||||
"rms_norm_eps": 1e-05,
|
||||
"tie_word_embeddings": false,
|
||||
"torch_dtype": "float32",
|
||||
"transformers_version": "4.29.2",
|
||||
"use_cache": true,
|
||||
"vocab_size": 32001
|
||||
}
|
||||
7
generation_config.json
Normal file
7
generation_config.json
Normal file
@@ -0,0 +1,7 @@
|
||||
{
|
||||
"_from_model_config": true,
|
||||
"bos_token_id": 1,
|
||||
"eos_token_id": 2,
|
||||
"pad_token_id": 0,
|
||||
"transformers_version": "4.29.2"
|
||||
}
|
||||
3
pytorch_model-00001-of-00003.bin
Normal file
3
pytorch_model-00001-of-00003.bin
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:6e7393f57174cd9d5aa2a851e3d78000095f6fe2e3e7bb9b82657af77df0be35
|
||||
size 9878001679
|
||||
3
pytorch_model-00002-of-00003.bin
Normal file
3
pytorch_model-00002-of-00003.bin
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:dc524db09501c0dcd169b9198f6cc752965d6f662dd704a0b19509d75c73d1c3
|
||||
size 9894796639
|
||||
3
pytorch_model-00003-of-00003.bin
Normal file
3
pytorch_model-00003-of-00003.bin
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:c64e43281e8eb6c13f112d9cbfc995f9ee269beca369cabc69a0c56a3f32bd5a
|
||||
size 7181003798
|
||||
330
pytorch_model.bin.index.json
Normal file
330
pytorch_model.bin.index.json
Normal file
@@ -0,0 +1,330 @@
|
||||
{
|
||||
"metadata": {
|
||||
"total_size": 26953699328
|
||||
},
|
||||
"weight_map": {
|
||||
"lm_head.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.embed_tokens.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.0.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.1.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.10.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.10.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.10.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.10.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.10.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.10.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.10.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.10.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.10.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.10.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.11.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.11.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.11.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.11.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.11.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.11.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.11.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.11.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.11.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.11.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.12.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.12.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.12.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.12.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.12.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.12.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.12.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.12.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.12.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.12.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.13.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.13.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.13.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.13.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.13.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.13.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.13.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.13.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.13.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.13.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.14.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.14.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.14.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.14.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.14.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.14.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.14.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.14.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.14.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.14.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.15.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.15.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.15.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.15.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.15.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.15.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.15.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.15.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.15.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.15.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.16.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.16.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.16.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.16.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.16.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.16.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.16.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.16.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.16.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.16.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.17.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.17.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.17.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.17.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.17.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.17.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.17.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.17.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.17.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.17.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.18.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.18.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.18.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.18.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.18.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.18.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.18.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.18.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.18.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.18.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.19.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.19.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.19.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.19.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.19.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.19.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.19.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.19.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.19.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.19.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.2.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.2.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.2.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.2.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.2.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.20.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.20.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.20.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.20.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.20.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.20.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.20.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.20.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.20.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.20.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.21.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.21.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.21.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.21.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.21.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.21.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.21.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.21.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.21.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.21.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.22.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.22.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.22.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.22.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.22.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.22.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.22.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.22.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.22.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.22.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.23.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.23.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.23.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.23.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.23.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.23.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.23.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.23.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.23.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.23.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
|
||||
"model.layers.24.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.24.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.24.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.24.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.24.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.24.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.24.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.24.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.24.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.24.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.25.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.25.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.25.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.25.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.25.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.25.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.25.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.25.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.25.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.25.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.26.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.26.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.26.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.26.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.26.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.26.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.26.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.26.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.26.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.26.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.27.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.27.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.27.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.27.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.27.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.27.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.27.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.27.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.27.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.27.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.28.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.28.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.28.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.28.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.28.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.28.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.28.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.28.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.28.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.28.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.29.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.29.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.29.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.29.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.29.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.29.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.29.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.29.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.29.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.29.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.3.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.3.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.3.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.3.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.3.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.3.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.3.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.3.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.3.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.3.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.30.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.30.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.30.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.30.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.30.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.30.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.30.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.30.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.30.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.30.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.31.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.31.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.31.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.31.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.31.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.31.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.31.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.31.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.31.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.31.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
|
||||
"model.layers.4.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.4.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.4.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.4.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.4.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.4.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.4.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.4.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.4.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.4.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.5.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.5.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.5.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.5.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.5.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.5.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.5.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.5.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.5.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.5.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.6.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.6.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.6.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.6.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.6.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.6.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.6.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.6.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.6.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.6.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.7.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.7.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.7.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.7.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.7.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.7.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.7.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.7.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.7.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.7.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.8.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.8.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.8.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.8.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.8.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.8.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.8.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.8.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.8.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.8.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.9.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.9.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.9.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.9.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.9.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.9.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.9.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.9.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.9.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
|
||||
"model.layers.9.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
|
||||
"model.norm.weight": "pytorch_model-00003-of-00003.bin"
|
||||
}
|
||||
}
|
||||
6
special_tokens_map.json
Normal file
6
special_tokens_map.json
Normal file
@@ -0,0 +1,6 @@
|
||||
{
|
||||
"bos_token": "</s>",
|
||||
"eos_token": "</s>",
|
||||
"pad_token": "[PAD]",
|
||||
"unk_token": "</s>"
|
||||
}
|
||||
3
tokenizer.model
Normal file
3
tokenizer.model
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
|
||||
size 499723
|
||||
34
tokenizer_config.json
Normal file
34
tokenizer_config.json
Normal file
@@ -0,0 +1,34 @@
|
||||
{
|
||||
"add_bos_token": true,
|
||||
"add_eos_token": false,
|
||||
"bos_token": {
|
||||
"__type": "AddedToken",
|
||||
"content": "<s>",
|
||||
"lstrip": false,
|
||||
"normalized": true,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"clean_up_tokenization_spaces": false,
|
||||
"eos_token": {
|
||||
"__type": "AddedToken",
|
||||
"content": "</s>",
|
||||
"lstrip": false,
|
||||
"normalized": true,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"model_max_length": 512,
|
||||
"pad_token": null,
|
||||
"padding_side": "right",
|
||||
"sp_model_kwargs": {},
|
||||
"tokenizer_class": "LlamaTokenizer",
|
||||
"unk_token": {
|
||||
"__type": "AddedToken",
|
||||
"content": "<unk>",
|
||||
"lstrip": false,
|
||||
"normalized": true,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
22795
trainer_state.json
Normal file
22795
trainer_state.json
Normal file
File diff suppressed because it is too large
Load Diff
3
training_args.bin
Normal file
3
training_args.bin
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:4997ba0d00c6d22cc70c7b4dc362ddc107a318a01bb12249eb0d993814eb6d72
|
||||
size 4143
|
||||
Reference in New Issue
Block a user