初始化项目,由ModelHub XC社区提供模型

Model: TIGER-Lab/MAmmoTH-13B
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-02 12:25:41 +08:00
commit e87d804a7f
16 changed files with 666 additions and 0 deletions

35
.gitattributes vendored Normal file
View File

@@ -0,0 +1,35 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text

101
README.md Normal file
View File

@@ -0,0 +1,101 @@
---
license: mit
datasets:
- TIGER-Lab/MathInstruct
language:
- en
---
# 🦣 MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning
Project Page: [https://tiger-ai-lab.github.io/MAmmoTH/](https://tiger-ai-lab.github.io/MAmmoTH/)
Paper: [https://arxiv.org/pdf/2309.05653.pdf](https://arxiv.org/pdf/2309.05653.pdf)
Code: [https://github.com/TIGER-AI-Lab/MAmmoTH](https://github.com/TIGER-AI-Lab/MAmmoTH)
## Introduction
We introduce 🦣 MAmmoTH, a series of open-source large language models (LLMs) specifically tailored for general math problem-solving. The MAmmoTH models are trained on 🤗 [MathInstruct Dataset](https://huggingface.co/datasets/TIGER-Lab/MathInstruct), a meticulously curated instruction tuning dataset that is lightweight yet generalizable. MathInstruct is compiled from 13 math rationale datasets, six of which are newly curated by this work. It uniquely focuses on the hybrid use of chain-of-thought (CoT) and program-of-thought (PoT) rationales, and ensures extensive coverage of diverse mathematical fields.
| | **Base Model: Llama-2** | **Base Model: Code Llama** |
|-----|---------------------------------------------------------------|--------------------------------------------------------------------------|
| 7B | 🦣 [MAmmoTH-7B](https://huggingface.co/TIGER-Lab/MAmmoTH-7B) | 🦣 [MAmmoTH-Coder-7B](https://huggingface.co/TIGER-Lab/MAmmoTH-Coder-7B) |
| 13B | 🦣 [MAmmoTH-13B](https://huggingface.co/TIGER-Lab/MAmmoTH-13B) | 🦣 [MAmmoTH-Coder-13B](https://huggingface.co/TIGER-Lab/MAmmoTH-Coder-13B)|
| 34B | - | 🦣 [MAmmoTH-Coder-34B](https://huggingface.co/TIGER-Lab/MAmmoTH-Coder-34B)|
| 70B | 🦣 [MAmmoTH-70B](https://huggingface.co/TIGER-Lab/MAmmoTH-70B) | - |
## Training Data
The models are trained on the 🤗 [MathInstruct Dataset](https://huggingface.co/datasets/TIGER-Lab/MathInstruct), which is compiled from 13 different math rationale datasets. Check out the dataset card for more details.
## Training Procedure
The models are fine-tuned with the MathInstruct dataset using the original Llama-2 and Code Llama models as base models. The training procedure varies for different models based on their sizes. Check out our paper for more details.
## Evaluation
The models are evaluated using open-ended and multiple-choice math problems from several datasets. Here are the results:
| **Model** | **Decoding** | **GSM** | **MATH** | **AQuA** | **NumG** | **SVA** | **Mat** | **Sim** | **SAT** | **MMLU** | **AVG** |
|-----------------------|--------------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|
| **MAmmoTH-7B** | CoT | 50.5 | 10.4 | 43.7 | 44.0 | 47.3 | 9.2 | 18.9 | 32.7 | 39.9 | 33.0 |
| | PoT | 51.6 | 28.7 | 43.3 | 52.3 | 65.1 | 41.9 | 48.2 | 39.1 | 44.6 | 46.1 |
| | **Hybrid** | **53.6** | **31.5** | **44.5** | **61.2** | **67.7** | **46.3** | **41.2** | **42.7** | **42.6** | **47.9** |
| **MAmmoTH-Coder-7B** | CoT | 22.4 | 7.9 | 36.2 | 36.0 | 37.0 | 8.2 | 7.2 | 32.7 | 34.6 | 24.7 |
| | PoT | 58.8 | 32.1 | 47.2 | 57.1 | 71.1 | 53.9 | 44.6 | 40.0 | 47.8 | 50.3 |
| | **Hybrid** | **59.4** | **33.4** | **47.2** | **66.4** | **71.4** | **55.4** | **45.9** | **40.5** | **48.3** | **52.0** |
| **MAmmoTH-13B** | CoT | 56.3 | 12.9 | 45.3 | 45.6 | 53.8 | 11.7 | 22.4 | 43.6 | 42.3 | 37.1 |
| | PoT | 61.3 | 32.6 | 48.8 | 59.6 | 72.2 | 48.5 | 40.3 | 46.8 | 45.4 | 50.6 |
| | **Hybrid** | **62.0** | **34.2** | **51.6** | **68.7** | **72.4** | **49.2** | **43.2** | **46.8** | **47.6** | **52.9** |
| **MAmmoTH-Coder-13B** | CoT | 32.1 | 10.2 | 40.6 | 36.2 | 43.0 | 9.6 | 10.1 | 40.9 | 36.6 | 28.8 |
| | PoT | 64.3 | 35.2 | 46.8 | 54.2 | 73.2 | 60.0 | 44.2 | 48.2 | 48.2 | 52.7 |
| | **Hybrid** | **64.7** | **36.3** | **46.9** | **66.8** | **73.7** | **61.5** | **47.1** | **48.6** | **48.3** | **54.9** |
| **MAmmoTH-Coder-33B** | CoT | 34.3 | 11.6 | 39.0 | 36.2 | 44.6 | 10.8 | 10.9 | 46.4 | 42.9 | 30.7 |
| | PoT | 72.3 | 42.8 | 53.8 | 59.6 | 84.0 | 64.7 | 50.6 | 58.6 | 52.7 | 59.9 |
| | **Hybrid** | **72.7** | **43.6** | **54.7** | **71.6** | **84.3** | **65.4** | **51.8** | **60.9** | **53.8** | **62.1** |
| **MAmmoTH-70B** | CoT | 72.4 | 21.1 | 57.9 | 58.9 | 71.6 | 20.0 | 31.9 | 57.3 | 52.1 | 49.2 |
| | PoT | 76.7 | 40.1 | 60.2 | 64.3 | 81.7 | 55.3 | 45.3 | 64.1 | 53.5 | 60.1 |
| | **Hybrid** | **76.9** | **41.8** | **65.0** | **74.4** | **82.4** | **55.6** | **51.4** | **66.4** | **56.7** | **63.4** |
## Usage
You can use the models through Huggingface's Transformers library. Use the pipeline function to create a text-generation pipeline with the model of your choice, then feed in a math problem to get the solution.
Check our Github repo for more advanced use: [https://github.com/TIGER-AI-Lab/MAmmoTH](https://github.com/TIGER-AI-Lab/MAmmoTH)
## Prompt Format
If you want to do CoT:
```
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Response:
```
If you want to do PoT:
```
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{instruction} Let's write a program.
### Response:
```
## Intended Uses
These models are trained for research purposes. They are designed to solve general math problems. They can be used in educational software, tutoring systems, or any application where a solution to a math problem is needed. The models can generate both a chain of thought (CoT) rationale and a program of thought (PoT) rationale, providing a comprehensive solution to a given math problem.
## Limitations
We've tried our best to build math generalist models. However, we acknowledge that the models' performance may vary based on the complexity and specifics of the math problem. Still not all mathematical fields can be covered comprehensively.
## Citation
If you use the models, data, or code from this project, please cite the original paper:
```
@article{yue2023mammoth,
title={MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning},
author={Xiang Yue, Xingwei Qu, Ge Zhang, Yao Fu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen},
journal={arXiv preprint arXiv:2309.05653},
year={2023}
}
```

3
added_tokens.json Normal file
View File

@@ -0,0 +1,3 @@
{
"[PAD]": 32000
}

26
config.json Normal file
View File

@@ -0,0 +1,26 @@
{
"_name_or_path": "/ML-A100/home/xiangyue/models/Llama-2-13b-hf",
"architectures": [
"LlamaForCausalLM"
],
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 5120,
"initializer_range": 0.02,
"intermediate_size": 13824,
"max_position_embeddings": 4096,
"model_type": "llama",
"num_attention_heads": 40,
"num_hidden_layers": 40,
"num_key_value_heads": 40,
"pad_token_id": 0,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.29.1",
"use_cache": true,
"vocab_size": 32001
}

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}

10
generation_config.json Normal file
View File

@@ -0,0 +1,10 @@
{
"bos_token_id": 1,
"do_sample": true,
"eos_token_id": 2,
"max_length": 4096,
"pad_token_id": 0,
"temperature": 0.6,
"top_p": 0.9,
"transformers_version": "4.29.1"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:74b5319f0991b558e8316e7b891ea634548f5b53bdf655c05a4dde47de855de8
size 9956566923

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:da20934d63d91d2f83294f716d875d7468ca6524a49ff74bbb86235abde6072b
size 9940859009

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:480130a352b28dc63cfd739eb31eedace9ad932ef043d2608f967d205320151b
size 9940859567

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:715eebf63cc7e267fc12ef4f3e0fc6f34ea331aa28e5a0e569065055c581a796
size 9867417913

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5287dfd8be851682fc2055a3494111d199c7b161e7ca022c77d9db50962af0a9
size 9867459649

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:40191a3ad0b48a0eb66cf5aee5f45d0f9ae56feb43c102af71b50aa46ebefb3e
size 2490497199

View File

@@ -0,0 +1,410 @@
{
"metadata": {
"total_size": 52063508480
},
"weight_map": {
"lm_head.weight": "pytorch_model-00006-of-00006.bin",
"model.embed_tokens.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.0.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00006.bin",
"model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.1.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00006.bin",
"model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.10.input_layernorm.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.10.mlp.down_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.10.mlp.gate_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.10.mlp.up_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.10.post_attention_layernorm.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.10.self_attn.k_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.10.self_attn.o_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.10.self_attn.q_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.10.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00006.bin",
"model.layers.10.self_attn.v_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.11.input_layernorm.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.11.mlp.down_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.11.mlp.gate_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.11.mlp.up_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.11.post_attention_layernorm.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.11.self_attn.k_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.11.self_attn.o_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.11.self_attn.q_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.11.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00006.bin",
"model.layers.11.self_attn.v_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.12.input_layernorm.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.12.mlp.down_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.12.mlp.gate_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.12.mlp.up_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.12.post_attention_layernorm.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.12.self_attn.k_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.12.self_attn.o_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.12.self_attn.q_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.12.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00006.bin",
"model.layers.12.self_attn.v_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.13.input_layernorm.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.13.mlp.down_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.13.mlp.gate_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.13.mlp.up_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.13.post_attention_layernorm.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.13.self_attn.k_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.13.self_attn.o_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.13.self_attn.q_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.13.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00006.bin",
"model.layers.13.self_attn.v_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.14.input_layernorm.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.14.mlp.down_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.14.mlp.gate_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.14.mlp.up_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.14.post_attention_layernorm.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.14.self_attn.k_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.14.self_attn.o_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.14.self_attn.q_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.14.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00006.bin",
"model.layers.14.self_attn.v_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.15.input_layernorm.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.15.mlp.down_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.15.mlp.gate_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.15.mlp.up_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.15.post_attention_layernorm.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.15.self_attn.k_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.15.self_attn.o_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.15.self_attn.q_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.15.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00006.bin",
"model.layers.15.self_attn.v_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.16.input_layernorm.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.16.mlp.down_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.16.mlp.gate_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.16.mlp.up_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.16.post_attention_layernorm.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.16.self_attn.k_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.16.self_attn.o_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.16.self_attn.q_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.16.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00006.bin",
"model.layers.16.self_attn.v_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.17.input_layernorm.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.17.mlp.down_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.17.mlp.gate_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.17.mlp.up_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.17.post_attention_layernorm.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.17.self_attn.k_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.17.self_attn.o_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.17.self_attn.q_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.17.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00006.bin",
"model.layers.17.self_attn.v_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.18.input_layernorm.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.18.mlp.down_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.18.mlp.gate_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.18.mlp.up_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.18.post_attention_layernorm.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.18.self_attn.k_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.18.self_attn.o_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.18.self_attn.q_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.18.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00006.bin",
"model.layers.18.self_attn.v_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.19.input_layernorm.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.19.mlp.down_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.19.mlp.gate_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.19.mlp.up_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.19.post_attention_layernorm.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.19.self_attn.k_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.19.self_attn.o_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.19.self_attn.q_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.19.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00006.bin",
"model.layers.19.self_attn.v_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.2.mlp.down_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.2.mlp.gate_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.2.mlp.up_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.2.self_attn.o_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.2.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00006.bin",
"model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.20.input_layernorm.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.20.mlp.down_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.20.mlp.gate_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.20.mlp.up_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.20.post_attention_layernorm.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.20.self_attn.k_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.20.self_attn.o_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.20.self_attn.q_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.20.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00006.bin",
"model.layers.20.self_attn.v_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.21.input_layernorm.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.21.mlp.down_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.21.mlp.gate_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.21.mlp.up_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.21.post_attention_layernorm.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.21.self_attn.k_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.21.self_attn.o_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.21.self_attn.q_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.21.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00006.bin",
"model.layers.21.self_attn.v_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.22.input_layernorm.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.22.mlp.down_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.22.mlp.gate_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.22.mlp.up_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.22.post_attention_layernorm.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.22.self_attn.k_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.22.self_attn.o_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.22.self_attn.q_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.22.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00006.bin",
"model.layers.22.self_attn.v_proj.weight": "pytorch_model-00003-of-00006.bin",
"model.layers.23.input_layernorm.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.23.mlp.down_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.23.mlp.gate_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.23.mlp.up_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.23.post_attention_layernorm.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.23.self_attn.k_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.23.self_attn.o_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.23.self_attn.q_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.23.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00006.bin",
"model.layers.23.self_attn.v_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.24.input_layernorm.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.24.mlp.down_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.24.mlp.gate_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.24.mlp.up_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.24.post_attention_layernorm.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.24.self_attn.k_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.24.self_attn.o_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.24.self_attn.q_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.24.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00006.bin",
"model.layers.24.self_attn.v_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.25.input_layernorm.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.25.mlp.down_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.25.mlp.gate_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.25.mlp.up_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.25.post_attention_layernorm.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.25.self_attn.k_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.25.self_attn.o_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.25.self_attn.q_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.25.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00006.bin",
"model.layers.25.self_attn.v_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.26.input_layernorm.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.26.mlp.down_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.26.mlp.gate_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.26.mlp.up_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.26.post_attention_layernorm.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.26.self_attn.k_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.26.self_attn.o_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.26.self_attn.q_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.26.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00006.bin",
"model.layers.26.self_attn.v_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.27.input_layernorm.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.27.mlp.down_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.27.mlp.gate_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.27.mlp.up_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.27.post_attention_layernorm.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.27.self_attn.k_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.27.self_attn.o_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.27.self_attn.q_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.27.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00006.bin",
"model.layers.27.self_attn.v_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.28.input_layernorm.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.28.mlp.down_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.28.mlp.gate_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.28.mlp.up_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.28.post_attention_layernorm.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.28.self_attn.k_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.28.self_attn.o_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.28.self_attn.q_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.28.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00006.bin",
"model.layers.28.self_attn.v_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.29.input_layernorm.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.29.mlp.down_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.29.mlp.gate_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.29.mlp.up_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.29.post_attention_layernorm.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.29.self_attn.k_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.29.self_attn.o_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.29.self_attn.q_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.29.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00006.bin",
"model.layers.29.self_attn.v_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.3.input_layernorm.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.3.mlp.down_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.3.mlp.gate_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.3.mlp.up_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.3.post_attention_layernorm.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.3.self_attn.k_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.3.self_attn.o_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.3.self_attn.q_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.3.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00006.bin",
"model.layers.3.self_attn.v_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.30.input_layernorm.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.30.mlp.down_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.30.mlp.gate_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.30.mlp.up_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.30.post_attention_layernorm.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.30.self_attn.k_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.30.self_attn.o_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.30.self_attn.q_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.30.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00006.bin",
"model.layers.30.self_attn.v_proj.weight": "pytorch_model-00004-of-00006.bin",
"model.layers.31.input_layernorm.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.31.mlp.down_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.31.mlp.gate_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.31.mlp.up_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.31.post_attention_layernorm.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.31.self_attn.k_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.31.self_attn.o_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.31.self_attn.q_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.31.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00006.bin",
"model.layers.31.self_attn.v_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.32.input_layernorm.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.32.mlp.down_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.32.mlp.gate_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.32.mlp.up_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.32.post_attention_layernorm.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.32.self_attn.k_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.32.self_attn.o_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.32.self_attn.q_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.32.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00006.bin",
"model.layers.32.self_attn.v_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.33.input_layernorm.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.33.mlp.down_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.33.mlp.gate_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.33.mlp.up_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.33.post_attention_layernorm.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.33.self_attn.k_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.33.self_attn.o_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.33.self_attn.q_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.33.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00006.bin",
"model.layers.33.self_attn.v_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.34.input_layernorm.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.34.mlp.down_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.34.mlp.gate_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.34.mlp.up_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.34.post_attention_layernorm.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.34.self_attn.k_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.34.self_attn.o_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.34.self_attn.q_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.34.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00006.bin",
"model.layers.34.self_attn.v_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.35.input_layernorm.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.35.mlp.down_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.35.mlp.gate_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.35.mlp.up_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.35.post_attention_layernorm.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.35.self_attn.k_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.35.self_attn.o_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.35.self_attn.q_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.35.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00006.bin",
"model.layers.35.self_attn.v_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.36.input_layernorm.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.36.mlp.down_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.36.mlp.gate_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.36.mlp.up_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.36.post_attention_layernorm.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.36.self_attn.k_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.36.self_attn.o_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.36.self_attn.q_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.36.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00006.bin",
"model.layers.36.self_attn.v_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.37.input_layernorm.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.37.mlp.down_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.37.mlp.gate_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.37.mlp.up_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.37.post_attention_layernorm.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.37.self_attn.k_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.37.self_attn.o_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.37.self_attn.q_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.37.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00006.bin",
"model.layers.37.self_attn.v_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.38.input_layernorm.weight": "pytorch_model-00006-of-00006.bin",
"model.layers.38.mlp.down_proj.weight": "pytorch_model-00006-of-00006.bin",
"model.layers.38.mlp.gate_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.38.mlp.up_proj.weight": "pytorch_model-00006-of-00006.bin",
"model.layers.38.post_attention_layernorm.weight": "pytorch_model-00006-of-00006.bin",
"model.layers.38.self_attn.k_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.38.self_attn.o_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.38.self_attn.q_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.38.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00006.bin",
"model.layers.38.self_attn.v_proj.weight": "pytorch_model-00005-of-00006.bin",
"model.layers.39.input_layernorm.weight": "pytorch_model-00006-of-00006.bin",
"model.layers.39.mlp.down_proj.weight": "pytorch_model-00006-of-00006.bin",
"model.layers.39.mlp.gate_proj.weight": "pytorch_model-00006-of-00006.bin",
"model.layers.39.mlp.up_proj.weight": "pytorch_model-00006-of-00006.bin",
"model.layers.39.post_attention_layernorm.weight": "pytorch_model-00006-of-00006.bin",
"model.layers.39.self_attn.k_proj.weight": "pytorch_model-00006-of-00006.bin",
"model.layers.39.self_attn.o_proj.weight": "pytorch_model-00006-of-00006.bin",
"model.layers.39.self_attn.q_proj.weight": "pytorch_model-00006-of-00006.bin",
"model.layers.39.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00006.bin",
"model.layers.39.self_attn.v_proj.weight": "pytorch_model-00006-of-00006.bin",
"model.layers.4.input_layernorm.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.4.mlp.down_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.4.mlp.gate_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.4.mlp.up_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.4.post_attention_layernorm.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.4.self_attn.k_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.4.self_attn.o_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.4.self_attn.q_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.4.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00006.bin",
"model.layers.4.self_attn.v_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.5.input_layernorm.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.5.mlp.down_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.5.mlp.gate_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.5.mlp.up_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.5.post_attention_layernorm.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.5.self_attn.k_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.5.self_attn.o_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.5.self_attn.q_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.5.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00006.bin",
"model.layers.5.self_attn.v_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.6.input_layernorm.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.6.mlp.down_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.6.mlp.gate_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.6.mlp.up_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.6.post_attention_layernorm.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.6.self_attn.k_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.6.self_attn.o_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.6.self_attn.q_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.6.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00006.bin",
"model.layers.6.self_attn.v_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.7.input_layernorm.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.7.mlp.down_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.7.mlp.gate_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.7.mlp.up_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.7.post_attention_layernorm.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.7.self_attn.k_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.7.self_attn.o_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.7.self_attn.q_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.7.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00006.bin",
"model.layers.7.self_attn.v_proj.weight": "pytorch_model-00001-of-00006.bin",
"model.layers.8.input_layernorm.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.8.mlp.down_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.8.mlp.gate_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.8.mlp.up_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.8.post_attention_layernorm.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.8.self_attn.k_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.8.self_attn.o_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.8.self_attn.q_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.8.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00006.bin",
"model.layers.8.self_attn.v_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.9.input_layernorm.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.9.mlp.down_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.9.mlp.gate_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.9.mlp.up_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.9.post_attention_layernorm.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.9.self_attn.k_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.9.self_attn.o_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.9.self_attn.q_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.layers.9.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00006.bin",
"model.layers.9.self_attn.v_proj.weight": "pytorch_model-00002-of-00006.bin",
"model.norm.weight": "pytorch_model-00006-of-00006.bin"
}
}

24
special_tokens_map.json Normal file
View File

@@ -0,0 +1,24 @@
{
"bos_token": {
"content": "<s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "</s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": "[PAD]",
"unk_token": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

3
tokenizer.model Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
size 499723

35
tokenizer_config.json Normal file
View File

@@ -0,0 +1,35 @@
{
"add_bos_token": true,
"add_eos_token": false,
"bos_token": {
"__type": "AddedToken",
"content": "<s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"clean_up_tokenization_spaces": false,
"eos_token": {
"__type": "AddedToken",
"content": "</s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"legacy": false,
"model_max_length": 512,
"pad_token": null,
"padding_side": "right",
"sp_model_kwargs": {},
"tokenizer_class": "LlamaTokenizer",
"unk_token": {
"__type": "AddedToken",
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}