初始化项目,由ModelHub XC社区提供模型

Model: IDEA-CCNL/Ziya-Coding-34B-v1.0
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-06-27 06:56:15 +08:00
commit 5f96eeb5b9
33 changed files with 798 additions and 0 deletions

35
.gitattributes vendored Normal file
View File

@@ -0,0 +1,35 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text

150
README.md Normal file
View File

@@ -0,0 +1,150 @@
---
license: gpl-3.0
language:
- zh
- en
library_name: transformers
pipeline_tag: text-generation
---
# Ziya-Coding-34B-v1.0
# 姜子牙系列模型
- [Ziya-LLaMA-13B-v1.1](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1.1)
- [Ziya-LLaMA-13B-v1](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1)
- [Ziya-LLaMA-7B-Reward](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-7B-Reward)
- [Ziya-LLaMA-13B-Pretrain-v1](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-Pretrain-v1)
- [Ziya-BLIP2-14B-Visual-v1](https://huggingface.co/IDEA-CCNL/Ziya-BLIP2-14B-Visual-v1)
- [Ziya-Writing-LLaMa-13B-v1](https://huggingface.co/IDEA-CCNL/Ziya-Writing-LLaMa-13B-v1)
- [Ziya-Coding-15B-v1](https://huggingface.co/IDEA-CCNL/Ziya-Coding-15B-v1)
## 代码仓库 Code Repository
**Homepage**: https://github.com/IDEA-CCNL/Ziya-Coding
## 简介 Brief Introduction
使用自然语言生成高质量的代码是大模型落地中的高频需求。今天IDEA研究院封神榜团队正式开源最新的代码大模型Ziya-Coding-34B-v1.0我们在HumanEval Pass@1的评测上取得了75.5的好成绩超过了GPT-467.0)的得分,也成为目前已知开源模型新高。封神榜团队正在为社区提供先进的大模型技术和经验,帮助生产和定制更多优秀垂类模型,推进大模型生态发展。
Generating high-quality code using natural language is a high-frequency demand in the deployment of large models. Today, the IDEA Research Institute's Fengshenbang team officially open-sourced the latest code model, Ziya-Coding-34B-v1.0. We achieved a good score of 75.5 on the HumanEval Pass@1 evaluation, surpassing the score of GPT-4 (67.0) and setting a new high for known open-source models. The Fengshenbang team is providing the community with advanced large model technology and experience, helping to produce and customize more excellent vertical models, and promoting the development of the large model ecosystem.
更多细节可以参考我们的公众号文章:
[再创新高姜子牙大模型开源代码大模型Ziya-Coding-34B-v1.0](https://mp.weixin.qq.com/s/Op4Wkiu2J9jwFr_Zj0YSZg)
[姜子牙大模型系列 | 代码模型ziya-coding发布低成本微调即可学会在专有场景编程](https://mp.weixin.qq.com/s/tWaRF1wL3HM87ZDEawd2UA)
## 软件依赖
```
pip install torch==1.12.1 tokenizers==0.13.3 transformers==4.31.0
```
## 模型信息 Model Information
在9月初我们开源了基于StarCoder-15B的代码模型Ziya-Coding-15B-v1我们将训练Ziya-Coding-15B-v1积累的训练经验迁移到了新版本的训练中。
我们收集并构造了约45万涵盖了几乎所有代码相关任务的指令数据进行第一阶段的微调这其中包括约10万的中文指令和35万的英文指令保证了数据的多样性在构造数据时我们充分利用了高质量的无指令代码数据使用LLM生成对应的指令扩充得到了更多高质量的代码指令数据。
同时实验过程中我们注意到代码指令的难度和正确性是训练代码模型成功的关键。因此我们引入了第二阶段的精调。我们使用evol-instruct的方法生成了大量高难度多要求的代码指令数据并利用代码编译器作为反馈筛选出能够通过编译的代码。最后利用LLM生成单元测试进一步验证代码的正确性。我们最终筛选出了46k数据在第一阶段模型的基础上使用较低的学习率进行微调最终得到了我们的Ziya-coding-34B-v1.0。
In early September, we open-sourced the code model Ziya-Coding-15B-v1 based on StarCoder-15B. The training experience accumulated in training Ziya-Coding-15B-v1 was transferred to the training of the new version.
We collected and constructed about 450,000 instruction data covering almost all code-related tasks for the first stage of fine-tuning. This includes about 100,000 Chinese instructions and 350,000 English instructions, ensuring data diversity. When constructing the data, we made full use of high-quality non-instructional code data, used LLM to generate corresponding instructions, and expanded to obtain more high-quality code instruction data.
During the experiment, we noticed that the difficulty and correctness of code instructions are key to the successful training of code models. Therefore, we introduced a second stage of fine-tuning. We used the evol-instruct method to generate a large amount of high-difficulty, multi-requirement code instruction data, and used a code compiler as feedback to filter out code that could pass compilation. Finally, we used LLM to generate unit tests to further verify the correctness of the code. We ultimately filtered out 46k data, and on the basis of the first-stage model, we fine-tuned it with a lower learning rate to finally obtain our Ziya-coding-34B-v1.0.
### 效果评估 Performance
| Model | HumanEval(pass@1) |
|:----------------------------|:-----------------:|
| **Ziya-Coding-34B-v1.0** | **75.5%** |
| CodeFuse-CodeLlama-34B | 74.4% |
| Phind-CodeLLaMa-34B-v2 | 73.8% |
| WizardCoder-Python-34B-V1.0 | 73.2% |
| GPT-4 | 67.0% |
| PanGu-Coder2 15B | 61.6% |
| WizardCoder-15B-V1.0 | 59.8% |
| CodeLlama-34b-Python | 53.7% |
| Ziya-Coding-15B-v1 | 50.1% |
| CodeLlama-34b | 48.8% |
| GPT-3.5 | 48.1% |
| StarCoder-15B | 33.6% |
其中我们对微调数据集进行了去污处理避免数据泄露HumanEval的pass@1指标是贪婪生成的结果
Prompt Format
```python3
"<human>: \nPlease Complete the given function below according to the docstring: \n{prompt}\n<bot>: \n"
```
In this process, we performed a decontamination process on the fine-tuning dataset to avoid data leakage. The pass@1 metric for HumanEval is based on the results of greedy generation.
## <span id="jump"> 使用 Usage </span>
```python3
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
device = torch.device("cuda")
prompt = "写一段快速排序"
model = AutoModelForCausalLM.from_pretrained("IDEA-CCNL/Ziya-Coding-34B-v1.0", torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("IDEA-CCNL/Ziya-Coding-34B-v1.0", use_fast=False)
input = f"<human>: \n{prompt}\n<bot>: \n"
input_ids = tokenizer(input, return_tensors="pt").input_ids.to(device)
generate_ids = model.generate(
input_ids,
max_new_tokens = 512,
do_sample = True,
top_p = 0.85,
temperature = 1.0,
repetition_penalty = 1.0,
eos_token_id = tokenizer.eos_token_id,
pad_token_id = tokenizer.pad_token_id,
)
output = tokenizer.batch_decode(generate_ids)[0]
print(output)
```
## 量化 Quantization
感谢社区优秀的工作您可以使用社区开发者为Ziya-Coding-34B-v1.0训练的量化版本。
Thanks to the excellent work of the community, you can use the quantized version trained by community developers for Ziya-Coding-34B-v1.0.
- [GPTQ](https://huggingface.co/TheBloke/Ziya-Coding-34B-v1.0-GPTQ)
- [AWQ](https://huggingface.co/TheBloke/Ziya-Coding-34B-v1.0-AWQ)
- [GGUF](https://huggingface.co/TheBloke/Ziya-Coding-34B-v1.0-GGUF)
## 引用 Citation
如果您在您的工作中使用了我们的模型,可以引用我们的[论文](https://arxiv.org/abs/2210.08590)
If you are using the resource for your work, please cite the our [paper](https://arxiv.org/abs/2210.08590):
```text
@article{fengshenbang,
author = {Jiaxing Zhang and Ruyi Gan and Junjie Wang and Yuxiang Zhang and Lin Zhang and Ping Yang and Xinyu Gao and Ziwei Wu and Xiaoqun Dong and Junqing He and Jianheng Zhuo and Qi Yang and Yongfeng Huang and Xiayu Li and Yanghan Wu and Junyu Lu and Xinyu Zhu and Weifeng Chen and Ting Han and Kunhao Pan and Rui Wang and Hao Wang and Xiaojun Wu and Zhongshen Zeng and Chongpei Chen},
title = {Fengshenbang 1.0: Being the Foundation of Chinese Cognitive Intelligence},
journal = {CoRR},
volume = {abs/2209.02970},
year = {2022}
}
```
You can also cite our [website](https://github.com/IDEA-CCNL/Fengshenbang-LM/):
欢迎引用我们的[网站](https://github.com/IDEA-CCNL/Fengshenbang-LM/):
```text
@misc{Fengshenbang-LM,
title={Fengshenbang-LM},
author={IDEA-CCNL},
year={2021},
howpublished={\url{https://github.com/IDEA-CCNL/Fengshenbang-LM}},
}
```

26
config.json Normal file
View File

@@ -0,0 +1,26 @@
{
"_name_or_path": "",
"architectures": [
"LlamaForCausalLM"
],
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 8192,
"initializer_range": 0.02,
"intermediate_size": 22016,
"max_position_embeddings": 16384,
"model_type": "llama",
"num_attention_heads": 64,
"num_hidden_layers": 48,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 1000000,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.32.1",
"use_cache": false,
"vocab_size": 32000
}

6
generation_config.json Normal file
View File

@@ -0,0 +1,6 @@
{
"_from_model_config": true,
"bos_token_id": 1,
"eos_token_id": 2,
"transformers_version": "4.32.1"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:134edecf7be43e75394a03b386727ff50b8725567f17eaa9b676b320775fe172
size 2931856879

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c2bf93f544e25617fd88cd7c4cc0c5b1c8c37d1ddfc16bb23915d3cfd3d73ce5
size 2768312403

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4eb5c9f42003c5571345e53d6f9365360b775257d20a1cb7dc9749f30ea67b4b
size 2768312403

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e831a5a14897686dd521e481ac665f1fea5c858984abb17f799bde5e92fe5a2f
size 2768312403

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2fc038631103ff701c3f40352fd3e913460232a0dd9cfc35cf607620f8458e31
size 2768312403

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:61ffd534a9c5eefc6c2267cd9332aef067ecd4ceb66c0ac72de4118858fffd60
size 2768312467

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:3369da52740e86ab04188dbeffb6fe218204b3ac3f43434a0f3775832f6a7e86
size 2768312467

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:93f141dfccf48b341e12c05b8a604c1dd4be3471d2d489bbbc11acf84690ce41
size 2768312467

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:58794a438aa54b6d345f2765d8cd735b1dbb81b63b930fb3eba1d507a72148e5
size 2768312467

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5db10c2ad1c0c62e65961874895045cfe56373a70cdc9fec030ec66788149f9e
size 2768312467

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:15d4bc7c889fcc4e6caba192d57f3928a29bd325d0233e0cbb3758d0a2099127
size 2768312467

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5ee7bad53c0919a2ffff469b276d70ca7f86bf09be50b325f84cdf1d282c691b
size 2768312467

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7e3f8623ff39bca26b62f1c1d1c91f343e0a0bde9a1c4f6271426d86eb7723a8
size 2768312467

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:50737a32a0e3c524aba1d676705b2ed4fb210ccdeaa16b62644584e60ea7ddff
size 2768312467

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:81960a1fd8a37242eb6910d8ac8abda736890fac21eb2a5c4a3078303eebe7b3
size 2768312467

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0cd7ed6a5467d4fb2111e2c1760f0b6dd03f002c70d1211e3d52a10af1cbcc1e
size 2768312467

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:3d6dc65a80c5684d8a30293479ca334711909f000add92ef7164dddfc8aefebe
size 2768312467

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:faef1759d3ed114a31cc94c260009f81d137c89ef0f475d7a378c0267962f2a8
size 2768312467

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:36f7d867168c35581b5575cfd8f75dfb124601342f8041b1b1829f8b1815fbc6
size 2768312467

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9971bf946d6b0974b2375ce551ff2ec56ea0898226b3abd8d96f78c2c5f57af8
size 2768312467

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0f60aa7fb6e0db1aa29f151928c69adcbb314cedddf4d517011f074deb44f9fd
size 2768312467

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d52039b53103e33a934c126f991ff42c33dc15a2c94ea6f0092cd70fae21ec58
size 2768312467

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:19e70d921f17859abe0b91ace2c50eb6af1238a0cdfb292c8bc53047f7bf1306
size 2768312467

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:36f282734e5635af27484204507480e3af7b3b951e46a55d8a2ebe4d59ee2c89
size 2768312467

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:713cc74edf8b0534bff4aef27cbd2573f103d58786efdafa4f1792cb5c837f72
size 885049454

View File

@@ -0,0 +1,442 @@
{
"metadata": {
"total_size": 67487940608
},
"weight_map": {
"lm_head.weight": "pytorch_model-00025-of-00025.bin",
"model.embed_tokens.weight": "pytorch_model-00001-of-00025.bin",
"model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00025.bin",
"model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00025.bin",
"model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00025.bin",
"model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00025.bin",
"model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00025.bin",
"model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00025.bin",
"model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00025.bin",
"model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00025.bin",
"model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00025.bin",
"model.layers.1.input_layernorm.weight": "pytorch_model-00002-of-00025.bin",
"model.layers.1.mlp.down_proj.weight": "pytorch_model-00002-of-00025.bin",
"model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00025.bin",
"model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00025.bin",
"model.layers.1.post_attention_layernorm.weight": "pytorch_model-00002-of-00025.bin",
"model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00025.bin",
"model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00025.bin",
"model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00025.bin",
"model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00025.bin",
"model.layers.10.input_layernorm.weight": "pytorch_model-00006-of-00025.bin",
"model.layers.10.mlp.down_proj.weight": "pytorch_model-00006-of-00025.bin",
"model.layers.10.mlp.gate_proj.weight": "pytorch_model-00006-of-00025.bin",
"model.layers.10.mlp.up_proj.weight": "pytorch_model-00006-of-00025.bin",
"model.layers.10.post_attention_layernorm.weight": "pytorch_model-00006-of-00025.bin",
"model.layers.10.self_attn.k_proj.weight": "pytorch_model-00006-of-00025.bin",
"model.layers.10.self_attn.o_proj.weight": "pytorch_model-00006-of-00025.bin",
"model.layers.10.self_attn.q_proj.weight": "pytorch_model-00006-of-00025.bin",
"model.layers.10.self_attn.v_proj.weight": "pytorch_model-00006-of-00025.bin",
"model.layers.11.input_layernorm.weight": "pytorch_model-00007-of-00025.bin",
"model.layers.11.mlp.down_proj.weight": "pytorch_model-00007-of-00025.bin",
"model.layers.11.mlp.gate_proj.weight": "pytorch_model-00006-of-00025.bin",
"model.layers.11.mlp.up_proj.weight": "pytorch_model-00006-of-00025.bin",
"model.layers.11.post_attention_layernorm.weight": "pytorch_model-00007-of-00025.bin",
"model.layers.11.self_attn.k_proj.weight": "pytorch_model-00006-of-00025.bin",
"model.layers.11.self_attn.o_proj.weight": "pytorch_model-00006-of-00025.bin",
"model.layers.11.self_attn.q_proj.weight": "pytorch_model-00006-of-00025.bin",
"model.layers.11.self_attn.v_proj.weight": "pytorch_model-00006-of-00025.bin",
"model.layers.12.input_layernorm.weight": "pytorch_model-00007-of-00025.bin",
"model.layers.12.mlp.down_proj.weight": "pytorch_model-00007-of-00025.bin",
"model.layers.12.mlp.gate_proj.weight": "pytorch_model-00007-of-00025.bin",
"model.layers.12.mlp.up_proj.weight": "pytorch_model-00007-of-00025.bin",
"model.layers.12.post_attention_layernorm.weight": "pytorch_model-00007-of-00025.bin",
"model.layers.12.self_attn.k_proj.weight": "pytorch_model-00007-of-00025.bin",
"model.layers.12.self_attn.o_proj.weight": "pytorch_model-00007-of-00025.bin",
"model.layers.12.self_attn.q_proj.weight": "pytorch_model-00007-of-00025.bin",
"model.layers.12.self_attn.v_proj.weight": "pytorch_model-00007-of-00025.bin",
"model.layers.13.input_layernorm.weight": "pytorch_model-00008-of-00025.bin",
"model.layers.13.mlp.down_proj.weight": "pytorch_model-00008-of-00025.bin",
"model.layers.13.mlp.gate_proj.weight": "pytorch_model-00007-of-00025.bin",
"model.layers.13.mlp.up_proj.weight": "pytorch_model-00007-of-00025.bin",
"model.layers.13.post_attention_layernorm.weight": "pytorch_model-00008-of-00025.bin",
"model.layers.13.self_attn.k_proj.weight": "pytorch_model-00007-of-00025.bin",
"model.layers.13.self_attn.o_proj.weight": "pytorch_model-00007-of-00025.bin",
"model.layers.13.self_attn.q_proj.weight": "pytorch_model-00007-of-00025.bin",
"model.layers.13.self_attn.v_proj.weight": "pytorch_model-00007-of-00025.bin",
"model.layers.14.input_layernorm.weight": "pytorch_model-00008-of-00025.bin",
"model.layers.14.mlp.down_proj.weight": "pytorch_model-00008-of-00025.bin",
"model.layers.14.mlp.gate_proj.weight": "pytorch_model-00008-of-00025.bin",
"model.layers.14.mlp.up_proj.weight": "pytorch_model-00008-of-00025.bin",
"model.layers.14.post_attention_layernorm.weight": "pytorch_model-00008-of-00025.bin",
"model.layers.14.self_attn.k_proj.weight": "pytorch_model-00008-of-00025.bin",
"model.layers.14.self_attn.o_proj.weight": "pytorch_model-00008-of-00025.bin",
"model.layers.14.self_attn.q_proj.weight": "pytorch_model-00008-of-00025.bin",
"model.layers.14.self_attn.v_proj.weight": "pytorch_model-00008-of-00025.bin",
"model.layers.15.input_layernorm.weight": "pytorch_model-00009-of-00025.bin",
"model.layers.15.mlp.down_proj.weight": "pytorch_model-00009-of-00025.bin",
"model.layers.15.mlp.gate_proj.weight": "pytorch_model-00008-of-00025.bin",
"model.layers.15.mlp.up_proj.weight": "pytorch_model-00008-of-00025.bin",
"model.layers.15.post_attention_layernorm.weight": "pytorch_model-00009-of-00025.bin",
"model.layers.15.self_attn.k_proj.weight": "pytorch_model-00008-of-00025.bin",
"model.layers.15.self_attn.o_proj.weight": "pytorch_model-00008-of-00025.bin",
"model.layers.15.self_attn.q_proj.weight": "pytorch_model-00008-of-00025.bin",
"model.layers.15.self_attn.v_proj.weight": "pytorch_model-00008-of-00025.bin",
"model.layers.16.input_layernorm.weight": "pytorch_model-00009-of-00025.bin",
"model.layers.16.mlp.down_proj.weight": "pytorch_model-00009-of-00025.bin",
"model.layers.16.mlp.gate_proj.weight": "pytorch_model-00009-of-00025.bin",
"model.layers.16.mlp.up_proj.weight": "pytorch_model-00009-of-00025.bin",
"model.layers.16.post_attention_layernorm.weight": "pytorch_model-00009-of-00025.bin",
"model.layers.16.self_attn.k_proj.weight": "pytorch_model-00009-of-00025.bin",
"model.layers.16.self_attn.o_proj.weight": "pytorch_model-00009-of-00025.bin",
"model.layers.16.self_attn.q_proj.weight": "pytorch_model-00009-of-00025.bin",
"model.layers.16.self_attn.v_proj.weight": "pytorch_model-00009-of-00025.bin",
"model.layers.17.input_layernorm.weight": "pytorch_model-00010-of-00025.bin",
"model.layers.17.mlp.down_proj.weight": "pytorch_model-00010-of-00025.bin",
"model.layers.17.mlp.gate_proj.weight": "pytorch_model-00009-of-00025.bin",
"model.layers.17.mlp.up_proj.weight": "pytorch_model-00009-of-00025.bin",
"model.layers.17.post_attention_layernorm.weight": "pytorch_model-00010-of-00025.bin",
"model.layers.17.self_attn.k_proj.weight": "pytorch_model-00009-of-00025.bin",
"model.layers.17.self_attn.o_proj.weight": "pytorch_model-00009-of-00025.bin",
"model.layers.17.self_attn.q_proj.weight": "pytorch_model-00009-of-00025.bin",
"model.layers.17.self_attn.v_proj.weight": "pytorch_model-00009-of-00025.bin",
"model.layers.18.input_layernorm.weight": "pytorch_model-00010-of-00025.bin",
"model.layers.18.mlp.down_proj.weight": "pytorch_model-00010-of-00025.bin",
"model.layers.18.mlp.gate_proj.weight": "pytorch_model-00010-of-00025.bin",
"model.layers.18.mlp.up_proj.weight": "pytorch_model-00010-of-00025.bin",
"model.layers.18.post_attention_layernorm.weight": "pytorch_model-00010-of-00025.bin",
"model.layers.18.self_attn.k_proj.weight": "pytorch_model-00010-of-00025.bin",
"model.layers.18.self_attn.o_proj.weight": "pytorch_model-00010-of-00025.bin",
"model.layers.18.self_attn.q_proj.weight": "pytorch_model-00010-of-00025.bin",
"model.layers.18.self_attn.v_proj.weight": "pytorch_model-00010-of-00025.bin",
"model.layers.19.input_layernorm.weight": "pytorch_model-00011-of-00025.bin",
"model.layers.19.mlp.down_proj.weight": "pytorch_model-00011-of-00025.bin",
"model.layers.19.mlp.gate_proj.weight": "pytorch_model-00010-of-00025.bin",
"model.layers.19.mlp.up_proj.weight": "pytorch_model-00010-of-00025.bin",
"model.layers.19.post_attention_layernorm.weight": "pytorch_model-00011-of-00025.bin",
"model.layers.19.self_attn.k_proj.weight": "pytorch_model-00010-of-00025.bin",
"model.layers.19.self_attn.o_proj.weight": "pytorch_model-00010-of-00025.bin",
"model.layers.19.self_attn.q_proj.weight": "pytorch_model-00010-of-00025.bin",
"model.layers.19.self_attn.v_proj.weight": "pytorch_model-00010-of-00025.bin",
"model.layers.2.input_layernorm.weight": "pytorch_model-00002-of-00025.bin",
"model.layers.2.mlp.down_proj.weight": "pytorch_model-00002-of-00025.bin",
"model.layers.2.mlp.gate_proj.weight": "pytorch_model-00002-of-00025.bin",
"model.layers.2.mlp.up_proj.weight": "pytorch_model-00002-of-00025.bin",
"model.layers.2.post_attention_layernorm.weight": "pytorch_model-00002-of-00025.bin",
"model.layers.2.self_attn.k_proj.weight": "pytorch_model-00002-of-00025.bin",
"model.layers.2.self_attn.o_proj.weight": "pytorch_model-00002-of-00025.bin",
"model.layers.2.self_attn.q_proj.weight": "pytorch_model-00002-of-00025.bin",
"model.layers.2.self_attn.v_proj.weight": "pytorch_model-00002-of-00025.bin",
"model.layers.20.input_layernorm.weight": "pytorch_model-00011-of-00025.bin",
"model.layers.20.mlp.down_proj.weight": "pytorch_model-00011-of-00025.bin",
"model.layers.20.mlp.gate_proj.weight": "pytorch_model-00011-of-00025.bin",
"model.layers.20.mlp.up_proj.weight": "pytorch_model-00011-of-00025.bin",
"model.layers.20.post_attention_layernorm.weight": "pytorch_model-00011-of-00025.bin",
"model.layers.20.self_attn.k_proj.weight": "pytorch_model-00011-of-00025.bin",
"model.layers.20.self_attn.o_proj.weight": "pytorch_model-00011-of-00025.bin",
"model.layers.20.self_attn.q_proj.weight": "pytorch_model-00011-of-00025.bin",
"model.layers.20.self_attn.v_proj.weight": "pytorch_model-00011-of-00025.bin",
"model.layers.21.input_layernorm.weight": "pytorch_model-00012-of-00025.bin",
"model.layers.21.mlp.down_proj.weight": "pytorch_model-00012-of-00025.bin",
"model.layers.21.mlp.gate_proj.weight": "pytorch_model-00011-of-00025.bin",
"model.layers.21.mlp.up_proj.weight": "pytorch_model-00011-of-00025.bin",
"model.layers.21.post_attention_layernorm.weight": "pytorch_model-00012-of-00025.bin",
"model.layers.21.self_attn.k_proj.weight": "pytorch_model-00011-of-00025.bin",
"model.layers.21.self_attn.o_proj.weight": "pytorch_model-00011-of-00025.bin",
"model.layers.21.self_attn.q_proj.weight": "pytorch_model-00011-of-00025.bin",
"model.layers.21.self_attn.v_proj.weight": "pytorch_model-00011-of-00025.bin",
"model.layers.22.input_layernorm.weight": "pytorch_model-00012-of-00025.bin",
"model.layers.22.mlp.down_proj.weight": "pytorch_model-00012-of-00025.bin",
"model.layers.22.mlp.gate_proj.weight": "pytorch_model-00012-of-00025.bin",
"model.layers.22.mlp.up_proj.weight": "pytorch_model-00012-of-00025.bin",
"model.layers.22.post_attention_layernorm.weight": "pytorch_model-00012-of-00025.bin",
"model.layers.22.self_attn.k_proj.weight": "pytorch_model-00012-of-00025.bin",
"model.layers.22.self_attn.o_proj.weight": "pytorch_model-00012-of-00025.bin",
"model.layers.22.self_attn.q_proj.weight": "pytorch_model-00012-of-00025.bin",
"model.layers.22.self_attn.v_proj.weight": "pytorch_model-00012-of-00025.bin",
"model.layers.23.input_layernorm.weight": "pytorch_model-00013-of-00025.bin",
"model.layers.23.mlp.down_proj.weight": "pytorch_model-00013-of-00025.bin",
"model.layers.23.mlp.gate_proj.weight": "pytorch_model-00012-of-00025.bin",
"model.layers.23.mlp.up_proj.weight": "pytorch_model-00012-of-00025.bin",
"model.layers.23.post_attention_layernorm.weight": "pytorch_model-00013-of-00025.bin",
"model.layers.23.self_attn.k_proj.weight": "pytorch_model-00012-of-00025.bin",
"model.layers.23.self_attn.o_proj.weight": "pytorch_model-00012-of-00025.bin",
"model.layers.23.self_attn.q_proj.weight": "pytorch_model-00012-of-00025.bin",
"model.layers.23.self_attn.v_proj.weight": "pytorch_model-00012-of-00025.bin",
"model.layers.24.input_layernorm.weight": "pytorch_model-00013-of-00025.bin",
"model.layers.24.mlp.down_proj.weight": "pytorch_model-00013-of-00025.bin",
"model.layers.24.mlp.gate_proj.weight": "pytorch_model-00013-of-00025.bin",
"model.layers.24.mlp.up_proj.weight": "pytorch_model-00013-of-00025.bin",
"model.layers.24.post_attention_layernorm.weight": "pytorch_model-00013-of-00025.bin",
"model.layers.24.self_attn.k_proj.weight": "pytorch_model-00013-of-00025.bin",
"model.layers.24.self_attn.o_proj.weight": "pytorch_model-00013-of-00025.bin",
"model.layers.24.self_attn.q_proj.weight": "pytorch_model-00013-of-00025.bin",
"model.layers.24.self_attn.v_proj.weight": "pytorch_model-00013-of-00025.bin",
"model.layers.25.input_layernorm.weight": "pytorch_model-00014-of-00025.bin",
"model.layers.25.mlp.down_proj.weight": "pytorch_model-00014-of-00025.bin",
"model.layers.25.mlp.gate_proj.weight": "pytorch_model-00013-of-00025.bin",
"model.layers.25.mlp.up_proj.weight": "pytorch_model-00013-of-00025.bin",
"model.layers.25.post_attention_layernorm.weight": "pytorch_model-00014-of-00025.bin",
"model.layers.25.self_attn.k_proj.weight": "pytorch_model-00013-of-00025.bin",
"model.layers.25.self_attn.o_proj.weight": "pytorch_model-00013-of-00025.bin",
"model.layers.25.self_attn.q_proj.weight": "pytorch_model-00013-of-00025.bin",
"model.layers.25.self_attn.v_proj.weight": "pytorch_model-00013-of-00025.bin",
"model.layers.26.input_layernorm.weight": "pytorch_model-00014-of-00025.bin",
"model.layers.26.mlp.down_proj.weight": "pytorch_model-00014-of-00025.bin",
"model.layers.26.mlp.gate_proj.weight": "pytorch_model-00014-of-00025.bin",
"model.layers.26.mlp.up_proj.weight": "pytorch_model-00014-of-00025.bin",
"model.layers.26.post_attention_layernorm.weight": "pytorch_model-00014-of-00025.bin",
"model.layers.26.self_attn.k_proj.weight": "pytorch_model-00014-of-00025.bin",
"model.layers.26.self_attn.o_proj.weight": "pytorch_model-00014-of-00025.bin",
"model.layers.26.self_attn.q_proj.weight": "pytorch_model-00014-of-00025.bin",
"model.layers.26.self_attn.v_proj.weight": "pytorch_model-00014-of-00025.bin",
"model.layers.27.input_layernorm.weight": "pytorch_model-00015-of-00025.bin",
"model.layers.27.mlp.down_proj.weight": "pytorch_model-00015-of-00025.bin",
"model.layers.27.mlp.gate_proj.weight": "pytorch_model-00014-of-00025.bin",
"model.layers.27.mlp.up_proj.weight": "pytorch_model-00014-of-00025.bin",
"model.layers.27.post_attention_layernorm.weight": "pytorch_model-00015-of-00025.bin",
"model.layers.27.self_attn.k_proj.weight": "pytorch_model-00014-of-00025.bin",
"model.layers.27.self_attn.o_proj.weight": "pytorch_model-00014-of-00025.bin",
"model.layers.27.self_attn.q_proj.weight": "pytorch_model-00014-of-00025.bin",
"model.layers.27.self_attn.v_proj.weight": "pytorch_model-00014-of-00025.bin",
"model.layers.28.input_layernorm.weight": "pytorch_model-00015-of-00025.bin",
"model.layers.28.mlp.down_proj.weight": "pytorch_model-00015-of-00025.bin",
"model.layers.28.mlp.gate_proj.weight": "pytorch_model-00015-of-00025.bin",
"model.layers.28.mlp.up_proj.weight": "pytorch_model-00015-of-00025.bin",
"model.layers.28.post_attention_layernorm.weight": "pytorch_model-00015-of-00025.bin",
"model.layers.28.self_attn.k_proj.weight": "pytorch_model-00015-of-00025.bin",
"model.layers.28.self_attn.o_proj.weight": "pytorch_model-00015-of-00025.bin",
"model.layers.28.self_attn.q_proj.weight": "pytorch_model-00015-of-00025.bin",
"model.layers.28.self_attn.v_proj.weight": "pytorch_model-00015-of-00025.bin",
"model.layers.29.input_layernorm.weight": "pytorch_model-00016-of-00025.bin",
"model.layers.29.mlp.down_proj.weight": "pytorch_model-00016-of-00025.bin",
"model.layers.29.mlp.gate_proj.weight": "pytorch_model-00015-of-00025.bin",
"model.layers.29.mlp.up_proj.weight": "pytorch_model-00015-of-00025.bin",
"model.layers.29.post_attention_layernorm.weight": "pytorch_model-00016-of-00025.bin",
"model.layers.29.self_attn.k_proj.weight": "pytorch_model-00015-of-00025.bin",
"model.layers.29.self_attn.o_proj.weight": "pytorch_model-00015-of-00025.bin",
"model.layers.29.self_attn.q_proj.weight": "pytorch_model-00015-of-00025.bin",
"model.layers.29.self_attn.v_proj.weight": "pytorch_model-00015-of-00025.bin",
"model.layers.3.input_layernorm.weight": "pytorch_model-00003-of-00025.bin",
"model.layers.3.mlp.down_proj.weight": "pytorch_model-00003-of-00025.bin",
"model.layers.3.mlp.gate_proj.weight": "pytorch_model-00002-of-00025.bin",
"model.layers.3.mlp.up_proj.weight": "pytorch_model-00002-of-00025.bin",
"model.layers.3.post_attention_layernorm.weight": "pytorch_model-00003-of-00025.bin",
"model.layers.3.self_attn.k_proj.weight": "pytorch_model-00002-of-00025.bin",
"model.layers.3.self_attn.o_proj.weight": "pytorch_model-00002-of-00025.bin",
"model.layers.3.self_attn.q_proj.weight": "pytorch_model-00002-of-00025.bin",
"model.layers.3.self_attn.v_proj.weight": "pytorch_model-00002-of-00025.bin",
"model.layers.30.input_layernorm.weight": "pytorch_model-00016-of-00025.bin",
"model.layers.30.mlp.down_proj.weight": "pytorch_model-00016-of-00025.bin",
"model.layers.30.mlp.gate_proj.weight": "pytorch_model-00016-of-00025.bin",
"model.layers.30.mlp.up_proj.weight": "pytorch_model-00016-of-00025.bin",
"model.layers.30.post_attention_layernorm.weight": "pytorch_model-00016-of-00025.bin",
"model.layers.30.self_attn.k_proj.weight": "pytorch_model-00016-of-00025.bin",
"model.layers.30.self_attn.o_proj.weight": "pytorch_model-00016-of-00025.bin",
"model.layers.30.self_attn.q_proj.weight": "pytorch_model-00016-of-00025.bin",
"model.layers.30.self_attn.v_proj.weight": "pytorch_model-00016-of-00025.bin",
"model.layers.31.input_layernorm.weight": "pytorch_model-00017-of-00025.bin",
"model.layers.31.mlp.down_proj.weight": "pytorch_model-00017-of-00025.bin",
"model.layers.31.mlp.gate_proj.weight": "pytorch_model-00016-of-00025.bin",
"model.layers.31.mlp.up_proj.weight": "pytorch_model-00016-of-00025.bin",
"model.layers.31.post_attention_layernorm.weight": "pytorch_model-00017-of-00025.bin",
"model.layers.31.self_attn.k_proj.weight": "pytorch_model-00016-of-00025.bin",
"model.layers.31.self_attn.o_proj.weight": "pytorch_model-00016-of-00025.bin",
"model.layers.31.self_attn.q_proj.weight": "pytorch_model-00016-of-00025.bin",
"model.layers.31.self_attn.v_proj.weight": "pytorch_model-00016-of-00025.bin",
"model.layers.32.input_layernorm.weight": "pytorch_model-00017-of-00025.bin",
"model.layers.32.mlp.down_proj.weight": "pytorch_model-00017-of-00025.bin",
"model.layers.32.mlp.gate_proj.weight": "pytorch_model-00017-of-00025.bin",
"model.layers.32.mlp.up_proj.weight": "pytorch_model-00017-of-00025.bin",
"model.layers.32.post_attention_layernorm.weight": "pytorch_model-00017-of-00025.bin",
"model.layers.32.self_attn.k_proj.weight": "pytorch_model-00017-of-00025.bin",
"model.layers.32.self_attn.o_proj.weight": "pytorch_model-00017-of-00025.bin",
"model.layers.32.self_attn.q_proj.weight": "pytorch_model-00017-of-00025.bin",
"model.layers.32.self_attn.v_proj.weight": "pytorch_model-00017-of-00025.bin",
"model.layers.33.input_layernorm.weight": "pytorch_model-00018-of-00025.bin",
"model.layers.33.mlp.down_proj.weight": "pytorch_model-00018-of-00025.bin",
"model.layers.33.mlp.gate_proj.weight": "pytorch_model-00017-of-00025.bin",
"model.layers.33.mlp.up_proj.weight": "pytorch_model-00017-of-00025.bin",
"model.layers.33.post_attention_layernorm.weight": "pytorch_model-00018-of-00025.bin",
"model.layers.33.self_attn.k_proj.weight": "pytorch_model-00017-of-00025.bin",
"model.layers.33.self_attn.o_proj.weight": "pytorch_model-00017-of-00025.bin",
"model.layers.33.self_attn.q_proj.weight": "pytorch_model-00017-of-00025.bin",
"model.layers.33.self_attn.v_proj.weight": "pytorch_model-00017-of-00025.bin",
"model.layers.34.input_layernorm.weight": "pytorch_model-00018-of-00025.bin",
"model.layers.34.mlp.down_proj.weight": "pytorch_model-00018-of-00025.bin",
"model.layers.34.mlp.gate_proj.weight": "pytorch_model-00018-of-00025.bin",
"model.layers.34.mlp.up_proj.weight": "pytorch_model-00018-of-00025.bin",
"model.layers.34.post_attention_layernorm.weight": "pytorch_model-00018-of-00025.bin",
"model.layers.34.self_attn.k_proj.weight": "pytorch_model-00018-of-00025.bin",
"model.layers.34.self_attn.o_proj.weight": "pytorch_model-00018-of-00025.bin",
"model.layers.34.self_attn.q_proj.weight": "pytorch_model-00018-of-00025.bin",
"model.layers.34.self_attn.v_proj.weight": "pytorch_model-00018-of-00025.bin",
"model.layers.35.input_layernorm.weight": "pytorch_model-00019-of-00025.bin",
"model.layers.35.mlp.down_proj.weight": "pytorch_model-00019-of-00025.bin",
"model.layers.35.mlp.gate_proj.weight": "pytorch_model-00018-of-00025.bin",
"model.layers.35.mlp.up_proj.weight": "pytorch_model-00018-of-00025.bin",
"model.layers.35.post_attention_layernorm.weight": "pytorch_model-00019-of-00025.bin",
"model.layers.35.self_attn.k_proj.weight": "pytorch_model-00018-of-00025.bin",
"model.layers.35.self_attn.o_proj.weight": "pytorch_model-00018-of-00025.bin",
"model.layers.35.self_attn.q_proj.weight": "pytorch_model-00018-of-00025.bin",
"model.layers.35.self_attn.v_proj.weight": "pytorch_model-00018-of-00025.bin",
"model.layers.36.input_layernorm.weight": "pytorch_model-00019-of-00025.bin",
"model.layers.36.mlp.down_proj.weight": "pytorch_model-00019-of-00025.bin",
"model.layers.36.mlp.gate_proj.weight": "pytorch_model-00019-of-00025.bin",
"model.layers.36.mlp.up_proj.weight": "pytorch_model-00019-of-00025.bin",
"model.layers.36.post_attention_layernorm.weight": "pytorch_model-00019-of-00025.bin",
"model.layers.36.self_attn.k_proj.weight": "pytorch_model-00019-of-00025.bin",
"model.layers.36.self_attn.o_proj.weight": "pytorch_model-00019-of-00025.bin",
"model.layers.36.self_attn.q_proj.weight": "pytorch_model-00019-of-00025.bin",
"model.layers.36.self_attn.v_proj.weight": "pytorch_model-00019-of-00025.bin",
"model.layers.37.input_layernorm.weight": "pytorch_model-00020-of-00025.bin",
"model.layers.37.mlp.down_proj.weight": "pytorch_model-00020-of-00025.bin",
"model.layers.37.mlp.gate_proj.weight": "pytorch_model-00019-of-00025.bin",
"model.layers.37.mlp.up_proj.weight": "pytorch_model-00019-of-00025.bin",
"model.layers.37.post_attention_layernorm.weight": "pytorch_model-00020-of-00025.bin",
"model.layers.37.self_attn.k_proj.weight": "pytorch_model-00019-of-00025.bin",
"model.layers.37.self_attn.o_proj.weight": "pytorch_model-00019-of-00025.bin",
"model.layers.37.self_attn.q_proj.weight": "pytorch_model-00019-of-00025.bin",
"model.layers.37.self_attn.v_proj.weight": "pytorch_model-00019-of-00025.bin",
"model.layers.38.input_layernorm.weight": "pytorch_model-00020-of-00025.bin",
"model.layers.38.mlp.down_proj.weight": "pytorch_model-00020-of-00025.bin",
"model.layers.38.mlp.gate_proj.weight": "pytorch_model-00020-of-00025.bin",
"model.layers.38.mlp.up_proj.weight": "pytorch_model-00020-of-00025.bin",
"model.layers.38.post_attention_layernorm.weight": "pytorch_model-00020-of-00025.bin",
"model.layers.38.self_attn.k_proj.weight": "pytorch_model-00020-of-00025.bin",
"model.layers.38.self_attn.o_proj.weight": "pytorch_model-00020-of-00025.bin",
"model.layers.38.self_attn.q_proj.weight": "pytorch_model-00020-of-00025.bin",
"model.layers.38.self_attn.v_proj.weight": "pytorch_model-00020-of-00025.bin",
"model.layers.39.input_layernorm.weight": "pytorch_model-00021-of-00025.bin",
"model.layers.39.mlp.down_proj.weight": "pytorch_model-00021-of-00025.bin",
"model.layers.39.mlp.gate_proj.weight": "pytorch_model-00020-of-00025.bin",
"model.layers.39.mlp.up_proj.weight": "pytorch_model-00020-of-00025.bin",
"model.layers.39.post_attention_layernorm.weight": "pytorch_model-00021-of-00025.bin",
"model.layers.39.self_attn.k_proj.weight": "pytorch_model-00020-of-00025.bin",
"model.layers.39.self_attn.o_proj.weight": "pytorch_model-00020-of-00025.bin",
"model.layers.39.self_attn.q_proj.weight": "pytorch_model-00020-of-00025.bin",
"model.layers.39.self_attn.v_proj.weight": "pytorch_model-00020-of-00025.bin",
"model.layers.4.input_layernorm.weight": "pytorch_model-00003-of-00025.bin",
"model.layers.4.mlp.down_proj.weight": "pytorch_model-00003-of-00025.bin",
"model.layers.4.mlp.gate_proj.weight": "pytorch_model-00003-of-00025.bin",
"model.layers.4.mlp.up_proj.weight": "pytorch_model-00003-of-00025.bin",
"model.layers.4.post_attention_layernorm.weight": "pytorch_model-00003-of-00025.bin",
"model.layers.4.self_attn.k_proj.weight": "pytorch_model-00003-of-00025.bin",
"model.layers.4.self_attn.o_proj.weight": "pytorch_model-00003-of-00025.bin",
"model.layers.4.self_attn.q_proj.weight": "pytorch_model-00003-of-00025.bin",
"model.layers.4.self_attn.v_proj.weight": "pytorch_model-00003-of-00025.bin",
"model.layers.40.input_layernorm.weight": "pytorch_model-00021-of-00025.bin",
"model.layers.40.mlp.down_proj.weight": "pytorch_model-00021-of-00025.bin",
"model.layers.40.mlp.gate_proj.weight": "pytorch_model-00021-of-00025.bin",
"model.layers.40.mlp.up_proj.weight": "pytorch_model-00021-of-00025.bin",
"model.layers.40.post_attention_layernorm.weight": "pytorch_model-00021-of-00025.bin",
"model.layers.40.self_attn.k_proj.weight": "pytorch_model-00021-of-00025.bin",
"model.layers.40.self_attn.o_proj.weight": "pytorch_model-00021-of-00025.bin",
"model.layers.40.self_attn.q_proj.weight": "pytorch_model-00021-of-00025.bin",
"model.layers.40.self_attn.v_proj.weight": "pytorch_model-00021-of-00025.bin",
"model.layers.41.input_layernorm.weight": "pytorch_model-00022-of-00025.bin",
"model.layers.41.mlp.down_proj.weight": "pytorch_model-00022-of-00025.bin",
"model.layers.41.mlp.gate_proj.weight": "pytorch_model-00021-of-00025.bin",
"model.layers.41.mlp.up_proj.weight": "pytorch_model-00021-of-00025.bin",
"model.layers.41.post_attention_layernorm.weight": "pytorch_model-00022-of-00025.bin",
"model.layers.41.self_attn.k_proj.weight": "pytorch_model-00021-of-00025.bin",
"model.layers.41.self_attn.o_proj.weight": "pytorch_model-00021-of-00025.bin",
"model.layers.41.self_attn.q_proj.weight": "pytorch_model-00021-of-00025.bin",
"model.layers.41.self_attn.v_proj.weight": "pytorch_model-00021-of-00025.bin",
"model.layers.42.input_layernorm.weight": "pytorch_model-00022-of-00025.bin",
"model.layers.42.mlp.down_proj.weight": "pytorch_model-00022-of-00025.bin",
"model.layers.42.mlp.gate_proj.weight": "pytorch_model-00022-of-00025.bin",
"model.layers.42.mlp.up_proj.weight": "pytorch_model-00022-of-00025.bin",
"model.layers.42.post_attention_layernorm.weight": "pytorch_model-00022-of-00025.bin",
"model.layers.42.self_attn.k_proj.weight": "pytorch_model-00022-of-00025.bin",
"model.layers.42.self_attn.o_proj.weight": "pytorch_model-00022-of-00025.bin",
"model.layers.42.self_attn.q_proj.weight": "pytorch_model-00022-of-00025.bin",
"model.layers.42.self_attn.v_proj.weight": "pytorch_model-00022-of-00025.bin",
"model.layers.43.input_layernorm.weight": "pytorch_model-00023-of-00025.bin",
"model.layers.43.mlp.down_proj.weight": "pytorch_model-00023-of-00025.bin",
"model.layers.43.mlp.gate_proj.weight": "pytorch_model-00022-of-00025.bin",
"model.layers.43.mlp.up_proj.weight": "pytorch_model-00022-of-00025.bin",
"model.layers.43.post_attention_layernorm.weight": "pytorch_model-00023-of-00025.bin",
"model.layers.43.self_attn.k_proj.weight": "pytorch_model-00022-of-00025.bin",
"model.layers.43.self_attn.o_proj.weight": "pytorch_model-00022-of-00025.bin",
"model.layers.43.self_attn.q_proj.weight": "pytorch_model-00022-of-00025.bin",
"model.layers.43.self_attn.v_proj.weight": "pytorch_model-00022-of-00025.bin",
"model.layers.44.input_layernorm.weight": "pytorch_model-00023-of-00025.bin",
"model.layers.44.mlp.down_proj.weight": "pytorch_model-00023-of-00025.bin",
"model.layers.44.mlp.gate_proj.weight": "pytorch_model-00023-of-00025.bin",
"model.layers.44.mlp.up_proj.weight": "pytorch_model-00023-of-00025.bin",
"model.layers.44.post_attention_layernorm.weight": "pytorch_model-00023-of-00025.bin",
"model.layers.44.self_attn.k_proj.weight": "pytorch_model-00023-of-00025.bin",
"model.layers.44.self_attn.o_proj.weight": "pytorch_model-00023-of-00025.bin",
"model.layers.44.self_attn.q_proj.weight": "pytorch_model-00023-of-00025.bin",
"model.layers.44.self_attn.v_proj.weight": "pytorch_model-00023-of-00025.bin",
"model.layers.45.input_layernorm.weight": "pytorch_model-00024-of-00025.bin",
"model.layers.45.mlp.down_proj.weight": "pytorch_model-00024-of-00025.bin",
"model.layers.45.mlp.gate_proj.weight": "pytorch_model-00023-of-00025.bin",
"model.layers.45.mlp.up_proj.weight": "pytorch_model-00023-of-00025.bin",
"model.layers.45.post_attention_layernorm.weight": "pytorch_model-00024-of-00025.bin",
"model.layers.45.self_attn.k_proj.weight": "pytorch_model-00023-of-00025.bin",
"model.layers.45.self_attn.o_proj.weight": "pytorch_model-00023-of-00025.bin",
"model.layers.45.self_attn.q_proj.weight": "pytorch_model-00023-of-00025.bin",
"model.layers.45.self_attn.v_proj.weight": "pytorch_model-00023-of-00025.bin",
"model.layers.46.input_layernorm.weight": "pytorch_model-00024-of-00025.bin",
"model.layers.46.mlp.down_proj.weight": "pytorch_model-00024-of-00025.bin",
"model.layers.46.mlp.gate_proj.weight": "pytorch_model-00024-of-00025.bin",
"model.layers.46.mlp.up_proj.weight": "pytorch_model-00024-of-00025.bin",
"model.layers.46.post_attention_layernorm.weight": "pytorch_model-00024-of-00025.bin",
"model.layers.46.self_attn.k_proj.weight": "pytorch_model-00024-of-00025.bin",
"model.layers.46.self_attn.o_proj.weight": "pytorch_model-00024-of-00025.bin",
"model.layers.46.self_attn.q_proj.weight": "pytorch_model-00024-of-00025.bin",
"model.layers.46.self_attn.v_proj.weight": "pytorch_model-00024-of-00025.bin",
"model.layers.47.input_layernorm.weight": "pytorch_model-00025-of-00025.bin",
"model.layers.47.mlp.down_proj.weight": "pytorch_model-00025-of-00025.bin",
"model.layers.47.mlp.gate_proj.weight": "pytorch_model-00024-of-00025.bin",
"model.layers.47.mlp.up_proj.weight": "pytorch_model-00024-of-00025.bin",
"model.layers.47.post_attention_layernorm.weight": "pytorch_model-00025-of-00025.bin",
"model.layers.47.self_attn.k_proj.weight": "pytorch_model-00024-of-00025.bin",
"model.layers.47.self_attn.o_proj.weight": "pytorch_model-00024-of-00025.bin",
"model.layers.47.self_attn.q_proj.weight": "pytorch_model-00024-of-00025.bin",
"model.layers.47.self_attn.v_proj.weight": "pytorch_model-00024-of-00025.bin",
"model.layers.5.input_layernorm.weight": "pytorch_model-00004-of-00025.bin",
"model.layers.5.mlp.down_proj.weight": "pytorch_model-00004-of-00025.bin",
"model.layers.5.mlp.gate_proj.weight": "pytorch_model-00003-of-00025.bin",
"model.layers.5.mlp.up_proj.weight": "pytorch_model-00003-of-00025.bin",
"model.layers.5.post_attention_layernorm.weight": "pytorch_model-00004-of-00025.bin",
"model.layers.5.self_attn.k_proj.weight": "pytorch_model-00003-of-00025.bin",
"model.layers.5.self_attn.o_proj.weight": "pytorch_model-00003-of-00025.bin",
"model.layers.5.self_attn.q_proj.weight": "pytorch_model-00003-of-00025.bin",
"model.layers.5.self_attn.v_proj.weight": "pytorch_model-00003-of-00025.bin",
"model.layers.6.input_layernorm.weight": "pytorch_model-00004-of-00025.bin",
"model.layers.6.mlp.down_proj.weight": "pytorch_model-00004-of-00025.bin",
"model.layers.6.mlp.gate_proj.weight": "pytorch_model-00004-of-00025.bin",
"model.layers.6.mlp.up_proj.weight": "pytorch_model-00004-of-00025.bin",
"model.layers.6.post_attention_layernorm.weight": "pytorch_model-00004-of-00025.bin",
"model.layers.6.self_attn.k_proj.weight": "pytorch_model-00004-of-00025.bin",
"model.layers.6.self_attn.o_proj.weight": "pytorch_model-00004-of-00025.bin",
"model.layers.6.self_attn.q_proj.weight": "pytorch_model-00004-of-00025.bin",
"model.layers.6.self_attn.v_proj.weight": "pytorch_model-00004-of-00025.bin",
"model.layers.7.input_layernorm.weight": "pytorch_model-00005-of-00025.bin",
"model.layers.7.mlp.down_proj.weight": "pytorch_model-00005-of-00025.bin",
"model.layers.7.mlp.gate_proj.weight": "pytorch_model-00004-of-00025.bin",
"model.layers.7.mlp.up_proj.weight": "pytorch_model-00004-of-00025.bin",
"model.layers.7.post_attention_layernorm.weight": "pytorch_model-00005-of-00025.bin",
"model.layers.7.self_attn.k_proj.weight": "pytorch_model-00004-of-00025.bin",
"model.layers.7.self_attn.o_proj.weight": "pytorch_model-00004-of-00025.bin",
"model.layers.7.self_attn.q_proj.weight": "pytorch_model-00004-of-00025.bin",
"model.layers.7.self_attn.v_proj.weight": "pytorch_model-00004-of-00025.bin",
"model.layers.8.input_layernorm.weight": "pytorch_model-00005-of-00025.bin",
"model.layers.8.mlp.down_proj.weight": "pytorch_model-00005-of-00025.bin",
"model.layers.8.mlp.gate_proj.weight": "pytorch_model-00005-of-00025.bin",
"model.layers.8.mlp.up_proj.weight": "pytorch_model-00005-of-00025.bin",
"model.layers.8.post_attention_layernorm.weight": "pytorch_model-00005-of-00025.bin",
"model.layers.8.self_attn.k_proj.weight": "pytorch_model-00005-of-00025.bin",
"model.layers.8.self_attn.o_proj.weight": "pytorch_model-00005-of-00025.bin",
"model.layers.8.self_attn.q_proj.weight": "pytorch_model-00005-of-00025.bin",
"model.layers.8.self_attn.v_proj.weight": "pytorch_model-00005-of-00025.bin",
"model.layers.9.input_layernorm.weight": "pytorch_model-00006-of-00025.bin",
"model.layers.9.mlp.down_proj.weight": "pytorch_model-00006-of-00025.bin",
"model.layers.9.mlp.gate_proj.weight": "pytorch_model-00005-of-00025.bin",
"model.layers.9.mlp.up_proj.weight": "pytorch_model-00005-of-00025.bin",
"model.layers.9.post_attention_layernorm.weight": "pytorch_model-00006-of-00025.bin",
"model.layers.9.self_attn.k_proj.weight": "pytorch_model-00005-of-00025.bin",
"model.layers.9.self_attn.o_proj.weight": "pytorch_model-00005-of-00025.bin",
"model.layers.9.self_attn.q_proj.weight": "pytorch_model-00005-of-00025.bin",
"model.layers.9.self_attn.v_proj.weight": "pytorch_model-00005-of-00025.bin",
"model.norm.weight": "pytorch_model-00025-of-00025.bin"
}
}

24
special_tokens_map.json Normal file
View File

@@ -0,0 +1,24 @@
{
"bos_token": {
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"pad_token": "<unk>",
"unk_token": {
"content": "<unk>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
}
}

BIN
tokenizer.model (Stored with Git LFS) Normal file

Binary file not shown.

37
tokenizer_config.json Normal file
View File

@@ -0,0 +1,37 @@
{
"add_bos_token": true,
"add_eos_token": false,
"bos_token": {
"__type": "AddedToken",
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"clean_up_tokenization_spaces": false,
"eos_token": {
"__type": "AddedToken",
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"legacy": null,
"model_max_length": 4096,
"pad_token": null,
"padding_side": "right",
"sp_model_kwargs": {},
"spaces_between_special_tokens": false,
"tokenizer_class": "LlamaTokenizer",
"unk_token": {
"__type": "AddedToken",
"content": "<unk>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"use_default_system_prompt": true
}