初始化项目,由ModelHub XC社区提供模型

Model: IDEA-CCNL/Ziya-LLaMA-13B-Pretrain-v1
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-08 10:39:25 +08:00
commit 30bde184a1
39 changed files with 805 additions and 0 deletions

34
.gitattributes vendored Normal file
View File

@@ -0,0 +1,34 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text

184
README.md Normal file
View File

@@ -0,0 +1,184 @@
---
license: gpl-3.0
language:
- en
- zh
inference: false
---
# Ziya-LLaMA-13B-Pretrain-v1
- Main Page:[Fengshenbang](https://fengshenbang-lm.com/)
- Github: [Fengshenbang-LM](https://github.com/IDEA-CCNL/Fengshenbang-LM)
LLaMA权重的许可证限制我们无法直接发布完整的模型权重用户需要参考[使用说明](#-使用-usage-)进行合并)
# 姜子牙系列模型
- [Ziya-LLaMA-13B-v1](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1)
- [Ziya-LLaMA-7B-Reward](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-7B-Reward)
- [Ziya-LLaMA-13B-Pretrain-v1](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-Pretrain-v1)
- [Ziya-BLIP2-14B-Visual-v1](https://huggingface.co/IDEA-CCNL/Ziya-BLIP2-14B-Visual-v1)
## 简介 Brief Introduction
Ziya-LLaMA-13B-Pretrain-v1 是基于LLaMa的130亿参数大规模预训练模型针对中文分词优化并完成了中英文 110B tokens 的增量预训练,进一步提升了中文生成和理解能力。目前姜子牙通用大模型 [Ziya-LLaMA-13B-v1](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1) 在本模型上,进一步完成了多任务有监督微调和人类反馈学习阶段的训练过程,具备翻译,编程,文本分类,信息抽取,摘要,文案生成,常识问答和数学计算等能力。
**用户须知**:为了遵循 Meta 发布的 LLaMA 模型许可,本模型发布的是训练前后的权重增量,最终模型可方便地通过脚本获得(参考 Usage 中的步骤)。
The Ziya-LLaMA-13B-Pretrain-v1 is a large-scale pre-trained model based on LLaMA with 13 billion parameters. We optimizes LLaMAtokenizer on chinese, and incrementally train 110 billion tokens of data based on LLaMa-13B model, which significantly improved the understanding and generation ability on Chinese. Based on the Ziya-LLaMA-13B-Pretrain-v1, the [Ziya-LLaMA-13B-v1](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1) is furtherly trained with 2 stages: multi-task supervised fine-tuning (SFT), and human feedback learning (RM, PPO). The Ziya-LLaMA-13B-v1 has the ability to perform tasks such as translation, programming, text classification, information extraction, summarization, copywriting, common sense Q&A, and mathematical calculation.
**README**: To follow the License of LLaMA released by Meta, we only release the incremental weights after continual pretraining. The final model Ziya-LLaMA-13B-Pretrain-v1 could be easily got via the script (refer to Usage).
## 模型分类 Model Taxonomy
| 需求 Demand | 任务 Task | 系列 Series | 模型 Model | 参数 Parameter | 额外 Extra |
| :----: | :----: | :----: | :----: | :----: | :----: |
| 通用 General | AGI模型 | 姜子牙 Ziya | LLaMA | 13B | English&Chinese |
## 模型信息 Model Information
### 继续预训练 Continual Pretraining
原始数据包含英文和中文其中英文数据来自openwebtext、Books、Wikipedia和Code中文数据来自清洗后的悟道数据集、自建的中文数据集。在对原始数据进行去重、模型打分、数据分桶、规则过滤、敏感主题过滤和数据评估后最终得到125B tokens的有效数据。
为了解决LLaMA原生分词对中文编解码效率低下的问题我们在LLaMA词表的基础上增加了7k+个常见中文字通过和LLaMA原生的词表去重最终得到一个39410大小的词表并通过复用Transformers里LlamaTokenizer来实现了这一效果。
在增量训练过程中我们使用了160张40GB的A100采用2.6M tokens的训练集样本数量和FP 16的混合精度吞吐量达到118 TFLOP per GPU per second。因此我们能够在8天的时间里在原生的LLaMA-13B模型基础上增量训练110B tokens的数据。据我们所知这也是至今为止LLaMA-13B上最大规模增量训练。
训练期间虽然遇到了机器宕机、底层框架bug、loss spike等各种问题但我们通过快速调整保证了增量训练的稳定性。我们也放出训练过程的loss曲线让大家了解可能出现的问题。
The original data contains both English and Chinese, with English data from openwebtext, Books, Wikipedia, and Code, and Chinese data from the cleaned Wudao dataset and self-built Chinese dataset. After deduplication, model scoring, data bucketing, rule filtering, sensitive topic filtering, and data evaluation, we finally obtained 125 billion tokens of data.
To address the issue of low efficiency in Chinese encoding and decoding caused by the tokenizer of LLaMa, we added 8,000 commonly used Chinese characters to the LLaMa SentencePiece vocabulary. Deduplicating with the original LLaMa vocabulary, we finally obtained a vocabulary of size 39,410. We achieved this by reusing the LlamaTokenizer in Transformers.
During the incremental training process, we used 160 A100s with a total of 40GB memory, using a training dataset with 2.6 million tokens and mixed precision of FP16. The throughput reached 118 TFLOP per GPU per second. As a result, we were able to incrementally train 110 billion tokens of data based on LLaMa-13B model in just 8 days.As far as we know, it is the largest increamental training on LLaMA-13B so far.
Throughout the training process, we encountered various issues such as machine crashes, underlying framework bugs, and loss spikes. However, we ensured the stability of the incremental training by making rapid adjustments. We have also released the loss curve during the training process to help everyone understand the potential issues that may arise.
<img src="https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-Pretrain-v1/resolve/main/loss.png" width=1000 height=600>
### 效果评估 Performance
以下是 Ziya-LLaMA-13B-Pertrain-v1 和继续训练前的LLaMA 模型在英文公开评测 [HeLM](https://crfm.stanford.edu/helm/latest/) 和中文多项选择评测集上的评估效果对比。
Here are comparisons of the Ziya-LLaMA-13B-Pretrain-v1 model and the LLaMA model before continual pre-training, evaluated on the English benchmark (HeLM), and our Chinese multiple-choice evaluation datasets.
<img src="https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-Pretrain-v1/resolve/main/ziya_en_eval.png" width=2542 height=1045>
| Model | Meanwin_rate | MMLU | BoolQ | NarrativeQA | NaturalQuestion(closed-book) | NaturalQuestion(open-book) | QuAC | TruthfulQA | IMDB |
| -------------------------- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- |
| LLaMA-13B | 0.500 | 0.424 | 0.718 | 0.440 | 0.349 | 0.591 | 0.318 | 0.326 | 0.487 |
| Ziya-LLaMA-13B-Pretrain-v1 | 0.650 | 0.433 | 0.753 | 0.445 | 0.348 | 0.528 | 0.335 | 0.249 | 0.497 |
<img src="https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-Pretrain-v1/resolve/main/ziya_zh_eval.png" width=2340 height=1523>
| 模型 | incontext  | c3 | 常识 | 语文 | 数学 | 英语 | 物理 | 化学 | 生物 | 历史 | 政治 | 地理 |
|-------------------------|------------|--------|--------|--------|--------|--------|--------|--------|--------|--------|--------|--------|
| LLaMA-13B | 0-shot | 0.4817 | 0.3088 | 0.2674 | 0.2882 | 0.3399 | 0.2581 | 0.2478 | 0.2271 | 0.3380 | 0.3275 | 0.296 |
| Ziya-LLaMA-13B-Pretrain-v1 | 0-shot | 0.5354 | 0.3373 | 0.2925 | 0.3059 | 0.3428 | 0.2903 | 0.2655 | 0.3215 | 0.4190 | 0.4123 | 0.4425 |
| LLaMA-13B | 5-shot | 0.5314 | 0.3586 | 0.2813 | 0.2912 | 0.4476 | 0.2939 | 0.2301 | 0.2330 | 0.3268 | 0.3187 | 0.3103 |
| Ziya-LLaMA-13B-Pretrain-v1 | 5-shot | 0.6037 | 0.4330 | 0.2802 | 0.2912 | 0.4363 | 0.2975 | 0.2802 | 0.3422 | 0.4358 | 0.4357 | 0.4540 |
<!--
<img src="" width=1000 height=600> -->
## <span id="jump"> 使用 Usage </span>
由于LLaMA权重的许可限制该模型不能用于商业用途请严格遵守LLaMA的使用政策。考虑到LLaMA权重的许可证限制我们无法直接发布完整的模型权重。因此我们使用了[FastChat开源工具](https://github.com/lm-sys/FastChat/blob/main/fastchat/model/apply_delta.py)作为基础并对其进行了进一步的优化。我们计算并发布了Ziya-LLaMA-13B-v1权重与原始LLaMA权重之间的差值。用户可以按照以下步骤操作以获得Ziya-LLaMA-13B-v1完整权重具体步骤如下
Step 1:获取[LLaMA](https://docs.google.com/forms/d/e/1FAIpQLSfqNECQnMkycAp2jP4Z9TFX0cGR4uf7b_fBxjY_OjhJILlKGA/viewform)权重并转成Hugging Face Transformers模型格式可参考转换[脚本](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py)若已经有huggingface权重则跳过
```
python src/transformers/models/llama/convert_llama_weights_to_hf.py \
--input_dir /path/to/downloaded/llama/weights --model_size 13B --output_dir /output/path
```
Step 2:下载Ziya-LLaMA-13B-v1的delta权重以及step 1中转换好的原始LLaMA权重使用如下脚本转换https://github.com/IDEA-CCNL/Fengshenbang-LM/blob/main/fengshen/utils/apply_delta.py.
```
python3 -m apply_delta --base ~/model_weights/llama-13b --target ~/model_weights/Ziya-LLaMA-13B --delta ~/model_weights/Ziya-LLaMA-13B-v1
```
Step 3: 加载step 2得到的模型推理
```python3
from transformers import AutoTokenizer
from transformers import LlamaForCausalLM
import torch
device = torch.device("cuda")
query="帮我写一份去西安的旅游计划"
model = LlamaForCausalLM.from_pretrained(ckpt, torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(ckpt)
inputs = query.strip()
input_ids = tokenizer(inputs, return_tensors="pt").input_ids.to(device)
generate_ids = model.generate(
input_ids,
max_new_tokens=1024,
do_sample = True,
top_p = 0.85,
temperature = 1.0,
repetition_penalty=1.,
eos_token_id=2,
bos_token_id=1,
pad_token_id=0)
output = tokenizer.batch_decode(generate_ids)[0]
print(output)
```
Step 1: Obtain the [LLaMA](https://huggingface.co/docs/transformers/main/en/model_doc/llama#overview) weights and convert them into the Hugging Face Transformers format. You can refer to the [script](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py) (skip this step if you already have the Hugging Face weights).
```
python src/transformers/models/llama/convert_llama_weights_to_hf.py \
--input_dir /path/to/downloaded/llama/weights --model_size 13B --output_dir /output/path
```
Step 2: Download the delta weights for Ziya-LLaMA-13B-v1 and the pre-converted original LLaMA weights from step 1. Use the following script for conversion: https://github.com/IDEA-CCNL/Fengshenbang-LM/blob/main/fengshen/utils/apply_delta.py
```
python3 -m apply_delta --base ~/model_weights/llama-13b --target ~/model_weights/Ziya-LLaMA-13B --delta ~/model_weights/Ziya-LLaMA-13B-v1(huggingface下载)
```
Step 3: Load the model obtained in Step 2 for inference.
## 微调示例 Finetune Example
Refer to [ziya_finetune](https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/main/fengshen/examples/ziya_llama)
## 推理量化示例 Inference & Quantization Example
Refer to [ziya_inference](https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/main/fengshen/examples/ziya_inference)
## 引用 Citation
如果您在您的工作中使用了我们的模型,可以引用我们的[论文](https://arxiv.org/abs/2210.08590)
If you are using the resource for your work, please cite the our [paper](https://arxiv.org/abs/2210.08590):
```text
@article{fengshenbang,
author = {Jiaxing Zhang and Ruyi Gan and Junjie Wang and Yuxiang Zhang and Lin Zhang and Ping Yang and Xinyu Gao and Ziwei Wu and Xiaoqun Dong and Junqing He and Jianheng Zhuo and Qi Yang and Yongfeng Huang and Xiayu Li and Yanghan Wu and Junyu Lu and Xinyu Zhu and Weifeng Chen and Ting Han and Kunhao Pan and Rui Wang and Hao Wang and Xiaojun Wu and Zhongshen Zeng and Chongpei Chen},
title = {Fengshenbang 1.0: Being the Foundation of Chinese Cognitive Intelligence},
journal = {CoRR},
volume = {abs/2209.02970},
year = {2022}
}
```
You can also cite our [website](https://github.com/IDEA-CCNL/Fengshenbang-LM/):
欢迎引用我们的[网站](https://github.com/IDEA-CCNL/Fengshenbang-LM/):
```text
@misc{Fengshenbang-LM,
title={Fengshenbang-LM},
author={IDEA-CCNL},
year={2021},
howpublished={\url{https://github.com/IDEA-CCNL/Fengshenbang-LM}},
}
```

27
config.json Normal file
View File

@@ -0,0 +1,27 @@
{
"_name_or_path": "../Ziya-LLaMA-13B-Pretrain-v1-final",
"architectures": [
"LlamaForCausalLM"
],
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 5120,
"initializer_range": 0.02,
"intermediate_size": 13824,
"layer_norm_eps": 1e-05,
"max_position_embeddings": 2048,
"model_type": "llama",
"num_attention_heads": 40,
"num_hidden_layers": 40,
"pad_token_id": 0,
"rms_norm_eps": 1e-06,
"rotary_emb_base": 10000,
"rotary_pct": 1,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.28.0",
"use_cache": true,
"use_parallel_residual": false,
"vocab_size": 39424
}

7
generation_config.json Normal file
View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 1,
"eos_token_id": 2,
"pad_token_id": 0,
"transformers_version": "4.28.0"
}

BIN
loss.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 57 KiB

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:886ac36238faddc741693d9354dd6f6f784e8647da2ed786664cc121cf49a85e
size 896534991

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ccbf9305bf2b58b64a3afe65a817d59c1539cea7c8e8ff1f91436aae3d4b4402
size 985707823

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:fd0fe8baa77de3e20736a2deae6b6ef10389fa4b9cb5796f6e051afdd7c0306d
size 917528001

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:24bdb18ea5b3777ae6b8218d12317a71619124190d1eb4e7a278285d45058a89
size 985707823

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ad6833da811d00c1be079c42fdbe3d4d0724fe69a06ba0dc99ca7e4d7af69903
size 917528001

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:070ad6557634d6a56231d19fbe1c025e854429cddafc3387bbd0aede3ebd2490
size 985707823

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a297ca5a24e9b95ba7000c3f87e55bd3007b2a91e786b7bac39c40234f332bc7
size 917528001

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0b2c0fa823482f88f163bedd633da8726228aa2d62801d87d7ea0e74d0e6ab74
size 985707823

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:3f37a26f74476d33302e37c3d97a0b5b1b75ef4c1f0a7ee24090cae884dfe7d7
size 917528001

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:05230862d220261be51f908d5f6d928f1634ff74faa1f21a8dee35f6c0c22803
size 985707823

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b76e62c3f08f01b4b485dd5ede76121c142f9419b04b0a606219d078d19cb0eb
size 917528001

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:145f7053cffe5ef626c31824274698ebe1059962267c6dc4cf3694ae7fafe112
size 985707823

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2ecf6e7c65cb76f0d66c99dd7ed75dc1d09a7d2b3c07b9698ac3e5dbf149f325
size 917528001

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:175a2aefa4ab28481bdda4c869ef4fe49002255f200cc63bd8869c8521e5056b
size 985707823

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:47bc8dab76b8e70a6b7f668b3a21b7a7b401944f3914cba50d0171c93571275a
size 917528001

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6cccf711cf507bc936532a55b00874e7ac75225e82d5d662e7fc6a2092d59544
size 985707823

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ff7340c0491825e561e0e85daf2f15fa5953aa70caf08339f9f1407ee2974956
size 917528001

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:86a2bdffe490a5bc29c70b63312ce0676721d411fc201a8d9d0c7cd371a11057
size 985707823

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b407a44a9a874ff4b9cec43ba062cafc95ac41ab8eb7893f8e4409d7742e57f9
size 917528001

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b8ba34b37a059e3f23d9d1845494783b6d0ee0a08cd102086ea6c6f5da042dab
size 985707823

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d060f6cd0e722c658b0716c190d3f894565b1a1cc5f901aa0cd6421c47191e5e
size 917528001

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c6f59541374debbc4ee125ed5c36de4366f561e284bf6588e3888b34a5ce578d
size 985707823

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7e38ac921950194678c350312b7b64d778d0aa28ab6e90121070ead6e6943526
size 917528001

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9737a6237e5d581a740e1e04b29379e588442573175945f58cb329b57689a1d7
size 985707823

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1b8841d5ec49b165071c4efe65e3f5a3a38101f4bc6e3ebffc745537af7161ce
size 917528001

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9e82e9563a92092957fa9eec87e28498fe805d2ecb3e8f4652f8b5373c41bdc3
size 985707823

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:251827e686292651513f74db403cf39ef237a28ae49bbf120c44cb81933ff97e
size 917528001

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:88348f807061ae3bf8fd7e09ea91c45d7d70622db434383d9759b85e342244c0
size 545291867

View File

@@ -0,0 +1,410 @@
{
"metadata": {
"total_size": 26183777280
},
"weight_map": {
"lm_head.weight": "pytorch_model-00028-of-00028.bin",
"model.embed_tokens.weight": "pytorch_model-00001-of-00028.bin",
"model.layers.0.input_layernorm.weight": "pytorch_model-00002-of-00028.bin",
"model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00028.bin",
"model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00028.bin",
"model.layers.0.mlp.up_proj.weight": "pytorch_model-00002-of-00028.bin",
"model.layers.0.post_attention_layernorm.weight": "pytorch_model-00002-of-00028.bin",
"model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00028.bin",
"model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00028.bin",
"model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00028.bin",
"model.layers.0.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00028.bin",
"model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00028.bin",
"model.layers.1.input_layernorm.weight": "pytorch_model-00002-of-00028.bin",
"model.layers.1.mlp.down_proj.weight": "pytorch_model-00002-of-00028.bin",
"model.layers.1.mlp.gate_proj.weight": "pytorch_model-00002-of-00028.bin",
"model.layers.1.mlp.up_proj.weight": "pytorch_model-00002-of-00028.bin",
"model.layers.1.post_attention_layernorm.weight": "pytorch_model-00002-of-00028.bin",
"model.layers.1.self_attn.k_proj.weight": "pytorch_model-00002-of-00028.bin",
"model.layers.1.self_attn.o_proj.weight": "pytorch_model-00002-of-00028.bin",
"model.layers.1.self_attn.q_proj.weight": "pytorch_model-00002-of-00028.bin",
"model.layers.1.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00028.bin",
"model.layers.1.self_attn.v_proj.weight": "pytorch_model-00002-of-00028.bin",
"model.layers.10.input_layernorm.weight": "pytorch_model-00008-of-00028.bin",
"model.layers.10.mlp.down_proj.weight": "pytorch_model-00008-of-00028.bin",
"model.layers.10.mlp.gate_proj.weight": "pytorch_model-00008-of-00028.bin",
"model.layers.10.mlp.up_proj.weight": "pytorch_model-00008-of-00028.bin",
"model.layers.10.post_attention_layernorm.weight": "pytorch_model-00008-of-00028.bin",
"model.layers.10.self_attn.k_proj.weight": "pytorch_model-00008-of-00028.bin",
"model.layers.10.self_attn.o_proj.weight": "pytorch_model-00008-of-00028.bin",
"model.layers.10.self_attn.q_proj.weight": "pytorch_model-00008-of-00028.bin",
"model.layers.10.self_attn.rotary_emb.inv_freq": "pytorch_model-00008-of-00028.bin",
"model.layers.10.self_attn.v_proj.weight": "pytorch_model-00008-of-00028.bin",
"model.layers.11.input_layernorm.weight": "pytorch_model-00009-of-00028.bin",
"model.layers.11.mlp.down_proj.weight": "pytorch_model-00009-of-00028.bin",
"model.layers.11.mlp.gate_proj.weight": "pytorch_model-00009-of-00028.bin",
"model.layers.11.mlp.up_proj.weight": "pytorch_model-00009-of-00028.bin",
"model.layers.11.post_attention_layernorm.weight": "pytorch_model-00009-of-00028.bin",
"model.layers.11.self_attn.k_proj.weight": "pytorch_model-00008-of-00028.bin",
"model.layers.11.self_attn.o_proj.weight": "pytorch_model-00008-of-00028.bin",
"model.layers.11.self_attn.q_proj.weight": "pytorch_model-00008-of-00028.bin",
"model.layers.11.self_attn.rotary_emb.inv_freq": "pytorch_model-00008-of-00028.bin",
"model.layers.11.self_attn.v_proj.weight": "pytorch_model-00008-of-00028.bin",
"model.layers.12.input_layernorm.weight": "pytorch_model-00010-of-00028.bin",
"model.layers.12.mlp.down_proj.weight": "pytorch_model-00009-of-00028.bin",
"model.layers.12.mlp.gate_proj.weight": "pytorch_model-00009-of-00028.bin",
"model.layers.12.mlp.up_proj.weight": "pytorch_model-00010-of-00028.bin",
"model.layers.12.post_attention_layernorm.weight": "pytorch_model-00010-of-00028.bin",
"model.layers.12.self_attn.k_proj.weight": "pytorch_model-00009-of-00028.bin",
"model.layers.12.self_attn.o_proj.weight": "pytorch_model-00009-of-00028.bin",
"model.layers.12.self_attn.q_proj.weight": "pytorch_model-00009-of-00028.bin",
"model.layers.12.self_attn.rotary_emb.inv_freq": "pytorch_model-00009-of-00028.bin",
"model.layers.12.self_attn.v_proj.weight": "pytorch_model-00009-of-00028.bin",
"model.layers.13.input_layernorm.weight": "pytorch_model-00010-of-00028.bin",
"model.layers.13.mlp.down_proj.weight": "pytorch_model-00010-of-00028.bin",
"model.layers.13.mlp.gate_proj.weight": "pytorch_model-00010-of-00028.bin",
"model.layers.13.mlp.up_proj.weight": "pytorch_model-00010-of-00028.bin",
"model.layers.13.post_attention_layernorm.weight": "pytorch_model-00010-of-00028.bin",
"model.layers.13.self_attn.k_proj.weight": "pytorch_model-00010-of-00028.bin",
"model.layers.13.self_attn.o_proj.weight": "pytorch_model-00010-of-00028.bin",
"model.layers.13.self_attn.q_proj.weight": "pytorch_model-00010-of-00028.bin",
"model.layers.13.self_attn.rotary_emb.inv_freq": "pytorch_model-00010-of-00028.bin",
"model.layers.13.self_attn.v_proj.weight": "pytorch_model-00010-of-00028.bin",
"model.layers.14.input_layernorm.weight": "pytorch_model-00011-of-00028.bin",
"model.layers.14.mlp.down_proj.weight": "pytorch_model-00011-of-00028.bin",
"model.layers.14.mlp.gate_proj.weight": "pytorch_model-00011-of-00028.bin",
"model.layers.14.mlp.up_proj.weight": "pytorch_model-00011-of-00028.bin",
"model.layers.14.post_attention_layernorm.weight": "pytorch_model-00011-of-00028.bin",
"model.layers.14.self_attn.k_proj.weight": "pytorch_model-00010-of-00028.bin",
"model.layers.14.self_attn.o_proj.weight": "pytorch_model-00010-of-00028.bin",
"model.layers.14.self_attn.q_proj.weight": "pytorch_model-00010-of-00028.bin",
"model.layers.14.self_attn.rotary_emb.inv_freq": "pytorch_model-00010-of-00028.bin",
"model.layers.14.self_attn.v_proj.weight": "pytorch_model-00010-of-00028.bin",
"model.layers.15.input_layernorm.weight": "pytorch_model-00012-of-00028.bin",
"model.layers.15.mlp.down_proj.weight": "pytorch_model-00011-of-00028.bin",
"model.layers.15.mlp.gate_proj.weight": "pytorch_model-00011-of-00028.bin",
"model.layers.15.mlp.up_proj.weight": "pytorch_model-00012-of-00028.bin",
"model.layers.15.post_attention_layernorm.weight": "pytorch_model-00012-of-00028.bin",
"model.layers.15.self_attn.k_proj.weight": "pytorch_model-00011-of-00028.bin",
"model.layers.15.self_attn.o_proj.weight": "pytorch_model-00011-of-00028.bin",
"model.layers.15.self_attn.q_proj.weight": "pytorch_model-00011-of-00028.bin",
"model.layers.15.self_attn.rotary_emb.inv_freq": "pytorch_model-00011-of-00028.bin",
"model.layers.15.self_attn.v_proj.weight": "pytorch_model-00011-of-00028.bin",
"model.layers.16.input_layernorm.weight": "pytorch_model-00012-of-00028.bin",
"model.layers.16.mlp.down_proj.weight": "pytorch_model-00012-of-00028.bin",
"model.layers.16.mlp.gate_proj.weight": "pytorch_model-00012-of-00028.bin",
"model.layers.16.mlp.up_proj.weight": "pytorch_model-00012-of-00028.bin",
"model.layers.16.post_attention_layernorm.weight": "pytorch_model-00012-of-00028.bin",
"model.layers.16.self_attn.k_proj.weight": "pytorch_model-00012-of-00028.bin",
"model.layers.16.self_attn.o_proj.weight": "pytorch_model-00012-of-00028.bin",
"model.layers.16.self_attn.q_proj.weight": "pytorch_model-00012-of-00028.bin",
"model.layers.16.self_attn.rotary_emb.inv_freq": "pytorch_model-00012-of-00028.bin",
"model.layers.16.self_attn.v_proj.weight": "pytorch_model-00012-of-00028.bin",
"model.layers.17.input_layernorm.weight": "pytorch_model-00013-of-00028.bin",
"model.layers.17.mlp.down_proj.weight": "pytorch_model-00013-of-00028.bin",
"model.layers.17.mlp.gate_proj.weight": "pytorch_model-00013-of-00028.bin",
"model.layers.17.mlp.up_proj.weight": "pytorch_model-00013-of-00028.bin",
"model.layers.17.post_attention_layernorm.weight": "pytorch_model-00013-of-00028.bin",
"model.layers.17.self_attn.k_proj.weight": "pytorch_model-00012-of-00028.bin",
"model.layers.17.self_attn.o_proj.weight": "pytorch_model-00012-of-00028.bin",
"model.layers.17.self_attn.q_proj.weight": "pytorch_model-00012-of-00028.bin",
"model.layers.17.self_attn.rotary_emb.inv_freq": "pytorch_model-00012-of-00028.bin",
"model.layers.17.self_attn.v_proj.weight": "pytorch_model-00012-of-00028.bin",
"model.layers.18.input_layernorm.weight": "pytorch_model-00014-of-00028.bin",
"model.layers.18.mlp.down_proj.weight": "pytorch_model-00013-of-00028.bin",
"model.layers.18.mlp.gate_proj.weight": "pytorch_model-00013-of-00028.bin",
"model.layers.18.mlp.up_proj.weight": "pytorch_model-00014-of-00028.bin",
"model.layers.18.post_attention_layernorm.weight": "pytorch_model-00014-of-00028.bin",
"model.layers.18.self_attn.k_proj.weight": "pytorch_model-00013-of-00028.bin",
"model.layers.18.self_attn.o_proj.weight": "pytorch_model-00013-of-00028.bin",
"model.layers.18.self_attn.q_proj.weight": "pytorch_model-00013-of-00028.bin",
"model.layers.18.self_attn.rotary_emb.inv_freq": "pytorch_model-00013-of-00028.bin",
"model.layers.18.self_attn.v_proj.weight": "pytorch_model-00013-of-00028.bin",
"model.layers.19.input_layernorm.weight": "pytorch_model-00014-of-00028.bin",
"model.layers.19.mlp.down_proj.weight": "pytorch_model-00014-of-00028.bin",
"model.layers.19.mlp.gate_proj.weight": "pytorch_model-00014-of-00028.bin",
"model.layers.19.mlp.up_proj.weight": "pytorch_model-00014-of-00028.bin",
"model.layers.19.post_attention_layernorm.weight": "pytorch_model-00014-of-00028.bin",
"model.layers.19.self_attn.k_proj.weight": "pytorch_model-00014-of-00028.bin",
"model.layers.19.self_attn.o_proj.weight": "pytorch_model-00014-of-00028.bin",
"model.layers.19.self_attn.q_proj.weight": "pytorch_model-00014-of-00028.bin",
"model.layers.19.self_attn.rotary_emb.inv_freq": "pytorch_model-00014-of-00028.bin",
"model.layers.19.self_attn.v_proj.weight": "pytorch_model-00014-of-00028.bin",
"model.layers.2.input_layernorm.weight": "pytorch_model-00003-of-00028.bin",
"model.layers.2.mlp.down_proj.weight": "pytorch_model-00003-of-00028.bin",
"model.layers.2.mlp.gate_proj.weight": "pytorch_model-00003-of-00028.bin",
"model.layers.2.mlp.up_proj.weight": "pytorch_model-00003-of-00028.bin",
"model.layers.2.post_attention_layernorm.weight": "pytorch_model-00003-of-00028.bin",
"model.layers.2.self_attn.k_proj.weight": "pytorch_model-00002-of-00028.bin",
"model.layers.2.self_attn.o_proj.weight": "pytorch_model-00002-of-00028.bin",
"model.layers.2.self_attn.q_proj.weight": "pytorch_model-00002-of-00028.bin",
"model.layers.2.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00028.bin",
"model.layers.2.self_attn.v_proj.weight": "pytorch_model-00002-of-00028.bin",
"model.layers.20.input_layernorm.weight": "pytorch_model-00015-of-00028.bin",
"model.layers.20.mlp.down_proj.weight": "pytorch_model-00015-of-00028.bin",
"model.layers.20.mlp.gate_proj.weight": "pytorch_model-00015-of-00028.bin",
"model.layers.20.mlp.up_proj.weight": "pytorch_model-00015-of-00028.bin",
"model.layers.20.post_attention_layernorm.weight": "pytorch_model-00015-of-00028.bin",
"model.layers.20.self_attn.k_proj.weight": "pytorch_model-00014-of-00028.bin",
"model.layers.20.self_attn.o_proj.weight": "pytorch_model-00014-of-00028.bin",
"model.layers.20.self_attn.q_proj.weight": "pytorch_model-00014-of-00028.bin",
"model.layers.20.self_attn.rotary_emb.inv_freq": "pytorch_model-00014-of-00028.bin",
"model.layers.20.self_attn.v_proj.weight": "pytorch_model-00014-of-00028.bin",
"model.layers.21.input_layernorm.weight": "pytorch_model-00016-of-00028.bin",
"model.layers.21.mlp.down_proj.weight": "pytorch_model-00015-of-00028.bin",
"model.layers.21.mlp.gate_proj.weight": "pytorch_model-00015-of-00028.bin",
"model.layers.21.mlp.up_proj.weight": "pytorch_model-00016-of-00028.bin",
"model.layers.21.post_attention_layernorm.weight": "pytorch_model-00016-of-00028.bin",
"model.layers.21.self_attn.k_proj.weight": "pytorch_model-00015-of-00028.bin",
"model.layers.21.self_attn.o_proj.weight": "pytorch_model-00015-of-00028.bin",
"model.layers.21.self_attn.q_proj.weight": "pytorch_model-00015-of-00028.bin",
"model.layers.21.self_attn.rotary_emb.inv_freq": "pytorch_model-00015-of-00028.bin",
"model.layers.21.self_attn.v_proj.weight": "pytorch_model-00015-of-00028.bin",
"model.layers.22.input_layernorm.weight": "pytorch_model-00016-of-00028.bin",
"model.layers.22.mlp.down_proj.weight": "pytorch_model-00016-of-00028.bin",
"model.layers.22.mlp.gate_proj.weight": "pytorch_model-00016-of-00028.bin",
"model.layers.22.mlp.up_proj.weight": "pytorch_model-00016-of-00028.bin",
"model.layers.22.post_attention_layernorm.weight": "pytorch_model-00016-of-00028.bin",
"model.layers.22.self_attn.k_proj.weight": "pytorch_model-00016-of-00028.bin",
"model.layers.22.self_attn.o_proj.weight": "pytorch_model-00016-of-00028.bin",
"model.layers.22.self_attn.q_proj.weight": "pytorch_model-00016-of-00028.bin",
"model.layers.22.self_attn.rotary_emb.inv_freq": "pytorch_model-00016-of-00028.bin",
"model.layers.22.self_attn.v_proj.weight": "pytorch_model-00016-of-00028.bin",
"model.layers.23.input_layernorm.weight": "pytorch_model-00017-of-00028.bin",
"model.layers.23.mlp.down_proj.weight": "pytorch_model-00017-of-00028.bin",
"model.layers.23.mlp.gate_proj.weight": "pytorch_model-00017-of-00028.bin",
"model.layers.23.mlp.up_proj.weight": "pytorch_model-00017-of-00028.bin",
"model.layers.23.post_attention_layernorm.weight": "pytorch_model-00017-of-00028.bin",
"model.layers.23.self_attn.k_proj.weight": "pytorch_model-00016-of-00028.bin",
"model.layers.23.self_attn.o_proj.weight": "pytorch_model-00016-of-00028.bin",
"model.layers.23.self_attn.q_proj.weight": "pytorch_model-00016-of-00028.bin",
"model.layers.23.self_attn.rotary_emb.inv_freq": "pytorch_model-00016-of-00028.bin",
"model.layers.23.self_attn.v_proj.weight": "pytorch_model-00016-of-00028.bin",
"model.layers.24.input_layernorm.weight": "pytorch_model-00018-of-00028.bin",
"model.layers.24.mlp.down_proj.weight": "pytorch_model-00017-of-00028.bin",
"model.layers.24.mlp.gate_proj.weight": "pytorch_model-00017-of-00028.bin",
"model.layers.24.mlp.up_proj.weight": "pytorch_model-00018-of-00028.bin",
"model.layers.24.post_attention_layernorm.weight": "pytorch_model-00018-of-00028.bin",
"model.layers.24.self_attn.k_proj.weight": "pytorch_model-00017-of-00028.bin",
"model.layers.24.self_attn.o_proj.weight": "pytorch_model-00017-of-00028.bin",
"model.layers.24.self_attn.q_proj.weight": "pytorch_model-00017-of-00028.bin",
"model.layers.24.self_attn.rotary_emb.inv_freq": "pytorch_model-00017-of-00028.bin",
"model.layers.24.self_attn.v_proj.weight": "pytorch_model-00017-of-00028.bin",
"model.layers.25.input_layernorm.weight": "pytorch_model-00018-of-00028.bin",
"model.layers.25.mlp.down_proj.weight": "pytorch_model-00018-of-00028.bin",
"model.layers.25.mlp.gate_proj.weight": "pytorch_model-00018-of-00028.bin",
"model.layers.25.mlp.up_proj.weight": "pytorch_model-00018-of-00028.bin",
"model.layers.25.post_attention_layernorm.weight": "pytorch_model-00018-of-00028.bin",
"model.layers.25.self_attn.k_proj.weight": "pytorch_model-00018-of-00028.bin",
"model.layers.25.self_attn.o_proj.weight": "pytorch_model-00018-of-00028.bin",
"model.layers.25.self_attn.q_proj.weight": "pytorch_model-00018-of-00028.bin",
"model.layers.25.self_attn.rotary_emb.inv_freq": "pytorch_model-00018-of-00028.bin",
"model.layers.25.self_attn.v_proj.weight": "pytorch_model-00018-of-00028.bin",
"model.layers.26.input_layernorm.weight": "pytorch_model-00019-of-00028.bin",
"model.layers.26.mlp.down_proj.weight": "pytorch_model-00019-of-00028.bin",
"model.layers.26.mlp.gate_proj.weight": "pytorch_model-00019-of-00028.bin",
"model.layers.26.mlp.up_proj.weight": "pytorch_model-00019-of-00028.bin",
"model.layers.26.post_attention_layernorm.weight": "pytorch_model-00019-of-00028.bin",
"model.layers.26.self_attn.k_proj.weight": "pytorch_model-00018-of-00028.bin",
"model.layers.26.self_attn.o_proj.weight": "pytorch_model-00018-of-00028.bin",
"model.layers.26.self_attn.q_proj.weight": "pytorch_model-00018-of-00028.bin",
"model.layers.26.self_attn.rotary_emb.inv_freq": "pytorch_model-00018-of-00028.bin",
"model.layers.26.self_attn.v_proj.weight": "pytorch_model-00018-of-00028.bin",
"model.layers.27.input_layernorm.weight": "pytorch_model-00020-of-00028.bin",
"model.layers.27.mlp.down_proj.weight": "pytorch_model-00019-of-00028.bin",
"model.layers.27.mlp.gate_proj.weight": "pytorch_model-00019-of-00028.bin",
"model.layers.27.mlp.up_proj.weight": "pytorch_model-00020-of-00028.bin",
"model.layers.27.post_attention_layernorm.weight": "pytorch_model-00020-of-00028.bin",
"model.layers.27.self_attn.k_proj.weight": "pytorch_model-00019-of-00028.bin",
"model.layers.27.self_attn.o_proj.weight": "pytorch_model-00019-of-00028.bin",
"model.layers.27.self_attn.q_proj.weight": "pytorch_model-00019-of-00028.bin",
"model.layers.27.self_attn.rotary_emb.inv_freq": "pytorch_model-00019-of-00028.bin",
"model.layers.27.self_attn.v_proj.weight": "pytorch_model-00019-of-00028.bin",
"model.layers.28.input_layernorm.weight": "pytorch_model-00020-of-00028.bin",
"model.layers.28.mlp.down_proj.weight": "pytorch_model-00020-of-00028.bin",
"model.layers.28.mlp.gate_proj.weight": "pytorch_model-00020-of-00028.bin",
"model.layers.28.mlp.up_proj.weight": "pytorch_model-00020-of-00028.bin",
"model.layers.28.post_attention_layernorm.weight": "pytorch_model-00020-of-00028.bin",
"model.layers.28.self_attn.k_proj.weight": "pytorch_model-00020-of-00028.bin",
"model.layers.28.self_attn.o_proj.weight": "pytorch_model-00020-of-00028.bin",
"model.layers.28.self_attn.q_proj.weight": "pytorch_model-00020-of-00028.bin",
"model.layers.28.self_attn.rotary_emb.inv_freq": "pytorch_model-00020-of-00028.bin",
"model.layers.28.self_attn.v_proj.weight": "pytorch_model-00020-of-00028.bin",
"model.layers.29.input_layernorm.weight": "pytorch_model-00021-of-00028.bin",
"model.layers.29.mlp.down_proj.weight": "pytorch_model-00021-of-00028.bin",
"model.layers.29.mlp.gate_proj.weight": "pytorch_model-00021-of-00028.bin",
"model.layers.29.mlp.up_proj.weight": "pytorch_model-00021-of-00028.bin",
"model.layers.29.post_attention_layernorm.weight": "pytorch_model-00021-of-00028.bin",
"model.layers.29.self_attn.k_proj.weight": "pytorch_model-00020-of-00028.bin",
"model.layers.29.self_attn.o_proj.weight": "pytorch_model-00020-of-00028.bin",
"model.layers.29.self_attn.q_proj.weight": "pytorch_model-00020-of-00028.bin",
"model.layers.29.self_attn.rotary_emb.inv_freq": "pytorch_model-00020-of-00028.bin",
"model.layers.29.self_attn.v_proj.weight": "pytorch_model-00020-of-00028.bin",
"model.layers.3.input_layernorm.weight": "pytorch_model-00004-of-00028.bin",
"model.layers.3.mlp.down_proj.weight": "pytorch_model-00003-of-00028.bin",
"model.layers.3.mlp.gate_proj.weight": "pytorch_model-00003-of-00028.bin",
"model.layers.3.mlp.up_proj.weight": "pytorch_model-00004-of-00028.bin",
"model.layers.3.post_attention_layernorm.weight": "pytorch_model-00004-of-00028.bin",
"model.layers.3.self_attn.k_proj.weight": "pytorch_model-00003-of-00028.bin",
"model.layers.3.self_attn.o_proj.weight": "pytorch_model-00003-of-00028.bin",
"model.layers.3.self_attn.q_proj.weight": "pytorch_model-00003-of-00028.bin",
"model.layers.3.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00028.bin",
"model.layers.3.self_attn.v_proj.weight": "pytorch_model-00003-of-00028.bin",
"model.layers.30.input_layernorm.weight": "pytorch_model-00022-of-00028.bin",
"model.layers.30.mlp.down_proj.weight": "pytorch_model-00021-of-00028.bin",
"model.layers.30.mlp.gate_proj.weight": "pytorch_model-00021-of-00028.bin",
"model.layers.30.mlp.up_proj.weight": "pytorch_model-00022-of-00028.bin",
"model.layers.30.post_attention_layernorm.weight": "pytorch_model-00022-of-00028.bin",
"model.layers.30.self_attn.k_proj.weight": "pytorch_model-00021-of-00028.bin",
"model.layers.30.self_attn.o_proj.weight": "pytorch_model-00021-of-00028.bin",
"model.layers.30.self_attn.q_proj.weight": "pytorch_model-00021-of-00028.bin",
"model.layers.30.self_attn.rotary_emb.inv_freq": "pytorch_model-00021-of-00028.bin",
"model.layers.30.self_attn.v_proj.weight": "pytorch_model-00021-of-00028.bin",
"model.layers.31.input_layernorm.weight": "pytorch_model-00022-of-00028.bin",
"model.layers.31.mlp.down_proj.weight": "pytorch_model-00022-of-00028.bin",
"model.layers.31.mlp.gate_proj.weight": "pytorch_model-00022-of-00028.bin",
"model.layers.31.mlp.up_proj.weight": "pytorch_model-00022-of-00028.bin",
"model.layers.31.post_attention_layernorm.weight": "pytorch_model-00022-of-00028.bin",
"model.layers.31.self_attn.k_proj.weight": "pytorch_model-00022-of-00028.bin",
"model.layers.31.self_attn.o_proj.weight": "pytorch_model-00022-of-00028.bin",
"model.layers.31.self_attn.q_proj.weight": "pytorch_model-00022-of-00028.bin",
"model.layers.31.self_attn.rotary_emb.inv_freq": "pytorch_model-00022-of-00028.bin",
"model.layers.31.self_attn.v_proj.weight": "pytorch_model-00022-of-00028.bin",
"model.layers.32.input_layernorm.weight": "pytorch_model-00023-of-00028.bin",
"model.layers.32.mlp.down_proj.weight": "pytorch_model-00023-of-00028.bin",
"model.layers.32.mlp.gate_proj.weight": "pytorch_model-00023-of-00028.bin",
"model.layers.32.mlp.up_proj.weight": "pytorch_model-00023-of-00028.bin",
"model.layers.32.post_attention_layernorm.weight": "pytorch_model-00023-of-00028.bin",
"model.layers.32.self_attn.k_proj.weight": "pytorch_model-00022-of-00028.bin",
"model.layers.32.self_attn.o_proj.weight": "pytorch_model-00022-of-00028.bin",
"model.layers.32.self_attn.q_proj.weight": "pytorch_model-00022-of-00028.bin",
"model.layers.32.self_attn.rotary_emb.inv_freq": "pytorch_model-00022-of-00028.bin",
"model.layers.32.self_attn.v_proj.weight": "pytorch_model-00022-of-00028.bin",
"model.layers.33.input_layernorm.weight": "pytorch_model-00024-of-00028.bin",
"model.layers.33.mlp.down_proj.weight": "pytorch_model-00023-of-00028.bin",
"model.layers.33.mlp.gate_proj.weight": "pytorch_model-00023-of-00028.bin",
"model.layers.33.mlp.up_proj.weight": "pytorch_model-00024-of-00028.bin",
"model.layers.33.post_attention_layernorm.weight": "pytorch_model-00024-of-00028.bin",
"model.layers.33.self_attn.k_proj.weight": "pytorch_model-00023-of-00028.bin",
"model.layers.33.self_attn.o_proj.weight": "pytorch_model-00023-of-00028.bin",
"model.layers.33.self_attn.q_proj.weight": "pytorch_model-00023-of-00028.bin",
"model.layers.33.self_attn.rotary_emb.inv_freq": "pytorch_model-00023-of-00028.bin",
"model.layers.33.self_attn.v_proj.weight": "pytorch_model-00023-of-00028.bin",
"model.layers.34.input_layernorm.weight": "pytorch_model-00024-of-00028.bin",
"model.layers.34.mlp.down_proj.weight": "pytorch_model-00024-of-00028.bin",
"model.layers.34.mlp.gate_proj.weight": "pytorch_model-00024-of-00028.bin",
"model.layers.34.mlp.up_proj.weight": "pytorch_model-00024-of-00028.bin",
"model.layers.34.post_attention_layernorm.weight": "pytorch_model-00024-of-00028.bin",
"model.layers.34.self_attn.k_proj.weight": "pytorch_model-00024-of-00028.bin",
"model.layers.34.self_attn.o_proj.weight": "pytorch_model-00024-of-00028.bin",
"model.layers.34.self_attn.q_proj.weight": "pytorch_model-00024-of-00028.bin",
"model.layers.34.self_attn.rotary_emb.inv_freq": "pytorch_model-00024-of-00028.bin",
"model.layers.34.self_attn.v_proj.weight": "pytorch_model-00024-of-00028.bin",
"model.layers.35.input_layernorm.weight": "pytorch_model-00025-of-00028.bin",
"model.layers.35.mlp.down_proj.weight": "pytorch_model-00025-of-00028.bin",
"model.layers.35.mlp.gate_proj.weight": "pytorch_model-00025-of-00028.bin",
"model.layers.35.mlp.up_proj.weight": "pytorch_model-00025-of-00028.bin",
"model.layers.35.post_attention_layernorm.weight": "pytorch_model-00025-of-00028.bin",
"model.layers.35.self_attn.k_proj.weight": "pytorch_model-00024-of-00028.bin",
"model.layers.35.self_attn.o_proj.weight": "pytorch_model-00024-of-00028.bin",
"model.layers.35.self_attn.q_proj.weight": "pytorch_model-00024-of-00028.bin",
"model.layers.35.self_attn.rotary_emb.inv_freq": "pytorch_model-00024-of-00028.bin",
"model.layers.35.self_attn.v_proj.weight": "pytorch_model-00024-of-00028.bin",
"model.layers.36.input_layernorm.weight": "pytorch_model-00026-of-00028.bin",
"model.layers.36.mlp.down_proj.weight": "pytorch_model-00025-of-00028.bin",
"model.layers.36.mlp.gate_proj.weight": "pytorch_model-00025-of-00028.bin",
"model.layers.36.mlp.up_proj.weight": "pytorch_model-00026-of-00028.bin",
"model.layers.36.post_attention_layernorm.weight": "pytorch_model-00026-of-00028.bin",
"model.layers.36.self_attn.k_proj.weight": "pytorch_model-00025-of-00028.bin",
"model.layers.36.self_attn.o_proj.weight": "pytorch_model-00025-of-00028.bin",
"model.layers.36.self_attn.q_proj.weight": "pytorch_model-00025-of-00028.bin",
"model.layers.36.self_attn.rotary_emb.inv_freq": "pytorch_model-00025-of-00028.bin",
"model.layers.36.self_attn.v_proj.weight": "pytorch_model-00025-of-00028.bin",
"model.layers.37.input_layernorm.weight": "pytorch_model-00026-of-00028.bin",
"model.layers.37.mlp.down_proj.weight": "pytorch_model-00026-of-00028.bin",
"model.layers.37.mlp.gate_proj.weight": "pytorch_model-00026-of-00028.bin",
"model.layers.37.mlp.up_proj.weight": "pytorch_model-00026-of-00028.bin",
"model.layers.37.post_attention_layernorm.weight": "pytorch_model-00026-of-00028.bin",
"model.layers.37.self_attn.k_proj.weight": "pytorch_model-00026-of-00028.bin",
"model.layers.37.self_attn.o_proj.weight": "pytorch_model-00026-of-00028.bin",
"model.layers.37.self_attn.q_proj.weight": "pytorch_model-00026-of-00028.bin",
"model.layers.37.self_attn.rotary_emb.inv_freq": "pytorch_model-00026-of-00028.bin",
"model.layers.37.self_attn.v_proj.weight": "pytorch_model-00026-of-00028.bin",
"model.layers.38.input_layernorm.weight": "pytorch_model-00027-of-00028.bin",
"model.layers.38.mlp.down_proj.weight": "pytorch_model-00027-of-00028.bin",
"model.layers.38.mlp.gate_proj.weight": "pytorch_model-00027-of-00028.bin",
"model.layers.38.mlp.up_proj.weight": "pytorch_model-00027-of-00028.bin",
"model.layers.38.post_attention_layernorm.weight": "pytorch_model-00027-of-00028.bin",
"model.layers.38.self_attn.k_proj.weight": "pytorch_model-00026-of-00028.bin",
"model.layers.38.self_attn.o_proj.weight": "pytorch_model-00026-of-00028.bin",
"model.layers.38.self_attn.q_proj.weight": "pytorch_model-00026-of-00028.bin",
"model.layers.38.self_attn.rotary_emb.inv_freq": "pytorch_model-00026-of-00028.bin",
"model.layers.38.self_attn.v_proj.weight": "pytorch_model-00026-of-00028.bin",
"model.layers.39.input_layernorm.weight": "pytorch_model-00028-of-00028.bin",
"model.layers.39.mlp.down_proj.weight": "pytorch_model-00027-of-00028.bin",
"model.layers.39.mlp.gate_proj.weight": "pytorch_model-00027-of-00028.bin",
"model.layers.39.mlp.up_proj.weight": "pytorch_model-00028-of-00028.bin",
"model.layers.39.post_attention_layernorm.weight": "pytorch_model-00028-of-00028.bin",
"model.layers.39.self_attn.k_proj.weight": "pytorch_model-00027-of-00028.bin",
"model.layers.39.self_attn.o_proj.weight": "pytorch_model-00027-of-00028.bin",
"model.layers.39.self_attn.q_proj.weight": "pytorch_model-00027-of-00028.bin",
"model.layers.39.self_attn.rotary_emb.inv_freq": "pytorch_model-00027-of-00028.bin",
"model.layers.39.self_attn.v_proj.weight": "pytorch_model-00027-of-00028.bin",
"model.layers.4.input_layernorm.weight": "pytorch_model-00004-of-00028.bin",
"model.layers.4.mlp.down_proj.weight": "pytorch_model-00004-of-00028.bin",
"model.layers.4.mlp.gate_proj.weight": "pytorch_model-00004-of-00028.bin",
"model.layers.4.mlp.up_proj.weight": "pytorch_model-00004-of-00028.bin",
"model.layers.4.post_attention_layernorm.weight": "pytorch_model-00004-of-00028.bin",
"model.layers.4.self_attn.k_proj.weight": "pytorch_model-00004-of-00028.bin",
"model.layers.4.self_attn.o_proj.weight": "pytorch_model-00004-of-00028.bin",
"model.layers.4.self_attn.q_proj.weight": "pytorch_model-00004-of-00028.bin",
"model.layers.4.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00028.bin",
"model.layers.4.self_attn.v_proj.weight": "pytorch_model-00004-of-00028.bin",
"model.layers.5.input_layernorm.weight": "pytorch_model-00005-of-00028.bin",
"model.layers.5.mlp.down_proj.weight": "pytorch_model-00005-of-00028.bin",
"model.layers.5.mlp.gate_proj.weight": "pytorch_model-00005-of-00028.bin",
"model.layers.5.mlp.up_proj.weight": "pytorch_model-00005-of-00028.bin",
"model.layers.5.post_attention_layernorm.weight": "pytorch_model-00005-of-00028.bin",
"model.layers.5.self_attn.k_proj.weight": "pytorch_model-00004-of-00028.bin",
"model.layers.5.self_attn.o_proj.weight": "pytorch_model-00004-of-00028.bin",
"model.layers.5.self_attn.q_proj.weight": "pytorch_model-00004-of-00028.bin",
"model.layers.5.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00028.bin",
"model.layers.5.self_attn.v_proj.weight": "pytorch_model-00004-of-00028.bin",
"model.layers.6.input_layernorm.weight": "pytorch_model-00006-of-00028.bin",
"model.layers.6.mlp.down_proj.weight": "pytorch_model-00005-of-00028.bin",
"model.layers.6.mlp.gate_proj.weight": "pytorch_model-00005-of-00028.bin",
"model.layers.6.mlp.up_proj.weight": "pytorch_model-00006-of-00028.bin",
"model.layers.6.post_attention_layernorm.weight": "pytorch_model-00006-of-00028.bin",
"model.layers.6.self_attn.k_proj.weight": "pytorch_model-00005-of-00028.bin",
"model.layers.6.self_attn.o_proj.weight": "pytorch_model-00005-of-00028.bin",
"model.layers.6.self_attn.q_proj.weight": "pytorch_model-00005-of-00028.bin",
"model.layers.6.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00028.bin",
"model.layers.6.self_attn.v_proj.weight": "pytorch_model-00005-of-00028.bin",
"model.layers.7.input_layernorm.weight": "pytorch_model-00006-of-00028.bin",
"model.layers.7.mlp.down_proj.weight": "pytorch_model-00006-of-00028.bin",
"model.layers.7.mlp.gate_proj.weight": "pytorch_model-00006-of-00028.bin",
"model.layers.7.mlp.up_proj.weight": "pytorch_model-00006-of-00028.bin",
"model.layers.7.post_attention_layernorm.weight": "pytorch_model-00006-of-00028.bin",
"model.layers.7.self_attn.k_proj.weight": "pytorch_model-00006-of-00028.bin",
"model.layers.7.self_attn.o_proj.weight": "pytorch_model-00006-of-00028.bin",
"model.layers.7.self_attn.q_proj.weight": "pytorch_model-00006-of-00028.bin",
"model.layers.7.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00028.bin",
"model.layers.7.self_attn.v_proj.weight": "pytorch_model-00006-of-00028.bin",
"model.layers.8.input_layernorm.weight": "pytorch_model-00007-of-00028.bin",
"model.layers.8.mlp.down_proj.weight": "pytorch_model-00007-of-00028.bin",
"model.layers.8.mlp.gate_proj.weight": "pytorch_model-00007-of-00028.bin",
"model.layers.8.mlp.up_proj.weight": "pytorch_model-00007-of-00028.bin",
"model.layers.8.post_attention_layernorm.weight": "pytorch_model-00007-of-00028.bin",
"model.layers.8.self_attn.k_proj.weight": "pytorch_model-00006-of-00028.bin",
"model.layers.8.self_attn.o_proj.weight": "pytorch_model-00006-of-00028.bin",
"model.layers.8.self_attn.q_proj.weight": "pytorch_model-00006-of-00028.bin",
"model.layers.8.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00028.bin",
"model.layers.8.self_attn.v_proj.weight": "pytorch_model-00006-of-00028.bin",
"model.layers.9.input_layernorm.weight": "pytorch_model-00008-of-00028.bin",
"model.layers.9.mlp.down_proj.weight": "pytorch_model-00007-of-00028.bin",
"model.layers.9.mlp.gate_proj.weight": "pytorch_model-00007-of-00028.bin",
"model.layers.9.mlp.up_proj.weight": "pytorch_model-00008-of-00028.bin",
"model.layers.9.post_attention_layernorm.weight": "pytorch_model-00008-of-00028.bin",
"model.layers.9.self_attn.k_proj.weight": "pytorch_model-00007-of-00028.bin",
"model.layers.9.self_attn.o_proj.weight": "pytorch_model-00007-of-00028.bin",
"model.layers.9.self_attn.q_proj.weight": "pytorch_model-00007-of-00028.bin",
"model.layers.9.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00028.bin",
"model.layers.9.self_attn.v_proj.weight": "pytorch_model-00007-of-00028.bin",
"model.norm.weight": "pytorch_model-00028-of-00028.bin"
}
}

23
special_tokens_map.json Normal file
View File

@@ -0,0 +1,23 @@
{
"bos_token": {
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"unk_token": {
"content": "<unk>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
}
}

3
tokenizer.model Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:147fa8ef9267d7760a39e07a15eeadb64ce6560a74fa92f4e69cf206307e876c
size 588649

33
tokenizer_config.json Normal file
View File

@@ -0,0 +1,33 @@
{
"add_bos_token": true,
"add_eos_token": false,
"bos_token": {
"__type": "AddedToken",
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"clean_up_tokenization_spaces": false,
"eos_token": {
"__type": "AddedToken",
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"model_max_length": 1000000000000000019884624838656,
"pad_token": null,
"sp_model_kwargs": {},
"tokenizer_class": "LlamaTokenizer",
"unk_token": {
"__type": "AddedToken",
"content": "<unk>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
}
}

BIN
ziya_en_eval.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 138 KiB

BIN
ziya_zh_eval.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 448 KiB