初始化项目,由ModelHub XC社区提供模型
Model: cmarkea/bloomz-7b1-mt-sft-chat Source: Original Platform
This commit is contained in:
36
.gitattributes
vendored
Normal file
36
.gitattributes
vendored
Normal file
@@ -0,0 +1,36 @@
|
|||||||
|
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.model filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||||
|
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||||
116
README.md
Normal file
116
README.md
Normal file
@@ -0,0 +1,116 @@
|
|||||||
|
---
|
||||||
|
license: bigscience-bloom-rail-1.0
|
||||||
|
datasets:
|
||||||
|
- ehartford/wizard_vicuna_70k_unfiltered
|
||||||
|
- shahules786/orca-chat
|
||||||
|
- timdettmers/openassistant-guanaco
|
||||||
|
- laion/OIG
|
||||||
|
language:
|
||||||
|
- fr
|
||||||
|
- en
|
||||||
|
library_name: transformers
|
||||||
|
pipeline_tag: text-generation
|
||||||
|
widget:
|
||||||
|
- text: </s>Bonjour, qui es-tu ?<s>
|
||||||
|
- text: </s>Hello, who are you?<s>
|
||||||
|
---
|
||||||
|
|
||||||
|
bloomz-7b1-mt-sft-chat
|
||||||
|
--------------------
|
||||||
|
We introduce the bloomz-7b1-mt-sft-chat model, which is a fine-tuning of a Large Language Model (LLM) [bigscience/bloomz-7b1-mt](https://huggingface.co/bigscience/bloomz-7b1-mt). This model is notable for being pre-trained for a chatbot context and undergoing a transposition from float16 to bfloat16. Therefore, this model serves as a solid starting point for fine-tuning towards other more specific tasks.
|
||||||
|
|
||||||
|
The model was trained equally on both French and English data, ensuring maximum efficiency for these two languages (and their interactions). Due to the transition from float16 to bfloat16, we do not guarantee the preservation of the original model's multilingual capabilities. However, fine-tuning can restore reasonable performance on other languages.
|
||||||
|
|
||||||
|
The objective is to pre-train all three models (Bloomz-{560m, 3b, 7b1-mt}-sft-chat) to ensure high-performing, energy-efficient, and fast "foundation" models for inference on "realistic" infrastructures suitable for a business with standard industrial capabilities.
|
||||||
|
|
||||||
|
|
||||||
|
Bloomz, through its license, enables free and flexible industrial use. Its tokenizer has been designed with true multi-lingual context in mind, with a significantly lower token generation per word compared to other LLM models. This capability not only leads to improved performance but also enhanced efficiency during inference by making fewer model calls when generating text with shorter contexts. Here is a table illustrating our points using French as an example, where we tokenized Marcel Proust's longest sentence (823 words):
|
||||||
|
```
|
||||||
|
Sans honneur que précaire, sans liberté que provisoire, [...], et de façon qu’à eux-mêmes il ne leur paraisse pas un vice.
|
||||||
|
```
|
||||||
|
|
||||||
|
| model | GPT 3.5 | Boris | Flan-T5 | LLaMA | Dolly | MPT | Falcon | Bloomz |
|
||||||
|
|:--------------:|:-------:|:-----:|:-------:|:-----:|:-----:|:---:|:------:|:------:|
|
||||||
|
| tokens per word | 2.3 | 2.3 | 2 | 1.9 | 1.9 | 1.9 | 1.8 | 1.4 |
|
||||||
|
|
||||||
|
For comparison, with a specialized French tokenizer like [CamemBERT](https://huggingface.co/camembert/camembert-base) or [DistilCamemBERT](cmarkea/distilcamembert-base), we have 1.5 tokens per word. In addition to its positive impact on inference time and resource consumption, there has already been [shown that there is a direct relationship](https://arxiv.org/abs/2012.15613) between the number of tokens per word required for modeling and the predictive performance of the model.
|
||||||
|
|
||||||
|
Dataset
|
||||||
|
-------
|
||||||
|
After analyzing a substantial set of modelings, we have observed that the most effective pre-training for zero-shot use cases is pre-training for chatbot contexts. This study was conducted internally, focusing specifically on the French context. As a result, we trained the model on a dataset comprising 0.9 billion tokens. This dataset consists of interactions between an individual and a third party. To balance the French and English data, we utilized the Google Translate API.
|
||||||
|
|
||||||
|
Training
|
||||||
|
--------
|
||||||
|
Here is the table summarizing the architecture used for training, along with the training time and average inference speed per token on the target architecture in tokens per second:
|
||||||
|
|
||||||
|
| model | Architecture | Training time (h) | Inference speed (tokens per second) |
|
||||||
|
|:----------------------:|:-------------:|:-----------------:|:-----------------------------------:|
|
||||||
|
| [bloomz-560m-sft-chat](https://huggingface.co/cmarkea/bloomz-560m-sft-chat) | 1 x A100 40GB | 41 | 29 |
|
||||||
|
| [bloomz-3b-sft-chat](https://huggingface.co/cmarkea/bloomz-3b-sft-chat) | 1 x A100 40GB | 140 | 13 |
|
||||||
|
| [bloomz-7b1-mt-sft-chat](https://huggingface.co/cmarkea/bloomz-7b1-mt-sft-chat) | 4 x A100 40GB | 268 | 8 |
|
||||||
|
|
||||||
|
| Hyperparameter | Value |
|
||||||
|
|:---------------------:|:----------:|
|
||||||
|
| label smoothing | 0.05 |
|
||||||
|
| optimizer | AdamW |
|
||||||
|
| betas | 0.9, 0.999 |
|
||||||
|
| learning rate | 5e-6 |
|
||||||
|
| anneal strategy | cos |
|
||||||
|
| div factor | 100 |
|
||||||
|
| final div factor | 0.1 |
|
||||||
|
| batch size | 2 |
|
||||||
|
| gradient accumulation | 200 |
|
||||||
|
| max length | 2048 |
|
||||||
|
|
||||||
|
Experimentations
|
||||||
|
----------------
|
||||||
|
Since the model is trained only on English and French corpora, the performance of the model cannot be guaranteed in other languages. This degradation in performance in other languages is also due to the change in the model's data type from float16 to bfloat16. The conversation example below illustrates this point:
|
||||||
|
|
||||||
|
| French | English | Spanish |
|
||||||
|
|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
|
||||||
|
| **human:** C'est quoi la première guerre mondiale ?<br>**bot:** La Première Guerre mondiale a eu lieu entre 1914 et 1918. C'était une guerre entre les puissances européennes, principalement la France, l'Allemagne et la Grande-Bretagne.<br>**human:** Y a-t-il eu également d’autres pays impliqués ?<br>**bot:** Oui, il y avait également des pays comme l'Italie, les États-Unis, la Russie et la Turquie.<br>**human:** Comment cela s'est-il fini et quels ont été ses impacts ?<br>**bot:** La Première Guerre mondiale s'est terminée par un accord de paix appelé le Traité de Versailles. Il a mis fin à la guerre et a établi un nouvel ordre mondial.<br>**human:** Combien y a-t-il eu de mort ?<br>**bot:** Il y a eu environ 10 millions de morts, principalement des soldats.<br>**human:** Et combien de mort civile ?<br>**bot:** Il y a eu environ 5 millions de morts civils, principalement des civils.<br>**human:** Tu es certain de ce chiffre, ça me parait peu...<br>**bot:** Je suis sûr de ce chiffre. | **human:** What is the first world war?<br>**bot:** The first world war was a war that took place between 1914 and 1918.<br>**human:** Which countries participated in this war?<br>**bot:** he countries that participated in the first world war were the United Kingdom, France, Belgium, the Netherlands, Italy, Austria-Hungary, Serbia, Montenegro, Bulgaria, Greece, Romania, Albania, and the Ottoman Empire.<br>**human:** How did it end and what were its impacts?<br>**bot:** The first world war ended with the Treaty of Versailles, which was signed on 28 June 1919. The Treaty of Versailles was a peace treaty that ended the war and established the international system that we know today.<br>**human:** How many deaths?<br>**bot:** The first world war resulted in approximately 50 million deaths, with approximately 30 million of those deaths occurring in the European Theater.<br>**human:** That seems a bit much to me, are you sure of these numbers?<br>**bot:** Yes, I am. | **human:** ¿Qué es la Primera Guerra Mundial?<br>**bot:** La Primera Guerra Mundial fue un conflicto bélico que ocurrió entre 1914 y 1918.<br>**human:** ¿En qué países participamos?<br>**bot:** La Primera Guerra Mundial involucró a muchos países, incluidos Alemania, Austria-Hungría, Bélgica, Bulgaria, Checoslovaquia, Francia, Gran Bretaña, Italia, Japón, Luxemburgo, Noruega, los Países Bajos, Portugal, Serbia, España, Turquía y Estados Unidos.<br>**human:** ¿Cuántas muertes?<br>**bot:** La Primera Guerra Mundial dejó más de 10 millones de muertes.<br>**human:** ¿Y cuántas muertes de civiles?<br>**bot:** La Primera Guerra Mundial dejó más de 5 millones de muertes de civiles.<br>**human:** ¿Estás seguro? Eso no me parece mucho.<br>**bot:** No, no estoy seguro. |
|
||||||
|
|
||||||
|
Evaluating the performance of generative models can be challenging, and quantitative measures may not always provide the most relevant insights for this type of approach. In this regard, we conducted a qualitative performance study. It involves assessing the relevance of model responses to a pool of French questions using blind evaluation. The metric used for evaluation is the average of two criteria: response accuracy and completeness, similar to what the [SSA metric](https://arxiv.org/abs/2001.09977) aims to capture. Please note that this measure lacks rigorous scientific validity due to the limited number of questions in the test dataset and the fact that it involves only a few individuals with similar socio-demographic characteristics. The prompts take the same format for all models:
|
||||||
|
```
|
||||||
|
[Instruction]
|
||||||
|
Question : [Question]
|
||||||
|
Réponse :
|
||||||
|
```
|
||||||
|
As a result, the prompts do not exploit the structures of chatbot models to ensure fairness, and the evaluation quantifies performance in a purely instruction-based approach. The figure below illustrates the results. The higher a model is positioned in the top-left corner with a small circle radius, the better the model; conversely, if a model is towards the bottom-right with a large circle, it performs less favorably.
|
||||||
|

|
||||||
|
We observe that across all models, the performance gain is logarithmic in relation to the increase in model parameters. However, for models that undergo multiple pre-trainings (vanilla, instruction, and chat), models pre-trained on instruction and chat perform significantly better in zero-shot contexts, with a notable improvement for chat-based approaches. The models we have trained demonstrate promising efficiency in this test compared to the number of parameters, indicating cost-effectiveness in a production context.
|
||||||
|
|
||||||
|
How to use bloomz-7b1-mt-sft-chat
|
||||||
|
-------------------------------
|
||||||
|
|
||||||
|
There are no specific instructions for using these models in a normal causal inference context. However, to leverage the chatbot capability of the model, an individual's prompt should be preceded by the EOS token (</s>), and the generated part should be preceded by the BOS token (<s>). The structure takes the following form:
|
||||||
|
```
|
||||||
|
</s>[human prompt 1]<s>[bot answer 1]</s>[human prompt 2]<s>
|
||||||
|
```
|
||||||
|
For example, to load the model using the HuggingFace pipeline interface:
|
||||||
|
```python
|
||||||
|
from transformers import pipeline
|
||||||
|
|
||||||
|
model = pipeline("text-generation", "cmarkea/bloomz-7b1-mt-sft-chat")
|
||||||
|
result = model("</s>C'est quoi le deep learning ?<s>", max_new_tokens=512)
|
||||||
|
|
||||||
|
result
|
||||||
|
[{'generated_text': "</s>C'est quoi le deep learning ?<s>Le deep learning
|
||||||
|
est un type de machine learning qui utilise des réseaux de neurones
|
||||||
|
artificiels pour apprendre à partir de données. Il est utilisé dans de
|
||||||
|
nombreux domaines, notamment la reconnaissance d'images, la
|
||||||
|
reconnaissance vocale, la traduction automatique et la reconnaissance
|
||||||
|
de l'écriture."}]
|
||||||
|
```
|
||||||
|
|
||||||
|
Citation
|
||||||
|
--------
|
||||||
|
|
||||||
|
```bibtex
|
||||||
|
@online{DeBloomzChat,
|
||||||
|
AUTHOR = {Cyrile Delestre},
|
||||||
|
URL = {https://huggingface.co/cmarkea/bloomz-7b1-mt-sft-chat},
|
||||||
|
YEAR = {2023},
|
||||||
|
KEYWORDS = {NLP ; Transformers ; LLM ; Bloomz},
|
||||||
|
}
|
||||||
|
```
|
||||||
32
config.json
Normal file
32
config.json
Normal file
@@ -0,0 +1,32 @@
|
|||||||
|
{
|
||||||
|
"_name_or_path": "cmarkea/bloomz-7b1-mt-sft-chat",
|
||||||
|
"apply_residual_connection_post_layernorm": false,
|
||||||
|
"architectures": [
|
||||||
|
"BloomForCausalLM"
|
||||||
|
],
|
||||||
|
"attention_dropout": 0.0,
|
||||||
|
"attention_softmax_in_fp32": true,
|
||||||
|
"bias_dropout_fusion": true,
|
||||||
|
"bos_token_id": 1,
|
||||||
|
"eos_token_id": 2,
|
||||||
|
"hidden_dropout": 0.0,
|
||||||
|
"hidden_size": 4096,
|
||||||
|
"initializer_range": 0.02,
|
||||||
|
"layer_norm_epsilon": 1e-05,
|
||||||
|
"masked_softmax_fusion": true,
|
||||||
|
"model_type": "bloom",
|
||||||
|
"n_head": 32,
|
||||||
|
"n_inner": null,
|
||||||
|
"n_layer": 30,
|
||||||
|
"offset_alibi": 100,
|
||||||
|
"pad_token_id": 3,
|
||||||
|
"pretraining_tp": 4,
|
||||||
|
"skip_bias_add": true,
|
||||||
|
"skip_bias_add_qkv": false,
|
||||||
|
"slow_but_exact": false,
|
||||||
|
"torch_dtype": "bfloat16",
|
||||||
|
"transformers_version": "4.32.0",
|
||||||
|
"unk_token_id": 0,
|
||||||
|
"use_cache": true,
|
||||||
|
"vocab_size": 250880
|
||||||
|
}
|
||||||
7
generation_config.json
Normal file
7
generation_config.json
Normal file
@@ -0,0 +1,7 @@
|
|||||||
|
{
|
||||||
|
"_from_model_config": true,
|
||||||
|
"bos_token_id": 1,
|
||||||
|
"eos_token_id": 2,
|
||||||
|
"pad_token_id": 3,
|
||||||
|
"transformers_version": "4.32.0"
|
||||||
|
}
|
||||||
3
model-00001-of-00002.safetensors
Normal file
3
model-00001-of-00002.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:2bb6aefe5e9a7338a8500b3be066bb3926ba6927b51ede9685f9b3bc056058d7
|
||||||
|
size 9976221672
|
||||||
3
model-00002-of-00002.safetensors
Normal file
3
model-00002-of-00002.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:18b32ca58093a07cc7431598af52507b07ae1549b03baed225da376bd7496eab
|
||||||
|
size 4161853848
|
||||||
372
model.safetensors.index.json
Normal file
372
model.safetensors.index.json
Normal file
@@ -0,0 +1,372 @@
|
|||||||
|
{
|
||||||
|
"metadata": {
|
||||||
|
"total_size": 14138032128
|
||||||
|
},
|
||||||
|
"weight_map": {
|
||||||
|
"transformer.h.0.input_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.0.mlp.dense_4h_to_h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.0.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.0.mlp.dense_h_to_4h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.0.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.0.post_attention_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.0.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.0.self_attention.dense.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.0.self_attention.dense.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.0.self_attention.query_key_value.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.0.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.1.input_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.1.mlp.dense_4h_to_h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.1.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.1.mlp.dense_h_to_4h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.1.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.1.post_attention_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.1.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.1.self_attention.dense.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.1.self_attention.dense.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.1.self_attention.query_key_value.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.1.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.10.input_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.10.mlp.dense_4h_to_h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.10.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.10.mlp.dense_h_to_4h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.10.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.10.post_attention_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.10.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.10.self_attention.dense.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.10.self_attention.dense.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.10.self_attention.query_key_value.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.10.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.11.input_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.11.mlp.dense_4h_to_h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.11.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.11.mlp.dense_h_to_4h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.11.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.11.post_attention_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.11.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.11.self_attention.dense.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.11.self_attention.dense.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.11.self_attention.query_key_value.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.11.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.12.input_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.12.mlp.dense_4h_to_h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.12.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.12.mlp.dense_h_to_4h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.12.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.12.post_attention_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.12.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.12.self_attention.dense.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.12.self_attention.dense.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.12.self_attention.query_key_value.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.12.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.13.input_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.13.mlp.dense_4h_to_h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.13.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.13.mlp.dense_h_to_4h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.13.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.13.post_attention_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.13.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.13.self_attention.dense.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.13.self_attention.dense.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.13.self_attention.query_key_value.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.13.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.14.input_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.14.mlp.dense_4h_to_h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.14.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.14.mlp.dense_h_to_4h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.14.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.14.post_attention_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.14.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.14.self_attention.dense.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.14.self_attention.dense.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.14.self_attention.query_key_value.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.14.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.15.input_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.15.mlp.dense_4h_to_h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.15.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.15.mlp.dense_h_to_4h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.15.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.15.post_attention_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.15.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.15.self_attention.dense.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.15.self_attention.dense.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.15.self_attention.query_key_value.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.15.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.16.input_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.16.mlp.dense_4h_to_h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.16.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.16.mlp.dense_h_to_4h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.16.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.16.post_attention_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.16.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.16.self_attention.dense.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.16.self_attention.dense.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.16.self_attention.query_key_value.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.16.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.17.input_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.17.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.17.mlp.dense_4h_to_h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.17.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.17.mlp.dense_h_to_4h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.17.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.17.post_attention_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.17.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.17.self_attention.dense.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.17.self_attention.dense.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.17.self_attention.query_key_value.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.17.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.18.input_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.18.mlp.dense_4h_to_h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.18.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.18.mlp.dense_h_to_4h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.18.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.18.post_attention_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.18.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.18.self_attention.dense.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.18.self_attention.dense.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.18.self_attention.query_key_value.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.18.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.19.input_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.19.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.19.mlp.dense_4h_to_h.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.19.mlp.dense_4h_to_h.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.19.mlp.dense_h_to_4h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.19.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.19.post_attention_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.19.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.19.self_attention.dense.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.19.self_attention.dense.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.19.self_attention.query_key_value.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.19.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.2.input_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.2.mlp.dense_4h_to_h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.2.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.2.mlp.dense_h_to_4h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.2.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.2.post_attention_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.2.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.2.self_attention.dense.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.2.self_attention.dense.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.2.self_attention.query_key_value.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.2.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.20.input_layernorm.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.20.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.20.mlp.dense_4h_to_h.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.20.mlp.dense_4h_to_h.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.20.mlp.dense_h_to_4h.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.20.mlp.dense_h_to_4h.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.20.post_attention_layernorm.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.20.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.20.self_attention.dense.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.20.self_attention.dense.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.20.self_attention.query_key_value.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.20.self_attention.query_key_value.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.21.input_layernorm.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.21.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.21.mlp.dense_4h_to_h.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.21.mlp.dense_4h_to_h.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.21.mlp.dense_h_to_4h.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.21.mlp.dense_h_to_4h.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.21.post_attention_layernorm.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.21.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.21.self_attention.dense.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.21.self_attention.dense.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.21.self_attention.query_key_value.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.21.self_attention.query_key_value.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.22.input_layernorm.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.22.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.22.mlp.dense_4h_to_h.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.22.mlp.dense_4h_to_h.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.22.mlp.dense_h_to_4h.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.22.mlp.dense_h_to_4h.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.22.post_attention_layernorm.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.22.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.22.self_attention.dense.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.22.self_attention.dense.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.22.self_attention.query_key_value.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.22.self_attention.query_key_value.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.23.input_layernorm.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.23.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.23.mlp.dense_4h_to_h.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.23.mlp.dense_4h_to_h.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.23.mlp.dense_h_to_4h.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.23.mlp.dense_h_to_4h.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.23.post_attention_layernorm.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.23.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.23.self_attention.dense.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.23.self_attention.dense.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.23.self_attention.query_key_value.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.23.self_attention.query_key_value.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.24.input_layernorm.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.24.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.24.mlp.dense_4h_to_h.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.24.mlp.dense_4h_to_h.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.24.mlp.dense_h_to_4h.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.24.mlp.dense_h_to_4h.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.24.post_attention_layernorm.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.24.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.24.self_attention.dense.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.24.self_attention.dense.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.24.self_attention.query_key_value.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.24.self_attention.query_key_value.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.25.input_layernorm.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.25.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.25.mlp.dense_4h_to_h.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.25.mlp.dense_4h_to_h.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.25.mlp.dense_h_to_4h.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.25.mlp.dense_h_to_4h.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.25.post_attention_layernorm.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.25.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.25.self_attention.dense.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.25.self_attention.dense.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.25.self_attention.query_key_value.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.25.self_attention.query_key_value.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.26.input_layernorm.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.26.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.26.mlp.dense_4h_to_h.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.26.mlp.dense_4h_to_h.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.26.mlp.dense_h_to_4h.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.26.mlp.dense_h_to_4h.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.26.post_attention_layernorm.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.26.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.26.self_attention.dense.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.26.self_attention.dense.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.26.self_attention.query_key_value.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.26.self_attention.query_key_value.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.27.input_layernorm.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.27.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.27.mlp.dense_4h_to_h.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.27.mlp.dense_4h_to_h.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.27.mlp.dense_h_to_4h.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.27.mlp.dense_h_to_4h.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.27.post_attention_layernorm.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.27.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.27.self_attention.dense.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.27.self_attention.dense.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.27.self_attention.query_key_value.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.27.self_attention.query_key_value.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.28.input_layernorm.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.28.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.28.mlp.dense_4h_to_h.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.28.mlp.dense_4h_to_h.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.28.mlp.dense_h_to_4h.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.28.mlp.dense_h_to_4h.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.28.post_attention_layernorm.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.28.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.28.self_attention.dense.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.28.self_attention.dense.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.28.self_attention.query_key_value.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.28.self_attention.query_key_value.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.29.input_layernorm.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.29.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.29.mlp.dense_4h_to_h.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.29.mlp.dense_4h_to_h.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.29.mlp.dense_h_to_4h.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.29.mlp.dense_h_to_4h.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.29.post_attention_layernorm.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.29.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.29.self_attention.dense.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.29.self_attention.dense.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.29.self_attention.query_key_value.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.29.self_attention.query_key_value.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.h.3.input_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.3.mlp.dense_4h_to_h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.3.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.3.mlp.dense_h_to_4h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.3.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.3.post_attention_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.3.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.3.self_attention.dense.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.3.self_attention.dense.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.3.self_attention.query_key_value.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.3.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.4.input_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.4.mlp.dense_4h_to_h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.4.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.4.mlp.dense_h_to_4h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.4.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.4.post_attention_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.4.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.4.self_attention.dense.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.4.self_attention.dense.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.4.self_attention.query_key_value.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.4.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.5.input_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.5.mlp.dense_4h_to_h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.5.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.5.mlp.dense_h_to_4h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.5.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.5.post_attention_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.5.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.5.self_attention.dense.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.5.self_attention.dense.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.5.self_attention.query_key_value.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.5.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.6.input_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.6.mlp.dense_4h_to_h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.6.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.6.mlp.dense_h_to_4h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.6.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.6.post_attention_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.6.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.6.self_attention.dense.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.6.self_attention.dense.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.6.self_attention.query_key_value.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.6.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.7.input_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.7.mlp.dense_4h_to_h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.7.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.7.mlp.dense_h_to_4h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.7.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.7.post_attention_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.7.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.7.self_attention.dense.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.7.self_attention.dense.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.7.self_attention.query_key_value.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.7.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.8.input_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.8.mlp.dense_4h_to_h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.8.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.8.mlp.dense_h_to_4h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.8.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.8.post_attention_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.8.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.8.self_attention.dense.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.8.self_attention.dense.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.8.self_attention.query_key_value.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.8.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.9.input_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.9.mlp.dense_4h_to_h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.9.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.9.mlp.dense_h_to_4h.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.9.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.9.post_attention_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.9.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.9.self_attention.dense.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.9.self_attention.dense.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.9.self_attention.query_key_value.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.h.9.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.ln_f.bias": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.ln_f.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"transformer.word_embeddings.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.word_embeddings_layernorm.bias": "model-00001-of-00002.safetensors",
|
||||||
|
"transformer.word_embeddings_layernorm.weight": "model-00001-of-00002.safetensors"
|
||||||
|
}
|
||||||
|
}
|
||||||
3
pytorch_model-00001-of-00002.bin
Normal file
3
pytorch_model-00001-of-00002.bin
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:c8f93c1e51d79da0d7efae3e3d557688e9bd230f26bfbfbc483c2951fb736c80
|
||||||
|
size 9976277014
|
||||||
3
pytorch_model-00002-of-00002.bin
Normal file
3
pytorch_model-00002-of-00002.bin
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:a8f3cdd90e703e72690704c24e2b6c3b9bf9e5ce4b90c94df3601cbf33e32668
|
||||||
|
size 4161880991
|
||||||
373
pytorch_model.bin.index.json
Normal file
373
pytorch_model.bin.index.json
Normal file
@@ -0,0 +1,373 @@
|
|||||||
|
{
|
||||||
|
"metadata": {
|
||||||
|
"total_size": 14138032128
|
||||||
|
},
|
||||||
|
"weight_map": {
|
||||||
|
"lm_head.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.0.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.0.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.0.mlp.dense_4h_to_h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.0.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.0.mlp.dense_h_to_4h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.0.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.0.post_attention_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.0.self_attention.dense.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.0.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.0.self_attention.query_key_value.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.0.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.1.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.1.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.1.mlp.dense_4h_to_h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.1.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.1.mlp.dense_h_to_4h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.1.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.1.post_attention_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.1.self_attention.dense.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.1.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.1.self_attention.query_key_value.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.1.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.10.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.10.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.10.mlp.dense_4h_to_h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.10.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.10.mlp.dense_h_to_4h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.10.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.10.post_attention_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.10.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.10.self_attention.dense.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.10.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.10.self_attention.query_key_value.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.10.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.11.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.11.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.11.mlp.dense_4h_to_h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.11.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.11.mlp.dense_h_to_4h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.11.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.11.post_attention_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.11.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.11.self_attention.dense.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.11.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.11.self_attention.query_key_value.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.11.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.12.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.12.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.12.mlp.dense_4h_to_h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.12.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.12.mlp.dense_h_to_4h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.12.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.12.post_attention_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.12.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.12.self_attention.dense.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.12.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.12.self_attention.query_key_value.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.12.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.13.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.13.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.13.mlp.dense_4h_to_h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.13.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.13.mlp.dense_h_to_4h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.13.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.13.post_attention_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.13.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.13.self_attention.dense.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.13.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.13.self_attention.query_key_value.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.13.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.14.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.14.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.14.mlp.dense_4h_to_h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.14.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.14.mlp.dense_h_to_4h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.14.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.14.post_attention_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.14.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.14.self_attention.dense.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.14.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.14.self_attention.query_key_value.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.14.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.15.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.15.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.15.mlp.dense_4h_to_h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.15.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.15.mlp.dense_h_to_4h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.15.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.15.post_attention_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.15.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.15.self_attention.dense.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.15.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.15.self_attention.query_key_value.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.15.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.16.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.16.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.16.mlp.dense_4h_to_h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.16.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.16.mlp.dense_h_to_4h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.16.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.16.post_attention_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.16.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.16.self_attention.dense.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.16.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.16.self_attention.query_key_value.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.16.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.17.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.17.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.17.mlp.dense_4h_to_h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.17.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.17.mlp.dense_h_to_4h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.17.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.17.post_attention_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.17.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.17.self_attention.dense.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.17.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.17.self_attention.query_key_value.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.17.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.18.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.18.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.18.mlp.dense_4h_to_h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.18.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.18.mlp.dense_h_to_4h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.18.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.18.post_attention_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.18.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.18.self_attention.dense.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.18.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.18.self_attention.query_key_value.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.18.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.19.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.19.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.19.mlp.dense_4h_to_h.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.19.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.19.mlp.dense_h_to_4h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.19.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.19.post_attention_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.19.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.19.self_attention.dense.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.19.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.19.self_attention.query_key_value.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.19.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.2.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.2.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.2.mlp.dense_4h_to_h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.2.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.2.mlp.dense_h_to_4h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.2.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.2.post_attention_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.2.self_attention.dense.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.2.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.2.self_attention.query_key_value.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.2.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.20.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.20.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.20.mlp.dense_4h_to_h.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.20.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.20.mlp.dense_h_to_4h.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.20.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.20.post_attention_layernorm.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.20.post_attention_layernorm.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.20.self_attention.dense.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.20.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.20.self_attention.query_key_value.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.20.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.21.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.21.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.21.mlp.dense_4h_to_h.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.21.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.21.mlp.dense_h_to_4h.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.21.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.21.post_attention_layernorm.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.21.post_attention_layernorm.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.21.self_attention.dense.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.21.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.21.self_attention.query_key_value.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.21.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.22.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.22.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.22.mlp.dense_4h_to_h.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.22.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.22.mlp.dense_h_to_4h.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.22.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.22.post_attention_layernorm.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.22.post_attention_layernorm.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.22.self_attention.dense.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.22.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.22.self_attention.query_key_value.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.22.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.23.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.23.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.23.mlp.dense_4h_to_h.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.23.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.23.mlp.dense_h_to_4h.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.23.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.23.post_attention_layernorm.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.23.post_attention_layernorm.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.23.self_attention.dense.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.23.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.23.self_attention.query_key_value.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.23.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.24.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.24.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.24.mlp.dense_4h_to_h.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.24.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.24.mlp.dense_h_to_4h.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.24.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.24.post_attention_layernorm.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.24.post_attention_layernorm.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.24.self_attention.dense.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.24.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.24.self_attention.query_key_value.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.24.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.25.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.25.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.25.mlp.dense_4h_to_h.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.25.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.25.mlp.dense_h_to_4h.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.25.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.25.post_attention_layernorm.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.25.post_attention_layernorm.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.25.self_attention.dense.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.25.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.25.self_attention.query_key_value.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.25.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.26.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.26.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.26.mlp.dense_4h_to_h.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.26.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.26.mlp.dense_h_to_4h.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.26.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.26.post_attention_layernorm.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.26.post_attention_layernorm.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.26.self_attention.dense.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.26.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.26.self_attention.query_key_value.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.26.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.27.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.27.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.27.mlp.dense_4h_to_h.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.27.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.27.mlp.dense_h_to_4h.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.27.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.27.post_attention_layernorm.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.27.post_attention_layernorm.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.27.self_attention.dense.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.27.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.27.self_attention.query_key_value.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.27.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.28.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.28.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.28.mlp.dense_4h_to_h.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.28.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.28.mlp.dense_h_to_4h.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.28.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.28.post_attention_layernorm.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.28.post_attention_layernorm.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.28.self_attention.dense.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.28.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.28.self_attention.query_key_value.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.28.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.29.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.29.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.29.mlp.dense_4h_to_h.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.29.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.29.mlp.dense_h_to_4h.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.29.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.29.post_attention_layernorm.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.29.post_attention_layernorm.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.29.self_attention.dense.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.29.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.29.self_attention.query_key_value.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.29.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.h.3.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.3.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.3.mlp.dense_4h_to_h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.3.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.3.mlp.dense_h_to_4h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.3.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.3.post_attention_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.3.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.3.self_attention.dense.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.3.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.3.self_attention.query_key_value.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.3.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.4.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.4.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.4.mlp.dense_4h_to_h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.4.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.4.mlp.dense_h_to_4h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.4.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.4.post_attention_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.4.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.4.self_attention.dense.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.4.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.4.self_attention.query_key_value.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.4.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.5.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.5.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.5.mlp.dense_4h_to_h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.5.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.5.mlp.dense_h_to_4h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.5.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.5.post_attention_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.5.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.5.self_attention.dense.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.5.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.5.self_attention.query_key_value.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.5.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.6.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.6.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.6.mlp.dense_4h_to_h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.6.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.6.mlp.dense_h_to_4h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.6.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.6.post_attention_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.6.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.6.self_attention.dense.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.6.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.6.self_attention.query_key_value.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.6.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.7.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.7.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.7.mlp.dense_4h_to_h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.7.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.7.mlp.dense_h_to_4h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.7.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.7.post_attention_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.7.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.7.self_attention.dense.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.7.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.7.self_attention.query_key_value.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.7.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.8.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.8.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.8.mlp.dense_4h_to_h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.8.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.8.mlp.dense_h_to_4h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.8.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.8.post_attention_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.8.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.8.self_attention.dense.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.8.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.8.self_attention.query_key_value.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.8.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.9.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.9.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.9.mlp.dense_4h_to_h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.9.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.9.mlp.dense_h_to_4h.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.9.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.9.post_attention_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.9.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.9.self_attention.dense.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.9.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.9.self_attention.query_key_value.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.h.9.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.ln_f.bias": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.ln_f.weight": "pytorch_model-00002-of-00002.bin",
|
||||||
|
"transformer.word_embeddings.weight": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.word_embeddings_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||||
|
"transformer.word_embeddings_layernorm.weight": "pytorch_model-00001-of-00002.bin"
|
||||||
|
}
|
||||||
|
}
|
||||||
6
special_tokens_map.json
Normal file
6
special_tokens_map.json
Normal file
@@ -0,0 +1,6 @@
|
|||||||
|
{
|
||||||
|
"bos_token": "<s>",
|
||||||
|
"eos_token": "</s>",
|
||||||
|
"pad_token": "<pad>",
|
||||||
|
"unk_token": "<unk>"
|
||||||
|
}
|
||||||
3
tokenizer.json
Normal file
3
tokenizer.json
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:17a208233d2ee8d8c83b23bc214df737c44806a1919f444e89b31e586cd956ba
|
||||||
|
size 14500471
|
||||||
10
tokenizer_config.json
Normal file
10
tokenizer_config.json
Normal file
@@ -0,0 +1,10 @@
|
|||||||
|
{
|
||||||
|
"add_prefix_space": false,
|
||||||
|
"bos_token": "<s>",
|
||||||
|
"clean_up_tokenization_spaces": false,
|
||||||
|
"eos_token": "</s>",
|
||||||
|
"model_max_length": 1000000000000000019884624838656,
|
||||||
|
"pad_token": "<pad>",
|
||||||
|
"tokenizer_class": "BloomTokenizer",
|
||||||
|
"unk_token": "<unk>"
|
||||||
|
}
|
||||||
Reference in New Issue
Block a user