初始化项目,由ModelHub XC社区提供模型

Model: Writer/palmyra-med-20b
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-06-08 18:32:37 +08:00
commit 6f9ed48d98
16 changed files with 151182 additions and 0 deletions

35
.gitattributes vendored Normal file
View File

@@ -0,0 +1,35 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text

184
README.md Normal file
View File

@@ -0,0 +1,184 @@
---
license: apache-2.0
language:
- en
tags:
- medical
- palmyra
---
**DEPRECATED MODEL NOTICE**
==========================
Please note that this model is no longer maintained or supported by our team. We strongly advise against using it in production or for any critical applications.
Instead, we recommend using our latest and greatest models, which can be found at:
https://huggingface.co/collections/Writer/palmyra-writer-license-66476fa8156169f8720a2c89
==========================
# Palmyra-med-20b
## Model description
**Palmyra-Med-20b** is a 20 billion parameter Large Language Model that has been uptrained on
**Palmyra-Large** with a specialized custom-curated medical dataset.
The main objective of this model is to enhance performance in tasks related to medical dialogue
and question-answering.
- **Developed by:** [https://writer.com/](https://writer.com/);
- **Model type:** Causal decoder-only;
- **Language(s) (NLP):** English;
- **License:** Apache 2.0;
- **Finetuned from model:** [Palmyra-Large](https://huggingface.co/Writer/palmyra-large).
### Model Source
[Palmyra-Med: Instruction-Based Fine-Tuning of LLMs Enhancing Medical Domain Performance](https://dev.writer.com/docs/palmyra-med-instruction-based-fine-tuning-of-llms-enhancing-medical-domain-performance)
## Uses
### Out-of-Scope Use
Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.
## Bias, Risks, and Limitations
Palmyra-Med-20B is mostly trained on English data, and will not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online.
### Recommendations
We recommend users of Palmyra-Med-20B to develop guardrails and to take appropriate precautions for any production use.
## Usage
The model is compatible with the huggingface `AutoModelForCausalLM` and can be easily run on a single 40GB A100.
```py
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "Writer/palmyra-med-20b"
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
torch_dtype=torch.float16,
)
prompt = "Can you explain in simple terms how vaccines help our body fight diseases?"
input_text = (
"A chat between a curious user and an artificial intelligence assistant. "
"The assistant gives helpful, detailed, and polite answers to the user's questions. "
"USER: {prompt} "
"ASSISTANT:"
)
model_inputs = tokenizer(input_text.format(prompt=prompt), return_tensors="pt").to(
"cuda"
)
gen_conf = {
"temperature": 0.7,
"repetition_penalty": 1.0,
"max_new_tokens": 512,
"do_sample": True,
}
out_tokens = model.generate(**model_inputs, **gen_conf)
response_ids = out_tokens[0][len(model_inputs.input_ids[0]) :]
output = tokenizer.decode(response_ids, skip_special_tokens=True)
print(output)
## output ##
# Vaccines stimulate the production of antibodies by the body's immune system.
# Antibodies are proteins produced by B lymphocytes in response to foreign substances,such as viruses and bacteria.
# The antibodies produced by the immune system can bind to and neutralize the pathogens, preventing them from invading and damaging the host cells.
# Vaccines work by introducing antigens, which are components of the pathogen, into the body.
# The immune system then produces antibodies against the antigens, which can recognize and neutralize the pathogen if it enters the body in the future.
# The use of vaccines has led to a significant reduction in the incidence and severity of many diseases, including measles, mumps, rubella, and polio.
```
It can also be used with text-generation-inference
```sh
model=Writer/palmyra-med-20b
volume=$PWD/data
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference --model-id $model
```
## Dataset
For the fine-tuning of our LLMs, we used a custom-curated medical dataset that combines data from
two publicly available sources: PubMedQA (Jin et al. 2019) and MedQA (Zhang et al. 2018).The
PubMedQA dataset, which originated from the PubMed abstract database, consists of biomedical
articles accompanied by corresponding question-answer pairs. In contrast, the MedQA dataset
features medical questions and answers that are designed to assess the reasoning capabilities of
medical question-answering systems.
We prepared our custom dataset by merging and processing data from the aforementioned sources,
maintaining the dataset mixture ratios detailed in Table 1. These ratios were consistent for finetuning
both Palmyra-20b and Palmyra-40b models. Upon fine-tuning the models with this dataset, we refer
to the resulting models as Palmyra-Med-20b and Palmyra-Med-40b, respectively.
| Dataset | Ratio | Count |
| -----------|----------- | ----------- |
| PubMedQA | 75% | 150,000 |
| MedQA | 25% | 10,178 |
## Evaluation
we present the findings of our experiments, beginning with the evaluation outcomes of
the fine-tuned models and followed by a discussion of the base models performance on each of the
evaluation datasets. Additionally, we report the progressive improvement of the Palmyra-Med-40b
model throughout the training process on the PubMedQA dataset.
| Model | PubMedQA | MedQA |
| -----------|----------- | ----------- |
| Palmyra-20b | 49.8 | 31.2 |
| Palmyra-40b | 64.8 | 43.1|
| Palmyra-Med-20b| 75.6 | 44.6|
| Palmyra-Med-40b| 81.1 | 72.4|
## Limitation
The model may not operate efficiently beyond the confines of the healthcare field.
Since it has not been subjected to practical scenarios, its real-time efficacy and precision remain undetermined.
Under no circumstances should it replace the advice of a medical professional, and it must be regarded solely as a tool for research purposes.
## Citation and Related Information
To cite this model:
```
@misc{Palmyra-Med-20B,
author = {Writer Engineering team},
title = {{Palmyra-Large Parameter Autoregressive Language Model}},
howpublished = {\url{https://dev.writer.com}},
year = 2023,
month = March
}
```
## Contact
Hello@writer.com
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Writer__palmyra-med-20b)
| Metric | Value |
|-----------------------|---------------------------|
| Avg. | 40.02 |
| ARC (25-shot) | 46.93 |
| HellaSwag (10-shot) | 73.51 |
| MMLU (5-shot) | 44.34 |
| TruthfulQA (0-shot) | 35.47 |
| Winogrande (5-shot) | 65.35 |
| GSM8K (5-shot) | 2.65 |
| DROP (3-shot) | 11.88 |

4
added_tokens.json Normal file
View File

@@ -0,0 +1,4 @@
{
"<s>": 50258,
"[PAD]": 50257
}

32
config.json Normal file
View File

@@ -0,0 +1,32 @@
{
"_name_or_path": "palmyra-med-20b",
"activation_function": "gelu",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50256,
"embd_pdrop": 0.1,
"eos_token_id": 50256,
"initializer_range": 0.008165,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_embd": 6144,
"n_head": 48,
"n_inner": 24576,
"n_layer": 44,
"n_positions": 2048,
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"torch_dtype": "float16",
"transformers_version": "4.30.0.dev0",
"use_cache": false,
"vocab_size": 50259
}

6
generation_config.json Normal file
View File

@@ -0,0 +1,6 @@
{
"_from_model_config": true,
"bos_token_id": 50256,
"eos_token_id": 50256,
"transformers_version": "4.30.0.dev0"
}

50001
merges.txt Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:bd9c141fb89248be2fcfd895873a97008e0329468117237cf2243dce9c4484f0
size 9930640129

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:804c94e524af7927aae4c222670cdbedda9fb4077f53f14440f6c319d0f74b06
size 9967469263

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:650f860fa5d24ec87ef17394e82735350c1563f8b18ff3521955c8d4990ad21a
size 9967469263

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4e07108b56d29d59adf796cb0e12b15af3de0138e318adf6ac3f6a8207d8861f
size 9967469263

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a4f32273b9d70dbd6a61f3fb9bb32d41c73c26b9e1a524fa2e5cf508223b27c7
size 679603843

View File

@@ -0,0 +1,540 @@
{
"metadata": {
"total_size": 40512466944
},
"weight_map": {
"lm_head.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.0.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.0.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.0.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.0.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.0.ln_1.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.0.ln_1.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.0.ln_2.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.0.ln_2.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.0.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.0.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.0.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.0.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.1.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.1.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.1.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.1.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.1.ln_1.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.1.ln_1.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.1.ln_2.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.1.ln_2.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.1.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.1.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.1.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.1.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.10.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.10.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.10.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.10.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.10.ln_1.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.10.ln_1.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.10.ln_2.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.10.ln_2.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.10.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.10.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.10.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.10.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.11.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.11.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.11.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.11.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.11.ln_1.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.11.ln_1.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.11.ln_2.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.11.ln_2.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.11.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.11.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.11.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.11.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.12.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.12.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.12.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.12.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.12.ln_1.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.12.ln_1.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.12.ln_2.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.12.ln_2.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.12.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.12.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.12.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.12.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.13.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.13.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.13.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.13.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.13.ln_1.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.13.ln_1.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.13.ln_2.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.13.ln_2.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.13.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.13.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.13.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.13.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.14.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.14.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.14.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.14.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.14.ln_1.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.14.ln_1.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.14.ln_2.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.14.ln_2.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.14.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.14.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.14.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.14.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.15.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.15.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.15.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.15.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.15.ln_1.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.15.ln_1.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.15.ln_2.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.15.ln_2.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.15.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.15.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.15.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.15.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.16.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.16.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.16.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.16.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.16.ln_1.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.16.ln_1.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.16.ln_2.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.16.ln_2.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.16.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.16.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.16.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.16.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.17.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.17.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.17.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.17.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.17.ln_1.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.17.ln_1.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.17.ln_2.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.17.ln_2.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.17.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.17.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.17.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.17.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.18.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.18.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.18.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.18.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.18.ln_1.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.18.ln_1.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.18.ln_2.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.18.ln_2.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.18.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.18.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.18.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.18.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.19.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.19.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.19.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.19.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.19.ln_1.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.19.ln_1.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.19.ln_2.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.19.ln_2.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.19.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.19.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.19.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.19.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.2.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.2.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.2.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.2.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.2.ln_1.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.2.ln_1.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.2.ln_2.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.2.ln_2.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.2.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.2.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.2.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.2.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.20.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.20.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.20.attn.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.20.attn.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.20.ln_1.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.20.ln_1.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.20.ln_2.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.20.ln_2.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.20.mlp.c_fc.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.20.mlp.c_fc.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.20.mlp.c_proj.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.20.mlp.c_proj.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.21.attn.c_attn.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.21.attn.c_attn.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.21.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.21.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.21.ln_1.bias": "pytorch_model-00002-of-00005.bin",
"transformer.h.21.ln_1.weight": "pytorch_model-00002-of-00005.bin",
"transformer.h.21.ln_2.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.21.ln_2.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.21.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.21.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.21.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.21.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.22.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.22.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.22.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.22.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.22.ln_1.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.22.ln_1.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.22.ln_2.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.22.ln_2.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.22.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.22.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.22.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.22.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.23.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.23.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.23.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.23.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.23.ln_1.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.23.ln_1.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.23.ln_2.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.23.ln_2.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.23.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.23.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.23.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.23.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.24.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.24.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.24.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.24.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.24.ln_1.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.24.ln_1.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.24.ln_2.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.24.ln_2.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.24.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.24.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.24.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.24.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.25.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.25.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.25.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.25.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.25.ln_1.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.25.ln_1.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.25.ln_2.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.25.ln_2.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.25.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.25.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.25.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.25.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.26.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.26.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.26.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.26.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.26.ln_1.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.26.ln_1.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.26.ln_2.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.26.ln_2.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.26.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.26.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.26.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.26.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.27.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.27.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.27.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.27.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.27.ln_1.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.27.ln_1.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.27.ln_2.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.27.ln_2.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.27.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.27.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.27.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.27.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.28.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.28.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.28.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.28.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.28.ln_1.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.28.ln_1.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.28.ln_2.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.28.ln_2.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.28.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.28.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.28.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.28.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.29.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.29.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.29.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.29.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.29.ln_1.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.29.ln_1.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.29.ln_2.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.29.ln_2.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.29.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.29.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.29.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.29.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.3.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.3.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.3.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.3.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.3.ln_1.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.3.ln_1.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.3.ln_2.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.3.ln_2.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.3.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.3.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.3.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.3.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.30.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.30.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.30.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.30.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.30.ln_1.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.30.ln_1.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.30.ln_2.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.30.ln_2.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.30.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.30.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.30.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.30.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.31.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.31.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.31.attn.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.31.attn.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.31.ln_1.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.31.ln_1.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.31.ln_2.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.31.ln_2.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.31.mlp.c_fc.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.31.mlp.c_fc.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.31.mlp.c_proj.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.31.mlp.c_proj.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.32.attn.c_attn.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.32.attn.c_attn.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.32.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.32.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.32.ln_1.bias": "pytorch_model-00003-of-00005.bin",
"transformer.h.32.ln_1.weight": "pytorch_model-00003-of-00005.bin",
"transformer.h.32.ln_2.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.32.ln_2.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.32.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.32.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.32.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.32.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.33.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.33.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.33.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.33.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.33.ln_1.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.33.ln_1.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.33.ln_2.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.33.ln_2.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.33.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.33.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.33.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.33.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.34.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.34.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.34.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.34.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.34.ln_1.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.34.ln_1.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.34.ln_2.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.34.ln_2.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.34.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.34.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.34.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.34.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.35.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.35.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.35.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.35.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.35.ln_1.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.35.ln_1.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.35.ln_2.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.35.ln_2.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.35.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.35.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.35.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.35.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.36.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.36.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.36.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.36.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.36.ln_1.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.36.ln_1.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.36.ln_2.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.36.ln_2.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.36.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.36.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.36.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.36.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.37.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.37.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.37.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.37.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.37.ln_1.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.37.ln_1.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.37.ln_2.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.37.ln_2.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.37.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.37.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.37.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.37.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.38.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.38.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.38.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.38.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.38.ln_1.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.38.ln_1.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.38.ln_2.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.38.ln_2.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.38.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.38.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.38.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.38.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.39.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.39.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.39.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.39.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.39.ln_1.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.39.ln_1.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.39.ln_2.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.39.ln_2.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.39.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.39.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.39.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.39.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.4.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.4.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.4.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.4.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.4.ln_1.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.4.ln_1.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.4.ln_2.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.4.ln_2.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.4.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.4.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.4.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.4.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.40.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.40.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.40.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.40.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.40.ln_1.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.40.ln_1.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.40.ln_2.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.40.ln_2.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.40.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.40.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.40.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.40.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.41.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.41.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.41.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.41.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.41.ln_1.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.41.ln_1.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.41.ln_2.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.41.ln_2.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.41.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.41.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.41.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.41.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.42.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.42.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.42.attn.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.42.attn.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.42.ln_1.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.42.ln_1.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.42.ln_2.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.42.ln_2.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.42.mlp.c_fc.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.42.mlp.c_fc.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.42.mlp.c_proj.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.42.mlp.c_proj.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.43.attn.c_attn.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.43.attn.c_attn.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.43.attn.c_proj.bias": "pytorch_model-00005-of-00005.bin",
"transformer.h.43.attn.c_proj.weight": "pytorch_model-00005-of-00005.bin",
"transformer.h.43.ln_1.bias": "pytorch_model-00004-of-00005.bin",
"transformer.h.43.ln_1.weight": "pytorch_model-00004-of-00005.bin",
"transformer.h.43.ln_2.bias": "pytorch_model-00005-of-00005.bin",
"transformer.h.43.ln_2.weight": "pytorch_model-00005-of-00005.bin",
"transformer.h.43.mlp.c_fc.bias": "pytorch_model-00005-of-00005.bin",
"transformer.h.43.mlp.c_fc.weight": "pytorch_model-00005-of-00005.bin",
"transformer.h.43.mlp.c_proj.bias": "pytorch_model-00005-of-00005.bin",
"transformer.h.43.mlp.c_proj.weight": "pytorch_model-00005-of-00005.bin",
"transformer.h.5.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.5.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.5.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.5.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.5.ln_1.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.5.ln_1.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.5.ln_2.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.5.ln_2.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.5.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.5.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.5.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.5.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.6.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.6.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.6.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.6.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.6.ln_1.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.6.ln_1.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.6.ln_2.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.6.ln_2.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.6.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.6.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.6.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.6.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.7.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.7.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.7.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.7.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.7.ln_1.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.7.ln_1.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.7.ln_2.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.7.ln_2.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.7.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.7.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.7.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.7.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.8.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.8.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.8.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.8.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.8.ln_1.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.8.ln_1.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.8.ln_2.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.8.ln_2.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.8.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.8.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.8.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.8.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.9.attn.c_attn.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.9.attn.c_attn.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.9.attn.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.9.attn.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.9.ln_1.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.9.ln_1.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.9.ln_2.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.9.ln_2.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.9.mlp.c_fc.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.9.mlp.c_fc.weight": "pytorch_model-00001-of-00005.bin",
"transformer.h.9.mlp.c_proj.bias": "pytorch_model-00001-of-00005.bin",
"transformer.h.9.mlp.c_proj.weight": "pytorch_model-00001-of-00005.bin",
"transformer.ln_f.bias": "pytorch_model-00005-of-00005.bin",
"transformer.ln_f.weight": "pytorch_model-00005-of-00005.bin",
"transformer.wpe.weight": "pytorch_model-00001-of-00005.bin",
"transformer.wte.weight": "pytorch_model-00001-of-00005.bin"
}
}

7
special_tokens_map.json Normal file
View File

@@ -0,0 +1,7 @@
{
"bos_token": "<|endoftext|>",
"eos_token": "<|endoftext|>",
"pad_token": "[PAD]",
"sep_token": "<s>",
"unk_token": "<|endoftext|>"
}

100323
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

34
tokenizer_config.json Normal file
View File

@@ -0,0 +1,34 @@
{
"add_bos_token": false,
"add_prefix_space": false,
"bos_token": {
"__type": "AddedToken",
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"clean_up_tokenization_spaces": true,
"eos_token": {
"__type": "AddedToken",
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"errors": "replace",
"model_max_length": 2048,
"pad_token": null,
"padding_side": "right",
"tokenizer_class": "GPT2Tokenizer",
"unk_token": {
"__type": "AddedToken",
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
}
}

1
vocab.json Normal file

File diff suppressed because one or more lines are too long