初始化项目,由ModelHub XC社区提供模型

Model: TehVenom/GPT-J-Pyg_PPO-6B-Dev-V8p4
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-06-08 08:44:17 +08:00
commit 59c5f3ad87
17 changed files with 152331 additions and 0 deletions

34
.gitattributes vendored Normal file
View File

@@ -0,0 +1,34 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text

78
README.md Normal file
View File

@@ -0,0 +1,78 @@
---
license: bigscience-openrail-m
language:
- en
---
GPT-J-Pyg_PPO-6B [GPT-J Pygmalion Dev V8p4 + GPT-J PPO_HH]
GPT-J-Pyg_PPO-6B is an experimental model containing a parameter-wise 40/60 blend (weighted average PPO_HH:Pygmalion) of the weights of ppo_hh_gpt-j and Pygmalion-6b Dev V8p4.
-Intended Merge Value-
As with fine-tuning, merging weights does not add information but transforms it, therefore it is important to consider trade-offs.
Pyg_PPO combines ppo_hh_gpt-j and Pygmalion-6b; both technical
achievements are blended with the intent to elevate the strengths of
both. Datasets of both are linked below to assist in exploratory speculation on which datasets in what quantity and configuration have
the largest impact on the usefulness of a model without the expense of
fine-tuning. Blend was done in FP32 and output in FP16.
-Intended Use-
Research purposes only, intended for responsible use.
Express a conversation in natural language, and Pyg_PPO will do the thing.
Try starting a two line prompt such as:
```
Bot: "Hello, how are you?"
You: "I am doing just fine, thank you."
```
Or any other
topic, and the model will carry on in this back and forth format.
Can also be used as a base to merge with other creative,
technical, or adventure themed models of the same class
(GPT-J & 6b NeoX) and parameter size (6b) to experiment with
the morphology of model weights based on the value added
by instruct.
Merge tested using KoboldAI with Nucleus Sampling Top-P set to 0.9, Temperature at 0.6, and Repetition Penalty at 1.1; extra samplers
disabled.
-Credits To-
Core Model:
https://huggingface.co/EleutherAI/gpt-j-6B
Author:
https://www.eleuther.ai/
Model1; 50% ppo_hh_gpt-j:
https://huggingface.co/reciprocate/ppo_hh_gpt-j
Author Repo:
https://huggingface.co/reciprocate
Related; CarperAI:
https://huggingface.co/CarperAI
Dataset is a variant of the Helpful Harmless assistant themed
dataset and Proximal Policy Optimization, specific datasets
used are unknown; listed repo datasets include:
https://huggingface.co/datasets/reciprocate/summarize_eval_ilql
https://huggingface.co/datasets/reciprocate/hh_eval_ilql
PPO explained:
https://paperswithcode.com/method/ppo
Potential HH-type datasets utilized:
https://huggingface.co/HuggingFaceH4
https://huggingface.co/datasets/Anthropic/hh-rlhf
Model2; 50% Pygmalion-6b:
https://huggingface.co/PygmalionAI/pygmalion-6b
Author Repo:
https://huggingface.co/PygmalionAI
Weight merge Script credit to Concedo:
https://huggingface.co/concedo
Model's card template credit to Digitous:
https://huggingface.co/digitous/GPT-R

145
added_tokens.json Normal file
View File

@@ -0,0 +1,145 @@
{
"<|extratoken_100|>": 50356,
"<|extratoken_101|>": 50357,
"<|extratoken_102|>": 50358,
"<|extratoken_103|>": 50359,
"<|extratoken_104|>": 50360,
"<|extratoken_105|>": 50361,
"<|extratoken_106|>": 50362,
"<|extratoken_107|>": 50363,
"<|extratoken_108|>": 50364,
"<|extratoken_109|>": 50365,
"<|extratoken_10|>": 50266,
"<|extratoken_110|>": 50366,
"<|extratoken_111|>": 50367,
"<|extratoken_112|>": 50368,
"<|extratoken_113|>": 50369,
"<|extratoken_114|>": 50370,
"<|extratoken_115|>": 50371,
"<|extratoken_116|>": 50372,
"<|extratoken_117|>": 50373,
"<|extratoken_118|>": 50374,
"<|extratoken_119|>": 50375,
"<|extratoken_11|>": 50267,
"<|extratoken_120|>": 50376,
"<|extratoken_121|>": 50377,
"<|extratoken_122|>": 50378,
"<|extratoken_123|>": 50379,
"<|extratoken_124|>": 50380,
"<|extratoken_125|>": 50381,
"<|extratoken_126|>": 50382,
"<|extratoken_127|>": 50383,
"<|extratoken_128|>": 50384,
"<|extratoken_129|>": 50385,
"<|extratoken_12|>": 50268,
"<|extratoken_130|>": 50386,
"<|extratoken_131|>": 50387,
"<|extratoken_132|>": 50388,
"<|extratoken_133|>": 50389,
"<|extratoken_134|>": 50390,
"<|extratoken_135|>": 50391,
"<|extratoken_136|>": 50392,
"<|extratoken_137|>": 50393,
"<|extratoken_138|>": 50394,
"<|extratoken_139|>": 50395,
"<|extratoken_13|>": 50269,
"<|extratoken_140|>": 50396,
"<|extratoken_141|>": 50397,
"<|extratoken_142|>": 50398,
"<|extratoken_143|>": 50399,
"<|extratoken_14|>": 50270,
"<|extratoken_15|>": 50271,
"<|extratoken_16|>": 50272,
"<|extratoken_17|>": 50273,
"<|extratoken_18|>": 50274,
"<|extratoken_19|>": 50275,
"<|extratoken_1|>": 50257,
"<|extratoken_20|>": 50276,
"<|extratoken_21|>": 50277,
"<|extratoken_22|>": 50278,
"<|extratoken_23|>": 50279,
"<|extratoken_24|>": 50280,
"<|extratoken_25|>": 50281,
"<|extratoken_26|>": 50282,
"<|extratoken_27|>": 50283,
"<|extratoken_28|>": 50284,
"<|extratoken_29|>": 50285,
"<|extratoken_2|>": 50258,
"<|extratoken_30|>": 50286,
"<|extratoken_31|>": 50287,
"<|extratoken_32|>": 50288,
"<|extratoken_33|>": 50289,
"<|extratoken_34|>": 50290,
"<|extratoken_35|>": 50291,
"<|extratoken_36|>": 50292,
"<|extratoken_37|>": 50293,
"<|extratoken_38|>": 50294,
"<|extratoken_39|>": 50295,
"<|extratoken_3|>": 50259,
"<|extratoken_40|>": 50296,
"<|extratoken_41|>": 50297,
"<|extratoken_42|>": 50298,
"<|extratoken_43|>": 50299,
"<|extratoken_44|>": 50300,
"<|extratoken_45|>": 50301,
"<|extratoken_46|>": 50302,
"<|extratoken_47|>": 50303,
"<|extratoken_48|>": 50304,
"<|extratoken_49|>": 50305,
"<|extratoken_4|>": 50260,
"<|extratoken_50|>": 50306,
"<|extratoken_51|>": 50307,
"<|extratoken_52|>": 50308,
"<|extratoken_53|>": 50309,
"<|extratoken_54|>": 50310,
"<|extratoken_55|>": 50311,
"<|extratoken_56|>": 50312,
"<|extratoken_57|>": 50313,
"<|extratoken_58|>": 50314,
"<|extratoken_59|>": 50315,
"<|extratoken_5|>": 50261,
"<|extratoken_60|>": 50316,
"<|extratoken_61|>": 50317,
"<|extratoken_62|>": 50318,
"<|extratoken_63|>": 50319,
"<|extratoken_64|>": 50320,
"<|extratoken_65|>": 50321,
"<|extratoken_66|>": 50322,
"<|extratoken_67|>": 50323,
"<|extratoken_68|>": 50324,
"<|extratoken_69|>": 50325,
"<|extratoken_6|>": 50262,
"<|extratoken_70|>": 50326,
"<|extratoken_71|>": 50327,
"<|extratoken_72|>": 50328,
"<|extratoken_73|>": 50329,
"<|extratoken_74|>": 50330,
"<|extratoken_75|>": 50331,
"<|extratoken_76|>": 50332,
"<|extratoken_77|>": 50333,
"<|extratoken_78|>": 50334,
"<|extratoken_79|>": 50335,
"<|extratoken_7|>": 50263,
"<|extratoken_80|>": 50336,
"<|extratoken_81|>": 50337,
"<|extratoken_82|>": 50338,
"<|extratoken_83|>": 50339,
"<|extratoken_84|>": 50340,
"<|extratoken_85|>": 50341,
"<|extratoken_86|>": 50342,
"<|extratoken_87|>": 50343,
"<|extratoken_88|>": 50344,
"<|extratoken_89|>": 50345,
"<|extratoken_8|>": 50264,
"<|extratoken_90|>": 50346,
"<|extratoken_91|>": 50347,
"<|extratoken_92|>": 50348,
"<|extratoken_93|>": 50349,
"<|extratoken_94|>": 50350,
"<|extratoken_95|>": 50351,
"<|extratoken_96|>": 50352,
"<|extratoken_97|>": 50353,
"<|extratoken_98|>": 50354,
"<|extratoken_99|>": 50355,
"<|extratoken_9|>": 50265
}

53
config.json Normal file
View File

@@ -0,0 +1,53 @@
{
"_name_or_path": "TehVenom_GPT-J-Pyg_PPO-6B-Dev-V8p4",
"activation_function": "gelu_new",
"architectures": [
"GPTJForCausalLM"
],
"attn_pdrop": 0.0,
"badwordsids": [
[
10619,
62,
19238,
62,
35,
12576,
7730
]
],
"bos_token_id": 50256,
"embd_pdrop": 0.0,
"eos_token_id": 50256,
"gradient_checkpointing": false,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gptj",
"n_embd": 4096,
"n_head": 16,
"n_inner": null,
"n_layer": 28,
"n_positions": 2048,
"resid_pdrop": 0.0,
"rotary_dim": 64,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"task_specific_params": {
"text-generation": {
"do_sample": true,
"max_length": 50,
"temperature": 0.5
}
},
"tie_word_embeddings": false,
"tokenizer_class": "GPT2Tokenizer",
"torch_dtype": "float16",
"transformers_version": "4.28.0.dev0",
"use_cache": true,
"vocab_size": 50400,
"welcome": "You are currently running ((ppo_hh-GPT-J[40%] + Pygmalion-6b V8p4_Dev [60%]), \na mix of the models reciprocate/ppo_hh_gpt-j, and PygmalionAI/pygmalion-6b V8p4_Dev at a ratio of 40:60"
}

6
generation_config.json Normal file
View File

@@ -0,0 +1,6 @@
{
"_from_model_config": true,
"bos_token_id": 50256,
"eos_token_id": 50256,
"transformers_version": "4.28.0.dev0"
}

50001
merges.txt Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b7a1fe694e81e5f4a9bb4a364f98eee896154531aa25356839624cb50f3e5989
size 2111833645

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d1f0ffed483ce0dda480b25892d648d4cee66efb007ab3e6263140d81af17ad0
size 2101653291

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:731634f2cf65ae70df65ea5a6106f7d20f216ee0c18fc8635a5a91bcd422baf4
size 2034543815

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ab1713e800e16c2930554fa695c174d80e988bb5c097bbe11287ab54efe8a56c
size 2034543815

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:21834b813f84e2d8a2d666008f15d8a4cd86970d72803b9e5303fac38dba7e14
size 2034543815

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4409bafa25b50a4a4f2dae51159a39dd99022114fe7f3465d56481812045c204
size 1902199207

View File

@@ -0,0 +1,348 @@
{
"metadata": {
"total_size": 12116445688.0
},
"weight_map": {
"lm_head.bias": "pytorch_model-00006-of-00006.bin",
"lm_head.weight": "pytorch_model-00006-of-00006.bin",
"transformer.h.0.attn.bias": "pytorch_model-00001-of-00006.bin",
"transformer.h.0.attn.k_proj.weight": "pytorch_model-00001-of-00006.bin",
"transformer.h.0.attn.masked_bias": "pytorch_model-00001-of-00006.bin",
"transformer.h.0.attn.out_proj.weight": "pytorch_model-00001-of-00006.bin",
"transformer.h.0.attn.q_proj.weight": "pytorch_model-00001-of-00006.bin",
"transformer.h.0.attn.v_proj.weight": "pytorch_model-00001-of-00006.bin",
"transformer.h.0.ln_1.bias": "pytorch_model-00001-of-00006.bin",
"transformer.h.0.ln_1.weight": "pytorch_model-00001-of-00006.bin",
"transformer.h.0.mlp.fc_in.bias": "pytorch_model-00001-of-00006.bin",
"transformer.h.0.mlp.fc_in.weight": "pytorch_model-00001-of-00006.bin",
"transformer.h.0.mlp.fc_out.bias": "pytorch_model-00001-of-00006.bin",
"transformer.h.0.mlp.fc_out.weight": "pytorch_model-00001-of-00006.bin",
"transformer.h.1.attn.bias": "pytorch_model-00001-of-00006.bin",
"transformer.h.1.attn.k_proj.weight": "pytorch_model-00001-of-00006.bin",
"transformer.h.1.attn.masked_bias": "pytorch_model-00001-of-00006.bin",
"transformer.h.1.attn.out_proj.weight": "pytorch_model-00001-of-00006.bin",
"transformer.h.1.attn.q_proj.weight": "pytorch_model-00001-of-00006.bin",
"transformer.h.1.attn.v_proj.weight": "pytorch_model-00001-of-00006.bin",
"transformer.h.1.ln_1.bias": "pytorch_model-00001-of-00006.bin",
"transformer.h.1.ln_1.weight": "pytorch_model-00001-of-00006.bin",
"transformer.h.1.mlp.fc_in.bias": "pytorch_model-00001-of-00006.bin",
"transformer.h.1.mlp.fc_in.weight": "pytorch_model-00001-of-00006.bin",
"transformer.h.1.mlp.fc_out.bias": "pytorch_model-00001-of-00006.bin",
"transformer.h.1.mlp.fc_out.weight": "pytorch_model-00001-of-00006.bin",
"transformer.h.10.attn.bias": "pytorch_model-00003-of-00006.bin",
"transformer.h.10.attn.k_proj.weight": "pytorch_model-00003-of-00006.bin",
"transformer.h.10.attn.masked_bias": "pytorch_model-00003-of-00006.bin",
"transformer.h.10.attn.out_proj.weight": "pytorch_model-00003-of-00006.bin",
"transformer.h.10.attn.q_proj.weight": "pytorch_model-00003-of-00006.bin",
"transformer.h.10.attn.v_proj.weight": "pytorch_model-00003-of-00006.bin",
"transformer.h.10.ln_1.bias": "pytorch_model-00003-of-00006.bin",
"transformer.h.10.ln_1.weight": "pytorch_model-00003-of-00006.bin",
"transformer.h.10.mlp.fc_in.bias": "pytorch_model-00003-of-00006.bin",
"transformer.h.10.mlp.fc_in.weight": "pytorch_model-00003-of-00006.bin",
"transformer.h.10.mlp.fc_out.bias": "pytorch_model-00003-of-00006.bin",
"transformer.h.10.mlp.fc_out.weight": "pytorch_model-00003-of-00006.bin",
"transformer.h.11.attn.bias": "pytorch_model-00003-of-00006.bin",
"transformer.h.11.attn.k_proj.weight": "pytorch_model-00003-of-00006.bin",
"transformer.h.11.attn.masked_bias": "pytorch_model-00003-of-00006.bin",
"transformer.h.11.attn.out_proj.weight": "pytorch_model-00003-of-00006.bin",
"transformer.h.11.attn.q_proj.weight": "pytorch_model-00003-of-00006.bin",
"transformer.h.11.attn.v_proj.weight": "pytorch_model-00003-of-00006.bin",
"transformer.h.11.ln_1.bias": "pytorch_model-00003-of-00006.bin",
"transformer.h.11.ln_1.weight": "pytorch_model-00003-of-00006.bin",
"transformer.h.11.mlp.fc_in.bias": "pytorch_model-00003-of-00006.bin",
"transformer.h.11.mlp.fc_in.weight": "pytorch_model-00003-of-00006.bin",
"transformer.h.11.mlp.fc_out.bias": "pytorch_model-00003-of-00006.bin",
"transformer.h.11.mlp.fc_out.weight": "pytorch_model-00003-of-00006.bin",
"transformer.h.12.attn.bias": "pytorch_model-00003-of-00006.bin",
"transformer.h.12.attn.k_proj.weight": "pytorch_model-00003-of-00006.bin",
"transformer.h.12.attn.masked_bias": "pytorch_model-00003-of-00006.bin",
"transformer.h.12.attn.out_proj.weight": "pytorch_model-00003-of-00006.bin",
"transformer.h.12.attn.q_proj.weight": "pytorch_model-00003-of-00006.bin",
"transformer.h.12.attn.v_proj.weight": "pytorch_model-00003-of-00006.bin",
"transformer.h.12.ln_1.bias": "pytorch_model-00003-of-00006.bin",
"transformer.h.12.ln_1.weight": "pytorch_model-00003-of-00006.bin",
"transformer.h.12.mlp.fc_in.bias": "pytorch_model-00003-of-00006.bin",
"transformer.h.12.mlp.fc_in.weight": "pytorch_model-00003-of-00006.bin",
"transformer.h.12.mlp.fc_out.bias": "pytorch_model-00003-of-00006.bin",
"transformer.h.12.mlp.fc_out.weight": "pytorch_model-00003-of-00006.bin",
"transformer.h.13.attn.bias": "pytorch_model-00003-of-00006.bin",
"transformer.h.13.attn.k_proj.weight": "pytorch_model-00003-of-00006.bin",
"transformer.h.13.attn.masked_bias": "pytorch_model-00003-of-00006.bin",
"transformer.h.13.attn.out_proj.weight": "pytorch_model-00003-of-00006.bin",
"transformer.h.13.attn.q_proj.weight": "pytorch_model-00003-of-00006.bin",
"transformer.h.13.attn.v_proj.weight": "pytorch_model-00003-of-00006.bin",
"transformer.h.13.ln_1.bias": "pytorch_model-00003-of-00006.bin",
"transformer.h.13.ln_1.weight": "pytorch_model-00003-of-00006.bin",
"transformer.h.13.mlp.fc_in.bias": "pytorch_model-00003-of-00006.bin",
"transformer.h.13.mlp.fc_in.weight": "pytorch_model-00003-of-00006.bin",
"transformer.h.13.mlp.fc_out.bias": "pytorch_model-00003-of-00006.bin",
"transformer.h.13.mlp.fc_out.weight": "pytorch_model-00003-of-00006.bin",
"transformer.h.14.attn.bias": "pytorch_model-00003-of-00006.bin",
"transformer.h.14.attn.k_proj.weight": "pytorch_model-00003-of-00006.bin",
"transformer.h.14.attn.masked_bias": "pytorch_model-00003-of-00006.bin",
"transformer.h.14.attn.out_proj.weight": "pytorch_model-00003-of-00006.bin",
"transformer.h.14.attn.q_proj.weight": "pytorch_model-00003-of-00006.bin",
"transformer.h.14.attn.v_proj.weight": "pytorch_model-00003-of-00006.bin",
"transformer.h.14.ln_1.bias": "pytorch_model-00003-of-00006.bin",
"transformer.h.14.ln_1.weight": "pytorch_model-00003-of-00006.bin",
"transformer.h.14.mlp.fc_in.bias": "pytorch_model-00004-of-00006.bin",
"transformer.h.14.mlp.fc_in.weight": "pytorch_model-00004-of-00006.bin",
"transformer.h.14.mlp.fc_out.bias": "pytorch_model-00004-of-00006.bin",
"transformer.h.14.mlp.fc_out.weight": "pytorch_model-00004-of-00006.bin",
"transformer.h.15.attn.bias": "pytorch_model-00004-of-00006.bin",
"transformer.h.15.attn.k_proj.weight": "pytorch_model-00004-of-00006.bin",
"transformer.h.15.attn.masked_bias": "pytorch_model-00004-of-00006.bin",
"transformer.h.15.attn.out_proj.weight": "pytorch_model-00004-of-00006.bin",
"transformer.h.15.attn.q_proj.weight": "pytorch_model-00004-of-00006.bin",
"transformer.h.15.attn.v_proj.weight": "pytorch_model-00004-of-00006.bin",
"transformer.h.15.ln_1.bias": "pytorch_model-00004-of-00006.bin",
"transformer.h.15.ln_1.weight": "pytorch_model-00004-of-00006.bin",
"transformer.h.15.mlp.fc_in.bias": "pytorch_model-00004-of-00006.bin",
"transformer.h.15.mlp.fc_in.weight": "pytorch_model-00004-of-00006.bin",
"transformer.h.15.mlp.fc_out.bias": "pytorch_model-00004-of-00006.bin",
"transformer.h.15.mlp.fc_out.weight": "pytorch_model-00004-of-00006.bin",
"transformer.h.16.attn.bias": "pytorch_model-00004-of-00006.bin",
"transformer.h.16.attn.k_proj.weight": "pytorch_model-00004-of-00006.bin",
"transformer.h.16.attn.masked_bias": "pytorch_model-00004-of-00006.bin",
"transformer.h.16.attn.out_proj.weight": "pytorch_model-00004-of-00006.bin",
"transformer.h.16.attn.q_proj.weight": "pytorch_model-00004-of-00006.bin",
"transformer.h.16.attn.v_proj.weight": "pytorch_model-00004-of-00006.bin",
"transformer.h.16.ln_1.bias": "pytorch_model-00004-of-00006.bin",
"transformer.h.16.ln_1.weight": "pytorch_model-00004-of-00006.bin",
"transformer.h.16.mlp.fc_in.bias": "pytorch_model-00004-of-00006.bin",
"transformer.h.16.mlp.fc_in.weight": "pytorch_model-00004-of-00006.bin",
"transformer.h.16.mlp.fc_out.bias": "pytorch_model-00004-of-00006.bin",
"transformer.h.16.mlp.fc_out.weight": "pytorch_model-00004-of-00006.bin",
"transformer.h.17.attn.bias": "pytorch_model-00004-of-00006.bin",
"transformer.h.17.attn.k_proj.weight": "pytorch_model-00004-of-00006.bin",
"transformer.h.17.attn.masked_bias": "pytorch_model-00004-of-00006.bin",
"transformer.h.17.attn.out_proj.weight": "pytorch_model-00004-of-00006.bin",
"transformer.h.17.attn.q_proj.weight": "pytorch_model-00004-of-00006.bin",
"transformer.h.17.attn.v_proj.weight": "pytorch_model-00004-of-00006.bin",
"transformer.h.17.ln_1.bias": "pytorch_model-00004-of-00006.bin",
"transformer.h.17.ln_1.weight": "pytorch_model-00004-of-00006.bin",
"transformer.h.17.mlp.fc_in.bias": "pytorch_model-00004-of-00006.bin",
"transformer.h.17.mlp.fc_in.weight": "pytorch_model-00004-of-00006.bin",
"transformer.h.17.mlp.fc_out.bias": "pytorch_model-00004-of-00006.bin",
"transformer.h.17.mlp.fc_out.weight": "pytorch_model-00004-of-00006.bin",
"transformer.h.18.attn.bias": "pytorch_model-00004-of-00006.bin",
"transformer.h.18.attn.k_proj.weight": "pytorch_model-00004-of-00006.bin",
"transformer.h.18.attn.masked_bias": "pytorch_model-00004-of-00006.bin",
"transformer.h.18.attn.out_proj.weight": "pytorch_model-00004-of-00006.bin",
"transformer.h.18.attn.q_proj.weight": "pytorch_model-00004-of-00006.bin",
"transformer.h.18.attn.v_proj.weight": "pytorch_model-00004-of-00006.bin",
"transformer.h.18.ln_1.bias": "pytorch_model-00004-of-00006.bin",
"transformer.h.18.ln_1.weight": "pytorch_model-00004-of-00006.bin",
"transformer.h.18.mlp.fc_in.bias": "pytorch_model-00004-of-00006.bin",
"transformer.h.18.mlp.fc_in.weight": "pytorch_model-00004-of-00006.bin",
"transformer.h.18.mlp.fc_out.bias": "pytorch_model-00004-of-00006.bin",
"transformer.h.18.mlp.fc_out.weight": "pytorch_model-00004-of-00006.bin",
"transformer.h.19.attn.bias": "pytorch_model-00004-of-00006.bin",
"transformer.h.19.attn.k_proj.weight": "pytorch_model-00004-of-00006.bin",
"transformer.h.19.attn.masked_bias": "pytorch_model-00004-of-00006.bin",
"transformer.h.19.attn.out_proj.weight": "pytorch_model-00004-of-00006.bin",
"transformer.h.19.attn.q_proj.weight": "pytorch_model-00004-of-00006.bin",
"transformer.h.19.attn.v_proj.weight": "pytorch_model-00004-of-00006.bin",
"transformer.h.19.ln_1.bias": "pytorch_model-00004-of-00006.bin",
"transformer.h.19.ln_1.weight": "pytorch_model-00004-of-00006.bin",
"transformer.h.19.mlp.fc_in.bias": "pytorch_model-00005-of-00006.bin",
"transformer.h.19.mlp.fc_in.weight": "pytorch_model-00005-of-00006.bin",
"transformer.h.19.mlp.fc_out.bias": "pytorch_model-00005-of-00006.bin",
"transformer.h.19.mlp.fc_out.weight": "pytorch_model-00005-of-00006.bin",
"transformer.h.2.attn.bias": "pytorch_model-00001-of-00006.bin",
"transformer.h.2.attn.k_proj.weight": "pytorch_model-00001-of-00006.bin",
"transformer.h.2.attn.masked_bias": "pytorch_model-00001-of-00006.bin",
"transformer.h.2.attn.out_proj.weight": "pytorch_model-00001-of-00006.bin",
"transformer.h.2.attn.q_proj.weight": "pytorch_model-00001-of-00006.bin",
"transformer.h.2.attn.v_proj.weight": "pytorch_model-00001-of-00006.bin",
"transformer.h.2.ln_1.bias": "pytorch_model-00001-of-00006.bin",
"transformer.h.2.ln_1.weight": "pytorch_model-00001-of-00006.bin",
"transformer.h.2.mlp.fc_in.bias": "pytorch_model-00001-of-00006.bin",
"transformer.h.2.mlp.fc_in.weight": "pytorch_model-00001-of-00006.bin",
"transformer.h.2.mlp.fc_out.bias": "pytorch_model-00001-of-00006.bin",
"transformer.h.2.mlp.fc_out.weight": "pytorch_model-00001-of-00006.bin",
"transformer.h.20.attn.bias": "pytorch_model-00005-of-00006.bin",
"transformer.h.20.attn.k_proj.weight": "pytorch_model-00005-of-00006.bin",
"transformer.h.20.attn.masked_bias": "pytorch_model-00005-of-00006.bin",
"transformer.h.20.attn.out_proj.weight": "pytorch_model-00005-of-00006.bin",
"transformer.h.20.attn.q_proj.weight": "pytorch_model-00005-of-00006.bin",
"transformer.h.20.attn.v_proj.weight": "pytorch_model-00005-of-00006.bin",
"transformer.h.20.ln_1.bias": "pytorch_model-00005-of-00006.bin",
"transformer.h.20.ln_1.weight": "pytorch_model-00005-of-00006.bin",
"transformer.h.20.mlp.fc_in.bias": "pytorch_model-00005-of-00006.bin",
"transformer.h.20.mlp.fc_in.weight": "pytorch_model-00005-of-00006.bin",
"transformer.h.20.mlp.fc_out.bias": "pytorch_model-00005-of-00006.bin",
"transformer.h.20.mlp.fc_out.weight": "pytorch_model-00005-of-00006.bin",
"transformer.h.21.attn.bias": "pytorch_model-00005-of-00006.bin",
"transformer.h.21.attn.k_proj.weight": "pytorch_model-00005-of-00006.bin",
"transformer.h.21.attn.masked_bias": "pytorch_model-00005-of-00006.bin",
"transformer.h.21.attn.out_proj.weight": "pytorch_model-00005-of-00006.bin",
"transformer.h.21.attn.q_proj.weight": "pytorch_model-00005-of-00006.bin",
"transformer.h.21.attn.v_proj.weight": "pytorch_model-00005-of-00006.bin",
"transformer.h.21.ln_1.bias": "pytorch_model-00005-of-00006.bin",
"transformer.h.21.ln_1.weight": "pytorch_model-00005-of-00006.bin",
"transformer.h.21.mlp.fc_in.bias": "pytorch_model-00005-of-00006.bin",
"transformer.h.21.mlp.fc_in.weight": "pytorch_model-00005-of-00006.bin",
"transformer.h.21.mlp.fc_out.bias": "pytorch_model-00005-of-00006.bin",
"transformer.h.21.mlp.fc_out.weight": "pytorch_model-00005-of-00006.bin",
"transformer.h.22.attn.bias": "pytorch_model-00005-of-00006.bin",
"transformer.h.22.attn.k_proj.weight": "pytorch_model-00005-of-00006.bin",
"transformer.h.22.attn.masked_bias": "pytorch_model-00005-of-00006.bin",
"transformer.h.22.attn.out_proj.weight": "pytorch_model-00005-of-00006.bin",
"transformer.h.22.attn.q_proj.weight": "pytorch_model-00005-of-00006.bin",
"transformer.h.22.attn.v_proj.weight": "pytorch_model-00005-of-00006.bin",
"transformer.h.22.ln_1.bias": "pytorch_model-00005-of-00006.bin",
"transformer.h.22.ln_1.weight": "pytorch_model-00005-of-00006.bin",
"transformer.h.22.mlp.fc_in.bias": "pytorch_model-00005-of-00006.bin",
"transformer.h.22.mlp.fc_in.weight": "pytorch_model-00005-of-00006.bin",
"transformer.h.22.mlp.fc_out.bias": "pytorch_model-00005-of-00006.bin",
"transformer.h.22.mlp.fc_out.weight": "pytorch_model-00005-of-00006.bin",
"transformer.h.23.attn.bias": "pytorch_model-00005-of-00006.bin",
"transformer.h.23.attn.k_proj.weight": "pytorch_model-00005-of-00006.bin",
"transformer.h.23.attn.masked_bias": "pytorch_model-00005-of-00006.bin",
"transformer.h.23.attn.out_proj.weight": "pytorch_model-00005-of-00006.bin",
"transformer.h.23.attn.q_proj.weight": "pytorch_model-00005-of-00006.bin",
"transformer.h.23.attn.v_proj.weight": "pytorch_model-00005-of-00006.bin",
"transformer.h.23.ln_1.bias": "pytorch_model-00005-of-00006.bin",
"transformer.h.23.ln_1.weight": "pytorch_model-00005-of-00006.bin",
"transformer.h.23.mlp.fc_in.bias": "pytorch_model-00005-of-00006.bin",
"transformer.h.23.mlp.fc_in.weight": "pytorch_model-00005-of-00006.bin",
"transformer.h.23.mlp.fc_out.bias": "pytorch_model-00005-of-00006.bin",
"transformer.h.23.mlp.fc_out.weight": "pytorch_model-00005-of-00006.bin",
"transformer.h.24.attn.bias": "pytorch_model-00005-of-00006.bin",
"transformer.h.24.attn.k_proj.weight": "pytorch_model-00005-of-00006.bin",
"transformer.h.24.attn.masked_bias": "pytorch_model-00005-of-00006.bin",
"transformer.h.24.attn.out_proj.weight": "pytorch_model-00005-of-00006.bin",
"transformer.h.24.attn.q_proj.weight": "pytorch_model-00005-of-00006.bin",
"transformer.h.24.attn.v_proj.weight": "pytorch_model-00005-of-00006.bin",
"transformer.h.24.ln_1.bias": "pytorch_model-00005-of-00006.bin",
"transformer.h.24.ln_1.weight": "pytorch_model-00005-of-00006.bin",
"transformer.h.24.mlp.fc_in.bias": "pytorch_model-00006-of-00006.bin",
"transformer.h.24.mlp.fc_in.weight": "pytorch_model-00006-of-00006.bin",
"transformer.h.24.mlp.fc_out.bias": "pytorch_model-00006-of-00006.bin",
"transformer.h.24.mlp.fc_out.weight": "pytorch_model-00006-of-00006.bin",
"transformer.h.25.attn.bias": "pytorch_model-00006-of-00006.bin",
"transformer.h.25.attn.k_proj.weight": "pytorch_model-00006-of-00006.bin",
"transformer.h.25.attn.masked_bias": "pytorch_model-00006-of-00006.bin",
"transformer.h.25.attn.out_proj.weight": "pytorch_model-00006-of-00006.bin",
"transformer.h.25.attn.q_proj.weight": "pytorch_model-00006-of-00006.bin",
"transformer.h.25.attn.v_proj.weight": "pytorch_model-00006-of-00006.bin",
"transformer.h.25.ln_1.bias": "pytorch_model-00006-of-00006.bin",
"transformer.h.25.ln_1.weight": "pytorch_model-00006-of-00006.bin",
"transformer.h.25.mlp.fc_in.bias": "pytorch_model-00006-of-00006.bin",
"transformer.h.25.mlp.fc_in.weight": "pytorch_model-00006-of-00006.bin",
"transformer.h.25.mlp.fc_out.bias": "pytorch_model-00006-of-00006.bin",
"transformer.h.25.mlp.fc_out.weight": "pytorch_model-00006-of-00006.bin",
"transformer.h.26.attn.bias": "pytorch_model-00006-of-00006.bin",
"transformer.h.26.attn.k_proj.weight": "pytorch_model-00006-of-00006.bin",
"transformer.h.26.attn.masked_bias": "pytorch_model-00006-of-00006.bin",
"transformer.h.26.attn.out_proj.weight": "pytorch_model-00006-of-00006.bin",
"transformer.h.26.attn.q_proj.weight": "pytorch_model-00006-of-00006.bin",
"transformer.h.26.attn.v_proj.weight": "pytorch_model-00006-of-00006.bin",
"transformer.h.26.ln_1.bias": "pytorch_model-00006-of-00006.bin",
"transformer.h.26.ln_1.weight": "pytorch_model-00006-of-00006.bin",
"transformer.h.26.mlp.fc_in.bias": "pytorch_model-00006-of-00006.bin",
"transformer.h.26.mlp.fc_in.weight": "pytorch_model-00006-of-00006.bin",
"transformer.h.26.mlp.fc_out.bias": "pytorch_model-00006-of-00006.bin",
"transformer.h.26.mlp.fc_out.weight": "pytorch_model-00006-of-00006.bin",
"transformer.h.27.attn.bias": "pytorch_model-00006-of-00006.bin",
"transformer.h.27.attn.k_proj.weight": "pytorch_model-00006-of-00006.bin",
"transformer.h.27.attn.masked_bias": "pytorch_model-00006-of-00006.bin",
"transformer.h.27.attn.out_proj.weight": "pytorch_model-00006-of-00006.bin",
"transformer.h.27.attn.q_proj.weight": "pytorch_model-00006-of-00006.bin",
"transformer.h.27.attn.v_proj.weight": "pytorch_model-00006-of-00006.bin",
"transformer.h.27.ln_1.bias": "pytorch_model-00006-of-00006.bin",
"transformer.h.27.ln_1.weight": "pytorch_model-00006-of-00006.bin",
"transformer.h.27.mlp.fc_in.bias": "pytorch_model-00006-of-00006.bin",
"transformer.h.27.mlp.fc_in.weight": "pytorch_model-00006-of-00006.bin",
"transformer.h.27.mlp.fc_out.bias": "pytorch_model-00006-of-00006.bin",
"transformer.h.27.mlp.fc_out.weight": "pytorch_model-00006-of-00006.bin",
"transformer.h.3.attn.bias": "pytorch_model-00001-of-00006.bin",
"transformer.h.3.attn.k_proj.weight": "pytorch_model-00001-of-00006.bin",
"transformer.h.3.attn.masked_bias": "pytorch_model-00001-of-00006.bin",
"transformer.h.3.attn.out_proj.weight": "pytorch_model-00001-of-00006.bin",
"transformer.h.3.attn.q_proj.weight": "pytorch_model-00001-of-00006.bin",
"transformer.h.3.attn.v_proj.weight": "pytorch_model-00001-of-00006.bin",
"transformer.h.3.ln_1.bias": "pytorch_model-00001-of-00006.bin",
"transformer.h.3.ln_1.weight": "pytorch_model-00001-of-00006.bin",
"transformer.h.3.mlp.fc_in.bias": "pytorch_model-00001-of-00006.bin",
"transformer.h.3.mlp.fc_in.weight": "pytorch_model-00001-of-00006.bin",
"transformer.h.3.mlp.fc_out.bias": "pytorch_model-00001-of-00006.bin",
"transformer.h.3.mlp.fc_out.weight": "pytorch_model-00001-of-00006.bin",
"transformer.h.4.attn.bias": "pytorch_model-00001-of-00006.bin",
"transformer.h.4.attn.k_proj.weight": "pytorch_model-00001-of-00006.bin",
"transformer.h.4.attn.masked_bias": "pytorch_model-00001-of-00006.bin",
"transformer.h.4.attn.out_proj.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.4.attn.q_proj.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.4.attn.v_proj.weight": "pytorch_model-00001-of-00006.bin",
"transformer.h.4.ln_1.bias": "pytorch_model-00001-of-00006.bin",
"transformer.h.4.ln_1.weight": "pytorch_model-00001-of-00006.bin",
"transformer.h.4.mlp.fc_in.bias": "pytorch_model-00002-of-00006.bin",
"transformer.h.4.mlp.fc_in.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.4.mlp.fc_out.bias": "pytorch_model-00002-of-00006.bin",
"transformer.h.4.mlp.fc_out.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.5.attn.bias": "pytorch_model-00002-of-00006.bin",
"transformer.h.5.attn.k_proj.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.5.attn.masked_bias": "pytorch_model-00002-of-00006.bin",
"transformer.h.5.attn.out_proj.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.5.attn.q_proj.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.5.attn.v_proj.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.5.ln_1.bias": "pytorch_model-00002-of-00006.bin",
"transformer.h.5.ln_1.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.5.mlp.fc_in.bias": "pytorch_model-00002-of-00006.bin",
"transformer.h.5.mlp.fc_in.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.5.mlp.fc_out.bias": "pytorch_model-00002-of-00006.bin",
"transformer.h.5.mlp.fc_out.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.6.attn.bias": "pytorch_model-00002-of-00006.bin",
"transformer.h.6.attn.k_proj.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.6.attn.masked_bias": "pytorch_model-00002-of-00006.bin",
"transformer.h.6.attn.out_proj.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.6.attn.q_proj.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.6.attn.v_proj.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.6.ln_1.bias": "pytorch_model-00002-of-00006.bin",
"transformer.h.6.ln_1.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.6.mlp.fc_in.bias": "pytorch_model-00002-of-00006.bin",
"transformer.h.6.mlp.fc_in.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.6.mlp.fc_out.bias": "pytorch_model-00002-of-00006.bin",
"transformer.h.6.mlp.fc_out.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.7.attn.bias": "pytorch_model-00002-of-00006.bin",
"transformer.h.7.attn.k_proj.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.7.attn.masked_bias": "pytorch_model-00002-of-00006.bin",
"transformer.h.7.attn.out_proj.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.7.attn.q_proj.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.7.attn.v_proj.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.7.ln_1.bias": "pytorch_model-00002-of-00006.bin",
"transformer.h.7.ln_1.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.7.mlp.fc_in.bias": "pytorch_model-00002-of-00006.bin",
"transformer.h.7.mlp.fc_in.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.7.mlp.fc_out.bias": "pytorch_model-00002-of-00006.bin",
"transformer.h.7.mlp.fc_out.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.8.attn.bias": "pytorch_model-00002-of-00006.bin",
"transformer.h.8.attn.k_proj.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.8.attn.masked_bias": "pytorch_model-00002-of-00006.bin",
"transformer.h.8.attn.out_proj.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.8.attn.q_proj.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.8.attn.v_proj.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.8.ln_1.bias": "pytorch_model-00002-of-00006.bin",
"transformer.h.8.ln_1.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.8.mlp.fc_in.bias": "pytorch_model-00002-of-00006.bin",
"transformer.h.8.mlp.fc_in.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.8.mlp.fc_out.bias": "pytorch_model-00002-of-00006.bin",
"transformer.h.8.mlp.fc_out.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.9.attn.bias": "pytorch_model-00002-of-00006.bin",
"transformer.h.9.attn.k_proj.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.9.attn.masked_bias": "pytorch_model-00002-of-00006.bin",
"transformer.h.9.attn.out_proj.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.9.attn.q_proj.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.9.attn.v_proj.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.9.ln_1.bias": "pytorch_model-00002-of-00006.bin",
"transformer.h.9.ln_1.weight": "pytorch_model-00002-of-00006.bin",
"transformer.h.9.mlp.fc_in.bias": "pytorch_model-00003-of-00006.bin",
"transformer.h.9.mlp.fc_in.weight": "pytorch_model-00003-of-00006.bin",
"transformer.h.9.mlp.fc_out.bias": "pytorch_model-00003-of-00006.bin",
"transformer.h.9.mlp.fc_out.weight": "pytorch_model-00003-of-00006.bin",
"transformer.ln_f.bias": "pytorch_model-00006-of-00006.bin",
"transformer.ln_f.weight": "pytorch_model-00006-of-00006.bin",
"transformer.wte.weight": "pytorch_model-00001-of-00006.bin"
}
}

24
special_tokens_map.json Normal file
View File

@@ -0,0 +1,24 @@
{
"bos_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"pad_token": "<|endoftext|>",
"unk_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
}
}

101591
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

32
tokenizer_config.json Normal file
View File

@@ -0,0 +1,32 @@
{
"add_prefix_space": false,
"bos_token": {
"__type": "AddedToken",
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"eos_token": {
"__type": "AddedToken",
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"errors": "replace",
"model_max_length": 1024,
"name_or_path": "pygmalion-6b",
"special_tokens_map_file": null,
"tokenizer_class": "GPT2Tokenizer",
"unk_token": {
"__type": "AddedToken",
"content": "<|endoftext|>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
}
}

1
vocab.json Normal file

File diff suppressed because one or more lines are too long