初始化项目,由ModelHub XC社区提供模型
Model: NousResearch/Redmond-Hermes-Coder Source: Original Platform
This commit is contained in:
39
.gitattributes
vendored
Normal file
39
.gitattributes
vendored
Normal file
@@ -0,0 +1,39 @@
|
|||||||
|
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.model filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||||
|
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
pytorch_model-00001-of-00004.bin filter=lfs diff=lfs merge=lfs -text
|
||||||
|
pytorch_model-00002-of-00004.bin filter=lfs diff=lfs merge=lfs -text
|
||||||
|
pytorch_model-00003-of-00004.bin filter=lfs diff=lfs merge=lfs -text
|
||||||
|
pytorch_model-00004-of-00004.bin filter=lfs diff=lfs merge=lfs -text
|
||||||
109
README.md
Normal file
109
README.md
Normal file
@@ -0,0 +1,109 @@
|
|||||||
|
---
|
||||||
|
license: gpl
|
||||||
|
language:
|
||||||
|
- en
|
||||||
|
tags:
|
||||||
|
- starcoder
|
||||||
|
- wizardcoder
|
||||||
|
- code
|
||||||
|
- self-instruct
|
||||||
|
- distillation
|
||||||
|
---
|
||||||
|
|
||||||
|
# Model Card: Redmond-Hermes-Coder 15B
|
||||||
|
|
||||||
|
## Model Description
|
||||||
|
|
||||||
|
Redmond-Hermes-Coder 15B is a state-of-the-art language model fine-tuned on over 300,000 instructions. This model was fine-tuned by Nous Research, with Teknium and Karan4D leading the fine tuning process and dataset curation, Redmond AI sponsoring the compute, and several other contributors.
|
||||||
|
|
||||||
|
This model was trained with a WizardCoder base, which itself uses a StarCoder base model.
|
||||||
|
|
||||||
|
The model is truly great at code, but, it does come with a tradeoff though. While far better at code than the original Nous-Hermes built on Llama, it is worse than WizardCoder at pure code benchmarks, like HumanEval.
|
||||||
|
|
||||||
|
It comes in at 39% on HumanEval, with WizardCoder at 57%. This is a preliminary experiment, and we are exploring improvements now.
|
||||||
|
|
||||||
|
However, it does seem better at non-code than WizardCoder on a variety of things, including writing tasks.
|
||||||
|
|
||||||
|
## Model Training
|
||||||
|
|
||||||
|
The model was trained almost entirely on synthetic GPT-4 outputs. This includes data from diverse sources such as GPTeacher, the general, roleplay v1&2, code instruct datasets, Nous Instruct & PDACTL (unpublished), CodeAlpaca, Evol_Instruct Uncensored, GPT4-LLM, and Unnatural Instructions.
|
||||||
|
|
||||||
|
Additional data inputs came from Camel-AI's Biology/Physics/Chemistry and Math Datasets, Airoboros' (v1) GPT-4 Dataset, and more from CodeAlpaca. The total volume of data encompassed over 300,000 instructions.
|
||||||
|
|
||||||
|
## Collaborators
|
||||||
|
The model fine-tuning and the datasets were a collaboration of efforts and resources from members of Nous Research, includingTeknium, Karan4D, Huemin Art, and Redmond AI's generous compute grants.
|
||||||
|
|
||||||
|
Huge shoutout and acknowledgement is deserved for all the dataset creators who generously share their datasets openly.
|
||||||
|
|
||||||
|
Among the contributors of datasets, GPTeacher was made available by Teknium, Wizard LM by nlpxucan, and the Nous Research Instruct Dataset was provided by Karan4D and HueminArt.
|
||||||
|
The GPT4-LLM and Unnatural Instructions were provided by Microsoft, Airoboros dataset by jondurbin, Camel-AI datasets are from Camel-AI, and CodeAlpaca dataset by Sahil 2801.
|
||||||
|
If anyone was left out, please open a thread in the community tab.
|
||||||
|
|
||||||
|
## Prompt Format
|
||||||
|
|
||||||
|
The model follows the Alpaca prompt format:
|
||||||
|
```
|
||||||
|
### Instruction:
|
||||||
|
|
||||||
|
### Response:
|
||||||
|
```
|
||||||
|
|
||||||
|
or
|
||||||
|
|
||||||
|
```
|
||||||
|
### Instruction:
|
||||||
|
|
||||||
|
### Input:
|
||||||
|
|
||||||
|
### Response:
|
||||||
|
```
|
||||||
|
|
||||||
|
## Resources for Applied Use Cases:
|
||||||
|
For an example of a back and forth chatbot using huggingface transformers and discord, check out: https://github.com/teknium1/alpaca-discord
|
||||||
|
For an example of a roleplaying discord bot, check out this: https://github.com/teknium1/alpaca-roleplay-discordbot
|
||||||
|
|
||||||
|
## Future Plans
|
||||||
|
The model is currently being uploaded in FP16 format, and there are plans to convert the model to GGML and GPTQ 4bit quantizations. The team is also working on a full benchmark, similar to what was done for GPT4-x-Vicuna. We will try to get in discussions to get the model included in the GPT4All.
|
||||||
|
|
||||||
|
## Benchmark Results
|
||||||
|
```
|
||||||
|
HumanEval: 39%
|
||||||
|
| Task |Version| Metric |Value | |Stderr|
|
||||||
|
|------------------------------------------------|------:|---------------------|-----:|---|-----:|
|
||||||
|
|arc_challenge | 0|acc |0.2858|± |0.0132|
|
||||||
|
| | |acc_norm |0.3148|± |0.0136|
|
||||||
|
|arc_easy | 0|acc |0.5349|± |0.0102|
|
||||||
|
| | |acc_norm |0.5097|± |0.0103|
|
||||||
|
|bigbench_causal_judgement | 0|multiple_choice_grade|0.5158|± |0.0364|
|
||||||
|
|bigbench_date_understanding | 0|multiple_choice_grade|0.5230|± |0.0260|
|
||||||
|
|bigbench_disambiguation_qa | 0|multiple_choice_grade|0.3295|± |0.0293|
|
||||||
|
|bigbench_geometric_shapes | 0|multiple_choice_grade|0.1003|± |0.0159|
|
||||||
|
| | |exact_str_match |0.0000|± |0.0000|
|
||||||
|
|bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|0.2260|± |0.0187|
|
||||||
|
|bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|0.1957|± |0.0150|
|
||||||
|
|bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|0.3733|± |0.0280|
|
||||||
|
|bigbench_movie_recommendation | 0|multiple_choice_grade|0.3200|± |0.0209|
|
||||||
|
|bigbench_navigate | 0|multiple_choice_grade|0.4830|± |0.0158|
|
||||||
|
|bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|0.4150|± |0.0110|
|
||||||
|
|bigbench_ruin_names | 0|multiple_choice_grade|0.2143|± |0.0194|
|
||||||
|
|bigbench_salient_translation_error_detection | 0|multiple_choice_grade|0.2926|± |0.0144|
|
||||||
|
|bigbench_snarks | 0|multiple_choice_grade|0.5249|± |0.0372|
|
||||||
|
|bigbench_sports_understanding | 0|multiple_choice_grade|0.4817|± |0.0159|
|
||||||
|
|bigbench_temporal_sequences | 0|multiple_choice_grade|0.2700|± |0.0140|
|
||||||
|
|bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|0.1864|± |0.0110|
|
||||||
|
|bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|0.1349|± |0.0082|
|
||||||
|
|bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|0.3733|± |0.0280|
|
||||||
|
|boolq | 1|acc |0.5498|± |0.0087|
|
||||||
|
|hellaswag | 0|acc |0.3814|± |0.0048|
|
||||||
|
| | |acc_norm |0.4677|± |0.0050|
|
||||||
|
|openbookqa | 0|acc |0.1960|± |0.0178|
|
||||||
|
| | |acc_norm |0.3100|± |0.0207|
|
||||||
|
|piqa | 0|acc |0.6600|± |0.0111|
|
||||||
|
| | |acc_norm |0.6610|± |0.0110|
|
||||||
|
|winogrande | 0|acc |0.5343|± |0.0140|
|
||||||
|
```
|
||||||
|
|
||||||
|
## Model Usage
|
||||||
|
The model is available for download on Hugging Face. It is suitable for a wide range of language tasks, from generating creative text to understanding and following complex instructions.
|
||||||
|
|
||||||
|
Compute provided by our project sponsor Redmond AI, thank you!!
|
||||||
3
added_tokens.json
Normal file
3
added_tokens.json
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
{
|
||||||
|
"[PAD]": 49152
|
||||||
|
}
|
||||||
39
config.json
Normal file
39
config.json
Normal file
@@ -0,0 +1,39 @@
|
|||||||
|
{
|
||||||
|
"_name_or_path": "./hermeswizardcoder-step3800/",
|
||||||
|
"activation_function": "gelu",
|
||||||
|
"architectures": [
|
||||||
|
"GPTBigCodeForCausalLM"
|
||||||
|
],
|
||||||
|
"attention_softmax_in_fp32": true,
|
||||||
|
"attn_pdrop": 0.1,
|
||||||
|
"bos_token_id": 0,
|
||||||
|
"embd_pdrop": 0.1,
|
||||||
|
"eos_token_id": 0,
|
||||||
|
"inference_runner": 0,
|
||||||
|
"initializer_range": 0.02,
|
||||||
|
"layer_norm_epsilon": 1e-05,
|
||||||
|
"max_batch_size": null,
|
||||||
|
"max_sequence_length": null,
|
||||||
|
"model_type": "gpt_bigcode",
|
||||||
|
"multi_query": true,
|
||||||
|
"n_embd": 6144,
|
||||||
|
"n_head": 48,
|
||||||
|
"n_inner": 24576,
|
||||||
|
"n_layer": 40,
|
||||||
|
"n_positions": 8192,
|
||||||
|
"pad_key_length": true,
|
||||||
|
"pre_allocate_kv_cache": false,
|
||||||
|
"resid_pdrop": 0.1,
|
||||||
|
"scale_attention_softmax_in_fp32": true,
|
||||||
|
"scale_attn_weights": true,
|
||||||
|
"summary_activation": null,
|
||||||
|
"summary_first_dropout": 0.1,
|
||||||
|
"summary_proj_to_labels": true,
|
||||||
|
"summary_type": "cls_index",
|
||||||
|
"summary_use_proj": true,
|
||||||
|
"torch_dtype": "float16",
|
||||||
|
"transformers_version": "4.29.2",
|
||||||
|
"use_cache": false,
|
||||||
|
"validate_runner_input": true,
|
||||||
|
"vocab_size": 49153
|
||||||
|
}
|
||||||
1
configuration.json
Normal file
1
configuration.json
Normal file
@@ -0,0 +1 @@
|
|||||||
|
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}
|
||||||
6
generation_config.json
Normal file
6
generation_config.json
Normal file
@@ -0,0 +1,6 @@
|
|||||||
|
{
|
||||||
|
"_from_model_config": true,
|
||||||
|
"bos_token_id": 0,
|
||||||
|
"eos_token_id": 0,
|
||||||
|
"transformers_version": "4.29.2"
|
||||||
|
}
|
||||||
48892
merges.txt
Normal file
48892
merges.txt
Normal file
File diff suppressed because it is too large
Load Diff
3
pytorch_model-00001-of-00004.bin
Normal file
3
pytorch_model-00001-of-00004.bin
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:4af05737ba28721d6999be329b3cf0aee859983975c4911db3235b87653afa13
|
||||||
|
size 9957995189
|
||||||
3
pytorch_model-00002-of-00004.bin
Normal file
3
pytorch_model-00002-of-00004.bin
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:926f98a133e87848b4ff8b5f9e5d6733fd144776df6e008b21276e5f24455994
|
||||||
|
size 9857381671
|
||||||
3
pytorch_model-00003-of-00004.bin
Normal file
3
pytorch_model-00003-of-00004.bin
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:92e986d197fc82fb5e834d9f02abaa9c65cc45a6a7681a12505108664ff3ebc3
|
||||||
|
size 9857381671
|
||||||
3
pytorch_model-00004-of-00004.bin
Normal file
3
pytorch_model-00004-of-00004.bin
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:d54966bd5baa095ac39ea10d70f01b6ae37448f94f8850f977be8d6c30243a96
|
||||||
|
size 1966320293
|
||||||
492
pytorch_model.bin.index.json
Normal file
492
pytorch_model.bin.index.json
Normal file
@@ -0,0 +1,492 @@
|
|||||||
|
{
|
||||||
|
"metadata": {
|
||||||
|
"total_size": 31638917120
|
||||||
|
},
|
||||||
|
"weight_map": {
|
||||||
|
"lm_head.weight": "pytorch_model-00004-of-00004.bin",
|
||||||
|
"transformer.h.0.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.0.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.0.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.0.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.0.ln_1.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.0.ln_1.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.0.ln_2.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.0.ln_2.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.0.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.0.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.0.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.0.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.1.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.1.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.1.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.1.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.1.ln_1.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.1.ln_1.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.1.ln_2.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.1.ln_2.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.1.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.1.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.1.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.1.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.10.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.10.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.10.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.10.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.10.ln_1.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.10.ln_1.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.10.ln_2.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.10.ln_2.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.10.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.10.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.10.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.10.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.11.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.11.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.11.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.11.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.11.ln_1.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.11.ln_1.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.11.ln_2.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.11.ln_2.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.11.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.11.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.11.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.11.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.12.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.12.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.12.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.12.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.12.ln_1.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.12.ln_1.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.12.ln_2.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.12.ln_2.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.12.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.12.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.12.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.12.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.13.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.13.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.13.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.13.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.13.ln_1.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.13.ln_1.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.13.ln_2.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.13.ln_2.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.13.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.13.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.13.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.13.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.14.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.14.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.14.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.14.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.14.ln_1.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.14.ln_1.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.14.ln_2.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.14.ln_2.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.14.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.14.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.14.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.14.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.15.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.15.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.15.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.15.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.15.ln_1.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.15.ln_1.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.15.ln_2.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.15.ln_2.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.15.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.15.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.15.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.15.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.16.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.16.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.16.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.16.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.16.ln_1.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.16.ln_1.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.16.ln_2.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.16.ln_2.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.16.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.16.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.16.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.16.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.17.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.17.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.17.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.17.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.17.ln_1.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.17.ln_1.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.17.ln_2.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.17.ln_2.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.17.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.17.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.17.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.17.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.18.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.18.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.18.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.18.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.18.ln_1.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.18.ln_1.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.18.ln_2.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.18.ln_2.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.18.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.18.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.18.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.18.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.19.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.19.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.19.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.19.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.19.ln_1.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.19.ln_1.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.19.ln_2.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.19.ln_2.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.19.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.19.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.19.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.19.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.2.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.2.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.2.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.2.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.2.ln_1.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.2.ln_1.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.2.ln_2.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.2.ln_2.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.2.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.2.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.2.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.2.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.20.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.20.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.20.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.20.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.20.ln_1.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.20.ln_1.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.20.ln_2.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.20.ln_2.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.20.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.20.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.20.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.20.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.21.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.21.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.21.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.21.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.21.ln_1.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.21.ln_1.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.21.ln_2.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.21.ln_2.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.21.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.21.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.21.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.21.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.22.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.22.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.22.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.22.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.22.ln_1.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.22.ln_1.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.22.ln_2.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.22.ln_2.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.22.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.22.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.22.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.22.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.23.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.23.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.23.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.23.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.23.ln_1.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.23.ln_1.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.23.ln_2.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.23.ln_2.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.23.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.23.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.23.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.23.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.24.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.24.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.24.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.24.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.24.ln_1.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.24.ln_1.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.24.ln_2.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.24.ln_2.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.24.mlp.c_fc.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.24.mlp.c_fc.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.24.mlp.c_proj.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.24.mlp.c_proj.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.25.attn.c_attn.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.25.attn.c_attn.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.25.attn.c_proj.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.25.attn.c_proj.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.25.ln_1.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.25.ln_1.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.25.ln_2.bias": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.25.ln_2.weight": "pytorch_model-00002-of-00004.bin",
|
||||||
|
"transformer.h.25.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.25.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.25.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.25.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.26.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.26.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.26.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.26.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.26.ln_1.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.26.ln_1.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.26.ln_2.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.26.ln_2.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.26.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.26.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.26.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.26.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.27.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.27.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.27.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.27.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.27.ln_1.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.27.ln_1.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.27.ln_2.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.27.ln_2.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.27.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.27.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.27.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.27.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.28.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.28.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.28.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.28.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.28.ln_1.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.28.ln_1.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.28.ln_2.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.28.ln_2.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.28.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.28.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.28.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.28.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.29.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.29.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.29.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.29.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.29.ln_1.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.29.ln_1.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.29.ln_2.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.29.ln_2.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.29.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.29.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.29.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.29.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.3.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.3.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.3.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.3.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.3.ln_1.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.3.ln_1.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.3.ln_2.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.3.ln_2.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.3.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.3.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.3.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.3.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.30.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.30.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.30.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.30.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.30.ln_1.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.30.ln_1.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.30.ln_2.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.30.ln_2.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.30.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.30.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.30.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.30.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.31.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.31.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.31.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.31.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.31.ln_1.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.31.ln_1.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.31.ln_2.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.31.ln_2.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.31.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.31.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.31.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.31.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.32.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.32.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.32.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.32.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.32.ln_1.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.32.ln_1.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.32.ln_2.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.32.ln_2.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.32.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.32.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.32.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.32.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.33.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.33.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.33.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.33.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.33.ln_1.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.33.ln_1.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.33.ln_2.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.33.ln_2.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.33.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.33.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.33.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.33.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.34.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.34.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.34.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.34.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.34.ln_1.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.34.ln_1.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.34.ln_2.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.34.ln_2.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.34.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.34.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.34.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.34.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.35.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.35.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.35.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.35.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.35.ln_1.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.35.ln_1.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.35.ln_2.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.35.ln_2.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.35.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.35.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.35.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.35.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.36.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.36.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.36.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.36.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.36.ln_1.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.36.ln_1.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.36.ln_2.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.36.ln_2.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.36.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.36.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.36.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.36.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.37.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.37.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.37.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.37.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.37.ln_1.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.37.ln_1.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.37.ln_2.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.37.ln_2.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.37.mlp.c_fc.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.37.mlp.c_fc.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.37.mlp.c_proj.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.37.mlp.c_proj.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.38.attn.c_attn.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.38.attn.c_attn.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.38.attn.c_proj.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.38.attn.c_proj.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.38.ln_1.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.38.ln_1.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.38.ln_2.bias": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.38.ln_2.weight": "pytorch_model-00003-of-00004.bin",
|
||||||
|
"transformer.h.38.mlp.c_fc.bias": "pytorch_model-00004-of-00004.bin",
|
||||||
|
"transformer.h.38.mlp.c_fc.weight": "pytorch_model-00004-of-00004.bin",
|
||||||
|
"transformer.h.38.mlp.c_proj.bias": "pytorch_model-00004-of-00004.bin",
|
||||||
|
"transformer.h.38.mlp.c_proj.weight": "pytorch_model-00004-of-00004.bin",
|
||||||
|
"transformer.h.39.attn.c_attn.bias": "pytorch_model-00004-of-00004.bin",
|
||||||
|
"transformer.h.39.attn.c_attn.weight": "pytorch_model-00004-of-00004.bin",
|
||||||
|
"transformer.h.39.attn.c_proj.bias": "pytorch_model-00004-of-00004.bin",
|
||||||
|
"transformer.h.39.attn.c_proj.weight": "pytorch_model-00004-of-00004.bin",
|
||||||
|
"transformer.h.39.ln_1.bias": "pytorch_model-00004-of-00004.bin",
|
||||||
|
"transformer.h.39.ln_1.weight": "pytorch_model-00004-of-00004.bin",
|
||||||
|
"transformer.h.39.ln_2.bias": "pytorch_model-00004-of-00004.bin",
|
||||||
|
"transformer.h.39.ln_2.weight": "pytorch_model-00004-of-00004.bin",
|
||||||
|
"transformer.h.39.mlp.c_fc.bias": "pytorch_model-00004-of-00004.bin",
|
||||||
|
"transformer.h.39.mlp.c_fc.weight": "pytorch_model-00004-of-00004.bin",
|
||||||
|
"transformer.h.39.mlp.c_proj.bias": "pytorch_model-00004-of-00004.bin",
|
||||||
|
"transformer.h.39.mlp.c_proj.weight": "pytorch_model-00004-of-00004.bin",
|
||||||
|
"transformer.h.4.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.4.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.4.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.4.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.4.ln_1.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.4.ln_1.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.4.ln_2.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.4.ln_2.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.4.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.4.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.4.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.4.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.5.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.5.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.5.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.5.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.5.ln_1.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.5.ln_1.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.5.ln_2.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.5.ln_2.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.5.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.5.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.5.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.5.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.6.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.6.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.6.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.6.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.6.ln_1.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.6.ln_1.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.6.ln_2.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.6.ln_2.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.6.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.6.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.6.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.6.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.7.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.7.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.7.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.7.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.7.ln_1.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.7.ln_1.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.7.ln_2.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.7.ln_2.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.7.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.7.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.7.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.7.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.8.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.8.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.8.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.8.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.8.ln_1.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.8.ln_1.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.8.ln_2.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.8.ln_2.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.8.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.8.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.8.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.8.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.9.attn.c_attn.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.9.attn.c_attn.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.9.attn.c_proj.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.9.attn.c_proj.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.9.ln_1.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.9.ln_1.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.9.ln_2.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.9.ln_2.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.9.mlp.c_fc.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.9.mlp.c_fc.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.9.mlp.c_proj.bias": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.h.9.mlp.c_proj.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.ln_f.bias": "pytorch_model-00004-of-00004.bin",
|
||||||
|
"transformer.ln_f.weight": "pytorch_model-00004-of-00004.bin",
|
||||||
|
"transformer.wpe.weight": "pytorch_model-00001-of-00004.bin",
|
||||||
|
"transformer.wte.weight": "pytorch_model-00001-of-00004.bin"
|
||||||
|
}
|
||||||
|
}
|
||||||
27
special_tokens_map.json
Normal file
27
special_tokens_map.json
Normal file
@@ -0,0 +1,27 @@
|
|||||||
|
{
|
||||||
|
"additional_special_tokens": [
|
||||||
|
"<|endoftext|>",
|
||||||
|
"<fim_prefix>",
|
||||||
|
"<fim_middle>",
|
||||||
|
"<fim_suffix>",
|
||||||
|
"<fim_pad>",
|
||||||
|
"<filename>",
|
||||||
|
"<gh_stars>",
|
||||||
|
"<issue_start>",
|
||||||
|
"<issue_comment>",
|
||||||
|
"<issue_closed>",
|
||||||
|
"<jupyter_start>",
|
||||||
|
"<jupyter_text>",
|
||||||
|
"<jupyter_code>",
|
||||||
|
"<jupyter_output>",
|
||||||
|
"<empty_output>",
|
||||||
|
"<commit_before>",
|
||||||
|
"<commit_msg>",
|
||||||
|
"<commit_after>",
|
||||||
|
"<reponame>"
|
||||||
|
],
|
||||||
|
"bos_token": "<|endoftext|>",
|
||||||
|
"eos_token": "<|endoftext|>",
|
||||||
|
"pad_token": "[PAD]",
|
||||||
|
"unk_token": "<|endoftext|>"
|
||||||
|
}
|
||||||
98266
tokenizer.json
Normal file
98266
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
32
tokenizer_config.json
Normal file
32
tokenizer_config.json
Normal file
@@ -0,0 +1,32 @@
|
|||||||
|
{
|
||||||
|
"add_prefix_space": false,
|
||||||
|
"additional_special_tokens": [
|
||||||
|
"<|endoftext|>",
|
||||||
|
"<fim_prefix>",
|
||||||
|
"<fim_middle>",
|
||||||
|
"<fim_suffix>",
|
||||||
|
"<fim_pad>",
|
||||||
|
"<filename>",
|
||||||
|
"<gh_stars>",
|
||||||
|
"<issue_start>",
|
||||||
|
"<issue_comment>",
|
||||||
|
"<issue_closed>",
|
||||||
|
"<jupyter_start>",
|
||||||
|
"<jupyter_text>",
|
||||||
|
"<jupyter_code>",
|
||||||
|
"<jupyter_output>",
|
||||||
|
"<empty_output>",
|
||||||
|
"<commit_before>",
|
||||||
|
"<commit_msg>",
|
||||||
|
"<commit_after>",
|
||||||
|
"<reponame>"
|
||||||
|
],
|
||||||
|
"bos_token": "<|endoftext|>",
|
||||||
|
"clean_up_tokenization_spaces": true,
|
||||||
|
"eos_token": "<|endoftext|>",
|
||||||
|
"model_max_length": 2048,
|
||||||
|
"padding_side": "right",
|
||||||
|
"tokenizer_class": "GPT2Tokenizer",
|
||||||
|
"unk_token": "<|endoftext|>",
|
||||||
|
"vocab_size": 49152
|
||||||
|
}
|
||||||
1
vocab.json
Normal file
1
vocab.json
Normal file
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user