初始化项目,由ModelHub XC社区提供模型
Model: AI-ModelScope/falcon-7b-instruct Source: Original Platform
This commit is contained in:
34
.gitattributes
vendored
Normal file
34
.gitattributes
vendored
Normal file
@@ -0,0 +1,34 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
109
README.md
Normal file
109
README.md
Normal file
@@ -0,0 +1,109 @@
|
||||
---
|
||||
tasks:
|
||||
- text-generation
|
||||
language:
|
||||
- en
|
||||
|
||||
license: Apache License 2.0
|
||||
---
|
||||
|
||||
# ✨ Falcon-7B-Instruct
|
||||
|
||||
**Falcon-7B-Instruct 是 [TII](https://www.tii.ae) 在 [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b)的基础上建立的7B参数因果解码器专用模型,并在chat/instruct数据集的混合中进行了微调。它是在Apache 2.0许可下提供的。**
|
||||
|
||||
*Paper coming soon 😊.*
|
||||
|
||||
## 为什么使用Falcon-7B-Instruct?
|
||||
|
||||
* **您正在寻找一个基于[Falcon-7B](https://huggingface.co/tiiuae/falcon-7b)的即用型chat/instruct模型.**
|
||||
* **猎鹰-7B是一个强大的基础模型,性能优于可比的开源模型** (e.g., [MPT-7B](https://huggingface.co/mosaicml/mpt-7b), [StableLM](https://github.com/Stability-AI/StableLM), [RedPajama](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-7B-v0.1) etc.), 得益于在1,500B tokens 的 [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) 通过精心策划的语料库来加强。See the [OpenLLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
|
||||
* **它有一个为推理而优化的架构**, with FlashAttention ([Dao et al., 2022](https://arxiv.org/abs/2205.14135)) and multiquery ([Shazeer et al., 2019](https://arxiv.org/abs/1911.02150)).
|
||||
|
||||
💬 **这是一个指导性的模型,对于进一步的微调可能并不理想。** 如果你有兴趣建立你自己的指示/聊天模型,我们建议从[猎鹰-7B](https://huggingface.co/tiiuae/falcon-7b)开始.
|
||||
|
||||
🔥 **想找一个更强大的模型吗?** [Falcon-40B-Instruct](https://huggingface.co/tiiuae/falcon-40b-instruct) 是Falcon-7B-Instruct的大哥!
|
||||
|
||||
```python
|
||||
from modelscope.utils.constant import Tasks
|
||||
from modelscope.pipelines import pipeline
|
||||
pipe = pipeline(task=Tasks.text_generation, model='AI-ModelScope/falcon-7b-instruct', model_revision='v1.0.1', device='cuda')
|
||||
query="Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:"
|
||||
result = pipe(query)
|
||||
print(result)
|
||||
|
||||
```
|
||||
|
||||
💥 **Falcon LLMs require PyTorch 2.0 for use with `transformers`!**
|
||||
|
||||
|
||||
# Model Card for Falcon-7B-Instruct
|
||||
|
||||
## 模型细节
|
||||
|
||||
### 模型描述
|
||||
|
||||
- **开发者/单位:** [https://www.tii.ae](https://www.tii.ae);
|
||||
- **模型类型:** Causal decoder-only;
|
||||
- **语言(NLP):** English and French;
|
||||
- **许可证:** Apache 2.0;
|
||||
- **根据模型进行微调:** [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b).
|
||||
|
||||
### 模型来源
|
||||
|
||||
- **Paper:** *coming soon*.
|
||||
|
||||
## 用途
|
||||
|
||||
### 直接使用
|
||||
|
||||
猎鹰-7B-Instruct已经在指示和聊天数据集的混合中进行了微调。
|
||||
|
||||
### 范围外的使用
|
||||
|
||||
在没有充分评估风险和缓解措施的情况下进行生产使用;任何可能被认为是不负责任或有害的使用情况。
|
||||
|
||||
## 偏见、风险和局限性
|
||||
|
||||
Falcon-7B-Instruct主要是在英语数据上训练的,不会适当地推广到其他语言。此外,由于它是在代表网络的大规模语料库上训练的,它将带有网上常见的定型观念和偏见。
|
||||
|
||||
### 建议
|
||||
|
||||
我们建议Falcon-7B-Instruct的用户制定护栏,并对任何生产使用采取适当的预防措施。
|
||||
|
||||
|
||||
## 训练细节
|
||||
|
||||
### 训练数据
|
||||
|
||||
Falcon-7B-Instruct在250M tokens混合的指示/聊天数据集上进行了微调。
|
||||
|
||||
| **Data source** | **Fraction** | **Tokens** | **Description** |
|
||||
|--------------------|--------------|------------|-----------------------------------|
|
||||
| [Bai ze](https://github.com/project-baize/baize-chatbot) | 65% | 164M | chat |
|
||||
| [GPT4All](https://github.com/nomic-ai/gpt4all) | 25% | 62M | instruct |
|
||||
| [GPTeacher](https://github.com/teknium1/GPTeacher) | 5% | 11M | instruct |
|
||||
| [RefinedWeb-English](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) | 5% | 13M | massive web crawl |
|
||||
|
||||
|
||||
The data was tokenized with the Falcon-[7B](https://huggingface.co/tiiuae/falcon-7b)/[40B](https://huggingface.co/tiiuae/falcon-40b) tokenizer.
|
||||
|
||||
|
||||
## 评价
|
||||
|
||||
*Paper coming soon.*
|
||||
|
||||
See the [OpenLLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) for early results.
|
||||
|
||||
请注意,这个模型变体没有针对NLP基准进行优化。
|
||||
|
||||
|
||||
## 技术参数
|
||||
|
||||
有关预训练的更多信息, 请见[Falcon-7B](https://huggingface.co/tiiuae/falcon-7b).
|
||||
|
||||
## 许可证
|
||||
|
||||
Falcon-7B-Instruct is made available under the Apache 2.0 license.
|
||||
|
||||
## 联系
|
||||
falconllm@tii.ae
|
||||
28
config.json
Normal file
28
config.json
Normal file
@@ -0,0 +1,28 @@
|
||||
{
|
||||
"alibi": false,
|
||||
"apply_residual_connection_post_layernorm": false,
|
||||
"architectures": [
|
||||
"RWForCausalLM"
|
||||
],
|
||||
"attention_dropout": 0.0,
|
||||
"auto_map": {
|
||||
"AutoConfig": "configuration_RW.RWConfig",
|
||||
"AutoModelForCausalLM": "modelling_RW.RWForCausalLM"
|
||||
},
|
||||
"bias": false,
|
||||
"bos_token_id": 11,
|
||||
"eos_token_id": 11,
|
||||
"hidden_dropout": 0.0,
|
||||
"hidden_size": 4544,
|
||||
"initializer_range": 0.02,
|
||||
"layer_norm_epsilon": 1e-05,
|
||||
"model_type": "RefinedWebModel",
|
||||
"multi_query": true,
|
||||
"n_head": 71,
|
||||
"n_layer": 32,
|
||||
"parallel_attn": true,
|
||||
"torch_dtype": "bfloat16",
|
||||
"transformers_version": "4.27.4",
|
||||
"use_cache": true,
|
||||
"vocab_size": 65024
|
||||
}
|
||||
11
configuration.json
Normal file
11
configuration.json
Normal file
@@ -0,0 +1,11 @@
|
||||
{
|
||||
"framework": "pytorch",
|
||||
"task": "text-generation",
|
||||
"model": {
|
||||
"type": "falcon-7b-instruct"
|
||||
},
|
||||
"pipeline": {
|
||||
"type": "falcon-7b-instruct-text-generation-pipe"
|
||||
},
|
||||
"allow_remote": true
|
||||
}
|
||||
79
configuration_RW.py
Normal file
79
configuration_RW.py
Normal file
@@ -0,0 +1,79 @@
|
||||
# coding=utf-8
|
||||
# Copyright 2022 the Big Science Workshop and HuggingFace Inc. team. All rights reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
""" Bloom configuration"""
|
||||
from transformers.configuration_utils import PretrainedConfig
|
||||
from transformers.utils import logging
|
||||
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
|
||||
|
||||
class RWConfig(PretrainedConfig):
|
||||
model_type = "RefinedWebModel"
|
||||
keys_to_ignore_at_inference = ["past_key_values"]
|
||||
attribute_map = {
|
||||
"num_hidden_layers": "n_layer",
|
||||
"num_attention_heads": "n_head",
|
||||
}
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
vocab_size=250880,
|
||||
hidden_size=64,
|
||||
n_layer=2,
|
||||
n_head=8,
|
||||
layer_norm_epsilon=1e-5,
|
||||
initializer_range=0.02,
|
||||
use_cache=True,
|
||||
bos_token_id=1,
|
||||
eos_token_id=2,
|
||||
apply_residual_connection_post_layernorm=False,
|
||||
hidden_dropout=0.0,
|
||||
attention_dropout=0.0,
|
||||
multi_query=False,
|
||||
alibi=False,
|
||||
bias=False,
|
||||
parallel_attn=False,
|
||||
**kwargs,
|
||||
):
|
||||
self.vocab_size = vocab_size
|
||||
# Backward compatibility with n_embed kwarg
|
||||
n_embed = kwargs.pop("n_embed", None)
|
||||
self.hidden_size = hidden_size if n_embed is None else n_embed
|
||||
self.n_layer = n_layer
|
||||
self.n_head = n_head
|
||||
self.layer_norm_epsilon = layer_norm_epsilon
|
||||
self.initializer_range = initializer_range
|
||||
self.use_cache = use_cache
|
||||
self.apply_residual_connection_post_layernorm = apply_residual_connection_post_layernorm
|
||||
self.hidden_dropout = hidden_dropout
|
||||
self.attention_dropout = attention_dropout
|
||||
|
||||
self.bos_token_id = bos_token_id
|
||||
self.eos_token_id = eos_token_id
|
||||
self.multi_query = multi_query
|
||||
self.alibi = alibi
|
||||
self.bias = bias
|
||||
self.parallel_attn = parallel_attn
|
||||
|
||||
super().__init__(bos_token_id=bos_token_id, eos_token_id=eos_token_id, **kwargs)
|
||||
|
||||
@property
|
||||
def head_dim(self):
|
||||
return self.hidden_size // self.n_head
|
||||
|
||||
@property
|
||||
def rotary(self):
|
||||
return not self.alibi
|
||||
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:b12b1d5cab8d237975a831477e3cf5997eef5e932636a0654ef1695b04eb9412
|
||||
size 396524
|
||||
@@ -0,0 +1,18 @@
|
||||
{
|
||||
"fileFormatVersion": "1.0.0",
|
||||
"itemInfoEntries": {
|
||||
"A51073A0-8381-4006-98E6-894FAB63FB3C": {
|
||||
"author": "com.apple.CoreML",
|
||||
"description": "CoreML Model Weights",
|
||||
"name": "weights",
|
||||
"path": "com.apple.CoreML/weights"
|
||||
},
|
||||
"F0BB8952-F8A2-4E8B-ABF3-9C72B4FC8816": {
|
||||
"author": "com.apple.CoreML",
|
||||
"description": "CoreML Model Specification",
|
||||
"name": "model.mlmodel",
|
||||
"path": "com.apple.CoreML/model.mlmodel"
|
||||
}
|
||||
},
|
||||
"rootModelIdentifier": "F0BB8952-F8A2-4E8B-ABF3-9C72B4FC8816"
|
||||
}
|
||||
6
generation_config.json
Normal file
6
generation_config.json
Normal file
@@ -0,0 +1,6 @@
|
||||
{
|
||||
"_from_model_config": true,
|
||||
"bos_token_id": 1,
|
||||
"eos_token_id": 2,
|
||||
"transformers_version": "4.27.4"
|
||||
}
|
||||
1100
modelling_RW.py
Normal file
1100
modelling_RW.py
Normal file
File diff suppressed because it is too large
Load Diff
75
ms_wrapper.py
Normal file
75
ms_wrapper.py
Normal file
@@ -0,0 +1,75 @@
|
||||
import os
|
||||
from typing import Any, Dict, Union
|
||||
|
||||
import torch
|
||||
import transformers
|
||||
from modelscope.models.base import Model, TorchModel
|
||||
from modelscope.models.builder import MODELS
|
||||
from modelscope.pipelines.base import Pipeline
|
||||
from modelscope.pipelines.builder import PIPELINES
|
||||
from modelscope.utils.constant import Tasks
|
||||
from modelscope.utils.logger import get_logger
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
if 'CUDA_VISIBLE_DEVICES' not in os.environ:
|
||||
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
|
||||
|
||||
|
||||
@PIPELINES.register_module(
|
||||
Tasks.text_generation,
|
||||
module_name='falcon-7b-instruct-text-generation-pipe')
|
||||
class falcon7binstructTextGenerationPipeline(Pipeline):
|
||||
def __init__(self, model: Union[Model, str], *args, **kwargs):
|
||||
model = falcon7binstructTextGeneration(model) if isinstance(
|
||||
model, str) else model
|
||||
super().__init__(model=model, **kwargs)
|
||||
|
||||
def preprocess(self, inputs, **preprocess_params) -> Dict[str, Any]:
|
||||
return inputs
|
||||
|
||||
# define the forward pass
|
||||
def forward(self, inputs: Dict, **forward_params) -> Dict[str, Any]:
|
||||
return self.model(inputs)
|
||||
|
||||
# format the outputs from pipeline
|
||||
def postprocess(self, input, **kwargs) -> Dict[str, Any]:
|
||||
return input
|
||||
|
||||
|
||||
@MODELS.register_module(Tasks.text_generation,
|
||||
module_name='falcon-7b-instruct')
|
||||
class falcon7binstructTextGeneration(TorchModel):
|
||||
def __init__(self, model_dir=None, *args, **kwargs):
|
||||
super().__init__(model_dir, *args, **kwargs)
|
||||
self.logger = get_logger()
|
||||
# loading tokenizer
|
||||
self.tokenizer = AutoTokenizer.from_pretrained(model_dir)
|
||||
self.pipeline = transformers.pipeline(
|
||||
"text-generation",
|
||||
model=model_dir,
|
||||
tokenizer=self.tokenizer,
|
||||
torch_dtype=torch.bfloat16,
|
||||
trust_remote_code=True,
|
||||
device_map="auto",
|
||||
)
|
||||
|
||||
def forward(self, input: Dict) -> Dict[str, Any]:
|
||||
output = {}
|
||||
res = self.infer(input)
|
||||
output['text'] = res
|
||||
return output
|
||||
|
||||
def quantize(self, bits: int):
|
||||
self.model = self.model.quantize(bits)
|
||||
return self
|
||||
|
||||
def infer(self, input):
|
||||
sequences = self.pipeline(
|
||||
input,
|
||||
max_length=200,
|
||||
do_sample=True,
|
||||
top_k=10,
|
||||
num_return_sequences=1,
|
||||
eos_token_id=self.tokenizer.eos_token_id,
|
||||
)
|
||||
return sequences
|
||||
3
pytorch_model-00001-of-00002.bin
Normal file
3
pytorch_model-00001-of-00002.bin
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:66acf4bebb68593952a51575cb02dbf258a606e236c6b82b6b60c3b1e9089e66
|
||||
size 9951028193
|
||||
3
pytorch_model-00002-of-00002.bin
Normal file
3
pytorch_model-00002-of-00002.bin
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:1de823c84b1c8b9889ac2a6c670ec6002a71776abd42cdf51bb3acd4c9938b29
|
||||
size 4483421659
|
||||
203
pytorch_model.bin.index.json
Normal file
203
pytorch_model.bin.index.json
Normal file
@@ -0,0 +1,203 @@
|
||||
{
|
||||
"metadata": {
|
||||
"total_size": 14434379520
|
||||
},
|
||||
"weight_map": {
|
||||
"lm_head.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.0.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.0.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.0.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.0.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.0.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.0.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.1.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.1.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.1.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.1.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.1.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.1.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.10.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.10.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.10.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.10.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.10.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.10.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.11.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.11.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.11.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.11.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.11.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.11.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.12.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.12.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.12.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.12.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.12.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.12.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.13.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.13.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.13.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.13.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.13.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.13.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.14.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.14.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.14.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.14.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.14.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.14.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.15.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.15.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.15.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.15.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.15.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.15.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.16.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.16.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.16.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.16.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.16.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.16.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.17.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.17.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.17.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.17.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.17.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.17.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.18.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.18.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.18.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.18.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.18.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.18.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.19.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.19.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.19.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.19.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.19.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.19.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.2.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.2.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.2.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.2.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.2.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.2.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.20.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.20.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.20.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.20.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.20.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.20.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.21.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.21.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.21.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.21.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.21.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.21.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.22.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.22.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.22.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.22.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.22.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.22.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.23.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.23.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.23.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.23.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.23.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.23.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.24.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.24.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.24.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.24.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.24.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.24.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.25.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.25.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.25.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.25.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.25.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.25.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.26.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.26.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.26.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.26.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.26.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.26.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.27.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.27.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.27.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.27.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.27.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.27.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.28.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.28.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.28.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.28.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.28.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.28.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.29.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.29.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.29.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.29.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.29.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.29.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.3.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.3.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.3.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.3.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.3.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.3.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.30.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.30.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.30.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.30.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.30.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.30.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.31.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.31.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.31.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.31.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.31.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.31.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.h.4.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.4.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.4.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.4.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.4.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.4.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.5.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.5.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.5.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.5.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.5.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.5.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.6.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.6.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.6.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.6.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.6.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.6.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.7.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.7.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.7.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.7.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.7.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.7.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.8.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.8.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.8.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.8.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.8.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.8.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.9.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.9.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.9.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.9.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.9.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.h.9.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
|
||||
"transformer.ln_f.bias": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.ln_f.weight": "pytorch_model-00002-of-00002.bin",
|
||||
"transformer.word_embeddings.weight": "pytorch_model-00001-of-00002.bin"
|
||||
}
|
||||
}
|
||||
16
special_tokens_map.json
Normal file
16
special_tokens_map.json
Normal file
@@ -0,0 +1,16 @@
|
||||
{
|
||||
"additional_special_tokens": [
|
||||
">>TITLE<<",
|
||||
">>ABSTRACT<<",
|
||||
">>INTRODUCTION<<",
|
||||
">>SUMMARY<<",
|
||||
">>COMMENT<<",
|
||||
">>ANSWER<<",
|
||||
">>QUESTION<<",
|
||||
">>DOMAIN<<",
|
||||
">>PREFIX<<",
|
||||
">>SUFFIX<<",
|
||||
">>MIDDLE<<"
|
||||
],
|
||||
"eos_token": "<|endoftext|>"
|
||||
}
|
||||
129970
tokenizer.json
Normal file
129970
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
8
tokenizer_config.json
Normal file
8
tokenizer_config.json
Normal file
@@ -0,0 +1,8 @@
|
||||
{
|
||||
"add_prefix_space": false,
|
||||
"eos_token": "<|endoftext|>",
|
||||
"model_max_length": 2048,
|
||||
"name_or_path": "tiiuae/falcon_tokenizer",
|
||||
"special_tokens_map_file": null,
|
||||
"tokenizer_class": "PreTrainedTokenizerFast"
|
||||
}
|
||||
Reference in New Issue
Block a user