初始化项目,由ModelHub XC社区提供模型

Model: tiiuae/falcon-7b-instruct
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-16 01:41:37 +08:00
commit 7b7136f232
20 changed files with 132201 additions and 0 deletions

39
.gitattributes vendored Normal file
View File

@@ -0,0 +1,39 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
model-00001-of-00002.safetensors filter=lfs diff=lfs merge=lfs -text
model-00002-of-00002.safetensors filter=lfs diff=lfs merge=lfs -text
pytorch_model-00001-of-00002.bin filter=lfs diff=lfs merge=lfs -text
pytorch_model-00002-of-00002.bin filter=lfs diff=lfs merge=lfs -text
coreml/text-generation/falcon-7b-64-float32.mlpackage/Data/com.apple.CoreML/weights/weight.bin filter=lfs diff=lfs merge=lfs -text

234
README.md Normal file
View File

@@ -0,0 +1,234 @@
---
datasets:
- tiiuae/falcon-refinedweb
language:
- en
inference: true
new_version: tiiuae/falcon-11B
widget:
- text: "Hey Falcon! Any recommendations for my holidays in Abu Dhabi?"
example_title: "Abu Dhabi Trip"
- text: "What's the Everett interpretation of quantum mechanics?"
example_title: "Q/A: Quantum & Answers"
- text: "Give me a list of the top 10 dive sites you would recommend around the world."
example_title: "Diving Top 10"
- text: "Can you tell me more about deep-water soloing?"
example_title: "Extreme sports"
- text: "Can you write a short tweet about the Apache 2.0 release of our latest AI model, Falcon LLM?"
example_title: "Twitter Helper"
- text: "What are the responsabilities of a Chief Llama Officer?"
example_title: "Trendy Jobs"
license: apache-2.0
---
# ✨ Falcon-7B-Instruct
**Falcon-7B-Instruct is a 7B parameters causal decoder-only model built by [TII](https://www.tii.ae) based on [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b) and finetuned on a mixture of chat/instruct datasets. It is made available under the Apache 2.0 license.**
*Paper coming soon 😊.*
🤗 To get started with Falcon (inference, finetuning, quantization, etc.), we recommend reading [this great blogpost fron HF](https://huggingface.co/blog/falcon)!
## Why use Falcon-7B-Instruct?
* **You are looking for a ready-to-use chat/instruct model based on [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b).**
* **Falcon-7B is a strong base model, outperforming comparable open-source models** (e.g., [MPT-7B](https://huggingface.co/mosaicml/mpt-7b), [StableLM](https://github.com/Stability-AI/StableLM), [RedPajama](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-7B-v0.1) etc.), thanks to being trained on 1,500B tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) enhanced with curated corpora. See the [OpenLLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
* **It features an architecture optimized for inference**, with FlashAttention ([Dao et al., 2022](https://arxiv.org/abs/2205.14135)) and multiquery ([Shazeer et al., 2019](https://arxiv.org/abs/1911.02150)).
💬 **This is an instruct model, which may not be ideal for further finetuning.** If you are interested in building your own instruct/chat model, we recommend starting from [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b).
🔥 **Looking for an even more powerful model?** [Falcon-40B-Instruct](https://huggingface.co/tiiuae/falcon-40b-instruct) is Falcon-7B-Instruct's big brother!
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch
model = "tiiuae/falcon-7b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto",
)
sequences = pipeline(
"Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:",
max_length=200,
do_sample=True,
top_k=10,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
print(f"Result: {seq['generated_text']}")
```
💥 **Falcon LLMs require PyTorch 2.0 for use with `transformers`!**
For fast inference with Falcon, check-out [Text Generation Inference](https://github.com/huggingface/text-generation-inference)! Read more in this [blogpost]((https://huggingface.co/blog/falcon).
You will need **at least 16GB of memory** to swiftly run inference with Falcon-7B-Instruct.
# Model Card for Falcon-7B-Instruct
## Model Details
### Model Description
- **Developed by:** [https://www.tii.ae](https://www.tii.ae);
- **Model type:** Causal decoder-only;
- **Language(s) (NLP):** English and French;
- **License:** Apache 2.0;
- **Finetuned from model:** [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b).
### Model Source
- **Paper:** *coming soon*.
## Uses
### Direct Use
Falcon-7B-Instruct has been finetuned on a mixture of instruct and chat datasets.
### Out-of-Scope Use
Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.
## Bias, Risks, and Limitations
Falcon-7B-Instruct is mostly trained on English data, and will not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online.
### Recommendations
We recommend users of Falcon-7B-Instruct to develop guardrails and to take appropriate precautions for any production use.
## How to Get Started with the Model
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch
model = "tiiuae/falcon-7b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto",
)
sequences = pipeline(
"Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:",
max_length=200,
do_sample=True,
top_k=10,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
print(f"Result: {seq['generated_text']}")
```
## Training Details
### Training Data
Falcon-7B-Instruct was finetuned on a 250M tokens mixture of instruct/chat datasets.
| **Data source** | **Fraction** | **Tokens** | **Description** |
|--------------------|--------------|------------|-----------------------------------|
| [Bai ze](https://github.com/project-baize/baize-chatbot) | 65% | 164M | chat |
| [GPT4All](https://github.com/nomic-ai/gpt4all) | 25% | 62M | instruct |
| [GPTeacher](https://github.com/teknium1/GPTeacher) | 5% | 11M | instruct |
| [RefinedWeb-English](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) | 5% | 13M | massive web crawl |
The data was tokenized with the Falcon-[7B](https://huggingface.co/tiiuae/falcon-7b)/[40B](https://huggingface.co/tiiuae/falcon-40b) tokenizer.
## Evaluation
*Paper coming soon.*
See the [OpenLLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) for early results.
Note that this model variant is not optimized for NLP benchmarks.
## Technical Specifications
For more information about pretraining, see [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b).
### Model Architecture and Objective
Falcon-7B is a causal decoder-only model trained on a causal language modeling task (i.e., predict the next token).
The architecture is broadly adapted from the GPT-3 paper ([Brown et al., 2020](https://arxiv.org/abs/2005.14165)), with the following differences:
* **Positionnal embeddings:** rotary ([Su et al., 2021](https://arxiv.org/abs/2104.09864));
* **Attention:** multiquery ([Shazeer et al., 2019](https://arxiv.org/abs/1911.02150)) and FlashAttention ([Dao et al., 2022](https://arxiv.org/abs/2205.14135));
* **Decoder-block:** parallel attention/MLP with a single layer norm.
| **Hyperparameter** | **Value** | **Comment** |
|--------------------|-----------|----------------------------------------|
| Layers | 32 | |
| `d_model` | 4544 | Increased to compensate for multiquery |
| `head_dim` | 64 | Reduced to optimise for FlashAttention |
| Vocabulary | 65024 | |
| Sequence length | 2048 | |
### Compute Infrastructure
#### Hardware
Falcon-7B-Instruct was trained on AWS SageMaker, on 32 A100 40GB GPUs in P4d instances.
#### Software
Falcon-7B-Instruct was trained a custom distributed training codebase, Gigatron. It uses a 3D parallelism approach combined with ZeRO and high-performance Triton kernels (FlashAttention, etc.)
## Citation
*Paper coming soon* 😊. In the meanwhile, you can use the following information to cite:
```
@article{falcon40b,
title={{Falcon-40B}: an open large language model with state-of-the-art performance},
author={Almazrouei, Ebtesam and Alobeidli, Hamza and Alshamsi, Abdulaziz and Cappelli, Alessandro and Cojocaru, Ruxandra and Debbah, Merouane and Goffinet, Etienne and Heslow, Daniel and Launay, Julien and Malartic, Quentin and Noune, Badreddine and Pannier, Baptiste and Penedo, Guilherme},
year={2023}
}
```
To learn more about the pretraining dataset, see the 📓 [RefinedWeb paper](https://arxiv.org/abs/2306.01116).
```
@article{refinedweb,
title={The {R}efined{W}eb dataset for {F}alcon {LLM}: outperforming curated corpora with web data, and web data only},
author={Guilherme Penedo and Quentin Malartic and Daniel Hesslow and Ruxandra Cojocaru and Alessandro Cappelli and Hamza Alobeidli and Baptiste Pannier and Ebtesam Almazrouei and Julien Launay},
journal={arXiv preprint arXiv:2306.01116},
eprint={2306.01116},
eprinttype = {arXiv},
url={https://arxiv.org/abs/2306.01116},
year={2023}
}
```
## License
Falcon-7B-Instruct is made available under the Apache 2.0 license.
## Contact
falconllm@tii.ae

33
config.json Normal file
View File

@@ -0,0 +1,33 @@
{
"alibi": false,
"apply_residual_connection_post_layernorm": false,
"architectures": [
"FalconForCausalLM"
],
"attention_dropout": 0.0,
"auto_map": {
"AutoConfig": "configuration_falcon.FalconConfig",
"AutoModel": "modeling_falcon.FalconModel",
"AutoModelForSequenceClassification": "modeling_falcon.FalconForSequenceClassification",
"AutoModelForTokenClassification": "modeling_falcon.FalconForTokenClassification",
"AutoModelForQuestionAnswering": "modeling_falcon.FalconForQuestionAnswering",
"AutoModelForCausalLM": "modeling_falcon.FalconForCausalLM"
},
"bias": false,
"bos_token_id": 11,
"eos_token_id": 11,
"hidden_dropout": 0.0,
"hidden_size": 4544,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "falcon",
"multi_query": true,
"new_decoder_architecture": false,
"num_attention_heads": 71,
"num_hidden_layers": 32,
"parallel_attn": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.27.4",
"use_cache": true,
"vocab_size": 65024
}

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}

152
configuration_falcon.py Normal file
View File

@@ -0,0 +1,152 @@
# coding=utf-8
# Copyright 2023 the Falcon authors and HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
""" Falcon configuration"""
from transformers.configuration_utils import PretrainedConfig
from transformers.utils import logging
logger = logging.get_logger(__name__)
FALCON_PRETRAINED_CONFIG_ARCHIVE_MAP = {
"tiiuae/falcon-40b": "https://huggingface.co/tiiuae/falcon-40b/resolve/main/config.json",
"tiiuae/falcon-7b": "https://huggingface.co/tiiuae/falcon-7b/resolve/main/config.json",
}
class FalconConfig(PretrainedConfig):
r"""
This is the configuration class to store the configuration of a [`FalconModel`]. It is used to instantiate a Falcon
model according to the specified arguments, defining the model architecture. Instantiating a configuration with the
defaults will yield a similar configuration to that of the
[tiiuae/falcon-7b](https://huggingface.co/tiiuae/falcon-7b) architecture.
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
documentation from [`PretrainedConfig`] for more information.
Args:
vocab_size (`int`, *optional*, defaults to 65024):
Vocabulary size of the Falcon model. Defines the number of different tokens that can be represented by the
`inputs_ids` passed when calling [`FalconModel`]
hidden_size (`int`, *optional*, defaults to 4544):
Dimension of the hidden representations.
num_hidden_layers (`int`, *optional*, defaults to 32):
Number of hidden layers in the Transformer decoder.
num_attention_heads (`int`, *optional*, defaults to 71):
Number of attention heads for each attention layer in the Transformer encoder.
initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
use_cache (`bool`, *optional*, defaults to `True`):
Whether the model should return the last key/values attentions (not used by all models). Only relevant if
`config.is_decoder=True`.
layer_norm_epsilon (`float`, *optional*, defaults to 1e-5):
The epsilon used by the layer normalization layers.
hidden_dropout (`float`, *optional*, defaults to 0.0):
The dropout probability for MLP layers.
attention_dropout (`float`, *optional*, defaults to 0.0):
The dropout probability for attention layers.
num_kv_heads (`int`, *optional*):
Number of key-value heads to use per attention layer. If unset, defaults to the same value as
`num_attention_heads`.
alibi (`bool`, *optional*, defaults to `False`):
Whether to use ALiBi positional biases during self-attention.
new_decoder_architecture (`bool`, *optional*, defaults to `False`):
Whether to use the new (Falcon-40B) decoder architecture. If `True`, the `multi_query` and `parallel_attn`
arguments are ignored, as the new decoder always uses parallel attention.
multi_query (`bool`, *optional*, defaults to `True`):
Whether to use multi-query attention in the decoder. Ignored when `new_decoder_architecture` is `True`.
parallel_attn (`bool`, *optional*, defaults to `True`):
Whether to compute attention in parallel with the feedforward layer. If False, they are consecutive
instead, as in the original Transformer architecture. Ignored when `new_decoder_architecture` is `True`.
bias (`bool`, *optional*, defaults to `False`):
Whether to use bias on Linear layers.
bos_token_id (`int`, *optional*, defaults to 11):
The id of the "beginning-of-sequence" token.
eos_token_id (`int`, *optional*, defaults to 11):
The id of the "end-of-sequence" token.
Example:
```python
>>> from transformers import FalconModel, FalconConfig
>>> # Initializing a small (2-layer) Falcon configuration
>>> configuration = FalconConfig(num_hidden_layers=2)
>>> # Initializing a model from the small configuration
>>> model = FalconModel(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config
```"""
model_type = "falcon"
keys_to_ignore_at_inference = ["past_key_values"]
def __init__(
self,
vocab_size=65024,
hidden_size=4544,
num_hidden_layers=32,
num_attention_heads=71,
layer_norm_epsilon=1e-5,
initializer_range=0.02,
use_cache=True,
hidden_dropout=0.0,
attention_dropout=0.0,
num_kv_heads=None,
alibi=False,
new_decoder_architecture=False,
multi_query=True,
parallel_attn=True,
bias=False,
bos_token_id=11,
eos_token_id=11,
**kwargs,
):
logger.warning_once(
"\nWARNING: You are currently loading Falcon using legacy code contained in the model repository. Falcon has now been fully ported into the Hugging Face transformers library. "
"For the most up-to-date and high-performance version of the Falcon model code, please update to the latest version of transformers and then load the model "
"without the trust_remote_code=True argument.\n"
)
self.vocab_size = vocab_size
# Backward compatibility with n_embed kwarg
n_embed = kwargs.pop("n_embed", None)
self.hidden_size = hidden_size if n_embed is None else n_embed
self.num_hidden_layers = num_hidden_layers
self.num_attention_heads = num_attention_heads
self.layer_norm_epsilon = layer_norm_epsilon
self.initializer_range = initializer_range
self.use_cache = use_cache
self.hidden_dropout = hidden_dropout
self.attention_dropout = attention_dropout
self.bos_token_id = bos_token_id
self.eos_token_id = eos_token_id
self.num_kv_heads = num_attention_heads if num_kv_heads is None else num_kv_heads
self.alibi = alibi
self.new_decoder_architecture = new_decoder_architecture
self.multi_query = multi_query # Ignored when new_decoder_architecture is True
self.parallel_attn = parallel_attn
self.bias = bias
super().__init__(bos_token_id=bos_token_id, eos_token_id=eos_token_id, **kwargs)
@property
def head_dim(self):
return self.hidden_size // self.num_attention_heads
@property
def rotary(self):
return not self.alibi

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b12b1d5cab8d237975a831477e3cf5997eef5e932636a0654ef1695b04eb9412
size 396524

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:bc5be03ba082315bb432dbe51f3f80f9d6b1ee1ca05c40d072bad8b0f91c4f0f
size 27693883200

View File

@@ -0,0 +1,18 @@
{
"fileFormatVersion": "1.0.0",
"itemInfoEntries": {
"A51073A0-8381-4006-98E6-894FAB63FB3C": {
"author": "com.apple.CoreML",
"description": "CoreML Model Weights",
"name": "weights",
"path": "com.apple.CoreML/weights"
},
"F0BB8952-F8A2-4E8B-ABF3-9C72B4FC8816": {
"author": "com.apple.CoreML",
"description": "CoreML Model Specification",
"name": "model.mlmodel",
"path": "com.apple.CoreML/model.mlmodel"
}
},
"rootModelIdentifier": "F0BB8952-F8A2-4E8B-ABF3-9C72B4FC8816"
}

6
generation_config.json Normal file
View File

@@ -0,0 +1,6 @@
{
"_from_model_config": true,
"bos_token_id": 11,
"eos_token_id": 11,
"transformers_version": "4.33.0.dev0"
}

33
handler.py Normal file
View File

@@ -0,0 +1,33 @@
import torch
from typing import Any, Dict
from transformers import AutoModelForCausalLM, AutoTokenizer
class EndpointHandler:
def __init__(self, path=""):
# load model and tokenizer from path
self.tokenizer = AutoTokenizer.from_pretrained(path)
self.model = AutoModelForCausalLM.from_pretrained(
path, device_map="auto", torch_dtype=torch.float16, trust_remote_code=True
)
self.device = "cuda" if torch.cuda.is_available() else "cpu"
def __call__(self, data: Dict[str, Any]) -> Dict[str, str]:
# process input
inputs = data.pop("inputs", data)
parameters = data.pop("parameters", None)
# preprocess
inputs = self.tokenizer(inputs, return_tensors="pt").to(self.device)
# pass inputs with all kwargs in data
if parameters is not None:
outputs = self.model.generate(**inputs, **parameters)
else:
outputs = self.model.generate(**inputs)
# postprocess the prediction
prediction = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
return [{"generated_text": prediction}]

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b4515a8e32f81ab4fef65e10891fdb188bc45c4195620d447bc3263e132d9de5
size 9950994832

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d68a175bf005c2dd7fdd9956227b5691802c8d5a5b03897ed107e591017b33ea
size 4483408144

View File

@@ -0,0 +1,203 @@
{
"metadata": {
"total_size": 14434379520
},
"weight_map": {
"lm_head.weight": "model-00002-of-00002.safetensors",
"transformer.h.0.input_layernorm.bias": "model-00001-of-00002.safetensors",
"transformer.h.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
"transformer.h.0.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
"transformer.h.0.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
"transformer.h.0.self_attention.dense.weight": "model-00001-of-00002.safetensors",
"transformer.h.0.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
"transformer.h.1.input_layernorm.bias": "model-00001-of-00002.safetensors",
"transformer.h.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
"transformer.h.1.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
"transformer.h.1.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
"transformer.h.1.self_attention.dense.weight": "model-00001-of-00002.safetensors",
"transformer.h.1.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
"transformer.h.10.input_layernorm.bias": "model-00001-of-00002.safetensors",
"transformer.h.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
"transformer.h.10.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
"transformer.h.10.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
"transformer.h.10.self_attention.dense.weight": "model-00001-of-00002.safetensors",
"transformer.h.10.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
"transformer.h.11.input_layernorm.bias": "model-00001-of-00002.safetensors",
"transformer.h.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
"transformer.h.11.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
"transformer.h.11.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
"transformer.h.11.self_attention.dense.weight": "model-00001-of-00002.safetensors",
"transformer.h.11.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
"transformer.h.12.input_layernorm.bias": "model-00001-of-00002.safetensors",
"transformer.h.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
"transformer.h.12.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
"transformer.h.12.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
"transformer.h.12.self_attention.dense.weight": "model-00001-of-00002.safetensors",
"transformer.h.12.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
"transformer.h.13.input_layernorm.bias": "model-00001-of-00002.safetensors",
"transformer.h.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
"transformer.h.13.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
"transformer.h.13.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
"transformer.h.13.self_attention.dense.weight": "model-00001-of-00002.safetensors",
"transformer.h.13.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
"transformer.h.14.input_layernorm.bias": "model-00001-of-00002.safetensors",
"transformer.h.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
"transformer.h.14.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
"transformer.h.14.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
"transformer.h.14.self_attention.dense.weight": "model-00001-of-00002.safetensors",
"transformer.h.14.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
"transformer.h.15.input_layernorm.bias": "model-00001-of-00002.safetensors",
"transformer.h.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
"transformer.h.15.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
"transformer.h.15.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
"transformer.h.15.self_attention.dense.weight": "model-00001-of-00002.safetensors",
"transformer.h.15.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
"transformer.h.16.input_layernorm.bias": "model-00001-of-00002.safetensors",
"transformer.h.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
"transformer.h.16.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
"transformer.h.16.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
"transformer.h.16.self_attention.dense.weight": "model-00001-of-00002.safetensors",
"transformer.h.16.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
"transformer.h.17.input_layernorm.bias": "model-00001-of-00002.safetensors",
"transformer.h.17.input_layernorm.weight": "model-00001-of-00002.safetensors",
"transformer.h.17.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
"transformer.h.17.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
"transformer.h.17.self_attention.dense.weight": "model-00001-of-00002.safetensors",
"transformer.h.17.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
"transformer.h.18.input_layernorm.bias": "model-00001-of-00002.safetensors",
"transformer.h.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
"transformer.h.18.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
"transformer.h.18.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
"transformer.h.18.self_attention.dense.weight": "model-00001-of-00002.safetensors",
"transformer.h.18.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
"transformer.h.19.input_layernorm.bias": "model-00001-of-00002.safetensors",
"transformer.h.19.input_layernorm.weight": "model-00001-of-00002.safetensors",
"transformer.h.19.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
"transformer.h.19.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
"transformer.h.19.self_attention.dense.weight": "model-00001-of-00002.safetensors",
"transformer.h.19.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
"transformer.h.2.input_layernorm.bias": "model-00001-of-00002.safetensors",
"transformer.h.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
"transformer.h.2.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
"transformer.h.2.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
"transformer.h.2.self_attention.dense.weight": "model-00001-of-00002.safetensors",
"transformer.h.2.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
"transformer.h.20.input_layernorm.bias": "model-00001-of-00002.safetensors",
"transformer.h.20.input_layernorm.weight": "model-00001-of-00002.safetensors",
"transformer.h.20.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
"transformer.h.20.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
"transformer.h.20.self_attention.dense.weight": "model-00001-of-00002.safetensors",
"transformer.h.20.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
"transformer.h.21.input_layernorm.bias": "model-00001-of-00002.safetensors",
"transformer.h.21.input_layernorm.weight": "model-00001-of-00002.safetensors",
"transformer.h.21.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
"transformer.h.21.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
"transformer.h.21.self_attention.dense.weight": "model-00001-of-00002.safetensors",
"transformer.h.21.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
"transformer.h.22.input_layernorm.bias": "model-00001-of-00002.safetensors",
"transformer.h.22.input_layernorm.weight": "model-00001-of-00002.safetensors",
"transformer.h.22.mlp.dense_4h_to_h.weight": "model-00002-of-00002.safetensors",
"transformer.h.22.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
"transformer.h.22.self_attention.dense.weight": "model-00001-of-00002.safetensors",
"transformer.h.22.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
"transformer.h.23.input_layernorm.bias": "model-00002-of-00002.safetensors",
"transformer.h.23.input_layernorm.weight": "model-00002-of-00002.safetensors",
"transformer.h.23.mlp.dense_4h_to_h.weight": "model-00002-of-00002.safetensors",
"transformer.h.23.mlp.dense_h_to_4h.weight": "model-00002-of-00002.safetensors",
"transformer.h.23.self_attention.dense.weight": "model-00002-of-00002.safetensors",
"transformer.h.23.self_attention.query_key_value.weight": "model-00002-of-00002.safetensors",
"transformer.h.24.input_layernorm.bias": "model-00002-of-00002.safetensors",
"transformer.h.24.input_layernorm.weight": "model-00002-of-00002.safetensors",
"transformer.h.24.mlp.dense_4h_to_h.weight": "model-00002-of-00002.safetensors",
"transformer.h.24.mlp.dense_h_to_4h.weight": "model-00002-of-00002.safetensors",
"transformer.h.24.self_attention.dense.weight": "model-00002-of-00002.safetensors",
"transformer.h.24.self_attention.query_key_value.weight": "model-00002-of-00002.safetensors",
"transformer.h.25.input_layernorm.bias": "model-00002-of-00002.safetensors",
"transformer.h.25.input_layernorm.weight": "model-00002-of-00002.safetensors",
"transformer.h.25.mlp.dense_4h_to_h.weight": "model-00002-of-00002.safetensors",
"transformer.h.25.mlp.dense_h_to_4h.weight": "model-00002-of-00002.safetensors",
"transformer.h.25.self_attention.dense.weight": "model-00002-of-00002.safetensors",
"transformer.h.25.self_attention.query_key_value.weight": "model-00002-of-00002.safetensors",
"transformer.h.26.input_layernorm.bias": "model-00002-of-00002.safetensors",
"transformer.h.26.input_layernorm.weight": "model-00002-of-00002.safetensors",
"transformer.h.26.mlp.dense_4h_to_h.weight": "model-00002-of-00002.safetensors",
"transformer.h.26.mlp.dense_h_to_4h.weight": "model-00002-of-00002.safetensors",
"transformer.h.26.self_attention.dense.weight": "model-00002-of-00002.safetensors",
"transformer.h.26.self_attention.query_key_value.weight": "model-00002-of-00002.safetensors",
"transformer.h.27.input_layernorm.bias": "model-00002-of-00002.safetensors",
"transformer.h.27.input_layernorm.weight": "model-00002-of-00002.safetensors",
"transformer.h.27.mlp.dense_4h_to_h.weight": "model-00002-of-00002.safetensors",
"transformer.h.27.mlp.dense_h_to_4h.weight": "model-00002-of-00002.safetensors",
"transformer.h.27.self_attention.dense.weight": "model-00002-of-00002.safetensors",
"transformer.h.27.self_attention.query_key_value.weight": "model-00002-of-00002.safetensors",
"transformer.h.28.input_layernorm.bias": "model-00002-of-00002.safetensors",
"transformer.h.28.input_layernorm.weight": "model-00002-of-00002.safetensors",
"transformer.h.28.mlp.dense_4h_to_h.weight": "model-00002-of-00002.safetensors",
"transformer.h.28.mlp.dense_h_to_4h.weight": "model-00002-of-00002.safetensors",
"transformer.h.28.self_attention.dense.weight": "model-00002-of-00002.safetensors",
"transformer.h.28.self_attention.query_key_value.weight": "model-00002-of-00002.safetensors",
"transformer.h.29.input_layernorm.bias": "model-00002-of-00002.safetensors",
"transformer.h.29.input_layernorm.weight": "model-00002-of-00002.safetensors",
"transformer.h.29.mlp.dense_4h_to_h.weight": "model-00002-of-00002.safetensors",
"transformer.h.29.mlp.dense_h_to_4h.weight": "model-00002-of-00002.safetensors",
"transformer.h.29.self_attention.dense.weight": "model-00002-of-00002.safetensors",
"transformer.h.29.self_attention.query_key_value.weight": "model-00002-of-00002.safetensors",
"transformer.h.3.input_layernorm.bias": "model-00001-of-00002.safetensors",
"transformer.h.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
"transformer.h.3.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
"transformer.h.3.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
"transformer.h.3.self_attention.dense.weight": "model-00001-of-00002.safetensors",
"transformer.h.3.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
"transformer.h.30.input_layernorm.bias": "model-00002-of-00002.safetensors",
"transformer.h.30.input_layernorm.weight": "model-00002-of-00002.safetensors",
"transformer.h.30.mlp.dense_4h_to_h.weight": "model-00002-of-00002.safetensors",
"transformer.h.30.mlp.dense_h_to_4h.weight": "model-00002-of-00002.safetensors",
"transformer.h.30.self_attention.dense.weight": "model-00002-of-00002.safetensors",
"transformer.h.30.self_attention.query_key_value.weight": "model-00002-of-00002.safetensors",
"transformer.h.31.input_layernorm.bias": "model-00002-of-00002.safetensors",
"transformer.h.31.input_layernorm.weight": "model-00002-of-00002.safetensors",
"transformer.h.31.mlp.dense_4h_to_h.weight": "model-00002-of-00002.safetensors",
"transformer.h.31.mlp.dense_h_to_4h.weight": "model-00002-of-00002.safetensors",
"transformer.h.31.self_attention.dense.weight": "model-00002-of-00002.safetensors",
"transformer.h.31.self_attention.query_key_value.weight": "model-00002-of-00002.safetensors",
"transformer.h.4.input_layernorm.bias": "model-00001-of-00002.safetensors",
"transformer.h.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
"transformer.h.4.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
"transformer.h.4.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
"transformer.h.4.self_attention.dense.weight": "model-00001-of-00002.safetensors",
"transformer.h.4.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
"transformer.h.5.input_layernorm.bias": "model-00001-of-00002.safetensors",
"transformer.h.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
"transformer.h.5.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
"transformer.h.5.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
"transformer.h.5.self_attention.dense.weight": "model-00001-of-00002.safetensors",
"transformer.h.5.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
"transformer.h.6.input_layernorm.bias": "model-00001-of-00002.safetensors",
"transformer.h.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
"transformer.h.6.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
"transformer.h.6.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
"transformer.h.6.self_attention.dense.weight": "model-00001-of-00002.safetensors",
"transformer.h.6.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
"transformer.h.7.input_layernorm.bias": "model-00001-of-00002.safetensors",
"transformer.h.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
"transformer.h.7.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
"transformer.h.7.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
"transformer.h.7.self_attention.dense.weight": "model-00001-of-00002.safetensors",
"transformer.h.7.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
"transformer.h.8.input_layernorm.bias": "model-00001-of-00002.safetensors",
"transformer.h.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
"transformer.h.8.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
"transformer.h.8.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
"transformer.h.8.self_attention.dense.weight": "model-00001-of-00002.safetensors",
"transformer.h.8.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
"transformer.h.9.input_layernorm.bias": "model-00001-of-00002.safetensors",
"transformer.h.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
"transformer.h.9.mlp.dense_4h_to_h.weight": "model-00001-of-00002.safetensors",
"transformer.h.9.mlp.dense_h_to_4h.weight": "model-00001-of-00002.safetensors",
"transformer.h.9.self_attention.dense.weight": "model-00001-of-00002.safetensors",
"transformer.h.9.self_attention.query_key_value.weight": "model-00001-of-00002.safetensors",
"transformer.ln_f.bias": "model-00002-of-00002.safetensors",
"transformer.ln_f.weight": "model-00002-of-00002.safetensors",
"transformer.word_embeddings.weight": "model-00001-of-00002.safetensors"
}
}

1262
modeling_falcon.py Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:66acf4bebb68593952a51575cb02dbf258a606e236c6b82b6b60c3b1e9089e66
size 9951028193

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1de823c84b1c8b9889ac2a6c670ec6002a71776abd42cdf51bb3acd4c9938b29
size 4483421659

View File

@@ -0,0 +1,203 @@
{
"metadata": {
"total_size": 14434379520
},
"weight_map": {
"lm_head.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.0.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
"transformer.h.0.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.0.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.0.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.0.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.0.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.1.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
"transformer.h.1.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.1.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.1.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.1.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.1.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.10.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
"transformer.h.10.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.10.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.10.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.10.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.10.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.11.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
"transformer.h.11.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.11.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.11.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.11.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.11.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.12.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
"transformer.h.12.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.12.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.12.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.12.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.12.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.13.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
"transformer.h.13.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.13.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.13.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.13.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.13.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.14.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
"transformer.h.14.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.14.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.14.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.14.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.14.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.15.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
"transformer.h.15.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.15.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.15.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.15.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.15.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.16.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
"transformer.h.16.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.16.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.16.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.16.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.16.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.17.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
"transformer.h.17.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.17.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.17.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.17.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.17.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.18.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
"transformer.h.18.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.18.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.18.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.18.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.18.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.19.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
"transformer.h.19.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.19.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.19.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.19.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.19.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.2.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
"transformer.h.2.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.2.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.2.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.2.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.2.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.20.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
"transformer.h.20.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.20.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.20.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.20.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.20.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.21.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
"transformer.h.21.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.21.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.21.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.21.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.21.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.22.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
"transformer.h.22.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.22.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.22.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.22.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.22.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.23.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
"transformer.h.23.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.23.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.23.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.23.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.23.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.24.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
"transformer.h.24.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.24.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.24.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.24.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.24.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.25.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
"transformer.h.25.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.25.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.25.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.25.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.25.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.26.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
"transformer.h.26.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.26.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.26.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.26.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.26.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.27.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
"transformer.h.27.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.27.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.27.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.27.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.27.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.28.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
"transformer.h.28.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.28.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.28.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.28.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.28.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.29.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
"transformer.h.29.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.29.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.29.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.29.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.29.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.3.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
"transformer.h.3.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.3.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.3.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.3.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.3.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.30.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
"transformer.h.30.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.30.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.30.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.30.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.30.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.31.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
"transformer.h.31.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.31.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.31.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.31.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.31.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
"transformer.h.4.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
"transformer.h.4.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.4.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.4.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.4.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.4.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.5.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
"transformer.h.5.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.5.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.5.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.5.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.5.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.6.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
"transformer.h.6.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.6.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.6.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.6.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.6.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.7.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
"transformer.h.7.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.7.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.7.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.7.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.7.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.8.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
"transformer.h.8.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.8.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.8.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.8.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.8.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.9.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
"transformer.h.9.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.9.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.9.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.9.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
"transformer.h.9.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
"transformer.ln_f.bias": "pytorch_model-00002-of-00002.bin",
"transformer.ln_f.weight": "pytorch_model-00002-of-00002.bin",
"transformer.word_embeddings.weight": "pytorch_model-00001-of-00002.bin"
}
}

16
special_tokens_map.json Normal file
View File

@@ -0,0 +1,16 @@
{
"additional_special_tokens": [
">>TITLE<<",
">>ABSTRACT<<",
">>INTRODUCTION<<",
">>SUMMARY<<",
">>COMMENT<<",
">>ANSWER<<",
">>QUESTION<<",
">>DOMAIN<<",
">>PREFIX<<",
">>SUFFIX<<",
">>MIDDLE<<"
],
"eos_token": "<|endoftext|>"
}

129970
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

13
tokenizer_config.json Normal file
View File

@@ -0,0 +1,13 @@
{
"add_prefix_space": false,
"eos_token": "<|endoftext|>",
"model_input_names": [
"input_ids",
"attention_mask"
],
"model_max_length": 2048,
"name_or_path": "tiiuae/falcon_tokenizer",
"special_tokens_map_file": null,
"tokenizer_class": "PreTrainedTokenizerFast",
"chat_template": "{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% set system_message = '' %}{% endif %}{% for message in loop_messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if loop.index0 == 0 %}{{ system_message.strip() }}{% endif %}{% if message['role'] == 'user' %}{{ '\n\nUser: ' + message['content'].strip().replace('\r\n', '\n').replace('\n\n', '\n') }}{% elif message['role'] == 'assistant' %}{{ '\n\nAssistant: ' + message['content'].strip().replace('\r\n', '\n').replace('\n\n', '\n') }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '\n\nAssistant:' }}{% endif %}"
}