初始化项目,由ModelHub XC社区提供模型
Model: tiiuae/falcon-rw-1b Source: Original Platform
This commit is contained in:
35
.gitattributes
vendored
Normal file
35
.gitattributes
vendored
Normal file
@@ -0,0 +1,35 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
pytorch_model.bin filter=lfs diff=lfs merge=lfs -text
|
||||
196
README.md
Normal file
196
README.md
Normal file
@@ -0,0 +1,196 @@
|
||||
---
|
||||
datasets:
|
||||
- tiiuae/falcon-refinedweb
|
||||
language:
|
||||
- en
|
||||
inference: false
|
||||
license: apache-2.0
|
||||
---
|
||||
|
||||
# Falcon-RW-1B
|
||||
|
||||
**Falcon-RW-1B is a 1B parameters causal decoder-only model built by [TII](https://www.tii.ae) and trained on 350B tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb). It is made available under the Apache 2.0 license.**
|
||||
|
||||
See the 📓 [paper on arXiv](https://arxiv.org/abs/2306.01116) for more details.
|
||||
|
||||
RefinedWeb is a high-quality web dataset built by leveraging stringent filtering and large-scale deduplication. Falcon-RW-1B, trained on RefinedWeb only, matches or outperforms comparable models trained on curated data.
|
||||
|
||||
⚠️ Falcon is now available as a core model in the `transformers` library! To use the in-library version, please install the latest version of `transformers` with `pip install git+https://github.com/huggingface/transformers.git`, then simply remove the `trust_remote_code=True` argument from `from_pretrained()`.
|
||||
|
||||
⚠️ This model is intended for use as a **research artifact**, to study the influence of training on web data alone. **If you are interested in state-of-the-art models, we recommend using Falcon-[7B](https://huggingface.co/tiiuae/falcon-7b)/[40B](https://huggingface.co/tiiuae/falcon-40b), both trained on >1,000 billion tokens.**
|
||||
|
||||
```python
|
||||
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||||
import transformers
|
||||
import torch
|
||||
|
||||
model = "tiiuae/falcon-rw-1b"
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained(model)
|
||||
pipeline = transformers.pipeline(
|
||||
"text-generation",
|
||||
model=model,
|
||||
tokenizer=tokenizer,
|
||||
torch_dtype=torch.bfloat16,
|
||||
device_map="auto",
|
||||
)
|
||||
sequences = pipeline(
|
||||
"Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:",
|
||||
max_length=200,
|
||||
do_sample=True,
|
||||
top_k=10,
|
||||
num_return_sequences=1,
|
||||
eos_token_id=tokenizer.eos_token_id,
|
||||
)
|
||||
for seq in sequences:
|
||||
print(f"Result: {seq['generated_text']}")
|
||||
|
||||
```
|
||||
|
||||
💥 **Falcon LLMs require PyTorch 2.0 for use with `transformers`!**
|
||||
|
||||
|
||||
|
||||
# Model Card for Falcon-RW-1B
|
||||
|
||||
## Model Details
|
||||
|
||||
### Model Description
|
||||
|
||||
- **Developed by:** [https://www.tii.ae](https://www.tii.ae);
|
||||
- **Model type:** Causal decoder-only;
|
||||
- **Language(s) (NLP):** English;
|
||||
- **License:** Apache 2.0.
|
||||
|
||||
### Model Source
|
||||
|
||||
- **Paper:** [https://arxiv.org/abs/2306.01116](https://arxiv.org/abs/2306.01116).
|
||||
|
||||
## Uses
|
||||
|
||||
### Direct Use
|
||||
|
||||
Research on large language models, specifically the influence of adequately filtered and deduplicated web data on the properties of large language models (fairness, safety, limitations, capabilities, etc.).
|
||||
|
||||
### Out-of-Scope Use
|
||||
|
||||
Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.
|
||||
|
||||
Broadly speaking, we would recommend Falcon-[7B](https://huggingface.co/tiiuae/falcon-7b)/[40B](https://huggingface.co/tiiuae/falcon-40b) for any use not directly related to research on web data pipelines.
|
||||
|
||||
## Bias, Risks, and Limitations
|
||||
|
||||
Falcon-RW-1B is trained on English data only, and will not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online.
|
||||
|
||||
### Recommendations
|
||||
|
||||
We recommend users of Falcon-RW-1B to consider finetuning it for the specific set of tasks of interest, and for guardrails and appropriate precautions to be taken for any production use.
|
||||
|
||||
## How to Get Started with the Model
|
||||
|
||||
|
||||
```python
|
||||
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||||
import transformers
|
||||
import torch
|
||||
|
||||
model = "tiiuae/falcon-rw-1b"
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained(model)
|
||||
pipeline = transformers.pipeline(
|
||||
"text-generation",
|
||||
model=model,
|
||||
tokenizer=tokenizer,
|
||||
torch_dtype=torch.bfloat16,
|
||||
device_map="auto",
|
||||
)
|
||||
sequences = pipeline(
|
||||
"Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:",
|
||||
max_length=200,
|
||||
do_sample=True,
|
||||
top_k=10,
|
||||
num_return_sequences=1,
|
||||
eos_token_id=tokenizer.eos_token_id,
|
||||
)
|
||||
for seq in sequences:
|
||||
print(f"Result: {seq['generated_text']}")
|
||||
|
||||
```
|
||||
|
||||
## Training Details
|
||||
|
||||
### Training Data
|
||||
|
||||
Falcon-RW-1B was trained on 350B tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb), a high-quality filtered and deduplicated web dataset. The data was tokenized with the GPT-2 tokenizer.
|
||||
|
||||
### Training Procedure
|
||||
|
||||
Falcon-RW-1B was trained on 32 A100 40GB GPUs, using only data parallelism with ZeRO.
|
||||
|
||||
#### Training Hyperparameters
|
||||
|
||||
Hyperparameters were adapted from the GPT-3 paper ([Brown et al., 2020](https://arxiv.org/abs/2005.14165)).
|
||||
|
||||
| **Hyperparameter** | **Value** | **Comment** |
|
||||
|--------------------|------------|-------------------------------------------|
|
||||
| Precision | `bfloat16` | |
|
||||
| Optimizer | AdamW | |
|
||||
| Learning rate | 2e-4 | 500M tokens warm-up, cosine decay to 2e-5 |
|
||||
| Weight decay | 1e-1 | |
|
||||
| Batch size | 512 | 4B tokens ramp-up |
|
||||
|
||||
|
||||
#### Speeds, Sizes, Times
|
||||
|
||||
Training happened in early December 2022 and took about six days.
|
||||
|
||||
|
||||
## Evaluation
|
||||
|
||||
See the 📓 [paper on arXiv](https://arxiv.org/abs/2306.01116) for in-depth evaluation.
|
||||
|
||||
|
||||
## Technical Specifications
|
||||
|
||||
### Model Architecture and Objective
|
||||
|
||||
Falcon-RW-1B is a causal decoder-only model trained on a causal language modeling task (i.e., predict the next token).
|
||||
|
||||
The architecture is adapted from the GPT-3 paper ([Brown et al., 2020](https://arxiv.org/abs/2005.14165)), but uses ALiBi ([Ofir et al., 2021](https://arxiv.org/abs/2108.12409)) and FlashAttention ([Dao et al., 2022](https://arxiv.org/abs/2205.14135)).
|
||||
|
||||
| **Hyperparameter** | **Value** | **Comment** |
|
||||
|--------------------|-----------|----------------------------------------|
|
||||
| Layers | 24 | |
|
||||
| `d_model` | 2048 | |
|
||||
| `head_dim` | 64 | Reduced to optimise for FlashAttention |
|
||||
| Vocabulary | 50304 | |
|
||||
| Sequence length | 2048 | |
|
||||
|
||||
### Compute Infrastructure
|
||||
|
||||
#### Hardware
|
||||
|
||||
Falcon-RW-1B was trained on AWS SageMaker, on 32 A100 40GB GPUs in P4d instances.
|
||||
|
||||
#### Software
|
||||
|
||||
Falcon-RW-1B was trained a custom distributed training codebase, Gigatron. It uses a 3D parallelism approach combined with ZeRO and high-performance Triton kernels (FlashAttention, etc.)
|
||||
|
||||
|
||||
## Citation
|
||||
|
||||
```
|
||||
@article{refinedweb,
|
||||
title={The {R}efined{W}eb dataset for {F}alcon {LLM}: outperforming curated corpora with web data, and web data only},
|
||||
author={Guilherme Penedo and Quentin Malartic and Daniel Hesslow and Ruxandra Cojocaru and Alessandro Cappelli and Hamza Alobeidli and Baptiste Pannier and Ebtesam Almazrouei and Julien Launay},
|
||||
journal={arXiv preprint arXiv:2306.01116},
|
||||
eprint={2306.01116},
|
||||
eprinttype = {arXiv},
|
||||
url={https://arxiv.org/abs/2306.01116},
|
||||
year={2023}
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
## Contact
|
||||
falconllm@tii.ae
|
||||
33
config.json
Normal file
33
config.json
Normal file
@@ -0,0 +1,33 @@
|
||||
{
|
||||
"alibi": true,
|
||||
"apply_residual_connection_post_layernorm": false,
|
||||
"architectures": [
|
||||
"FalconForCausalLM"
|
||||
],
|
||||
"attention_dropout": 0.0,
|
||||
"auto_map": {
|
||||
"AutoConfig": "configuration_falcon.FalconConfig",
|
||||
"AutoModel": "modeling_falcon.FalconModel",
|
||||
"AutoModelForSequenceClassification": "modeling_falcon.FalconForSequenceClassification",
|
||||
"AutoModelForTokenClassification": "modeling_falcon.FalconForTokenClassification",
|
||||
"AutoModelForQuestionAnswering": "modeling_falcon.FalconForQuestionAnswering",
|
||||
"AutoModelForCausalLM": "modeling_falcon.FalconForCausalLM"
|
||||
},
|
||||
"bias": true,
|
||||
"bos_token_id": 1,
|
||||
"eos_token_id": 2,
|
||||
"hidden_dropout": 0.0,
|
||||
"hidden_size": 2048,
|
||||
"initializer_range": 0.02,
|
||||
"layer_norm_epsilon": 1e-05,
|
||||
"model_type": "falcon",
|
||||
"multi_query": false,
|
||||
"new_decoder_architecture": false,
|
||||
"num_attention_heads": 32,
|
||||
"num_hidden_layers": 24,
|
||||
"parallel_attn": false,
|
||||
"torch_dtype": "bfloat16",
|
||||
"transformers_version": "4.27.4",
|
||||
"use_cache": true,
|
||||
"vocab_size": 50304
|
||||
}
|
||||
1
configuration.json
Normal file
1
configuration.json
Normal file
@@ -0,0 +1 @@
|
||||
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}
|
||||
147
configuration_falcon.py
Normal file
147
configuration_falcon.py
Normal file
@@ -0,0 +1,147 @@
|
||||
# coding=utf-8
|
||||
# Copyright 2023 the Falcon authors and HuggingFace Inc. team. All rights reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
""" Falcon configuration"""
|
||||
from transformers.configuration_utils import PretrainedConfig
|
||||
from transformers.utils import logging
|
||||
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
|
||||
FALCON_PRETRAINED_CONFIG_ARCHIVE_MAP = {
|
||||
"tiiuae/falcon-40b": "https://huggingface.co/tiiuae/falcon-40b/resolve/main/config.json",
|
||||
"tiiuae/falcon-7b": "https://huggingface.co/tiiuae/falcon-7b/resolve/main/config.json",
|
||||
}
|
||||
|
||||
|
||||
class FalconConfig(PretrainedConfig):
|
||||
r"""
|
||||
This is the configuration class to store the configuration of a [`FalconModel`]. It is used to instantiate a Falcon
|
||||
model according to the specified arguments, defining the model architecture. Instantiating a configuration with the
|
||||
defaults will yield a similar configuration to that of the
|
||||
[tiiuae/falcon-7b](https://huggingface.co/tiiuae/falcon-7b) architecture.
|
||||
|
||||
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
|
||||
documentation from [`PretrainedConfig`] for more information.
|
||||
|
||||
|
||||
Args:
|
||||
vocab_size (`int`, *optional*, defaults to 65024):
|
||||
Vocabulary size of the Falcon model. Defines the number of different tokens that can be represented by the
|
||||
`inputs_ids` passed when calling [`FalconModel`]
|
||||
hidden_size (`int`, *optional*, defaults to 4544):
|
||||
Dimension of the hidden representations.
|
||||
num_hidden_layers (`int`, *optional*, defaults to 32):
|
||||
Number of hidden layers in the Transformer decoder.
|
||||
num_attention_heads (`int`, *optional*, defaults to 71):
|
||||
Number of attention heads for each attention layer in the Transformer encoder.
|
||||
initializer_range (`float`, *optional*, defaults to 0.02):
|
||||
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
|
||||
use_cache (`bool`, *optional*, defaults to `True`):
|
||||
Whether the model should return the last key/values attentions (not used by all models). Only relevant if
|
||||
`config.is_decoder=True`.
|
||||
layer_norm_epsilon (`float`, *optional*, defaults to 1e-5):
|
||||
The epsilon used by the layer normalization layers.
|
||||
hidden_dropout (`float`, *optional*, defaults to 0.0):
|
||||
The dropout probability for MLP layers.
|
||||
attention_dropout (`float`, *optional*, defaults to 0.0):
|
||||
The dropout probability for attention layers.
|
||||
num_kv_heads (`int`, *optional*):
|
||||
Number of key-value heads to use per attention layer. If unset, defaults to the same value as
|
||||
`num_attention_heads`.
|
||||
alibi (`bool`, *optional*, defaults to `False`):
|
||||
Whether to use ALiBi positional biases during self-attention.
|
||||
new_decoder_architecture (`bool`, *optional*, defaults to `False`):
|
||||
Whether to use the new (Falcon-40B) decoder architecture. If `True`, the `multi_query` and `parallel_attn`
|
||||
arguments are ignored, as the new decoder always uses parallel attention.
|
||||
multi_query (`bool`, *optional*, defaults to `True`):
|
||||
Whether to use multi-query attention in the decoder. Ignored when `new_decoder_architecture` is `True`.
|
||||
parallel_attn (`bool`, *optional*, defaults to `True`):
|
||||
Whether to compute attention in parallel with the feedforward layer. If False, they are consecutive
|
||||
instead, as in the original Transformer architecture. Ignored when `new_decoder_architecture` is `True`.
|
||||
bias (`bool`, *optional*, defaults to `False`):
|
||||
Whether to use bias on Linear layers.
|
||||
bos_token_id (`int`, *optional*, defaults to 11):
|
||||
The id of the "beginning-of-sequence" token.
|
||||
eos_token_id (`int`, *optional*, defaults to 11):
|
||||
The id of the "end-of-sequence" token.
|
||||
|
||||
Example:
|
||||
|
||||
```python
|
||||
>>> from transformers import FalconModel, FalconConfig
|
||||
|
||||
>>> # Initializing a small (2-layer) Falcon configuration
|
||||
>>> configuration = FalconConfig(num_hidden_layers=2)
|
||||
|
||||
>>> # Initializing a model from the small configuration
|
||||
>>> model = FalconModel(configuration)
|
||||
|
||||
>>> # Accessing the model configuration
|
||||
>>> configuration = model.config
|
||||
```"""
|
||||
model_type = "falcon"
|
||||
keys_to_ignore_at_inference = ["past_key_values"]
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
vocab_size=65024,
|
||||
hidden_size=4544,
|
||||
num_hidden_layers=32,
|
||||
num_attention_heads=71,
|
||||
layer_norm_epsilon=1e-5,
|
||||
initializer_range=0.02,
|
||||
use_cache=True,
|
||||
hidden_dropout=0.0,
|
||||
attention_dropout=0.0,
|
||||
num_kv_heads=None,
|
||||
alibi=False,
|
||||
new_decoder_architecture=False,
|
||||
multi_query=True,
|
||||
parallel_attn=True,
|
||||
bias=False,
|
||||
bos_token_id=11,
|
||||
eos_token_id=11,
|
||||
**kwargs,
|
||||
):
|
||||
self.vocab_size = vocab_size
|
||||
# Backward compatibility with n_embed kwarg
|
||||
n_embed = kwargs.pop("n_embed", None)
|
||||
self.hidden_size = hidden_size if n_embed is None else n_embed
|
||||
self.num_hidden_layers = num_hidden_layers
|
||||
self.num_attention_heads = num_attention_heads
|
||||
self.layer_norm_epsilon = layer_norm_epsilon
|
||||
self.initializer_range = initializer_range
|
||||
self.use_cache = use_cache
|
||||
self.hidden_dropout = hidden_dropout
|
||||
self.attention_dropout = attention_dropout
|
||||
|
||||
self.bos_token_id = bos_token_id
|
||||
self.eos_token_id = eos_token_id
|
||||
self.num_kv_heads = num_attention_heads if num_kv_heads is None else num_kv_heads
|
||||
self.alibi = alibi
|
||||
self.new_decoder_architecture = new_decoder_architecture
|
||||
self.multi_query = multi_query # Ignored when new_decoder_architecture is True
|
||||
self.parallel_attn = parallel_attn
|
||||
self.bias = bias
|
||||
|
||||
super().__init__(bos_token_id=bos_token_id, eos_token_id=eos_token_id, **kwargs)
|
||||
|
||||
@property
|
||||
def head_dim(self):
|
||||
return self.hidden_size // self.num_attention_heads
|
||||
|
||||
@property
|
||||
def rotary(self):
|
||||
return not self.alibi
|
||||
6
generation_config.json
Normal file
6
generation_config.json
Normal file
@@ -0,0 +1,6 @@
|
||||
{
|
||||
"_from_model_config": true,
|
||||
"bos_token_id": 1,
|
||||
"eos_token_id": 2,
|
||||
"transformers_version": "4.31.0.dev0"
|
||||
}
|
||||
50001
merges.txt
Normal file
50001
merges.txt
Normal file
File diff suppressed because it is too large
Load Diff
1262
modeling_falcon.py
Normal file
1262
modeling_falcon.py
Normal file
File diff suppressed because it is too large
Load Diff
3
pytorch_model.bin
Normal file
3
pytorch_model.bin
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:3a0d68f0309c8f7ec913f51edf8bf2eca849477c9f53db727f49ffa2f6019251
|
||||
size 2623348889
|
||||
5
special_tokens_map.json
Normal file
5
special_tokens_map.json
Normal file
@@ -0,0 +1,5 @@
|
||||
{
|
||||
"bos_token": "<|endoftext|>",
|
||||
"eos_token": "<|endoftext|>",
|
||||
"unk_token": "<|endoftext|>"
|
||||
}
|
||||
100305
tokenizer.json
Normal file
100305
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
9
tokenizer_config.json
Normal file
9
tokenizer_config.json
Normal file
@@ -0,0 +1,9 @@
|
||||
{
|
||||
"add_prefix_space": false,
|
||||
"bos_token": "<|endoftext|>",
|
||||
"clean_up_tokenization_spaces": true,
|
||||
"eos_token": "<|endoftext|>",
|
||||
"model_max_length": 1024,
|
||||
"tokenizer_class": "GPT2Tokenizer",
|
||||
"unk_token": "<|endoftext|>"
|
||||
}
|
||||
1
vocab.json
Normal file
1
vocab.json
Normal file
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user