初始化项目，由ModelHub XC社区提供模型

Model: AI-ModelScope/falcon-7b-instruct Source: Original Platform
2026-05-15 01:32:43 +08:00
commit 128907e5ff
16 changed files with 131666 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,34 @@
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.bz2 filter=lfs diff=lfs merge=lfs -text
 *.ckpt filter=lfs diff=lfs merge=lfs -text
 *.ftz filter=lfs diff=lfs merge=lfs -text
 *.gz filter=lfs diff=lfs merge=lfs -text
 *.h5 filter=lfs diff=lfs merge=lfs -text
 *.joblib filter=lfs diff=lfs merge=lfs -text
 *.lfs.* filter=lfs diff=lfs merge=lfs -text
 *.mlmodel filter=lfs diff=lfs merge=lfs -text
 *.model filter=lfs diff=lfs merge=lfs -text
 *.msgpack filter=lfs diff=lfs merge=lfs -text
 *.npy filter=lfs diff=lfs merge=lfs -text
 *.npz filter=lfs diff=lfs merge=lfs -text
 *.onnx filter=lfs diff=lfs merge=lfs -text
 *.ot filter=lfs diff=lfs merge=lfs -text
 *.parquet filter=lfs diff=lfs merge=lfs -text
 *.pb filter=lfs diff=lfs merge=lfs -text
 *.pickle filter=lfs diff=lfs merge=lfs -text
 *.pkl filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
 *.rar filter=lfs diff=lfs merge=lfs -text
 *.safetensors filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text
 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,109 @@
 ---
 tasks:
 - text-generation
 language:
 - en
 license: Apache License 2.0
 ---
 # ✨ Falcon-7B-Instruct
 **Falcon-7B-Instruct 是 [TII](https://www.tii.ae) 在 [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b)的基础上建立的7B参数因果解码器专用模型，并在chat/instruct数据集的混合中进行了微调。它是在Apache 2.0许可下提供的。**
 *Paper coming soon 😊.*
 ## 为什么使用Falcon-7B-Instruct?
 * **您正在寻找一个基于[Falcon-7B](https://huggingface.co/tiiuae/falcon-7b)的即用型chat/instruct模型.**
 * **猎鹰-7B是一个强大的基础模型，性能优于可比的开源模型** (e.g., [MPT-7B](https://huggingface.co/mosaicml/mpt-7b), [StableLM](https://github.com/Stability-AI/StableLM), [RedPajama](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-7B-v0.1) etc.), 得益于在1,500B tokens 的 [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) 通过精心策划的语料库来加强。See the [OpenLLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
 * **它有一个为推理而优化的架构**, with FlashAttention ([Dao et al., 2022](https://arxiv.org/abs/2205.14135)) and multiquery ([Shazeer et al., 2019](https://arxiv.org/abs/1911.02150)). 
 💬 **这是一个指导性的模型，对于进一步的微调可能并不理想。** 如果你有兴趣建立你自己的指示/聊天模型，我们建议从[猎鹰-7B](https://huggingface.co/tiiuae/falcon-7b)开始. 
 🔥 **想找一个更强大的模型吗？** [Falcon-40B-Instruct](https://huggingface.co/tiiuae/falcon-40b-instruct) 是Falcon-7B-Instruct的大哥!
 ```python
 from modelscope.utils.constant import Tasks
 from modelscope.pipelines import pipeline
 pipe = pipeline(task=Tasks.text_generation, model='AI-ModelScope/falcon-7b-instruct', model_revision='v1.0.1', device='cuda')
 query="Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:"
 result = pipe(query)
 print(result)
 ```
 💥 **Falcon LLMs require PyTorch 2.0 for use with `transformers`!**
 # Model Card for Falcon-7B-Instruct
 ## 模型细节
 ### 模型描述
 - **开发者/单位:** [https://www.tii.ae](https://www.tii.ae);
 - **模型类型:** Causal decoder-only;
 - **语言(NLP):** English and French;
 - **许可证:** Apache 2.0;
 - **根据模型进行微调:** [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b).
 ### 模型来源
 - **Paper:** *coming soon*.
 ## 用途
 ### 直接使用
 猎鹰-7B-Instruct已经在指示和聊天数据集的混合中进行了微调。
 ### 范围外的使用
 在没有充分评估风险和缓解措施的情况下进行生产使用；任何可能被认为是不负责任或有害的使用情况。
 ## 偏见、风险和局限性
 Falcon-7B-Instruct主要是在英语数据上训练的，不会适当地推广到其他语言。此外，由于它是在代表网络的大规模语料库上训练的，它将带有网上常见的定型观念和偏见。
 ### 建议
 我们建议Falcon-7B-Instruct的用户制定护栏，并对任何生产使用采取适当的预防措施。
 ## 训练细节
 ### 训练数据
 Falcon-7B-Instruct在250M tokens混合的指示/聊天数据集上进行了微调。
 | **Data source**    | **Fraction** | **Tokens** | **Description**                       |
 |--------------------|--------------|------------|-----------------------------------|
 | [Bai ze](https://github.com/project-baize/baize-chatbot) | 65%          | 164M     | chat                 |
 | [GPT4All](https://github.com/nomic-ai/gpt4all)              | 25%           | 62M       | instruct                                  |
 | [GPTeacher](https://github.com/teknium1/GPTeacher)      | 5%           | 11M        | instruct |
 | [RefinedWeb-English](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) | 5%          | 13M     | massive web crawl                 |
 The data was tokenized with the Falcon-[7B](https://huggingface.co/tiiuae/falcon-7b)/[40B](https://huggingface.co/tiiuae/falcon-40b) tokenizer.
 ## 评价
 *Paper coming soon.*
 See the [OpenLLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) for early results.
 请注意，这个模型变体没有针对NLP基准进行优化。
 ## 技术参数
 有关预训练的更多信息, 请见[Falcon-7B](https://huggingface.co/tiiuae/falcon-7b).
 ## 许可证
 Falcon-7B-Instruct is made available under the Apache 2.0 license.
 ## 联系
 falconllm@tii.ae
--- a/config.json
+++ b/config.json
@@ -0,0 +1,28 @@
 {
  "alibi": false,
  "apply_residual_connection_post_layernorm": false,
  "architectures": [
    "RWForCausalLM"
  ],
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "configuration_RW.RWConfig",
    "AutoModelForCausalLM": "modelling_RW.RWForCausalLM"
  },
  "bias": false,
  "bos_token_id": 11,
  "eos_token_id": 11,
  "hidden_dropout": 0.0,
  "hidden_size": 4544,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "RefinedWebModel",
  "multi_query": true,
  "n_head": 71,
  "n_layer": 32,
  "parallel_attn": true,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.27.4",
  "use_cache": true,
  "vocab_size": 65024
 }
--- a/configuration.json
+++ b/configuration.json
@@ -0,0 +1,11 @@
 {
    "framework": "pytorch",
    "task": "text-generation",
    "model": {
        "type": "falcon-7b-instruct"
    },
    "pipeline": {
        "type": "falcon-7b-instruct-text-generation-pipe"
    },
    "allow_remote": true
 }
--- a/configuration_RW.py
+++ b/configuration_RW.py
@@ -0,0 +1,79 @@
 # coding=utf-8
 # Copyright 2022 the Big Science Workshop and HuggingFace Inc. team.  All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 """ Bloom configuration"""
 from transformers.configuration_utils import PretrainedConfig
 from transformers.utils import logging
 logger = logging.get_logger(__name__)
 class RWConfig(PretrainedConfig):
    model_type = "RefinedWebModel"
    keys_to_ignore_at_inference = ["past_key_values"]
    attribute_map = {
        "num_hidden_layers": "n_layer",
        "num_attention_heads": "n_head",
    }
    def __init__(
        self,
        vocab_size=250880,
        hidden_size=64,
        n_layer=2,
        n_head=8,
        layer_norm_epsilon=1e-5,
        initializer_range=0.02,
        use_cache=True,
        bos_token_id=1,
        eos_token_id=2,
        apply_residual_connection_post_layernorm=False,
        hidden_dropout=0.0,
        attention_dropout=0.0,
        multi_query=False,
        alibi=False,
        bias=False,
        parallel_attn=False,
        **kwargs,
    ):
        self.vocab_size = vocab_size
        # Backward compatibility with n_embed kwarg
        n_embed = kwargs.pop("n_embed", None)
        self.hidden_size = hidden_size if n_embed is None else n_embed
        self.n_layer = n_layer
        self.n_head = n_head
        self.layer_norm_epsilon = layer_norm_epsilon
        self.initializer_range = initializer_range
        self.use_cache = use_cache
        self.apply_residual_connection_post_layernorm = apply_residual_connection_post_layernorm
        self.hidden_dropout = hidden_dropout
        self.attention_dropout = attention_dropout
        self.bos_token_id = bos_token_id
        self.eos_token_id = eos_token_id
        self.multi_query = multi_query
        self.alibi = alibi
        self.bias = bias
        self.parallel_attn = parallel_attn
        super().__init__(bos_token_id=bos_token_id, eos_token_id=eos_token_id, **kwargs)
    @property
    def head_dim(self):
        return self.hidden_size // self.n_head
    @property
    def rotary(self):
        return not self.alibi
--- a/coreml/text-generation/falcon-7b-64-float32.mlpackage/Data/com.apple.CoreML/model.mlmodel
+++ b/coreml/text-generation/falcon-7b-64-float32.mlpackage/Data/com.apple.CoreML/model.mlmodel
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:b12b1d5cab8d237975a831477e3cf5997eef5e932636a0654ef1695b04eb9412
 size 396524
--- a/coreml/text-generation/falcon-7b-64-float32.mlpackage/Manifest.json
+++ b/coreml/text-generation/falcon-7b-64-float32.mlpackage/Manifest.json
@@ -0,0 +1,18 @@
 {
    "fileFormatVersion": "1.0.0",
    "itemInfoEntries": {
        "A51073A0-8381-4006-98E6-894FAB63FB3C": {
            "author": "com.apple.CoreML",
            "description": "CoreML Model Weights",
            "name": "weights",
            "path": "com.apple.CoreML/weights"
        },
        "F0BB8952-F8A2-4E8B-ABF3-9C72B4FC8816": {
            "author": "com.apple.CoreML",
            "description": "CoreML Model Specification",
            "name": "model.mlmodel",
            "path": "com.apple.CoreML/model.mlmodel"
        }
    },
    "rootModelIdentifier": "F0BB8952-F8A2-4E8B-ABF3-9C72B4FC8816"
 }
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,6 @@
 {
  "_from_model_config": true,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "transformers_version": "4.27.4"
 }
--- a/modelling_RW.py
+++ b/modelling_RW.py
--- a/ms_wrapper.py
+++ b/ms_wrapper.py
@@ -0,0 +1,75 @@
 import os
 from typing import Any, Dict, Union
 import torch
 import transformers
 from modelscope.models.base import Model, TorchModel
 from modelscope.models.builder import MODELS
 from modelscope.pipelines.base import Pipeline
 from modelscope.pipelines.builder import PIPELINES
 from modelscope.utils.constant import Tasks
 from modelscope.utils.logger import get_logger
 from transformers import AutoModelForCausalLM, AutoTokenizer
 if 'CUDA_VISIBLE_DEVICES' not in os.environ:
    os.environ['CUDA_VISIBLE_DEVICES'] = '0'
@PIPELINES.register_module(
    Tasks.text_generation,
    module_name='falcon-7b-instruct-text-generation-pipe')
 class falcon7binstructTextGenerationPipeline(Pipeline):
    def __init__(self, model: Union[Model, str], *args, **kwargs):
        model = falcon7binstructTextGeneration(model) if isinstance(
            model, str) else model
        super().__init__(model=model, **kwargs)
    def preprocess(self, inputs, **preprocess_params) -> Dict[str, Any]:
        return inputs
    # define the forward pass
    def forward(self, inputs: Dict, **forward_params) -> Dict[str, Any]:
        return self.model(inputs)
    # format the outputs from pipeline
    def postprocess(self, input, **kwargs) -> Dict[str, Any]:
        return input
@MODELS.register_module(Tasks.text_generation,
                        module_name='falcon-7b-instruct')
 class falcon7binstructTextGeneration(TorchModel):
    def __init__(self, model_dir=None, *args, **kwargs):
        super().__init__(model_dir, *args, **kwargs)
        self.logger = get_logger()
        # loading tokenizer
        self.tokenizer = AutoTokenizer.from_pretrained(model_dir)
        self.pipeline = transformers.pipeline(
            "text-generation",
            model=model_dir,
            tokenizer=self.tokenizer,
            torch_dtype=torch.bfloat16,
            trust_remote_code=True,
            device_map="auto",
        )
    def forward(self, input: Dict) -> Dict[str, Any]:
        output = {}
        res = self.infer(input)
        output['text'] = res
        return output
    def quantize(self, bits: int):
        self.model = self.model.quantize(bits)
        return self
    def infer(self, input):
        sequences = self.pipeline(
                input,
                max_length=200,
                do_sample=True,
                top_k=10,
                num_return_sequences=1,
                eos_token_id=self.tokenizer.eos_token_id,
            )
        return sequences
--- a/pytorch_model-00001-of-00002.bin
+++ b/pytorch_model-00001-of-00002.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:66acf4bebb68593952a51575cb02dbf258a606e236c6b82b6b60c3b1e9089e66
 size 9951028193
--- a/pytorch_model-00002-of-00002.bin
+++ b/pytorch_model-00002-of-00002.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:1de823c84b1c8b9889ac2a6c670ec6002a71776abd42cdf51bb3acd4c9938b29
 size 4483421659
--- a/pytorch_model.bin.index.json
+++ b/pytorch_model.bin.index.json
@@ -0,0 +1,203 @@
 {
  "metadata": {
    "total_size": 14434379520
  },
  "weight_map": {
    "lm_head.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.0.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
    "transformer.h.0.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.0.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.0.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.0.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.0.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.1.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
    "transformer.h.1.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.1.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.1.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.1.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.1.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.10.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
    "transformer.h.10.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.10.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.10.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.10.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.10.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.11.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
    "transformer.h.11.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.11.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.11.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.11.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.11.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.12.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
    "transformer.h.12.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.12.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.12.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.12.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.12.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.13.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
    "transformer.h.13.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.13.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.13.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.13.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.13.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.14.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
    "transformer.h.14.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.14.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.14.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.14.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.14.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.15.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
    "transformer.h.15.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.15.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.15.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.15.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.15.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.16.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
    "transformer.h.16.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.16.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.16.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.16.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.16.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.17.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
    "transformer.h.17.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.17.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.17.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.17.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.17.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.18.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
    "transformer.h.18.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.18.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.18.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.18.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.18.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.19.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
    "transformer.h.19.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.19.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.19.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.19.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.19.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.2.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
    "transformer.h.2.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.2.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.2.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.2.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.2.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.20.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
    "transformer.h.20.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.20.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.20.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.20.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.20.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.21.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
    "transformer.h.21.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.21.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.21.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.21.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.21.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.22.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
    "transformer.h.22.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.22.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.22.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.22.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.22.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.23.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
    "transformer.h.23.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.23.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.23.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.23.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.23.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.24.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
    "transformer.h.24.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.24.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.24.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.24.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.24.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.25.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
    "transformer.h.25.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.25.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.25.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.25.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.25.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.26.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
    "transformer.h.26.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.26.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.26.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.26.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.26.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.27.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
    "transformer.h.27.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.27.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.27.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.27.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.27.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.28.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
    "transformer.h.28.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.28.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.28.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.28.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.28.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.29.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
    "transformer.h.29.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.29.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.29.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.29.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.29.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.3.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
    "transformer.h.3.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.3.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.3.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.3.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.3.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.30.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
    "transformer.h.30.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.30.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.30.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.30.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.30.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.31.input_layernorm.bias": "pytorch_model-00002-of-00002.bin",
    "transformer.h.31.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.31.mlp.dense_4h_to_h.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.31.mlp.dense_h_to_4h.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.31.self_attention.dense.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.31.self_attention.query_key_value.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.h.4.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
    "transformer.h.4.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.4.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.4.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.4.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.4.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.5.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
    "transformer.h.5.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.5.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.5.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.5.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.5.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.6.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
    "transformer.h.6.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.6.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.6.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.6.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.6.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.7.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
    "transformer.h.7.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.7.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.7.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.7.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.7.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.8.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
    "transformer.h.8.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.8.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.8.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.8.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.8.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.9.input_layernorm.bias": "pytorch_model-00001-of-00002.bin",
    "transformer.h.9.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.9.mlp.dense_4h_to_h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.9.mlp.dense_h_to_4h.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.9.self_attention.dense.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.h.9.self_attention.query_key_value.weight": "pytorch_model-00001-of-00002.bin",
    "transformer.ln_f.bias": "pytorch_model-00002-of-00002.bin",
    "transformer.ln_f.weight": "pytorch_model-00002-of-00002.bin",
    "transformer.word_embeddings.weight": "pytorch_model-00001-of-00002.bin"
  }
 }
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,16 @@
 {
  "additional_special_tokens": [
    ">>TITLE<<",
    ">>ABSTRACT<<",
    ">>INTRODUCTION<<",
    ">>SUMMARY<<",
    ">>COMMENT<<",
    ">>ANSWER<<",
    ">>QUESTION<<",
    ">>DOMAIN<<",
    ">>PREFIX<<",
    ">>SUFFIX<<",
    ">>MIDDLE<<"
  ],
  "eos_token": "<|endoftext|>"
 }
--- a/tokenizer.json
+++ b/tokenizer.json
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,8 @@
 {
  "add_prefix_space": false,
  "eos_token": "<|endoftext|>",
  "model_max_length": 2048,
  "name_or_path": "tiiuae/falcon_tokenizer",
  "special_tokens_map_file": null,
  "tokenizer_class": "PreTrainedTokenizerFast"
 }