初始化项目，由ModelHub XC社区提供模型

Model: yokebtc/advertise Source: Original Platform
2026-05-07 04:33:49 +08:00
commit ade13e1d3b
23 changed files with 117298 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,34 @@
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.bin.* filter=lfs diff=lfs merge=lfs -text
 *.bz2 filter=lfs diff=lfs merge=lfs -text
 *.ftz filter=lfs diff=lfs merge=lfs -text
 *.gz filter=lfs diff=lfs merge=lfs -text
 *.h5 filter=lfs diff=lfs merge=lfs -text
 *.joblib filter=lfs diff=lfs merge=lfs -text
 *.lfs.* filter=lfs diff=lfs merge=lfs -text
 *.model filter=lfs diff=lfs merge=lfs -text
 *.msgpack filter=lfs diff=lfs merge=lfs -text
 *.onnx filter=lfs diff=lfs merge=lfs -text
 *.ot filter=lfs diff=lfs merge=lfs -text
 *.parquet filter=lfs diff=lfs merge=lfs -text
 *.pb filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
 *.rar filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zstandard filter=lfs diff=lfs merge=lfs -text
 *.tfevents* filter=lfs diff=lfs merge=lfs -text
 *.db* filter=lfs diff=lfs merge=lfs -text
 *.ark* filter=lfs diff=lfs merge=lfs -text
 **/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
 **/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
 **/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
 *.safetensors filter=lfs diff=lfs merge=lfs -text
 *.ckpt filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,96 @@
 ---
 frameworks:
 - Tensorflow
 license: Apache License 2.0
 tasks:
 - text-generation
 ---
 ###### 该模型当前使用的是默认介绍模版，处于“预发布”阶段，页面仅限所有者可见。
 ###### 请根据[模型贡献文档说明](https://www.modelscope.cn/docs/%E5%A6%82%E4%BD%95%E6%92%B0%E5%86%99%E5%A5%BD%E7%94%A8%E7%9A%84%E6%A8%A1%E5%9E%8B%E5%8D%A1%E7%89%87)，及时完善模型卡片内容。ModelScope平台将在模型卡片完善后展示。谢谢您的理解。
 #### Clone with HTTP
 ```bash
 git clone https://www.modelscope.cn/yokebtc/advertise.git
 ```
 ![demo.png](http://minio.datapandora.cn/chatgpt/screencapture-127-0-0-1-6006-2024-01-31-15_37_23y.png) | [demo.png](http://minio.datapandora.cn/chatgpt/screencapture-127-0-0-1-6006-2024-01-31-15_37_23y.png)
 ### 4.2 微调步骤
 #### 4.2.1 准备工作
 > xtuner 是从国内的 ModelScope 平台下载 MS-Agent 数据集，因此不用提前手动下载数据集文件。
 ```bash
 # 准备工作
 mkdir ~/ft-msagent && cd ~/ft-msagent
 cp -r ~/ft-oasst1/internlm-chat-7b .
 # 查看配置文件
 xtuner list-cfg | grep msagent
 # 复制配置文件到当前目录
 xtuner copy-cfg internlm_7b_qlora_msagent_react_e3_gpu8 .
 # 修改配置文件中的模型为本地路径
 vim ./internlm_7b_qlora_msagent_react_e3_gpu8_copy.py 
 ```
 ```diff
 - pretrained_model_name_or_path = 'internlm/internlm-chat-7b'
 + pretrained_model_name_or_path = './internlm-chat-7b'
 ```
 #### 4.2.2 开始微调
 ```Bash
 xtuner train ./internlm_7b_qlora_msagent_react_e3_gpu8_copy.py --deepspeed deepspeed_zero2
 ```
 ### 4.3 直接使用
 > 由于 msagent 的训练非常费时，大家如果想尽快把这个教程跟完，可以直接从 modelScope 拉取咱们已经微调好了的 Adapter。如下演示。
 #### 4.3.1 下载 Adapter
 ```Bash
 cd ~/ft-msagent
 apt install git git-lfs
 git lfs install
 git lfs clone https://www.modelscope.cn/xtuner/internlm-7b-qlora-msagent-react.git
 ```
 OK，现在目录应该长这样：
 - internlm_7b_qlora_msagent_react_e3_gpu8_copy.py
 - internlm-7b-qlora-msagent-react
 - internlm-chat-7b
 - work_dir（可有可无）
 有了这个在 msagent 上训练得到的Adapter，模型现在已经有 agent 能力了！就可以加 --lagent 以调用来自 lagent 的代理功能了！
 #### 4.3.2 添加 serper 环境变量
 > **开始 chat 之前，还要加个 serper 的环境变量：**
 > 
 > 去 serper.dev 免费注册一个账号，生成自己的 api key。这个东西是用来给 lagent 去获取 google 搜索的结果的。等于是 serper.dev 帮你去访问 google，而不是从你自己本地去访问 google 了。
 ![kDSdpQrhHfTWYsc.png](imgs/serper.png)
 添加 serper api key 到环境变量：
 ```bash
 export SERPER_API_KEY=abcdefg
 ```
 #### 4.3.3 xtuner + agent，启动！
 ```bash
 xtuner chat ./internlm-chat-7b --adapter internlm-7b-qlora-msagent-react --lagent
 ```
 #### 4.3.4 报错处理
 xtuner chat 增加 --lagent 参数后，报错 ```TypeError: transfomers.modelsauto.auto factory. BaseAutoModelClass.from pretrained() got multiple values for keyword argument "trust remote code"```	
 注释掉已安装包中的代码：
 ```bash
 vim /root/xtuner019/xtuner/xtuner/tools/chat.py
 ```
--- a/added_tokens.json
+++ b/added_tokens.json
@@ -0,0 +1,5 @@
 {
  "</s>": 2,
  "<s>": 1,
  "<unk>": 0
 }
--- a/config.json
+++ b/config.json
@@ -0,0 +1,29 @@
 {
  "_name_or_path": "./internlm-chat-7b",
  "architectures": [
    "InternLMForCausalLM"
  ],
  "auto_map": {
    "AutoConfig": "configuration_internlm.InternLMConfig",
    "AutoModel": "modeling_internlm.InternLMForCausalLM",
    "AutoModelForCausalLM": "modeling_internlm.InternLMForCausalLM"
  },
  "bias": true,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 11008,
  "max_position_embeddings": 2048,
  "model_type": "internlm",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "pad_token_id": 2,
  "rms_norm_eps": 1e-06,
  "tie_word_embeddings": false,
  "torch_dtype": "float16",
  "transformers_version": "4.34.0",
  "use_cache": true,
  "vocab_size": 103168
 }
--- a/configuration.json
+++ b/configuration.json
@@ -0,0 +1 @@
 {"framework":"Tensorflow","task":"text-generation"}
--- a/configuration_internlm.py
+++ b/configuration_internlm.py
@@ -0,0 +1,120 @@
 # coding=utf-8
 # Copyright 2022 EleutherAI and the HuggingFace Inc. team. All rights reserved.
 #
 # This code is based on EleutherAI's GPT-NeoX library and the GPT-NeoX
 # and OPT implementations in this library. It has been modified from its
 # original forms to accommodate minor architectural differences compared
 # to GPT-NeoX and OPT used by the Meta AI team that trained the model.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 """ InternLM model configuration"""
 from transformers.utils import logging
 from transformers.configuration_utils import PretrainedConfig
 logger = logging.get_logger(__name__)
 INTERNLM_PRETRAINED_CONFIG_ARCHIVE_MAP = {}
 class InternLMConfig(PretrainedConfig):
    r"""
    This is the configuration class to store the configuration of a [`InternLMModel`]. It is used to instantiate an InternLM
    model according to the specified arguments, defining the model architecture. Instantiating a configuration with the
    defaults will yield a similar configuration to that of the InternLM-7B.
    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
    documentation from [`PretrainedConfig`] for more information.
    Args:
        vocab_size (`int`, *optional*, defaults to 32000):
            Vocabulary size of the InternLM model. Defines the number of different tokens that can be represented by the
            `inputs_ids` passed when calling [`InternLMModel`]
        hidden_size (`int`, *optional*, defaults to 4096):
            Dimension of the hidden representations.
        intermediate_size (`int`, *optional*, defaults to 11008):
            Dimension of the MLP representations.
        num_hidden_layers (`int`, *optional*, defaults to 32):
            Number of hidden layers in the Transformer encoder.
        num_attention_heads (`int`, *optional*, defaults to 32):
            Number of attention heads for each attention layer in the Transformer encoder.
        hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
            The non-linear activation function (function or string) in the decoder.
        max_position_embeddings (`int`, *optional*, defaults to 2048):
            The maximum sequence length that this model might ever be used with. Typically set this to something large
            just in case (e.g., 512 or 1024 or 2048).
        initializer_range (`float`, *optional*, defaults to 0.02):
            The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
        rms_norm_eps (`float`, *optional*, defaults to 1e-12):
            The epsilon used by the rms normalization layers.
        use_cache (`bool`, *optional*, defaults to `True`):
            Whether or not the model should return the last key/values attentions (not used by all models). Only
            relevant if `config.is_decoder=True`.
        tie_word_embeddings(`bool`, *optional*, defaults to `False`):
            Whether to tie weight embeddings
        Example:
    ```python
    >>> from transformers import InternLMModel, InternLMConfig
    >>> # Initializing a InternLM internlm-7b style configuration
    >>> configuration = InternLMConfig()
    >>> # Initializing a model from the internlm-7b style configuration
    >>> model = InternLMModel(configuration)
    >>> # Accessing the model configuration
    >>> configuration = model.config
    ```"""
    model_type = "internlm"
    _auto_class = "AutoConfig"
    def __init__(
        self,
        vocab_size=103168,
        hidden_size=4096,
        intermediate_size=11008,
        num_hidden_layers=32,
        num_attention_heads=32,
        hidden_act="silu",
        max_position_embeddings=2048,
        initializer_range=0.02,
        rms_norm_eps=1e-6,
        use_cache=True,
        pad_token_id=0,
        bos_token_id=1,
        eos_token_id=2,
        tie_word_embeddings=False,
        bias=True,
        **kwargs,
    ):
        self.vocab_size = vocab_size
        self.max_position_embeddings = max_position_embeddings
        self.hidden_size = hidden_size
        self.intermediate_size = intermediate_size
        self.num_hidden_layers = num_hidden_layers
        self.num_attention_heads = num_attention_heads
        self.hidden_act = hidden_act
        self.initializer_range = initializer_range
        self.rms_norm_eps = rms_norm_eps
        self.use_cache = use_cache
        self.bias = bias
        super().__init__(
            pad_token_id=pad_token_id,
            bos_token_id=bos_token_id,
            eos_token_id=eos_token_id,
            tie_word_embeddings=tie_word_embeddings,
            **kwargs,
        )
--- a/dev.json
+++ b/dev.json
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,7 @@
 {
  "_from_model_config": true,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "pad_token_id": 2,
  "transformers_version": "4.34.0"
 }
--- a/modeling_internlm.py
+++ b/modeling_internlm.py
--- a/pytorch_model-00001-of-00008.bin
+++ b/pytorch_model-00001-of-00008.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:a6540a93174b91b91ae92a5297e47bdaf3bc1de127598c4d2788e122b867be2e
 size 1969371359
--- a/pytorch_model-00002-of-00008.bin
+++ b/pytorch_model-00002-of-00008.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:117da6c6681dc70e592db9b01d72f2bf52144bd60a93aac3fafe4b3a1fa00ab6
 size 1933845097
--- a/pytorch_model-00003-of-00008.bin
+++ b/pytorch_model-00003-of-00008.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:ed8b10e7181070a15deebb461c23282045507c847dd4f1c976bde4fecc2bc0c8
 size 1933845161
--- a/pytorch_model-00004-of-00008.bin
+++ b/pytorch_model-00004-of-00008.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:c64e74c59136727e036c10889546c9cf1e89f94b8efa1d64d64b780ebe771a8b
 size 1990459141
--- a/pytorch_model-00005-of-00008.bin
+++ b/pytorch_model-00005-of-00008.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:51bf2985d85baed35eff9bff4e3dbf80fe923cd681599474a6602138bb4298d0
 size 1990459735
--- a/pytorch_model-00006-of-00008.bin
+++ b/pytorch_model-00006-of-00008.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:e33e801e2c9ff34328dc24b7b83f4ebcd4dc387ca2b1b86f41abd7b2d49fb058
 size 1990459735
--- a/pytorch_model-00007-of-00008.bin
+++ b/pytorch_model-00007-of-00008.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:0be0b23180781a22903bfa2d2cccb2e5a140e248891454afa2c4d355c0c220b2
 size 1990468265
--- a/pytorch_model-00008-of-00008.bin
+++ b/pytorch_model-00008-of-00008.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:d8553c3ce27e157933280b9f39e90d5d51be98d95f2d0c7f6999e4312a41325e
 size 845153194
--- a/pytorch_model.bin.index.json
+++ b/pytorch_model.bin.index.json
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:08d58e27929891185728a9257559c2e4317dd08f28490397ee0b03e6a023c941
 size 37116
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,6 @@
 {
  "bos_token": "<s>",
  "eos_token": "</s>",
  "pad_token": "</s>",
  "unk_token": "<unk>"
 }
--- a/tokenization_internlm.py
+++ b/tokenization_internlm.py
@@ -0,0 +1,242 @@
 # coding=utf-8
 # Copyright 2022 EleutherAI and the HuggingFace Inc. team. All rights reserved.
 #
 # This code is based on EleutherAI's GPT-NeoX library and the GPT-NeoX
 # and OPT implementations in this library. It has been modified from its
 # original forms to accommodate minor architectural differences compared
 # to GPT-NeoX and OPT used by the Meta AI team that trained the model.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 """Tokenization classes for IntermLM."""
 import os
 from shutil import copyfile
 from typing import Any, Dict, List, Optional, Tuple
 import sentencepiece as spm
 from transformers.tokenization_utils import PreTrainedTokenizer
 from transformers.utils import logging
 logger = logging.get_logger(__name__)
 VOCAB_FILES_NAMES = {"vocab_file": "./tokenizer.model"}
 PRETRAINED_VOCAB_FILES_MAP = {}
 class InternLMTokenizer(PreTrainedTokenizer):
    """
    Construct a InternLM tokenizer. Based on byte-level Byte-Pair-Encoding.
    Args:
        vocab_file (`str`):
            Path to the vocabulary file.
    """
    vocab_files_names = VOCAB_FILES_NAMES
    pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
    model_input_names = ["input_ids", "attention_mask"]
    _auto_class = "AutoTokenizer"
    def __init__(
        self,
        vocab_file,
        unk_token="<unk>",
        bos_token="<s>",
        eos_token="</s>",
        pad_token="</s>",
        sp_model_kwargs: Optional[Dict[str, Any]] = None,
        add_bos_token=True,
        add_eos_token=False,
        decode_with_prefix_space=False,
        clean_up_tokenization_spaces=False,
        **kwargs,
    ):
        self.sp_model_kwargs = {} if sp_model_kwargs is None else sp_model_kwargs
        self.vocab_file = vocab_file
        self.add_bos_token = add_bos_token
        self.add_eos_token = add_eos_token
        self.decode_with_prefix_space = decode_with_prefix_space
        self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwargs)
        self.sp_model.Load(vocab_file)
        self._no_prefix_space_tokens = None
        super().__init__(
            bos_token=bos_token,
            eos_token=eos_token,
            unk_token=unk_token,
            pad_token=pad_token,
            clean_up_tokenization_spaces=clean_up_tokenization_spaces,
            **kwargs,
        )
        """ Initialization"""
    @property
    def no_prefix_space_tokens(self):
        if self._no_prefix_space_tokens is None:
            vocab = self.convert_ids_to_tokens(list(range(self.vocab_size)))
            self._no_prefix_space_tokens = {i for i, tok in enumerate(vocab) if not tok.startswith("▁")}
        return self._no_prefix_space_tokens
    @property
    def vocab_size(self):
        """Returns vocab size"""
        return self.sp_model.get_piece_size()
    @property
    def bos_token_id(self) -> Optional[int]:
        return self.sp_model.bos_id()
    @property
    def eos_token_id(self) -> Optional[int]:
        return self.sp_model.eos_id()
    def get_vocab(self):
        """Returns vocab as a dict"""
        vocab = {self.convert_ids_to_tokens(i): i for i in range(self.vocab_size)}
        vocab.update(self.added_tokens_encoder)
        return vocab
    def _tokenize(self, text):
        """Returns a tokenized string."""
        return self.sp_model.encode(text, out_type=str)
    def _convert_token_to_id(self, token):
        """Converts a token (str) in an id using the vocab."""
        return self.sp_model.piece_to_id(token)
    def _convert_id_to_token(self, index):
        """Converts an index (integer) in a token (str) using the vocab."""
        token = self.sp_model.IdToPiece(index)
        return token
    def _maybe_add_prefix_space(self, tokens, decoded):
        if tokens and tokens[0] not in self.no_prefix_space_tokens:
            return " " + decoded
        else:
            return decoded
    def convert_tokens_to_string(self, tokens):
        """Converts a sequence of tokens (string) in a single string."""
        current_sub_tokens = []
        out_string = ""
        prev_is_special = False
        for token in tokens:
            # make sure that special tokens are not decoded using sentencepiece model
            if token in self.all_special_tokens:
                if not prev_is_special:
                    out_string += " "
                out_string += self.sp_model.decode(current_sub_tokens) + token
                prev_is_special = True
                current_sub_tokens = []
            else:
                current_sub_tokens.append(token)
                prev_is_special = False
        out_string += self.sp_model.decode(current_sub_tokens)
        out_string = self.clean_up_tokenization(out_string)
        out_string = self._maybe_add_prefix_space(tokens=tokens, decoded=out_string)
        return out_string[1:]
    def save_vocabulary(self, save_directory, filename_prefix: Optional[str] = None) -> Tuple[str]:
        """
        Save the vocabulary and special tokens file to a directory.
        Args:
            save_directory (`str`):
                The directory in which to save the vocabulary.
        Returns:
            `Tuple(str)`: Paths to the files saved.
        """
        if not os.path.isdir(save_directory):
            logger.error(f"Vocabulary path ({save_directory}) should be a directory")
            return
        out_vocab_file = os.path.join(
            save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["vocab_file"]
        )
        if os.path.abspath(self.vocab_file) != os.path.abspath(out_vocab_file) and os.path.isfile(self.vocab_file):
            copyfile(self.vocab_file, out_vocab_file)
        elif not os.path.isfile(self.vocab_file):
            with open(out_vocab_file, "wb") as fi:
                content_spiece_model = self.sp_model.serialized_model_proto()
                fi.write(content_spiece_model)
        return (out_vocab_file,)
    def build_inputs_with_special_tokens(self, token_ids_0, token_ids_1=None):
        if self.add_bos_token:
            bos_token_ids = [self.bos_token_id]
        else:
            bos_token_ids = []
        output = bos_token_ids + token_ids_0
        if token_ids_1 is not None:
            output = output + token_ids_1
        if self.add_eos_token:
            output = output + [self.eos_token_id]
        return output
    def get_special_tokens_mask(
        self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None, already_has_special_tokens: bool = False
    ) -> List[int]:
        """
        Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding
        special tokens using the tokenizer `prepare_for_model` method.
        Args:
            token_ids_0 (`List[int]`):
                List of IDs.
            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.
            already_has_special_tokens (`bool`, *optional*, defaults to `False`):
                Whether or not the token list is already formatted with special tokens for the model.
        Returns:
            `List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
        """
        if already_has_special_tokens:
            return super().get_special_tokens_mask(
                token_ids_0=token_ids_0, token_ids_1=token_ids_1, already_has_special_tokens=True
            )
        if token_ids_1 is None:
            return [1] + ([0] * len(token_ids_0)) + [1]
        return [1] + ([0] * len(token_ids_0)) + [1, 1] + ([0] * len(token_ids_1)) + [1]
    def create_token_type_ids_from_sequences(
        self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
    ) -> List[int]:
        """
        Create a mask from the two sequences passed to be used in a sequence-pair classification task. T5 does not make
        use of token type ids, therefore a list of zeros is returned.
        Args:
            token_ids_0 (`List[int]`):
                List of IDs.
            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.
        Returns:
            `List[int]`: List of zeros.
        """
        eos = [self.eos_token_id]
        if token_ids_1 is None:
            return len(token_ids_0 + eos) * [0]
        return len(token_ids_0 + eos + token_ids_1 + eos) * [0]
--- a/tokenizer.model
+++ b/tokenizer.model
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:aab622d98c98677a1a51f969e25765154487bf3e85c7819db105db2fcacba83f
 size 1658691
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,44 @@
 {
  "added_tokens_decoder": {
    "0": {
      "content": "<unk>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "1": {
      "content": "<s>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "2": {
      "content": "</s>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    }
  },
  "additional_special_tokens": [],
  "auto_map": {
    "AutoTokenizer": [
      "tokenization_internlm.InternLMTokenizer",
      null
    ]
  },
  "bos_token": "<s>",
  "clean_up_tokenization_spaces": false,
  "encode_special_tokens": true,
  "eos_token": "</s>",
  "model_max_length": 1000000000000000019884624838656,
  "pad_token": "</s>",
  "tokenizer_class": "InternLMTokenizer",
  "tokenizer_file": null,
  "unk_token": "<unk>"
 }
--- a/train.json
+++ b/train.json
		`@@ -0,0 +1 @@`
							`{"framework":"Tensorflow","task":"text-generation"}`