初始化项目，由ModelHub XC社区提供模型

Model: pfnet/nekomata-7b-pfn-qfin Source: Original Platform
2026-05-03 22:31:43 +08:00
commit 5235be6ff6
19 changed files with 154445 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,39 @@
+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+model-00001-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
+model-00002-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
+model-00003-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
+model-00004-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
--- a/53
+++ b/53
@@ -0,0 +1,53 @@
+Tongyi Qianwen LICENSE AGREEMENT
+
+Tongyi Qianwen Release Date: August 3, 2023
+
+By clicking to agree or by using or distributing any portion or element of the Tongyi Qianwen Materials, you will be deemed to have recognized and accepted the content of this Agreement, which is effective immediately.
+
+1. Definitions
+    a. This Tongyi Qianwen LICENSE AGREEMENT (this "Agreement") shall mean the terms and conditions for use, reproduction, distribution and modification of the Materials as defined by this Agreement.
+    b. "We"(or "Us") shall mean Alibaba Cloud.
+    c. "You" (or "Your") shall mean a natural person or legal entity exercising the rights granted by this Agreement and/or using the Materials for any purpose and in any field of use.
+    d. "Third Parties" shall mean individuals or legal entities that are not under common control with Us or You.
+    e. "Tongyi Qianwen" shall mean the large language models (including Qwen model and Qwen-Chat model), and software and algorithms, consisting of trained model weights, parameters (including optimizer states), machine-learning model code, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Us.
+    f. "Materials" shall mean, collectively, Alibaba Cloud's proprietary Tongyi Qianwen and Documentation (and any portion thereof) made available under this Agreement.
+    g. "Source" form shall mean the preferred form for making modifications, including but not limited to model source code, documentation source, and configuration files.
+    h. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation,
+ and conversions to other media types.
+
+2. Grant of Rights
+You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Alibaba Cloud's intellectual property or other rights owned by Us embodied in the Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Materials.
+
+3. Redistribution
+You may reproduce and distribute copies of the Materials or derivative works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
+    a. You shall give any other recipients of the Materials or derivative works a copy of this Agreement;
+    b. You shall cause any modified files to carry prominent notices stating that You changed the files;
+    c. You shall retain in all copies of the Materials that You distribute the following attribution notices within a "Notice" text file distributed as a part of such copies: "Tongyi Qianwen is licensed under the Tongyi Qianwen LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved."; and
+    d. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such derivative works as a whole, provided Your use, reproduction, and distribution of the work otherwise complies with the terms and conditions of this Agreement.
+
+4. Restrictions
+If you are commercially using the Materials, and your product or service has more than 100 million monthly active users, You shall request a license from Us. You cannot exercise your rights under this Agreement without our express authorization.
+
+5. Rules of use
+    a. The Materials may be subject to export controls or restrictions in China, the United States or other countries or regions. You shall comply with applicable laws and regulations in your use of the Materials.
+    b. You can not use the Materials or any output therefrom to improve any other large language model (excluding Tongyi Qianwen or derivative works thereof).
+
+6. Intellectual Property
+    a. We retain ownership of all intellectual property rights in and to the Materials and derivatives made by or for Us. Conditioned upon compliance with the terms and conditions of this Agreement, with respect to any derivative works and modifications of the Materials that are made by you, you are and will be the owner of such derivative works and modifications.
+    b. No trademark license is granted to use the trade names, trademarks, service marks, or product names of Us, except as required to fulfill notice requirements under this Agreement or as required for reasonable and customary use in describing and redistributing the Materials.
+    c. If you commence a lawsuit or other proceedings (including a cross-claim or counterclaim in a lawsuit) against Us or any entity alleging that the Materials or any output therefrom, or any part of the foregoing, infringe any intellectual property or other right owned or licensable by you, then all licences granted to you under this Agreement shall terminate as of the date such lawsuit or other proceeding is commenced or brought.
+
+7. Disclaimer of Warranty and Limitation of Liability
+
+    a. We are not obligated to support, update, provide training for, or develop any further version of the Tongyi Qianwen Materials or to grant any license thereto.
+    b. THE MATERIALS ARE PROVIDED "AS IS" WITHOUT ANY EXPRESS OR IMPLIED WARRANTY OF ANY KIND INCLUDING WARRANTIES OF MERCHANTABILITY, NONINFRINGEMENT, OR FITNESS FOR A PARTICULAR PURPOSE. WE MAKE NO WARRANTY AND ASSUME NO RESPONSIBILITY FOR THE SAFETY OR STABILITY OF THE MATERIALS AND ANY OUTPUT THEREFROM.
+    c. IN NO EVENT SHALL WE BE LIABLE TO YOU FOR ANY DAMAGES, INCLUDING, BUT NOT LIMITED TO ANY DIRECT, OR INDIRECT, SPECIAL OR CONSEQUENTIAL DAMAGES ARISING FROM YOUR USE OR INABILITY TO USE THE MATERIALS OR ANY OUTPUT OF IT, NO MATTER HOW IT’S CAUSED.
+    d. You will defend, indemnify and hold harmless Us from and against any claim by any third party arising out of or related to your use or distribution of the Materials.
+
+8. Survival and Termination.
+    a. The term of this Agreement shall commence upon your acceptance of this Agreement or access to the Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein.
+    b. We may terminate this Agreement if you breach any of the terms or conditions of this Agreement. Upon termination of this Agreement, you must delete and cease use of the Materials. Sections 7 and 9 shall survive the termination of this Agreement.
+
+9. Governing Law and Jurisdiction.
+    a. This Agreement and any dispute arising out of or relating to it will be governed by the laws of China, without regard to conflict of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement.
+    b. The People's Courts in Hangzhou City shall have exclusive jurisdiction over any dispute arising out of this Agreement.
--- a/77
+++ b/77
@@ -0,0 +1,77 @@
+------------- LICENSE FOR NVIDIA Megatron-LM code  --------------
+
+Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+  * Redistributions of source code must retain the above copyright
+    notice, this list of conditions and the following disclaimer.
+  * Redistributions in binary form must reproduce the above copyright
+    notice, this list of conditions and the following disclaimer in the
+    documentation and/or other materials provided with the distribution.
+  * Neither the name of NVIDIA CORPORATION nor the names of its
+    contributors may be used to endorse or promote products derived
+    from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
+EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+------------- LICENSE FOR OpenAI tiktoken code  --------------
+
+MIT License
+
+Copyright (c) 2022 OpenAI, Shantanu Jain
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+
+------------- LICENSE FOR PanQiWei AutoGPTQ code  --------------
+
+MIT License
+
+Copyright (c) 2023 潘其威(William)
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
--- a/README.md
+++ b/README.md
@@ -0,0 +1,114 @@
+---
+license: other
+license_name: tongyi-qianwen-license
+license_link: LICENSE
+language:
+- en
+- ja
+library_name: transformers
+pipeline_tag: text-generation
+---
+
+# nekomata-7b-pfn-qfin
+
+## Model Description
+nekomata-7b-pfn-qfin is a fine-tuned model based on [rinna/nekomata-7b](https://huggingface.co/rinna/nekomata-7b/tree/main).
+This is the base model, which is good at generating continuous sentences for finance.
+nekomata-7b-pfn-qfin is fine-tuned on 370M tokens from multiple special datasets generated by Preferred Networks, which is clear to use for commercial usage.
+The fine-tuned were carried out at a 2048 context length.
+This model is released under [Tongyi Qianwen LICENSE AGREEMENT](https://github.com/QwenLM/Qwen/blob/e8e15962d897714944773cca57fa2e460a3655e8/Tongyi%20Qianwen%20LICENSE%20AGREEMENT).
+
+The research article is available on [arXiv](https://arxiv.org/abs/2404.10555).
+
+# Benchmarking
+The benchmark score is obtained using [Japanese Language Model Financial Evaluation Harness](https://github.com/pfnet-research/japanese-lm-fin-harness)
+For the benchmark, 0-shot and default prompts are used.
+```
+|      Task      |Metric|  nekomaba-7b   |       Ours      |
+|----------------|------|------|---|------|------|---|------|
+|chabsa          |f1    |0.8134|   |      |0.8127|   |      |
+|cma_basics      |acc   |0.3158|±  |0.0764|0.3684|±  |0.0793|
+|cpa_audit       |acc   |0.2085|±  |0.0203|0.1809|±  |0.0193|
+|fp2             |acc   |0.2484|±  |0.0198|0.2674|±  |0.0203|
+|security_sales_1|acc   |0.4912|±  |0.0668|0.5088|±  |0.0668|
+|----------------|------|------|---|------|------|---|------|
+|OVER ALL        |      |0.4155           |0.4276           |
+```
+## Usage
+Install the required libraries as follows:
+```sh
+>>> python -m pip install numpy sentencepiece torch transformers accelerate transformers_stream_generator tiktoken einops
+```
+
+Execute the following python code:
+```python
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM
+
+tokenizer = AutoTokenizer.from_pretrained("pfnet/nekomata-7b-pfn-qfin", trust_remote_code=True)
+
+# Use GPU with bf16 (recommended for supported devices)
+# model = AutoModelForCausalLM.from_pretrained("pfnet/nekomata-7b-pfn-qfin", device_map="auto", trust_remote_code=True, bf16=True)
+
+# Use GPU with fp16
+# model = AutoModelForCausalLM.from_pretrained("pfnet/nekomata-7b-pfn-qfin", device_map="auto", trust_remote_code=True, fp16=True)
+
+# Use GPU with fp32
+# model = AutoModelForCausalLM.from_pretrained("pfnet/nekomata-7b-pfn-qfin", device_map="auto", trust_remote_code=True, fp32=True)
+
+# Use CPU
+# model = AutoModelForCausalLM.from_pretrained("pfnet/nekomata-7b-pfn-qfin", device_map="cpu", trust_remote_code=True)
+
+# Automatically select device and precision
+model = AutoModelForCausalLM.from_pretrained("pfnet/nekomata-7b-pfn-qfin", device_map="auto", trust_remote_code=True)
+
+text = "日本銀行は"
+input_ids = tokenizer(text, return_tensors="pt").input_ids.to(model.device)
+with torch.no_grad():
+  generated_tokens = model.generate(
+      inputs=input_ids,
+      max_new_tokens=32,
+      do_sample=True,
+      temperature=1.0,
+      repetition_penalty=1.1
+  )[0]
+generated_text = tokenizer.decode(generated_tokens)
+print(generated_text)
+# 日本銀行は、2016年9月に「長短金利操作付き量的・質的金融緩和」を導入し、長期国
+```
+
+## Model Details
+- Model size: 7b
+- Fine-tuned tokens: 370M tokens (Japanese: 300M tokens, English: 13M tokens, Digits: 14M tokens)
+- Context length: 2048
+- Developed by: Preferred Networks, Inc
+- Model type: Causal decoder-only
+- Language(s): Japanese and English
+- License: [Tongyi Qianwen LICENSE AGREEMENT](https://github.com/QwenLM/Qwen/blob/e8e15962d897714944773cca57fa2e460a3655e8/Tongyi%20Qianwen%20LICENSE%20AGREEMENT)
+
+## Bias, Risks, and Limitations
+nekomata-7b-pfn-qfin is a new technology that carries risks with use.
+Testing conducted to date has been in English and Japanese, and has not covered, nor could it cover all scenarios.
+For these reasons, as with all LLMs, nekomata-7b-pfn-qfin’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts.
+This model is not designed for legal, tax, investment, financial, or other advice.
+Therefore, before deploying any applications of nekomata-7b-pfn-qfin, developers should perform safety testing and tuning tailored to their specific applications of the model.
+
+## How to cite
+```
+@misc{hirano2024,
+      title={Construction of Domain-specified Japanese Large Language Model for Finance through Continual Pre-training}, 
+      author={Masanori Hirano and Kentaro Imajo},
+      year={2024},
+      eprint={2404.10555},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL}
+}
+```
+
+## Contributors
+Preferred Networks, Inc.
+ - Masanori Hirano
+ - Kentaro Imajo
+
+# License
+[Tongyi Qianwen LICENSE AGREEMENT](https://github.com/QwenLM/Qwen/blob/e8e15962d897714944773cca57fa2e460a3655e8/Tongyi%20Qianwen%20LICENSE%20AGREEMENT)
--- a/config.json
+++ b/config.json
@@ -0,0 +1,42 @@
+{
+  "_name_or_path": "pfnet/nekomata-7b-pfn-qfin",
+  "architectures": [
+    "QWenLMHeadModel"
+  ],
+  "attn_dropout_prob": 0.0,
+  "auto_map": {
+    "AutoConfig": "configuration_qwen.QWenConfig",
+    "AutoModelForCausalLM": "modeling_qwen.QWenLMHeadModel"
+  },
+  "bf16": false,
+  "emb_dropout_prob": 0.0,
+  "fp16": false,
+  "fp32": false,
+  "hidden_size": 4096,
+  "initializer_range": 0.02,
+  "intermediate_size": 22016,
+  "kv_channels": 128,
+  "layer_norm_epsilon": 1e-06,
+  "max_position_embeddings": 32768,
+  "model_type": "qwen",
+  "no_bias": true,
+  "num_attention_heads": 32,
+  "num_hidden_layers": 32,
+  "onnx_safe": null,
+  "rotary_emb_base": 10000,
+  "rotary_pct": 1.0,
+  "scale_attn_weights": true,
+  "seq_length": 8192,
+  "softmax_in_fp32": false,
+  "tie_word_embeddings": false,
+  "tokenizer_class": "QWenTokenizer",
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.40.2",
+  "use_cache": true,
+  "use_cache_kernel": false,
+  "use_cache_quantization": false,
+  "use_dynamic_ntk": true,
+  "use_flash_attn": true,
+  "use_logn_attn": true,
+  "vocab_size": 151936
+}
--- a/configuration.json
+++ b/configuration.json
@@ -0,0 +1 @@
+{"framework": "pytorch", "task": "text-generation", "allow_remote": true}
--- a/configuration_qwen.py
+++ b/configuration_qwen.py
@@ -0,0 +1,71 @@
+# Copyright (c) Alibaba Cloud.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+from transformers import PretrainedConfig
+
+
+class QWenConfig(PretrainedConfig):
+    model_type = "qwen"
+    keys_to_ignore_at_inference = ["past_key_values"]
+
+    def __init__(
+        self,
+        vocab_size=151936,
+        hidden_size=4096,
+        num_hidden_layers=32,
+        num_attention_heads=32,
+        emb_dropout_prob=0.0,
+        attn_dropout_prob=0.0,
+        layer_norm_epsilon=1e-6,
+        initializer_range=0.02,
+        max_position_embeddings=8192,
+        scale_attn_weights=True,
+        use_cache=True,
+        bf16=False,
+        fp16=False,
+        fp32=False,
+        kv_channels=128,
+        rotary_pct=1.0,
+        rotary_emb_base=10000,
+        use_dynamic_ntk=True,
+        use_logn_attn=True,
+        use_flash_attn="auto",
+        intermediate_size=22016,
+        no_bias=True,
+        tie_word_embeddings=False,
+        use_cache_quantization=False,
+        use_cache_kernel=False,
+        softmax_in_fp32=False,
+        **kwargs,
+    ):
+        self.vocab_size = vocab_size
+        self.hidden_size = hidden_size
+        self.intermediate_size = intermediate_size
+        self.num_hidden_layers = num_hidden_layers
+        self.num_attention_heads = num_attention_heads
+        self.emb_dropout_prob = emb_dropout_prob
+        self.attn_dropout_prob = attn_dropout_prob
+        self.layer_norm_epsilon = layer_norm_epsilon
+        self.initializer_range = initializer_range
+        self.scale_attn_weights = scale_attn_weights
+        self.use_cache = use_cache
+        self.max_position_embeddings = max_position_embeddings
+        self.bf16 = bf16
+        self.fp16 = fp16
+        self.fp32 = fp32
+        self.kv_channels = kv_channels
+        self.rotary_pct = rotary_pct
+        self.rotary_emb_base = rotary_emb_base
+        self.use_dynamic_ntk = use_dynamic_ntk
+        self.use_logn_attn = use_logn_attn
+        self.use_flash_attn = use_flash_attn
+        self.no_bias = no_bias
+        self.use_cache_quantization = use_cache_quantization
+        self.use_cache_kernel = use_cache_kernel
+        self.softmax_in_fp32 = softmax_in_fp32
+        super().__init__(
+            tie_word_embeddings=tie_word_embeddings,
+            **kwargs
+        )
--- a/cpp_kernels.py
+++ b/cpp_kernels.py
@@ -0,0 +1,55 @@
+from torch.utils import cpp_extension
+import pathlib
+import os
+import subprocess
+
+def _get_cuda_bare_metal_version(cuda_dir):
+    raw_output = subprocess.check_output([cuda_dir + "/bin/nvcc", "-V"],
+                                         universal_newlines=True)
+    output = raw_output.split()
+    release_idx = output.index("release") + 1
+    release = output[release_idx].split(".")
+    bare_metal_major = release[0]
+    bare_metal_minor = release[1][0]
+
+    return raw_output, bare_metal_major, bare_metal_minor
+
+def _create_build_dir(buildpath):
+    try:
+        os.mkdir(buildpath)
+    except OSError:
+        if not os.path.isdir(buildpath):
+            print(f"Creation of the build directory {buildpath} failed")
+
+# Check if cuda 11 is installed for compute capability 8.0
+cc_flag = []
+_, bare_metal_major, bare_metal_minor = _get_cuda_bare_metal_version(cpp_extension.CUDA_HOME)
+if int(bare_metal_major) >= 11:
+    cc_flag.append('-gencode')
+    cc_flag.append('arch=compute_80,code=sm_80')
+    if int(bare_metal_minor) >= 7:
+        cc_flag.append('-gencode')
+        cc_flag.append('arch=compute_90,code=sm_90')
+
+# Build path
+srcpath = pathlib.Path(__file__).parent.absolute()
+buildpath = srcpath / 'build'
+_create_build_dir(buildpath)
+
+def _cpp_extention_load_helper(name, sources, extra_cuda_flags):
+    return cpp_extension.load(
+        name=name,
+        sources=sources,
+        build_directory=buildpath,
+        extra_cflags=['-O3', ],
+        extra_cuda_cflags=['-O3',
+                           '-gencode', 'arch=compute_70,code=sm_70',
+                           '--use_fast_math'] + extra_cuda_flags + cc_flag,
+        verbose=1
+    )
+
+extra_flags = []
+
+cache_autogptq_cuda_256_sources = ["./cache_autogptq_cuda_256.cpp",
+           "./cache_autogptq_cuda_kernel_256.cu"]
+cache_autogptq_cuda_256 = _cpp_extention_load_helper("cache_autogptq_cuda_256", cache_autogptq_cuda_256_sources, extra_flags)
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,4 @@
+{
+  "_from_model_config": true,
+  "transformers_version": "4.40.2"
+}
--- a/model-00001-of-00004.safetensors
+++ b/model-00001-of-00004.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:f806cb077d5fc8780ca2073576d8ca39bcd8bb11c53bb13c175d3164750f2c24
+size 4988485656
--- a/model-00002-of-00004.safetensors
+++ b/model-00002-of-00004.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:99d4b7ad5238aa4ab2df00b9bb943701d519b9533dea8425adf3a2e97326a8cd
+size 4981246520
--- a/model-00003-of-00004.safetensors
+++ b/model-00003-of-00004.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:a5bf1156ba496200b11fa19c61bd4ee841caa2dc37808a1d75887434cd3dbb5d
+size 4228285288
--- a/model-00004-of-00004.safetensors
+++ b/model-00004-of-00004.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:853c927d65a1e982117924b6015a68f1d6bb444d09a027ab31619c93404bf2bd
+size 1244659840
--- a/model.safetensors.index.json
+++ b/model.safetensors.index.json
@@ -0,0 +1,266 @@
+{
+  "metadata": {
+    "total_size": 15442649088
+  },
+  "weight_map": {
+    "lm_head.weight": "model-00004-of-00004.safetensors",
+    "transformer.h.0.attn.c_attn.bias": "model-00001-of-00004.safetensors",
+    "transformer.h.0.attn.c_attn.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.0.attn.c_proj.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.0.ln_1.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.0.ln_2.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.0.mlp.c_proj.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.0.mlp.w1.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.0.mlp.w2.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.1.attn.c_attn.bias": "model-00001-of-00004.safetensors",
+    "transformer.h.1.attn.c_attn.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.1.attn.c_proj.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.1.ln_1.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.1.ln_2.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.1.mlp.c_proj.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.1.mlp.w1.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.1.mlp.w2.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.10.attn.c_attn.bias": "model-00002-of-00004.safetensors",
+    "transformer.h.10.attn.c_attn.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.10.attn.c_proj.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.10.ln_1.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.10.ln_2.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.10.mlp.c_proj.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.10.mlp.w1.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.10.mlp.w2.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.11.attn.c_attn.bias": "model-00002-of-00004.safetensors",
+    "transformer.h.11.attn.c_attn.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.11.attn.c_proj.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.11.ln_1.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.11.ln_2.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.11.mlp.c_proj.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.11.mlp.w1.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.11.mlp.w2.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.12.attn.c_attn.bias": "model-00002-of-00004.safetensors",
+    "transformer.h.12.attn.c_attn.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.12.attn.c_proj.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.12.ln_1.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.12.ln_2.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.12.mlp.c_proj.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.12.mlp.w1.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.12.mlp.w2.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.13.attn.c_attn.bias": "model-00002-of-00004.safetensors",
+    "transformer.h.13.attn.c_attn.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.13.attn.c_proj.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.13.ln_1.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.13.ln_2.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.13.mlp.c_proj.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.13.mlp.w1.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.13.mlp.w2.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.14.attn.c_attn.bias": "model-00002-of-00004.safetensors",
+    "transformer.h.14.attn.c_attn.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.14.attn.c_proj.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.14.ln_1.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.14.ln_2.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.14.mlp.c_proj.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.14.mlp.w1.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.14.mlp.w2.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.15.attn.c_attn.bias": "model-00002-of-00004.safetensors",
+    "transformer.h.15.attn.c_attn.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.15.attn.c_proj.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.15.ln_1.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.15.ln_2.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.15.mlp.c_proj.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.15.mlp.w1.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.15.mlp.w2.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.16.attn.c_attn.bias": "model-00002-of-00004.safetensors",
+    "transformer.h.16.attn.c_attn.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.16.attn.c_proj.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.16.ln_1.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.16.ln_2.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.16.mlp.c_proj.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.16.mlp.w1.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.16.mlp.w2.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.17.attn.c_attn.bias": "model-00002-of-00004.safetensors",
+    "transformer.h.17.attn.c_attn.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.17.attn.c_proj.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.17.ln_1.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.17.ln_2.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.17.mlp.c_proj.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.17.mlp.w1.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.17.mlp.w2.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.18.attn.c_attn.bias": "model-00002-of-00004.safetensors",
+    "transformer.h.18.attn.c_attn.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.18.attn.c_proj.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.18.ln_1.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.18.ln_2.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.18.mlp.c_proj.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.18.mlp.w1.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.18.mlp.w2.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.19.attn.c_attn.bias": "model-00002-of-00004.safetensors",
+    "transformer.h.19.attn.c_attn.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.19.attn.c_proj.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.19.ln_1.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.19.ln_2.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.19.mlp.c_proj.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.19.mlp.w1.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.19.mlp.w2.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.2.attn.c_attn.bias": "model-00001-of-00004.safetensors",
+    "transformer.h.2.attn.c_attn.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.2.attn.c_proj.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.2.ln_1.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.2.ln_2.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.2.mlp.c_proj.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.2.mlp.w1.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.2.mlp.w2.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.20.attn.c_attn.bias": "model-00002-of-00004.safetensors",
+    "transformer.h.20.attn.c_attn.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.20.attn.c_proj.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.20.ln_1.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.20.ln_2.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.20.mlp.c_proj.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.20.mlp.w1.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.20.mlp.w2.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.21.attn.c_attn.bias": "model-00002-of-00004.safetensors",
+    "transformer.h.21.attn.c_attn.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.21.attn.c_proj.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.21.ln_1.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.21.ln_2.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.21.mlp.c_proj.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.21.mlp.w1.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.21.mlp.w2.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.22.attn.c_attn.bias": "model-00003-of-00004.safetensors",
+    "transformer.h.22.attn.c_attn.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.22.attn.c_proj.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.22.ln_1.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.22.ln_2.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.22.mlp.c_proj.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.22.mlp.w1.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.22.mlp.w2.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.23.attn.c_attn.bias": "model-00003-of-00004.safetensors",
+    "transformer.h.23.attn.c_attn.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.23.attn.c_proj.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.23.ln_1.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.23.ln_2.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.23.mlp.c_proj.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.23.mlp.w1.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.23.mlp.w2.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.24.attn.c_attn.bias": "model-00003-of-00004.safetensors",
+    "transformer.h.24.attn.c_attn.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.24.attn.c_proj.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.24.ln_1.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.24.ln_2.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.24.mlp.c_proj.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.24.mlp.w1.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.24.mlp.w2.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.25.attn.c_attn.bias": "model-00003-of-00004.safetensors",
+    "transformer.h.25.attn.c_attn.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.25.attn.c_proj.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.25.ln_1.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.25.ln_2.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.25.mlp.c_proj.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.25.mlp.w1.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.25.mlp.w2.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.26.attn.c_attn.bias": "model-00003-of-00004.safetensors",
+    "transformer.h.26.attn.c_attn.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.26.attn.c_proj.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.26.ln_1.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.26.ln_2.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.26.mlp.c_proj.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.26.mlp.w1.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.26.mlp.w2.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.27.attn.c_attn.bias": "model-00003-of-00004.safetensors",
+    "transformer.h.27.attn.c_attn.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.27.attn.c_proj.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.27.ln_1.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.27.ln_2.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.27.mlp.c_proj.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.27.mlp.w1.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.27.mlp.w2.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.28.attn.c_attn.bias": "model-00003-of-00004.safetensors",
+    "transformer.h.28.attn.c_attn.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.28.attn.c_proj.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.28.ln_1.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.28.ln_2.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.28.mlp.c_proj.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.28.mlp.w1.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.28.mlp.w2.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.29.attn.c_attn.bias": "model-00003-of-00004.safetensors",
+    "transformer.h.29.attn.c_attn.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.29.attn.c_proj.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.29.ln_1.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.29.ln_2.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.29.mlp.c_proj.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.29.mlp.w1.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.29.mlp.w2.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.3.attn.c_attn.bias": "model-00001-of-00004.safetensors",
+    "transformer.h.3.attn.c_attn.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.3.attn.c_proj.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.3.ln_1.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.3.ln_2.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.3.mlp.c_proj.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.3.mlp.w1.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.3.mlp.w2.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.30.attn.c_attn.bias": "model-00003-of-00004.safetensors",
+    "transformer.h.30.attn.c_attn.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.30.attn.c_proj.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.30.ln_1.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.30.ln_2.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.30.mlp.c_proj.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.30.mlp.w1.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.30.mlp.w2.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.31.attn.c_attn.bias": "model-00003-of-00004.safetensors",
+    "transformer.h.31.attn.c_attn.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.31.attn.c_proj.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.31.ln_1.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.31.ln_2.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.31.mlp.c_proj.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.31.mlp.w1.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.31.mlp.w2.weight": "model-00003-of-00004.safetensors",
+    "transformer.h.4.attn.c_attn.bias": "model-00001-of-00004.safetensors",
+    "transformer.h.4.attn.c_attn.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.4.attn.c_proj.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.4.ln_1.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.4.ln_2.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.4.mlp.c_proj.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.4.mlp.w1.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.4.mlp.w2.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.5.attn.c_attn.bias": "model-00001-of-00004.safetensors",
+    "transformer.h.5.attn.c_attn.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.5.attn.c_proj.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.5.ln_1.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.5.ln_2.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.5.mlp.c_proj.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.5.mlp.w1.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.5.mlp.w2.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.6.attn.c_attn.bias": "model-00001-of-00004.safetensors",
+    "transformer.h.6.attn.c_attn.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.6.attn.c_proj.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.6.ln_1.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.6.ln_2.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.6.mlp.c_proj.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.6.mlp.w1.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.6.mlp.w2.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.7.attn.c_attn.bias": "model-00001-of-00004.safetensors",
+    "transformer.h.7.attn.c_attn.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.7.attn.c_proj.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.7.ln_1.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.7.ln_2.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.7.mlp.c_proj.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.7.mlp.w1.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.7.mlp.w2.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.8.attn.c_attn.bias": "model-00001-of-00004.safetensors",
+    "transformer.h.8.attn.c_attn.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.8.attn.c_proj.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.8.ln_1.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.8.ln_2.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.8.mlp.c_proj.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.8.mlp.w1.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.8.mlp.w2.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.9.attn.c_attn.bias": "model-00001-of-00004.safetensors",
+    "transformer.h.9.attn.c_attn.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.9.attn.c_proj.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.9.ln_1.weight": "model-00001-of-00004.safetensors",
+    "transformer.h.9.ln_2.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.9.mlp.c_proj.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.9.mlp.w1.weight": "model-00002-of-00004.safetensors",
+    "transformer.h.9.mlp.w2.weight": "model-00002-of-00004.safetensors",
+    "transformer.ln_f.weight": "model-00003-of-00004.safetensors",
+    "transformer.wte.weight": "model-00001-of-00004.safetensors"
+  }
+}
--- a/modeling_qwen.py
+++ b/modeling_qwen.py
--- a/qwen.tiktoken
+++ b/qwen.tiktoken
--- a/qwen_generation_utils.py
+++ b/qwen_generation_utils.py
@@ -0,0 +1,416 @@
+# Copyright (c) Alibaba Cloud.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""Generation support."""
+
+from typing import Tuple, List, Union, Iterable
+
+import numpy as np
+import torch
+import torch.nn.functional as F
+from transformers import PreTrainedTokenizer
+from transformers import logging
+from transformers.generation import LogitsProcessor
+
+logger = logging.get_logger(__name__)
+
+# Types.
+HistoryType = List[Tuple[str, str]]
+TokensType = List[int]
+BatchTokensType = List[List[int]]
+
+
+def pad_batch(batch: BatchTokensType, pad_id: int, seq_length: int) -> BatchTokensType:
+    for tokens in batch:
+        context_length = len(tokens)
+        if context_length < seq_length:
+            tokens.extend([pad_id] * (seq_length - context_length))
+    return batch
+
+
+def get_ltor_masks_and_position_ids(
+    data,
+    eod_token,
+    reset_position_ids,
+    reset_attention_mask,
+    eod_mask_loss,
+):
+    """Build masks and position id for left to right model."""
+
+    # Extract batch size and sequence length.
+    micro_batch_size, seq_length = data.size()
+
+    # Attention mask (lower triangular).
+    if reset_attention_mask:
+        att_mask_batch = micro_batch_size
+    else:
+        att_mask_batch = 1
+    attention_mask = torch.tril(
+        torch.ones((att_mask_batch, seq_length, seq_length), device=data.device)
+    ).view(att_mask_batch, 1, seq_length, seq_length)
+
+    # Loss mask.
+    loss_mask = torch.ones(data.size(), dtype=torch.float, device=data.device)
+    if eod_mask_loss:
+        loss_mask[data == eod_token] = 0.0
+
+    # Position ids.
+    position_ids = torch.arange(seq_length, dtype=torch.long, device=data.device)
+    position_ids = position_ids.unsqueeze(0).expand_as(data)
+    # We need to clone as the ids will be modifed based on batch index.
+    if reset_position_ids:
+        position_ids = position_ids.clone()
+
+    if reset_position_ids or reset_attention_mask:
+        # Loop through the batches:
+        for b in range(micro_batch_size):
+
+            # Find indecies where EOD token is.
+            eod_index = position_ids[b, data[b] == eod_token]
+            # Detach indecies from positions if going to modify positions.
+            if reset_position_ids:
+                eod_index = eod_index.clone()
+
+            # Loop through EOD indecies:
+            prev_index = 0
+            for j in range(eod_index.size()[0]):
+                i = eod_index[j]
+                # Mask attention loss.
+                if reset_attention_mask:
+                    attention_mask[b, 0, (i + 1) :, : (i + 1)] = 0
+                # Reset positions.
+                if reset_position_ids:
+                    position_ids[b, (i + 1) :] -= i + 1 - prev_index
+                    prev_index = i + 1
+
+    # Convert attention mask to binary:
+    attention_mask = attention_mask < 0.5
+
+    return attention_mask, loss_mask, position_ids
+
+
+def get_batch(context_tokens: torch.LongTensor, eod_id: int):
+    """Generate batch from context tokens."""
+    # Move to GPU.
+    tokens = context_tokens.contiguous().to(context_tokens.device)
+    # Get the attention mask and postition ids.
+    attention_mask, _, position_ids = get_ltor_masks_and_position_ids(
+        tokens,
+        eod_id,
+        reset_position_ids=False,
+        reset_attention_mask=False,
+        eod_mask_loss=False,
+    )
+    return tokens, attention_mask, position_ids
+
+
+def get_stop_words_ids(chat_format, tokenizer):
+    if chat_format == "raw":
+        stop_words_ids = [tokenizer.encode("Human:"), [tokenizer.eod_id]]
+    elif chat_format == "chatml":
+        stop_words_ids = [[tokenizer.im_end_id], [tokenizer.im_start_id]]
+    else:
+        raise NotImplementedError(f"Unknown chat format {chat_format!r}")
+    return stop_words_ids
+
+
+def make_context(
+    tokenizer: PreTrainedTokenizer,
+    query: str,
+    history: List[Tuple[str, str]] = None,
+    system: str = "",
+    max_window_size: int = 6144,
+    chat_format: str = "chatml",
+):
+    if history is None:
+        history = []
+
+    if chat_format == "chatml":
+        im_start, im_end = "<|im_start|>", "<|im_end|>"
+        im_start_tokens = [tokenizer.im_start_id]
+        im_end_tokens = [tokenizer.im_end_id]
+        nl_tokens = tokenizer.encode("\n")
+
+        def _tokenize_str(role, content):
+            return f"{role}\n{content}", tokenizer.encode(
+                role, allowed_special=set()
+            ) + nl_tokens + tokenizer.encode(content, allowed_special=set())
+
+        system_text, system_tokens_part = _tokenize_str("system", system)
+        system_tokens = im_start_tokens + system_tokens_part + im_end_tokens
+
+        raw_text = ""
+        context_tokens = []
+
+        for turn_query, turn_response in reversed(history):
+            query_text, query_tokens_part = _tokenize_str("user", turn_query)
+            query_tokens = im_start_tokens + query_tokens_part + im_end_tokens
+            response_text, response_tokens_part = _tokenize_str(
+                "assistant", turn_response
+            )
+            response_tokens = im_start_tokens + response_tokens_part + im_end_tokens
+
+            next_context_tokens = nl_tokens + query_tokens + nl_tokens + response_tokens
+            prev_chat = (
+                f"\n{im_start}{query_text}{im_end}\n{im_start}{response_text}{im_end}"
+            )
+
+            current_context_size = (
+                len(system_tokens) + len(next_context_tokens) + len(context_tokens)
+            )
+            if current_context_size < max_window_size:
+                context_tokens = next_context_tokens + context_tokens
+                raw_text = prev_chat + raw_text
+            else:
+                break
+
+        context_tokens = system_tokens + context_tokens
+        raw_text = f"{im_start}{system_text}{im_end}" + raw_text
+        context_tokens += (
+            nl_tokens
+            + im_start_tokens
+            + _tokenize_str("user", query)[1]
+            + im_end_tokens
+            + nl_tokens
+            + im_start_tokens
+            + tokenizer.encode("assistant")
+            + nl_tokens
+        )
+        raw_text += f"\n{im_start}user\n{query}{im_end}\n{im_start}assistant\n"
+
+    elif chat_format == "raw":
+        raw_text = query
+        context_tokens = tokenizer.encode(raw_text)
+    else:
+        raise NotImplementedError(f"Unknown chat format {chat_format!r}")
+
+    return raw_text, context_tokens
+
+
+def _decode_default(
+    tokens: List[int],
+    *,
+    stop_words: List[str],
+    eod_words: List[str],
+    tokenizer: PreTrainedTokenizer,
+    raw_text_len: int,
+    verbose: bool = False,
+    return_end_reason: bool = False,
+    errors: str='replace',
+):
+    trim_decode_tokens = tokenizer.decode(tokens, errors=errors)[raw_text_len:]
+    if verbose:
+        print("\nRaw Generate: ", trim_decode_tokens)
+
+    end_reason = f"Gen length {len(tokens)}"
+    for stop_word in stop_words:
+        trim_decode_tokens = trim_decode_tokens.replace(stop_word, "").strip()
+    for eod_word in eod_words:
+        if eod_word in trim_decode_tokens:
+            end_reason = f"Gen {eod_word!r}"
+        trim_decode_tokens = trim_decode_tokens.split(eod_word)[0]
+    trim_decode_tokens = trim_decode_tokens.strip()
+    if verbose:
+        print("\nEnd Reason:", end_reason)
+        print("\nGenerate: ", trim_decode_tokens)
+
+    if return_end_reason:
+        return trim_decode_tokens, end_reason
+    else:
+        return trim_decode_tokens
+
+
+def _decode_chatml(
+    tokens: List[int],
+    *,
+    stop_words: List[str],
+    eod_token_ids: List[int],
+    tokenizer: PreTrainedTokenizer,
+    raw_text_len: int,
+    context_length: int,
+    verbose: bool = False,
+    return_end_reason: bool = False,
+    errors: str='replace'
+):
+    end_reason = f"Gen length {len(tokens)}"
+    eod_token_idx = context_length
+    for eod_token_idx in range(context_length, len(tokens)):
+        if tokens[eod_token_idx] in eod_token_ids:
+            end_reason = f"Gen {tokenizer.decode([tokens[eod_token_idx]])!r}"
+            break
+
+    trim_decode_tokens = tokenizer.decode(tokens[:eod_token_idx], errors=errors)[raw_text_len:]
+    if verbose:
+        print("\nRaw Generate w/o EOD:", tokenizer.decode(tokens, errors=errors)[raw_text_len:])
+        print("\nRaw Generate:", trim_decode_tokens)
+        print("\nEnd Reason:", end_reason)
+    for stop_word in stop_words:
+        trim_decode_tokens = trim_decode_tokens.replace(stop_word, "").strip()
+    trim_decode_tokens = trim_decode_tokens.strip()
+    if verbose:
+        print("\nGenerate:", trim_decode_tokens)
+
+    if return_end_reason:
+        return trim_decode_tokens, end_reason
+    else:
+        return trim_decode_tokens
+
+
+def decode_tokens(
+    tokens: Union[torch.LongTensor, TokensType],
+    tokenizer: PreTrainedTokenizer,
+    raw_text_len: int,
+    context_length: int,
+    chat_format: str,
+    verbose: bool = False,
+    return_end_reason: bool = False,
+    errors: str="replace",
+) -> str:
+    if torch.is_tensor(tokens):
+        tokens = tokens.cpu().numpy().tolist()
+
+    if chat_format == "chatml":
+        return _decode_chatml(
+            tokens,
+            stop_words=[],
+            eod_token_ids=[tokenizer.im_start_id, tokenizer.im_end_id],
+            tokenizer=tokenizer,
+            raw_text_len=raw_text_len,
+            context_length=context_length,
+            verbose=verbose,
+            return_end_reason=return_end_reason,
+            errors=errors,
+        )
+    elif chat_format == "raw":
+        return _decode_default(
+            tokens,
+            stop_words=["<|endoftext|>"],
+            eod_words=["<|endoftext|>"],
+            tokenizer=tokenizer,
+            raw_text_len=raw_text_len,
+            verbose=verbose,
+            return_end_reason=return_end_reason,
+            errors=errors,
+        )
+    else:
+        raise NotImplementedError(f"Unknown chat format {chat_format!r}")
+
+
+class StopWordsLogitsProcessor(LogitsProcessor):
+    """
+    :class:`transformers.LogitsProcessor` that enforces that when specified sequences appear, stop geration.
+
+    Args:
+        stop_words_ids (:obj:`List[List[int]]`):
+            List of list of token ids of stop ids. In order to get the tokens of the words
+            that should not appear in the generated text, use :obj:`tokenizer(bad_word,
+            add_prefix_space=True).input_ids`.
+        eos_token_id (:obj:`int`):
+            The id of the `end-of-sequence` token.
+    """
+
+    def __init__(self, stop_words_ids: Iterable[Iterable[int]], eos_token_id: int):
+
+        if not isinstance(stop_words_ids, List) or len(stop_words_ids) == 0:
+            raise ValueError(
+                f"`stop_words_ids` has to be a non-emtpy list, but is {stop_words_ids}."
+            )
+        if any(not isinstance(bad_word_ids, list) for bad_word_ids in stop_words_ids):
+            raise ValueError(
+                f"`stop_words_ids` has to be a list of lists, but is {stop_words_ids}."
+            )
+        if any(
+            any(
+                (not isinstance(token_id, (int, np.integer)) or token_id < 0)
+                for token_id in stop_word_ids
+            )
+            for stop_word_ids in stop_words_ids
+        ):
+            raise ValueError(
+                f"Each list in `stop_words_ids` has to be a list of positive integers, but is {stop_words_ids}."
+            )
+
+        self.stop_words_ids = list(
+            filter(
+                lambda bad_token_seq: bad_token_seq != [eos_token_id], stop_words_ids
+            )
+        )
+        self.eos_token_id = eos_token_id
+        for stop_token_seq in self.stop_words_ids:
+            assert (
+                len(stop_token_seq) > 0
+            ), "Stop words token sequences {} cannot have an empty list".format(
+                stop_words_ids
+            )
+
+    def __call__(
+        self, input_ids: torch.LongTensor, scores: torch.FloatTensor
+    ) -> torch.FloatTensor:
+        stopped_samples = self._calc_stopped_samples(input_ids)
+        for i, should_stop in enumerate(stopped_samples):
+            if should_stop:
+                scores[i, self.eos_token_id] = float(2**15)
+        return scores
+
+    def _tokens_match(self, prev_tokens: torch.LongTensor, tokens: List[int]) -> bool:
+        if len(tokens) == 0:
+            # if bad word tokens is just one token always ban it
+            return True
+        elif len(tokens) > len(prev_tokens):
+            # if bad word tokens are longer then prev input_ids they can't be equal
+            return False
+        elif prev_tokens[-len(tokens) :].tolist() == tokens:
+            # if tokens match
+            return True
+        else:
+            return False
+
+    def _calc_stopped_samples(self, prev_input_ids: Iterable[int]) -> Iterable[int]:
+        stopped_samples = []
+        for prev_input_ids_slice in prev_input_ids:
+            match = False
+            for stop_token_seq in self.stop_words_ids:
+                if self._tokens_match(prev_input_ids_slice, stop_token_seq):
+                    # if tokens do not match continue
+                    match = True
+                    break
+            stopped_samples.append(match)
+
+        return stopped_samples
+
+
+def top_k_logits(logits, top_k=0, top_p=0.0, filter_value=-float("Inf")):
+    """This function has been mostly taken from huggingface conversational
+    ai code at
+        https://medium.com/huggingface/how-to-build-a-state-of-the-art-
+             conversational-ai-with-transfer-learning-2d818ac26313"""
+
+    if top_k > 0:
+        # Remove all tokens with a probability less than the
+        # last token of the top-k
+        indices_to_remove = logits < torch.topk(logits, top_k)[0][..., -1, None]
+        logits[indices_to_remove] = filter_value
+
+    if top_p > 0.0:
+        # Cconvert to 1D
+        sorted_logits, sorted_indices = torch.sort(logits, descending=True, dim=-1)
+        cumulative_probs = torch.cumsum(F.softmax(sorted_logits, dim=-1), dim=-1)
+
+        # Remove tokens with cumulative probability above the threshold
+        sorted_indices_to_remove = cumulative_probs > top_p
+        # Shift the indices to the right to keep also the first token
+        # above the threshold
+        sorted_indices_to_remove[..., 1:] = sorted_indices_to_remove[..., :-1].clone()
+        sorted_indices_to_remove[..., 0] = 0
+        for i in range(sorted_indices.size(0)):
+            indices_to_remove = sorted_indices[i][sorted_indices_to_remove[i]]
+            logits[i][indices_to_remove] = filter_value
+
+    return logits
+
+
+def switch(val1, val2, boolean):
+    boolean = boolean.type_as(val1)
+    return (1 - boolean) * val1 + boolean * val2
--- a/tokenization_qwen.py
+++ b/tokenization_qwen.py
@@ -0,0 +1,276 @@
+# Copyright (c) Alibaba Cloud.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""Tokenization classes for QWen."""
+
+import base64
+import logging
+import os
+import unicodedata
+from typing import Collection, Dict, List, Set, Tuple, Union
+
+import tiktoken
+from transformers import PreTrainedTokenizer, AddedToken
+
+logger = logging.getLogger(__name__)
+
+
+VOCAB_FILES_NAMES = {"vocab_file": "qwen.tiktoken"}
+
+PAT_STR = r"""(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}| ?[^\s\p{L}\p{N}]+[\r\n]*|\s*[\r\n]+|\s+(?!\S)|\s+"""
+ENDOFTEXT = "<|endoftext|>"
+IMSTART = "<|im_start|>"
+IMEND = "<|im_end|>"
+# as the default behavior is changed to allow special tokens in
+# regular texts, the surface forms of special tokens need to be
+# as different as possible to minimize the impact
+EXTRAS = tuple((f"<|extra_{i}|>" for i in range(205)))
+# changed to use actual index to avoid misconfiguration with vocabulary expansion
+SPECIAL_START_ID = 151643
+SPECIAL_TOKENS = tuple(
+    enumerate(
+        (
+            (
+                ENDOFTEXT,
+                IMSTART,
+                IMEND,
+            )
+            + EXTRAS
+        ),
+        start=SPECIAL_START_ID,
+    )
+)
+SPECIAL_TOKENS_SET = set(t for i, t in SPECIAL_TOKENS)
+
+
+def _load_tiktoken_bpe(tiktoken_bpe_file: str) -> Dict[bytes, int]:
+    with open(tiktoken_bpe_file, "rb") as f:
+        contents = f.read()
+    return {
+        base64.b64decode(token): int(rank)
+        for token, rank in (line.split() for line in contents.splitlines() if line)
+    }
+
+
+class QWenTokenizer(PreTrainedTokenizer):
+    """QWen tokenizer."""
+
+    vocab_files_names = VOCAB_FILES_NAMES
+
+    def __init__(
+        self,
+        vocab_file,
+        errors="replace",
+        extra_vocab_file=None,
+        **kwargs,
+    ):
+        super().__init__(**kwargs)
+
+        # how to handle errors in decoding UTF-8 byte sequences
+        # use ignore if you are in streaming inference
+        self.errors = errors  
+
+        self.mergeable_ranks = _load_tiktoken_bpe(vocab_file)  # type: Dict[bytes, int]
+        self.special_tokens = {
+            token: index
+            for index, token in SPECIAL_TOKENS
+        }
+
+        # try load extra vocab from file
+        if extra_vocab_file is not None:
+            used_ids = set(self.mergeable_ranks.values()) | set(self.special_tokens.values())
+            extra_mergeable_ranks = _load_tiktoken_bpe(extra_vocab_file)
+            for token, index in extra_mergeable_ranks.items():
+                if token in self.mergeable_ranks:
+                    logger.info(f"extra token {token} exists, skipping")
+                    continue
+                if index in used_ids:
+                    logger.info(f'the index {index} for extra token {token} exists, skipping')
+                    continue
+                self.mergeable_ranks[token] = index
+            # the index may be sparse after this, but don't worry tiktoken.Encoding will handle this
+
+        enc = tiktoken.Encoding(
+            "Qwen",
+            pat_str=PAT_STR,
+            mergeable_ranks=self.mergeable_ranks,
+            special_tokens=self.special_tokens,
+        )
+        assert (
+            len(self.mergeable_ranks) + len(self.special_tokens) == enc.n_vocab
+        ), f"{len(self.mergeable_ranks) + len(self.special_tokens)} != {enc.n_vocab} in encoding"
+
+        self.decoder = {
+            v: k for k, v in self.mergeable_ranks.items()
+        }  # type: dict[int, bytes|str]
+        self.decoder.update({v: k for k, v in self.special_tokens.items()})
+
+        self.tokenizer = enc  # type: tiktoken.Encoding
+
+        self.eod_id = self.tokenizer.eot_token
+        self.im_start_id = self.special_tokens[IMSTART]
+        self.im_end_id = self.special_tokens[IMEND]
+
+    def __getstate__(self):
+        # for pickle lovers
+        state = self.__dict__.copy()
+        del state["tokenizer"]
+        return state
+
+    def __setstate__(self, state):
+        # tokenizer is not python native; don't pass it; rebuild it
+        self.__dict__.update(state)
+        enc = tiktoken.Encoding(
+            "Qwen",
+            pat_str=PAT_STR,
+            mergeable_ranks=self.mergeable_ranks,
+            special_tokens=self.special_tokens,
+        )
+        self.tokenizer = enc
+
+    def __len__(self) -> int:
+        return self.tokenizer.n_vocab
+
+    def get_vocab(self) -> Dict[bytes, int]:
+        return self.mergeable_ranks
+
+    def convert_tokens_to_ids(
+        self, tokens: Union[bytes, str, List[Union[bytes, str]]]
+    ) -> List[int]:
+        ids = []
+        if isinstance(tokens, (str, bytes)):
+            if tokens in self.special_tokens:
+                return self.special_tokens[tokens]
+            else:
+                return self.mergeable_ranks.get(tokens)
+        for token in tokens:
+            if token in self.special_tokens:
+                ids.append(self.special_tokens[token])
+            else:
+                ids.append(self.mergeable_ranks.get(token))
+        return ids
+
+    def _add_tokens(
+        self,
+        new_tokens: Union[List[str], List[AddedToken]],
+        special_tokens: bool = False,
+    ) -> int:
+        if not special_tokens and new_tokens:
+            raise ValueError("Adding regular tokens is not supported")
+        for token in new_tokens:
+            surface_form = token.content if isinstance(token, AddedToken) else token
+            if surface_form not in SPECIAL_TOKENS_SET:
+                raise ValueError("Adding unknown special tokens is not supported")
+        return 0
+
+    def save_vocabulary(self, save_directory: str, **kwargs) -> Tuple[str]:
+        """
+        Save only the vocabulary of the tokenizer (vocabulary).
+
+        Returns:
+            `Tuple(str)`: Paths to the files saved.
+        """
+        file_path = os.path.join(save_directory, "qwen.tiktoken")
+        with open(file_path, "w", encoding="utf8") as w:
+            for k, v in self.mergeable_ranks.items():
+                line = base64.b64encode(k).decode("utf8") + " " + str(v) + "\n"
+                w.write(line)
+        return (file_path,)
+
+    def tokenize(
+        self,
+        text: str,
+        allowed_special: Union[Set, str] = "all",
+        disallowed_special: Union[Collection, str] = (),
+        **kwargs,
+    ) -> List[Union[bytes, str]]:
+        """
+        Converts a string in a sequence of tokens.
+
+        Args:
+            text (`str`):
+                The sequence to be encoded.
+            allowed_special (`Literal["all"]` or `set`):
+                The surface forms of the tokens to be encoded as special tokens in regular texts.
+                Default to "all".
+            disallowed_special (`Literal["all"]` or `Collection`):
+                The surface forms of the tokens that should not be in regular texts and trigger errors.
+                Default to an empty tuple.
+
+            kwargs (additional keyword arguments, *optional*):
+                Will be passed to the underlying model specific encode method.
+
+        Returns:
+            `List[bytes|str]`: The list of tokens.
+        """
+        tokens = []
+        text = unicodedata.normalize("NFC", text)
+
+        # this implementation takes a detour: text -> token id -> token surface forms
+        for t in self.tokenizer.encode(
+            text, allowed_special=allowed_special, disallowed_special=disallowed_special
+        ):
+            tokens.append(self.decoder[t])
+        return tokens
+
+    def convert_tokens_to_string(self, tokens: List[Union[bytes, str]]) -> str:
+        """
+        Converts a sequence of tokens in a single string.
+        """
+        text = ""
+        temp = b""
+        for t in tokens:
+            if isinstance(t, str):
+                if temp:
+                    text += temp.decode("utf-8", errors=self.errors)
+                    temp = b""
+                text += t
+            elif isinstance(t, bytes):
+                temp += t
+            else:
+                raise TypeError("token should only be of type types or str")
+        if temp:
+            text += temp.decode("utf-8", errors=self.errors)
+        return text
+
+    @property
+    def vocab_size(self):
+        return self.tokenizer.n_vocab
+
+    def _convert_id_to_token(self, index: int) -> Union[bytes, str]:
+        """Converts an id to a token, special tokens included"""
+        if index in self.decoder:
+            return self.decoder[index]
+        raise ValueError("unknown ids")
+
+    def _convert_token_to_id(self, token: Union[bytes, str]) -> int:
+        """Converts a token to an id using the vocab, special tokens included"""
+        if token in self.special_tokens:
+            return self.special_tokens[token]
+        if token in self.mergeable_ranks:
+            return self.mergeable_ranks[token]
+        raise ValueError("unknown token")
+
+    def _tokenize(self, text: str, **kwargs):
+        """
+        Converts a string in a sequence of tokens (string), using the tokenizer. Split in words for word-based
+        vocabulary or sub-words for sub-word-based vocabularies (BPE/SentencePieces/WordPieces).
+
+        Do NOT take care of added tokens.
+        """
+        raise NotImplementedError
+
+    def _decode(
+        self,
+        token_ids: Union[int, List[int]],
+        skip_special_tokens: bool = False,
+        errors: str = None,
+        **kwargs,
+    ) -> str:
+        if isinstance(token_ids, int):
+            token_ids = [token_ids]
+        if skip_special_tokens:
+            token_ids = [i for i in token_ids if i < self.eod_id]
+        return self.tokenizer.decode(token_ids, errors=errors or self.errors)
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,13 @@
+{
+  "model_max_length": 32768,
+  "tokenizer_class": "QWenTokenizer",
+  "auto_map": {
+    "AutoTokenizer": [
+      "tokenization_qwen.QWenTokenizer",
+      null
+      ]
+  },
+  "bos_token": "<|endoftext|>",
+  "eos_token": "<|endoftext|>",
+  "pad_token": "<|extra_204|>"
+}
				`@@ -0,0 +1 @@`
				`{"framework": "pytorch", "task": "text-generation", "allow_remote": true}`