初始化项目,由ModelHub XC社区提供模型

Model: pfnet/nekomata-7b-pfn-qfin
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-03 22:31:43 +08:00
commit 5235be6ff6
19 changed files with 154445 additions and 0 deletions

39
.gitattributes vendored Normal file
View File

@@ -0,0 +1,39 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
model-00001-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
model-00002-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
model-00003-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
model-00004-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text

53
LICENSE Normal file
View File

@@ -0,0 +1,53 @@
Tongyi Qianwen LICENSE AGREEMENT
Tongyi Qianwen Release Date: August 3, 2023
By clicking to agree or by using or distributing any portion or element of the Tongyi Qianwen Materials, you will be deemed to have recognized and accepted the content of this Agreement, which is effective immediately.
1. Definitions
a. This Tongyi Qianwen LICENSE AGREEMENT (this "Agreement") shall mean the terms and conditions for use, reproduction, distribution and modification of the Materials as defined by this Agreement.
b. "We"(or "Us") shall mean Alibaba Cloud.
c. "You" (or "Your") shall mean a natural person or legal entity exercising the rights granted by this Agreement and/or using the Materials for any purpose and in any field of use.
d. "Third Parties" shall mean individuals or legal entities that are not under common control with Us or You.
e. "Tongyi Qianwen" shall mean the large language models (including Qwen model and Qwen-Chat model), and software and algorithms, consisting of trained model weights, parameters (including optimizer states), machine-learning model code, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Us.
f. "Materials" shall mean, collectively, Alibaba Cloud's proprietary Tongyi Qianwen and Documentation (and any portion thereof) made available under this Agreement.
g. "Source" form shall mean the preferred form for making modifications, including but not limited to model source code, documentation source, and configuration files.
h. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation,
and conversions to other media types.
2. Grant of Rights
You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Alibaba Cloud's intellectual property or other rights owned by Us embodied in the Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Materials.
3. Redistribution
You may reproduce and distribute copies of the Materials or derivative works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
a. You shall give any other recipients of the Materials or derivative works a copy of this Agreement;
b. You shall cause any modified files to carry prominent notices stating that You changed the files;
c. You shall retain in all copies of the Materials that You distribute the following attribution notices within a "Notice" text file distributed as a part of such copies: "Tongyi Qianwen is licensed under the Tongyi Qianwen LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved."; and
d. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such derivative works as a whole, provided Your use, reproduction, and distribution of the work otherwise complies with the terms and conditions of this Agreement.
4. Restrictions
If you are commercially using the Materials, and your product or service has more than 100 million monthly active users, You shall request a license from Us. You cannot exercise your rights under this Agreement without our express authorization.
5. Rules of use
a. The Materials may be subject to export controls or restrictions in China, the United States or other countries or regions. You shall comply with applicable laws and regulations in your use of the Materials.
b. You can not use the Materials or any output therefrom to improve any other large language model (excluding Tongyi Qianwen or derivative works thereof).
6. Intellectual Property
a. We retain ownership of all intellectual property rights in and to the Materials and derivatives made by or for Us. Conditioned upon compliance with the terms and conditions of this Agreement, with respect to any derivative works and modifications of the Materials that are made by you, you are and will be the owner of such derivative works and modifications.
b. No trademark license is granted to use the trade names, trademarks, service marks, or product names of Us, except as required to fulfill notice requirements under this Agreement or as required for reasonable and customary use in describing and redistributing the Materials.
c. If you commence a lawsuit or other proceedings (including a cross-claim or counterclaim in a lawsuit) against Us or any entity alleging that the Materials or any output therefrom, or any part of the foregoing, infringe any intellectual property or other right owned or licensable by you, then all licences granted to you under this Agreement shall terminate as of the date such lawsuit or other proceeding is commenced or brought.
7. Disclaimer of Warranty and Limitation of Liability
a. We are not obligated to support, update, provide training for, or develop any further version of the Tongyi Qianwen Materials or to grant any license thereto.
b. THE MATERIALS ARE PROVIDED "AS IS" WITHOUT ANY EXPRESS OR IMPLIED WARRANTY OF ANY KIND INCLUDING WARRANTIES OF MERCHANTABILITY, NONINFRINGEMENT, OR FITNESS FOR A PARTICULAR PURPOSE. WE MAKE NO WARRANTY AND ASSUME NO RESPONSIBILITY FOR THE SAFETY OR STABILITY OF THE MATERIALS AND ANY OUTPUT THEREFROM.
c. IN NO EVENT SHALL WE BE LIABLE TO YOU FOR ANY DAMAGES, INCLUDING, BUT NOT LIMITED TO ANY DIRECT, OR INDIRECT, SPECIAL OR CONSEQUENTIAL DAMAGES ARISING FROM YOUR USE OR INABILITY TO USE THE MATERIALS OR ANY OUTPUT OF IT, NO MATTER HOW ITS CAUSED.
d. You will defend, indemnify and hold harmless Us from and against any claim by any third party arising out of or related to your use or distribution of the Materials.
8. Survival and Termination.
a. The term of this Agreement shall commence upon your acceptance of this Agreement or access to the Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein.
b. We may terminate this Agreement if you breach any of the terms or conditions of this Agreement. Upon termination of this Agreement, you must delete and cease use of the Materials. Sections 7 and 9 shall survive the termination of this Agreement.
9. Governing Law and Jurisdiction.
a. This Agreement and any dispute arising out of or relating to it will be governed by the laws of China, without regard to conflict of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement.
b. The People's Courts in Hangzhou City shall have exclusive jurisdiction over any dispute arising out of this Agreement.

77
NOTICE Normal file
View File

@@ -0,0 +1,77 @@
------------- LICENSE FOR NVIDIA Megatron-LM code --------------
Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of NVIDIA CORPORATION nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
------------- LICENSE FOR OpenAI tiktoken code --------------
MIT License
Copyright (c) 2022 OpenAI, Shantanu Jain
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
------------- LICENSE FOR PanQiWei AutoGPTQ code --------------
MIT License
Copyright (c) 2023 潘其威(William)
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

114
README.md Normal file
View File

@@ -0,0 +1,114 @@
---
license: other
license_name: tongyi-qianwen-license
license_link: LICENSE
language:
- en
- ja
library_name: transformers
pipeline_tag: text-generation
---
# nekomata-7b-pfn-qfin
## Model Description
nekomata-7b-pfn-qfin is a fine-tuned model based on [rinna/nekomata-7b](https://huggingface.co/rinna/nekomata-7b/tree/main).
This is the base model, which is good at generating continuous sentences for finance.
nekomata-7b-pfn-qfin is fine-tuned on 370M tokens from multiple special datasets generated by Preferred Networks, which is clear to use for commercial usage.
The fine-tuned were carried out at a 2048 context length.
This model is released under [Tongyi Qianwen LICENSE AGREEMENT](https://github.com/QwenLM/Qwen/blob/e8e15962d897714944773cca57fa2e460a3655e8/Tongyi%20Qianwen%20LICENSE%20AGREEMENT).
The research article is available on [arXiv](https://arxiv.org/abs/2404.10555).
# Benchmarking
The benchmark score is obtained using [Japanese Language Model Financial Evaluation Harness](https://github.com/pfnet-research/japanese-lm-fin-harness)
For the benchmark, 0-shot and default prompts are used.
```
| Task |Metric| nekomaba-7b | Ours |
|----------------|------|------|---|------|------|---|------|
|chabsa |f1 |0.8134| | |0.8127| | |
|cma_basics |acc |0.3158|± |0.0764|0.3684|± |0.0793|
|cpa_audit |acc |0.2085|± |0.0203|0.1809|± |0.0193|
|fp2 |acc |0.2484|± |0.0198|0.2674|± |0.0203|
|security_sales_1|acc |0.4912|± |0.0668|0.5088|± |0.0668|
|----------------|------|------|---|------|------|---|------|
|OVER ALL | |0.4155 |0.4276 |
```
## Usage
Install the required libraries as follows:
```sh
>>> python -m pip install numpy sentencepiece torch transformers accelerate transformers_stream_generator tiktoken einops
```
Execute the following python code:
```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("pfnet/nekomata-7b-pfn-qfin", trust_remote_code=True)
# Use GPU with bf16 (recommended for supported devices)
# model = AutoModelForCausalLM.from_pretrained("pfnet/nekomata-7b-pfn-qfin", device_map="auto", trust_remote_code=True, bf16=True)
# Use GPU with fp16
# model = AutoModelForCausalLM.from_pretrained("pfnet/nekomata-7b-pfn-qfin", device_map="auto", trust_remote_code=True, fp16=True)
# Use GPU with fp32
# model = AutoModelForCausalLM.from_pretrained("pfnet/nekomata-7b-pfn-qfin", device_map="auto", trust_remote_code=True, fp32=True)
# Use CPU
# model = AutoModelForCausalLM.from_pretrained("pfnet/nekomata-7b-pfn-qfin", device_map="cpu", trust_remote_code=True)
# Automatically select device and precision
model = AutoModelForCausalLM.from_pretrained("pfnet/nekomata-7b-pfn-qfin", device_map="auto", trust_remote_code=True)
text = "日本銀行は"
input_ids = tokenizer(text, return_tensors="pt").input_ids.to(model.device)
with torch.no_grad():
generated_tokens = model.generate(
inputs=input_ids,
max_new_tokens=32,
do_sample=True,
temperature=1.0,
repetition_penalty=1.1
)[0]
generated_text = tokenizer.decode(generated_tokens)
print(generated_text)
# 日本銀行は、2016年9月に「長短金利操作付き量的・質的金融緩和」を導入し、長期国
```
## Model Details
- Model size: 7b
- Fine-tuned tokens: 370M tokens (Japanese: 300M tokens, English: 13M tokens, Digits: 14M tokens)
- Context length: 2048
- Developed by: Preferred Networks, Inc
- Model type: Causal decoder-only
- Language(s): Japanese and English
- License: [Tongyi Qianwen LICENSE AGREEMENT](https://github.com/QwenLM/Qwen/blob/e8e15962d897714944773cca57fa2e460a3655e8/Tongyi%20Qianwen%20LICENSE%20AGREEMENT)
## Bias, Risks, and Limitations
nekomata-7b-pfn-qfin is a new technology that carries risks with use.
Testing conducted to date has been in English and Japanese, and has not covered, nor could it cover all scenarios.
For these reasons, as with all LLMs, nekomata-7b-pfn-qfins potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts.
This model is not designed for legal, tax, investment, financial, or other advice.
Therefore, before deploying any applications of nekomata-7b-pfn-qfin, developers should perform safety testing and tuning tailored to their specific applications of the model.
## How to cite
```
@misc{hirano2024,
title={Construction of Domain-specified Japanese Large Language Model for Finance through Continual Pre-training},
author={Masanori Hirano and Kentaro Imajo},
year={2024},
eprint={2404.10555},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
## Contributors
Preferred Networks, Inc.
- Masanori Hirano
- Kentaro Imajo
# License
[Tongyi Qianwen LICENSE AGREEMENT](https://github.com/QwenLM/Qwen/blob/e8e15962d897714944773cca57fa2e460a3655e8/Tongyi%20Qianwen%20LICENSE%20AGREEMENT)

42
config.json Normal file
View File

@@ -0,0 +1,42 @@
{
"_name_or_path": "pfnet/nekomata-7b-pfn-qfin",
"architectures": [
"QWenLMHeadModel"
],
"attn_dropout_prob": 0.0,
"auto_map": {
"AutoConfig": "configuration_qwen.QWenConfig",
"AutoModelForCausalLM": "modeling_qwen.QWenLMHeadModel"
},
"bf16": false,
"emb_dropout_prob": 0.0,
"fp16": false,
"fp32": false,
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 22016,
"kv_channels": 128,
"layer_norm_epsilon": 1e-06,
"max_position_embeddings": 32768,
"model_type": "qwen",
"no_bias": true,
"num_attention_heads": 32,
"num_hidden_layers": 32,
"onnx_safe": null,
"rotary_emb_base": 10000,
"rotary_pct": 1.0,
"scale_attn_weights": true,
"seq_length": 8192,
"softmax_in_fp32": false,
"tie_word_embeddings": false,
"tokenizer_class": "QWenTokenizer",
"torch_dtype": "bfloat16",
"transformers_version": "4.40.2",
"use_cache": true,
"use_cache_kernel": false,
"use_cache_quantization": false,
"use_dynamic_ntk": true,
"use_flash_attn": true,
"use_logn_attn": true,
"vocab_size": 151936
}

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}

71
configuration_qwen.py Normal file
View File

@@ -0,0 +1,71 @@
# Copyright (c) Alibaba Cloud.
#
# This source code is licensed under the license found in the
# LICENSE file in the root directory of this source tree.
from transformers import PretrainedConfig
class QWenConfig(PretrainedConfig):
model_type = "qwen"
keys_to_ignore_at_inference = ["past_key_values"]
def __init__(
self,
vocab_size=151936,
hidden_size=4096,
num_hidden_layers=32,
num_attention_heads=32,
emb_dropout_prob=0.0,
attn_dropout_prob=0.0,
layer_norm_epsilon=1e-6,
initializer_range=0.02,
max_position_embeddings=8192,
scale_attn_weights=True,
use_cache=True,
bf16=False,
fp16=False,
fp32=False,
kv_channels=128,
rotary_pct=1.0,
rotary_emb_base=10000,
use_dynamic_ntk=True,
use_logn_attn=True,
use_flash_attn="auto",
intermediate_size=22016,
no_bias=True,
tie_word_embeddings=False,
use_cache_quantization=False,
use_cache_kernel=False,
softmax_in_fp32=False,
**kwargs,
):
self.vocab_size = vocab_size
self.hidden_size = hidden_size
self.intermediate_size = intermediate_size
self.num_hidden_layers = num_hidden_layers
self.num_attention_heads = num_attention_heads
self.emb_dropout_prob = emb_dropout_prob
self.attn_dropout_prob = attn_dropout_prob
self.layer_norm_epsilon = layer_norm_epsilon
self.initializer_range = initializer_range
self.scale_attn_weights = scale_attn_weights
self.use_cache = use_cache
self.max_position_embeddings = max_position_embeddings
self.bf16 = bf16
self.fp16 = fp16
self.fp32 = fp32
self.kv_channels = kv_channels
self.rotary_pct = rotary_pct
self.rotary_emb_base = rotary_emb_base
self.use_dynamic_ntk = use_dynamic_ntk
self.use_logn_attn = use_logn_attn
self.use_flash_attn = use_flash_attn
self.no_bias = no_bias
self.use_cache_quantization = use_cache_quantization
self.use_cache_kernel = use_cache_kernel
self.softmax_in_fp32 = softmax_in_fp32
super().__init__(
tie_word_embeddings=tie_word_embeddings,
**kwargs
)

55
cpp_kernels.py Normal file
View File

@@ -0,0 +1,55 @@
from torch.utils import cpp_extension
import pathlib
import os
import subprocess
def _get_cuda_bare_metal_version(cuda_dir):
raw_output = subprocess.check_output([cuda_dir + "/bin/nvcc", "-V"],
universal_newlines=True)
output = raw_output.split()
release_idx = output.index("release") + 1
release = output[release_idx].split(".")
bare_metal_major = release[0]
bare_metal_minor = release[1][0]
return raw_output, bare_metal_major, bare_metal_minor
def _create_build_dir(buildpath):
try:
os.mkdir(buildpath)
except OSError:
if not os.path.isdir(buildpath):
print(f"Creation of the build directory {buildpath} failed")
# Check if cuda 11 is installed for compute capability 8.0
cc_flag = []
_, bare_metal_major, bare_metal_minor = _get_cuda_bare_metal_version(cpp_extension.CUDA_HOME)
if int(bare_metal_major) >= 11:
cc_flag.append('-gencode')
cc_flag.append('arch=compute_80,code=sm_80')
if int(bare_metal_minor) >= 7:
cc_flag.append('-gencode')
cc_flag.append('arch=compute_90,code=sm_90')
# Build path
srcpath = pathlib.Path(__file__).parent.absolute()
buildpath = srcpath / 'build'
_create_build_dir(buildpath)
def _cpp_extention_load_helper(name, sources, extra_cuda_flags):
return cpp_extension.load(
name=name,
sources=sources,
build_directory=buildpath,
extra_cflags=['-O3', ],
extra_cuda_cflags=['-O3',
'-gencode', 'arch=compute_70,code=sm_70',
'--use_fast_math'] + extra_cuda_flags + cc_flag,
verbose=1
)
extra_flags = []
cache_autogptq_cuda_256_sources = ["./cache_autogptq_cuda_256.cpp",
"./cache_autogptq_cuda_kernel_256.cu"]
cache_autogptq_cuda_256 = _cpp_extention_load_helper("cache_autogptq_cuda_256", cache_autogptq_cuda_256_sources, extra_flags)

4
generation_config.json Normal file
View File

@@ -0,0 +1,4 @@
{
"_from_model_config": true,
"transformers_version": "4.40.2"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f806cb077d5fc8780ca2073576d8ca39bcd8bb11c53bb13c175d3164750f2c24
size 4988485656

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:99d4b7ad5238aa4ab2df00b9bb943701d519b9533dea8425adf3a2e97326a8cd
size 4981246520

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a5bf1156ba496200b11fa19c61bd4ee841caa2dc37808a1d75887434cd3dbb5d
size 4228285288

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:853c927d65a1e982117924b6015a68f1d6bb444d09a027ab31619c93404bf2bd
size 1244659840

View File

@@ -0,0 +1,266 @@
{
"metadata": {
"total_size": 15442649088
},
"weight_map": {
"lm_head.weight": "model-00004-of-00004.safetensors",
"transformer.h.0.attn.c_attn.bias": "model-00001-of-00004.safetensors",
"transformer.h.0.attn.c_attn.weight": "model-00001-of-00004.safetensors",
"transformer.h.0.attn.c_proj.weight": "model-00001-of-00004.safetensors",
"transformer.h.0.ln_1.weight": "model-00001-of-00004.safetensors",
"transformer.h.0.ln_2.weight": "model-00001-of-00004.safetensors",
"transformer.h.0.mlp.c_proj.weight": "model-00001-of-00004.safetensors",
"transformer.h.0.mlp.w1.weight": "model-00001-of-00004.safetensors",
"transformer.h.0.mlp.w2.weight": "model-00001-of-00004.safetensors",
"transformer.h.1.attn.c_attn.bias": "model-00001-of-00004.safetensors",
"transformer.h.1.attn.c_attn.weight": "model-00001-of-00004.safetensors",
"transformer.h.1.attn.c_proj.weight": "model-00001-of-00004.safetensors",
"transformer.h.1.ln_1.weight": "model-00001-of-00004.safetensors",
"transformer.h.1.ln_2.weight": "model-00001-of-00004.safetensors",
"transformer.h.1.mlp.c_proj.weight": "model-00001-of-00004.safetensors",
"transformer.h.1.mlp.w1.weight": "model-00001-of-00004.safetensors",
"transformer.h.1.mlp.w2.weight": "model-00001-of-00004.safetensors",
"transformer.h.10.attn.c_attn.bias": "model-00002-of-00004.safetensors",
"transformer.h.10.attn.c_attn.weight": "model-00002-of-00004.safetensors",
"transformer.h.10.attn.c_proj.weight": "model-00002-of-00004.safetensors",
"transformer.h.10.ln_1.weight": "model-00002-of-00004.safetensors",
"transformer.h.10.ln_2.weight": "model-00002-of-00004.safetensors",
"transformer.h.10.mlp.c_proj.weight": "model-00002-of-00004.safetensors",
"transformer.h.10.mlp.w1.weight": "model-00002-of-00004.safetensors",
"transformer.h.10.mlp.w2.weight": "model-00002-of-00004.safetensors",
"transformer.h.11.attn.c_attn.bias": "model-00002-of-00004.safetensors",
"transformer.h.11.attn.c_attn.weight": "model-00002-of-00004.safetensors",
"transformer.h.11.attn.c_proj.weight": "model-00002-of-00004.safetensors",
"transformer.h.11.ln_1.weight": "model-00002-of-00004.safetensors",
"transformer.h.11.ln_2.weight": "model-00002-of-00004.safetensors",
"transformer.h.11.mlp.c_proj.weight": "model-00002-of-00004.safetensors",
"transformer.h.11.mlp.w1.weight": "model-00002-of-00004.safetensors",
"transformer.h.11.mlp.w2.weight": "model-00002-of-00004.safetensors",
"transformer.h.12.attn.c_attn.bias": "model-00002-of-00004.safetensors",
"transformer.h.12.attn.c_attn.weight": "model-00002-of-00004.safetensors",
"transformer.h.12.attn.c_proj.weight": "model-00002-of-00004.safetensors",
"transformer.h.12.ln_1.weight": "model-00002-of-00004.safetensors",
"transformer.h.12.ln_2.weight": "model-00002-of-00004.safetensors",
"transformer.h.12.mlp.c_proj.weight": "model-00002-of-00004.safetensors",
"transformer.h.12.mlp.w1.weight": "model-00002-of-00004.safetensors",
"transformer.h.12.mlp.w2.weight": "model-00002-of-00004.safetensors",
"transformer.h.13.attn.c_attn.bias": "model-00002-of-00004.safetensors",
"transformer.h.13.attn.c_attn.weight": "model-00002-of-00004.safetensors",
"transformer.h.13.attn.c_proj.weight": "model-00002-of-00004.safetensors",
"transformer.h.13.ln_1.weight": "model-00002-of-00004.safetensors",
"transformer.h.13.ln_2.weight": "model-00002-of-00004.safetensors",
"transformer.h.13.mlp.c_proj.weight": "model-00002-of-00004.safetensors",
"transformer.h.13.mlp.w1.weight": "model-00002-of-00004.safetensors",
"transformer.h.13.mlp.w2.weight": "model-00002-of-00004.safetensors",
"transformer.h.14.attn.c_attn.bias": "model-00002-of-00004.safetensors",
"transformer.h.14.attn.c_attn.weight": "model-00002-of-00004.safetensors",
"transformer.h.14.attn.c_proj.weight": "model-00002-of-00004.safetensors",
"transformer.h.14.ln_1.weight": "model-00002-of-00004.safetensors",
"transformer.h.14.ln_2.weight": "model-00002-of-00004.safetensors",
"transformer.h.14.mlp.c_proj.weight": "model-00002-of-00004.safetensors",
"transformer.h.14.mlp.w1.weight": "model-00002-of-00004.safetensors",
"transformer.h.14.mlp.w2.weight": "model-00002-of-00004.safetensors",
"transformer.h.15.attn.c_attn.bias": "model-00002-of-00004.safetensors",
"transformer.h.15.attn.c_attn.weight": "model-00002-of-00004.safetensors",
"transformer.h.15.attn.c_proj.weight": "model-00002-of-00004.safetensors",
"transformer.h.15.ln_1.weight": "model-00002-of-00004.safetensors",
"transformer.h.15.ln_2.weight": "model-00002-of-00004.safetensors",
"transformer.h.15.mlp.c_proj.weight": "model-00002-of-00004.safetensors",
"transformer.h.15.mlp.w1.weight": "model-00002-of-00004.safetensors",
"transformer.h.15.mlp.w2.weight": "model-00002-of-00004.safetensors",
"transformer.h.16.attn.c_attn.bias": "model-00002-of-00004.safetensors",
"transformer.h.16.attn.c_attn.weight": "model-00002-of-00004.safetensors",
"transformer.h.16.attn.c_proj.weight": "model-00002-of-00004.safetensors",
"transformer.h.16.ln_1.weight": "model-00002-of-00004.safetensors",
"transformer.h.16.ln_2.weight": "model-00002-of-00004.safetensors",
"transformer.h.16.mlp.c_proj.weight": "model-00002-of-00004.safetensors",
"transformer.h.16.mlp.w1.weight": "model-00002-of-00004.safetensors",
"transformer.h.16.mlp.w2.weight": "model-00002-of-00004.safetensors",
"transformer.h.17.attn.c_attn.bias": "model-00002-of-00004.safetensors",
"transformer.h.17.attn.c_attn.weight": "model-00002-of-00004.safetensors",
"transformer.h.17.attn.c_proj.weight": "model-00002-of-00004.safetensors",
"transformer.h.17.ln_1.weight": "model-00002-of-00004.safetensors",
"transformer.h.17.ln_2.weight": "model-00002-of-00004.safetensors",
"transformer.h.17.mlp.c_proj.weight": "model-00002-of-00004.safetensors",
"transformer.h.17.mlp.w1.weight": "model-00002-of-00004.safetensors",
"transformer.h.17.mlp.w2.weight": "model-00002-of-00004.safetensors",
"transformer.h.18.attn.c_attn.bias": "model-00002-of-00004.safetensors",
"transformer.h.18.attn.c_attn.weight": "model-00002-of-00004.safetensors",
"transformer.h.18.attn.c_proj.weight": "model-00002-of-00004.safetensors",
"transformer.h.18.ln_1.weight": "model-00002-of-00004.safetensors",
"transformer.h.18.ln_2.weight": "model-00002-of-00004.safetensors",
"transformer.h.18.mlp.c_proj.weight": "model-00002-of-00004.safetensors",
"transformer.h.18.mlp.w1.weight": "model-00002-of-00004.safetensors",
"transformer.h.18.mlp.w2.weight": "model-00002-of-00004.safetensors",
"transformer.h.19.attn.c_attn.bias": "model-00002-of-00004.safetensors",
"transformer.h.19.attn.c_attn.weight": "model-00002-of-00004.safetensors",
"transformer.h.19.attn.c_proj.weight": "model-00002-of-00004.safetensors",
"transformer.h.19.ln_1.weight": "model-00002-of-00004.safetensors",
"transformer.h.19.ln_2.weight": "model-00002-of-00004.safetensors",
"transformer.h.19.mlp.c_proj.weight": "model-00002-of-00004.safetensors",
"transformer.h.19.mlp.w1.weight": "model-00002-of-00004.safetensors",
"transformer.h.19.mlp.w2.weight": "model-00002-of-00004.safetensors",
"transformer.h.2.attn.c_attn.bias": "model-00001-of-00004.safetensors",
"transformer.h.2.attn.c_attn.weight": "model-00001-of-00004.safetensors",
"transformer.h.2.attn.c_proj.weight": "model-00001-of-00004.safetensors",
"transformer.h.2.ln_1.weight": "model-00001-of-00004.safetensors",
"transformer.h.2.ln_2.weight": "model-00001-of-00004.safetensors",
"transformer.h.2.mlp.c_proj.weight": "model-00001-of-00004.safetensors",
"transformer.h.2.mlp.w1.weight": "model-00001-of-00004.safetensors",
"transformer.h.2.mlp.w2.weight": "model-00001-of-00004.safetensors",
"transformer.h.20.attn.c_attn.bias": "model-00002-of-00004.safetensors",
"transformer.h.20.attn.c_attn.weight": "model-00002-of-00004.safetensors",
"transformer.h.20.attn.c_proj.weight": "model-00002-of-00004.safetensors",
"transformer.h.20.ln_1.weight": "model-00002-of-00004.safetensors",
"transformer.h.20.ln_2.weight": "model-00002-of-00004.safetensors",
"transformer.h.20.mlp.c_proj.weight": "model-00002-of-00004.safetensors",
"transformer.h.20.mlp.w1.weight": "model-00002-of-00004.safetensors",
"transformer.h.20.mlp.w2.weight": "model-00002-of-00004.safetensors",
"transformer.h.21.attn.c_attn.bias": "model-00002-of-00004.safetensors",
"transformer.h.21.attn.c_attn.weight": "model-00002-of-00004.safetensors",
"transformer.h.21.attn.c_proj.weight": "model-00002-of-00004.safetensors",
"transformer.h.21.ln_1.weight": "model-00002-of-00004.safetensors",
"transformer.h.21.ln_2.weight": "model-00002-of-00004.safetensors",
"transformer.h.21.mlp.c_proj.weight": "model-00003-of-00004.safetensors",
"transformer.h.21.mlp.w1.weight": "model-00002-of-00004.safetensors",
"transformer.h.21.mlp.w2.weight": "model-00003-of-00004.safetensors",
"transformer.h.22.attn.c_attn.bias": "model-00003-of-00004.safetensors",
"transformer.h.22.attn.c_attn.weight": "model-00003-of-00004.safetensors",
"transformer.h.22.attn.c_proj.weight": "model-00003-of-00004.safetensors",
"transformer.h.22.ln_1.weight": "model-00003-of-00004.safetensors",
"transformer.h.22.ln_2.weight": "model-00003-of-00004.safetensors",
"transformer.h.22.mlp.c_proj.weight": "model-00003-of-00004.safetensors",
"transformer.h.22.mlp.w1.weight": "model-00003-of-00004.safetensors",
"transformer.h.22.mlp.w2.weight": "model-00003-of-00004.safetensors",
"transformer.h.23.attn.c_attn.bias": "model-00003-of-00004.safetensors",
"transformer.h.23.attn.c_attn.weight": "model-00003-of-00004.safetensors",
"transformer.h.23.attn.c_proj.weight": "model-00003-of-00004.safetensors",
"transformer.h.23.ln_1.weight": "model-00003-of-00004.safetensors",
"transformer.h.23.ln_2.weight": "model-00003-of-00004.safetensors",
"transformer.h.23.mlp.c_proj.weight": "model-00003-of-00004.safetensors",
"transformer.h.23.mlp.w1.weight": "model-00003-of-00004.safetensors",
"transformer.h.23.mlp.w2.weight": "model-00003-of-00004.safetensors",
"transformer.h.24.attn.c_attn.bias": "model-00003-of-00004.safetensors",
"transformer.h.24.attn.c_attn.weight": "model-00003-of-00004.safetensors",
"transformer.h.24.attn.c_proj.weight": "model-00003-of-00004.safetensors",
"transformer.h.24.ln_1.weight": "model-00003-of-00004.safetensors",
"transformer.h.24.ln_2.weight": "model-00003-of-00004.safetensors",
"transformer.h.24.mlp.c_proj.weight": "model-00003-of-00004.safetensors",
"transformer.h.24.mlp.w1.weight": "model-00003-of-00004.safetensors",
"transformer.h.24.mlp.w2.weight": "model-00003-of-00004.safetensors",
"transformer.h.25.attn.c_attn.bias": "model-00003-of-00004.safetensors",
"transformer.h.25.attn.c_attn.weight": "model-00003-of-00004.safetensors",
"transformer.h.25.attn.c_proj.weight": "model-00003-of-00004.safetensors",
"transformer.h.25.ln_1.weight": "model-00003-of-00004.safetensors",
"transformer.h.25.ln_2.weight": "model-00003-of-00004.safetensors",
"transformer.h.25.mlp.c_proj.weight": "model-00003-of-00004.safetensors",
"transformer.h.25.mlp.w1.weight": "model-00003-of-00004.safetensors",
"transformer.h.25.mlp.w2.weight": "model-00003-of-00004.safetensors",
"transformer.h.26.attn.c_attn.bias": "model-00003-of-00004.safetensors",
"transformer.h.26.attn.c_attn.weight": "model-00003-of-00004.safetensors",
"transformer.h.26.attn.c_proj.weight": "model-00003-of-00004.safetensors",
"transformer.h.26.ln_1.weight": "model-00003-of-00004.safetensors",
"transformer.h.26.ln_2.weight": "model-00003-of-00004.safetensors",
"transformer.h.26.mlp.c_proj.weight": "model-00003-of-00004.safetensors",
"transformer.h.26.mlp.w1.weight": "model-00003-of-00004.safetensors",
"transformer.h.26.mlp.w2.weight": "model-00003-of-00004.safetensors",
"transformer.h.27.attn.c_attn.bias": "model-00003-of-00004.safetensors",
"transformer.h.27.attn.c_attn.weight": "model-00003-of-00004.safetensors",
"transformer.h.27.attn.c_proj.weight": "model-00003-of-00004.safetensors",
"transformer.h.27.ln_1.weight": "model-00003-of-00004.safetensors",
"transformer.h.27.ln_2.weight": "model-00003-of-00004.safetensors",
"transformer.h.27.mlp.c_proj.weight": "model-00003-of-00004.safetensors",
"transformer.h.27.mlp.w1.weight": "model-00003-of-00004.safetensors",
"transformer.h.27.mlp.w2.weight": "model-00003-of-00004.safetensors",
"transformer.h.28.attn.c_attn.bias": "model-00003-of-00004.safetensors",
"transformer.h.28.attn.c_attn.weight": "model-00003-of-00004.safetensors",
"transformer.h.28.attn.c_proj.weight": "model-00003-of-00004.safetensors",
"transformer.h.28.ln_1.weight": "model-00003-of-00004.safetensors",
"transformer.h.28.ln_2.weight": "model-00003-of-00004.safetensors",
"transformer.h.28.mlp.c_proj.weight": "model-00003-of-00004.safetensors",
"transformer.h.28.mlp.w1.weight": "model-00003-of-00004.safetensors",
"transformer.h.28.mlp.w2.weight": "model-00003-of-00004.safetensors",
"transformer.h.29.attn.c_attn.bias": "model-00003-of-00004.safetensors",
"transformer.h.29.attn.c_attn.weight": "model-00003-of-00004.safetensors",
"transformer.h.29.attn.c_proj.weight": "model-00003-of-00004.safetensors",
"transformer.h.29.ln_1.weight": "model-00003-of-00004.safetensors",
"transformer.h.29.ln_2.weight": "model-00003-of-00004.safetensors",
"transformer.h.29.mlp.c_proj.weight": "model-00003-of-00004.safetensors",
"transformer.h.29.mlp.w1.weight": "model-00003-of-00004.safetensors",
"transformer.h.29.mlp.w2.weight": "model-00003-of-00004.safetensors",
"transformer.h.3.attn.c_attn.bias": "model-00001-of-00004.safetensors",
"transformer.h.3.attn.c_attn.weight": "model-00001-of-00004.safetensors",
"transformer.h.3.attn.c_proj.weight": "model-00001-of-00004.safetensors",
"transformer.h.3.ln_1.weight": "model-00001-of-00004.safetensors",
"transformer.h.3.ln_2.weight": "model-00001-of-00004.safetensors",
"transformer.h.3.mlp.c_proj.weight": "model-00001-of-00004.safetensors",
"transformer.h.3.mlp.w1.weight": "model-00001-of-00004.safetensors",
"transformer.h.3.mlp.w2.weight": "model-00001-of-00004.safetensors",
"transformer.h.30.attn.c_attn.bias": "model-00003-of-00004.safetensors",
"transformer.h.30.attn.c_attn.weight": "model-00003-of-00004.safetensors",
"transformer.h.30.attn.c_proj.weight": "model-00003-of-00004.safetensors",
"transformer.h.30.ln_1.weight": "model-00003-of-00004.safetensors",
"transformer.h.30.ln_2.weight": "model-00003-of-00004.safetensors",
"transformer.h.30.mlp.c_proj.weight": "model-00003-of-00004.safetensors",
"transformer.h.30.mlp.w1.weight": "model-00003-of-00004.safetensors",
"transformer.h.30.mlp.w2.weight": "model-00003-of-00004.safetensors",
"transformer.h.31.attn.c_attn.bias": "model-00003-of-00004.safetensors",
"transformer.h.31.attn.c_attn.weight": "model-00003-of-00004.safetensors",
"transformer.h.31.attn.c_proj.weight": "model-00003-of-00004.safetensors",
"transformer.h.31.ln_1.weight": "model-00003-of-00004.safetensors",
"transformer.h.31.ln_2.weight": "model-00003-of-00004.safetensors",
"transformer.h.31.mlp.c_proj.weight": "model-00003-of-00004.safetensors",
"transformer.h.31.mlp.w1.weight": "model-00003-of-00004.safetensors",
"transformer.h.31.mlp.w2.weight": "model-00003-of-00004.safetensors",
"transformer.h.4.attn.c_attn.bias": "model-00001-of-00004.safetensors",
"transformer.h.4.attn.c_attn.weight": "model-00001-of-00004.safetensors",
"transformer.h.4.attn.c_proj.weight": "model-00001-of-00004.safetensors",
"transformer.h.4.ln_1.weight": "model-00001-of-00004.safetensors",
"transformer.h.4.ln_2.weight": "model-00001-of-00004.safetensors",
"transformer.h.4.mlp.c_proj.weight": "model-00001-of-00004.safetensors",
"transformer.h.4.mlp.w1.weight": "model-00001-of-00004.safetensors",
"transformer.h.4.mlp.w2.weight": "model-00001-of-00004.safetensors",
"transformer.h.5.attn.c_attn.bias": "model-00001-of-00004.safetensors",
"transformer.h.5.attn.c_attn.weight": "model-00001-of-00004.safetensors",
"transformer.h.5.attn.c_proj.weight": "model-00001-of-00004.safetensors",
"transformer.h.5.ln_1.weight": "model-00001-of-00004.safetensors",
"transformer.h.5.ln_2.weight": "model-00001-of-00004.safetensors",
"transformer.h.5.mlp.c_proj.weight": "model-00001-of-00004.safetensors",
"transformer.h.5.mlp.w1.weight": "model-00001-of-00004.safetensors",
"transformer.h.5.mlp.w2.weight": "model-00001-of-00004.safetensors",
"transformer.h.6.attn.c_attn.bias": "model-00001-of-00004.safetensors",
"transformer.h.6.attn.c_attn.weight": "model-00001-of-00004.safetensors",
"transformer.h.6.attn.c_proj.weight": "model-00001-of-00004.safetensors",
"transformer.h.6.ln_1.weight": "model-00001-of-00004.safetensors",
"transformer.h.6.ln_2.weight": "model-00001-of-00004.safetensors",
"transformer.h.6.mlp.c_proj.weight": "model-00001-of-00004.safetensors",
"transformer.h.6.mlp.w1.weight": "model-00001-of-00004.safetensors",
"transformer.h.6.mlp.w2.weight": "model-00001-of-00004.safetensors",
"transformer.h.7.attn.c_attn.bias": "model-00001-of-00004.safetensors",
"transformer.h.7.attn.c_attn.weight": "model-00001-of-00004.safetensors",
"transformer.h.7.attn.c_proj.weight": "model-00001-of-00004.safetensors",
"transformer.h.7.ln_1.weight": "model-00001-of-00004.safetensors",
"transformer.h.7.ln_2.weight": "model-00001-of-00004.safetensors",
"transformer.h.7.mlp.c_proj.weight": "model-00001-of-00004.safetensors",
"transformer.h.7.mlp.w1.weight": "model-00001-of-00004.safetensors",
"transformer.h.7.mlp.w2.weight": "model-00001-of-00004.safetensors",
"transformer.h.8.attn.c_attn.bias": "model-00001-of-00004.safetensors",
"transformer.h.8.attn.c_attn.weight": "model-00001-of-00004.safetensors",
"transformer.h.8.attn.c_proj.weight": "model-00001-of-00004.safetensors",
"transformer.h.8.ln_1.weight": "model-00001-of-00004.safetensors",
"transformer.h.8.ln_2.weight": "model-00001-of-00004.safetensors",
"transformer.h.8.mlp.c_proj.weight": "model-00001-of-00004.safetensors",
"transformer.h.8.mlp.w1.weight": "model-00001-of-00004.safetensors",
"transformer.h.8.mlp.w2.weight": "model-00001-of-00004.safetensors",
"transformer.h.9.attn.c_attn.bias": "model-00001-of-00004.safetensors",
"transformer.h.9.attn.c_attn.weight": "model-00001-of-00004.safetensors",
"transformer.h.9.attn.c_proj.weight": "model-00002-of-00004.safetensors",
"transformer.h.9.ln_1.weight": "model-00001-of-00004.safetensors",
"transformer.h.9.ln_2.weight": "model-00002-of-00004.safetensors",
"transformer.h.9.mlp.c_proj.weight": "model-00002-of-00004.safetensors",
"transformer.h.9.mlp.w1.weight": "model-00002-of-00004.safetensors",
"transformer.h.9.mlp.w2.weight": "model-00002-of-00004.safetensors",
"transformer.ln_f.weight": "model-00003-of-00004.safetensors",
"transformer.wte.weight": "model-00001-of-00004.safetensors"
}
}

1363
modeling_qwen.py Normal file

File diff suppressed because it is too large Load Diff

151643
qwen.tiktoken Normal file

File diff suppressed because it is too large Load Diff

416
qwen_generation_utils.py Normal file
View File

@@ -0,0 +1,416 @@
# Copyright (c) Alibaba Cloud.
#
# This source code is licensed under the license found in the
# LICENSE file in the root directory of this source tree.
"""Generation support."""
from typing import Tuple, List, Union, Iterable
import numpy as np
import torch
import torch.nn.functional as F
from transformers import PreTrainedTokenizer
from transformers import logging
from transformers.generation import LogitsProcessor
logger = logging.get_logger(__name__)
# Types.
HistoryType = List[Tuple[str, str]]
TokensType = List[int]
BatchTokensType = List[List[int]]
def pad_batch(batch: BatchTokensType, pad_id: int, seq_length: int) -> BatchTokensType:
for tokens in batch:
context_length = len(tokens)
if context_length < seq_length:
tokens.extend([pad_id] * (seq_length - context_length))
return batch
def get_ltor_masks_and_position_ids(
data,
eod_token,
reset_position_ids,
reset_attention_mask,
eod_mask_loss,
):
"""Build masks and position id for left to right model."""
# Extract batch size and sequence length.
micro_batch_size, seq_length = data.size()
# Attention mask (lower triangular).
if reset_attention_mask:
att_mask_batch = micro_batch_size
else:
att_mask_batch = 1
attention_mask = torch.tril(
torch.ones((att_mask_batch, seq_length, seq_length), device=data.device)
).view(att_mask_batch, 1, seq_length, seq_length)
# Loss mask.
loss_mask = torch.ones(data.size(), dtype=torch.float, device=data.device)
if eod_mask_loss:
loss_mask[data == eod_token] = 0.0
# Position ids.
position_ids = torch.arange(seq_length, dtype=torch.long, device=data.device)
position_ids = position_ids.unsqueeze(0).expand_as(data)
# We need to clone as the ids will be modifed based on batch index.
if reset_position_ids:
position_ids = position_ids.clone()
if reset_position_ids or reset_attention_mask:
# Loop through the batches:
for b in range(micro_batch_size):
# Find indecies where EOD token is.
eod_index = position_ids[b, data[b] == eod_token]
# Detach indecies from positions if going to modify positions.
if reset_position_ids:
eod_index = eod_index.clone()
# Loop through EOD indecies:
prev_index = 0
for j in range(eod_index.size()[0]):
i = eod_index[j]
# Mask attention loss.
if reset_attention_mask:
attention_mask[b, 0, (i + 1) :, : (i + 1)] = 0
# Reset positions.
if reset_position_ids:
position_ids[b, (i + 1) :] -= i + 1 - prev_index
prev_index = i + 1
# Convert attention mask to binary:
attention_mask = attention_mask < 0.5
return attention_mask, loss_mask, position_ids
def get_batch(context_tokens: torch.LongTensor, eod_id: int):
"""Generate batch from context tokens."""
# Move to GPU.
tokens = context_tokens.contiguous().to(context_tokens.device)
# Get the attention mask and postition ids.
attention_mask, _, position_ids = get_ltor_masks_and_position_ids(
tokens,
eod_id,
reset_position_ids=False,
reset_attention_mask=False,
eod_mask_loss=False,
)
return tokens, attention_mask, position_ids
def get_stop_words_ids(chat_format, tokenizer):
if chat_format == "raw":
stop_words_ids = [tokenizer.encode("Human:"), [tokenizer.eod_id]]
elif chat_format == "chatml":
stop_words_ids = [[tokenizer.im_end_id], [tokenizer.im_start_id]]
else:
raise NotImplementedError(f"Unknown chat format {chat_format!r}")
return stop_words_ids
def make_context(
tokenizer: PreTrainedTokenizer,
query: str,
history: List[Tuple[str, str]] = None,
system: str = "",
max_window_size: int = 6144,
chat_format: str = "chatml",
):
if history is None:
history = []
if chat_format == "chatml":
im_start, im_end = "<|im_start|>", "<|im_end|>"
im_start_tokens = [tokenizer.im_start_id]
im_end_tokens = [tokenizer.im_end_id]
nl_tokens = tokenizer.encode("\n")
def _tokenize_str(role, content):
return f"{role}\n{content}", tokenizer.encode(
role, allowed_special=set()
) + nl_tokens + tokenizer.encode(content, allowed_special=set())
system_text, system_tokens_part = _tokenize_str("system", system)
system_tokens = im_start_tokens + system_tokens_part + im_end_tokens
raw_text = ""
context_tokens = []
for turn_query, turn_response in reversed(history):
query_text, query_tokens_part = _tokenize_str("user", turn_query)
query_tokens = im_start_tokens + query_tokens_part + im_end_tokens
response_text, response_tokens_part = _tokenize_str(
"assistant", turn_response
)
response_tokens = im_start_tokens + response_tokens_part + im_end_tokens
next_context_tokens = nl_tokens + query_tokens + nl_tokens + response_tokens
prev_chat = (
f"\n{im_start}{query_text}{im_end}\n{im_start}{response_text}{im_end}"
)
current_context_size = (
len(system_tokens) + len(next_context_tokens) + len(context_tokens)
)
if current_context_size < max_window_size:
context_tokens = next_context_tokens + context_tokens
raw_text = prev_chat + raw_text
else:
break
context_tokens = system_tokens + context_tokens
raw_text = f"{im_start}{system_text}{im_end}" + raw_text
context_tokens += (
nl_tokens
+ im_start_tokens
+ _tokenize_str("user", query)[1]
+ im_end_tokens
+ nl_tokens
+ im_start_tokens
+ tokenizer.encode("assistant")
+ nl_tokens
)
raw_text += f"\n{im_start}user\n{query}{im_end}\n{im_start}assistant\n"
elif chat_format == "raw":
raw_text = query
context_tokens = tokenizer.encode(raw_text)
else:
raise NotImplementedError(f"Unknown chat format {chat_format!r}")
return raw_text, context_tokens
def _decode_default(
tokens: List[int],
*,
stop_words: List[str],
eod_words: List[str],
tokenizer: PreTrainedTokenizer,
raw_text_len: int,
verbose: bool = False,
return_end_reason: bool = False,
errors: str='replace',
):
trim_decode_tokens = tokenizer.decode(tokens, errors=errors)[raw_text_len:]
if verbose:
print("\nRaw Generate: ", trim_decode_tokens)
end_reason = f"Gen length {len(tokens)}"
for stop_word in stop_words:
trim_decode_tokens = trim_decode_tokens.replace(stop_word, "").strip()
for eod_word in eod_words:
if eod_word in trim_decode_tokens:
end_reason = f"Gen {eod_word!r}"
trim_decode_tokens = trim_decode_tokens.split(eod_word)[0]
trim_decode_tokens = trim_decode_tokens.strip()
if verbose:
print("\nEnd Reason:", end_reason)
print("\nGenerate: ", trim_decode_tokens)
if return_end_reason:
return trim_decode_tokens, end_reason
else:
return trim_decode_tokens
def _decode_chatml(
tokens: List[int],
*,
stop_words: List[str],
eod_token_ids: List[int],
tokenizer: PreTrainedTokenizer,
raw_text_len: int,
context_length: int,
verbose: bool = False,
return_end_reason: bool = False,
errors: str='replace'
):
end_reason = f"Gen length {len(tokens)}"
eod_token_idx = context_length
for eod_token_idx in range(context_length, len(tokens)):
if tokens[eod_token_idx] in eod_token_ids:
end_reason = f"Gen {tokenizer.decode([tokens[eod_token_idx]])!r}"
break
trim_decode_tokens = tokenizer.decode(tokens[:eod_token_idx], errors=errors)[raw_text_len:]
if verbose:
print("\nRaw Generate w/o EOD:", tokenizer.decode(tokens, errors=errors)[raw_text_len:])
print("\nRaw Generate:", trim_decode_tokens)
print("\nEnd Reason:", end_reason)
for stop_word in stop_words:
trim_decode_tokens = trim_decode_tokens.replace(stop_word, "").strip()
trim_decode_tokens = trim_decode_tokens.strip()
if verbose:
print("\nGenerate:", trim_decode_tokens)
if return_end_reason:
return trim_decode_tokens, end_reason
else:
return trim_decode_tokens
def decode_tokens(
tokens: Union[torch.LongTensor, TokensType],
tokenizer: PreTrainedTokenizer,
raw_text_len: int,
context_length: int,
chat_format: str,
verbose: bool = False,
return_end_reason: bool = False,
errors: str="replace",
) -> str:
if torch.is_tensor(tokens):
tokens = tokens.cpu().numpy().tolist()
if chat_format == "chatml":
return _decode_chatml(
tokens,
stop_words=[],
eod_token_ids=[tokenizer.im_start_id, tokenizer.im_end_id],
tokenizer=tokenizer,
raw_text_len=raw_text_len,
context_length=context_length,
verbose=verbose,
return_end_reason=return_end_reason,
errors=errors,
)
elif chat_format == "raw":
return _decode_default(
tokens,
stop_words=["<|endoftext|>"],
eod_words=["<|endoftext|>"],
tokenizer=tokenizer,
raw_text_len=raw_text_len,
verbose=verbose,
return_end_reason=return_end_reason,
errors=errors,
)
else:
raise NotImplementedError(f"Unknown chat format {chat_format!r}")
class StopWordsLogitsProcessor(LogitsProcessor):
"""
:class:`transformers.LogitsProcessor` that enforces that when specified sequences appear, stop geration.
Args:
stop_words_ids (:obj:`List[List[int]]`):
List of list of token ids of stop ids. In order to get the tokens of the words
that should not appear in the generated text, use :obj:`tokenizer(bad_word,
add_prefix_space=True).input_ids`.
eos_token_id (:obj:`int`):
The id of the `end-of-sequence` token.
"""
def __init__(self, stop_words_ids: Iterable[Iterable[int]], eos_token_id: int):
if not isinstance(stop_words_ids, List) or len(stop_words_ids) == 0:
raise ValueError(
f"`stop_words_ids` has to be a non-emtpy list, but is {stop_words_ids}."
)
if any(not isinstance(bad_word_ids, list) for bad_word_ids in stop_words_ids):
raise ValueError(
f"`stop_words_ids` has to be a list of lists, but is {stop_words_ids}."
)
if any(
any(
(not isinstance(token_id, (int, np.integer)) or token_id < 0)
for token_id in stop_word_ids
)
for stop_word_ids in stop_words_ids
):
raise ValueError(
f"Each list in `stop_words_ids` has to be a list of positive integers, but is {stop_words_ids}."
)
self.stop_words_ids = list(
filter(
lambda bad_token_seq: bad_token_seq != [eos_token_id], stop_words_ids
)
)
self.eos_token_id = eos_token_id
for stop_token_seq in self.stop_words_ids:
assert (
len(stop_token_seq) > 0
), "Stop words token sequences {} cannot have an empty list".format(
stop_words_ids
)
def __call__(
self, input_ids: torch.LongTensor, scores: torch.FloatTensor
) -> torch.FloatTensor:
stopped_samples = self._calc_stopped_samples(input_ids)
for i, should_stop in enumerate(stopped_samples):
if should_stop:
scores[i, self.eos_token_id] = float(2**15)
return scores
def _tokens_match(self, prev_tokens: torch.LongTensor, tokens: List[int]) -> bool:
if len(tokens) == 0:
# if bad word tokens is just one token always ban it
return True
elif len(tokens) > len(prev_tokens):
# if bad word tokens are longer then prev input_ids they can't be equal
return False
elif prev_tokens[-len(tokens) :].tolist() == tokens:
# if tokens match
return True
else:
return False
def _calc_stopped_samples(self, prev_input_ids: Iterable[int]) -> Iterable[int]:
stopped_samples = []
for prev_input_ids_slice in prev_input_ids:
match = False
for stop_token_seq in self.stop_words_ids:
if self._tokens_match(prev_input_ids_slice, stop_token_seq):
# if tokens do not match continue
match = True
break
stopped_samples.append(match)
return stopped_samples
def top_k_logits(logits, top_k=0, top_p=0.0, filter_value=-float("Inf")):
"""This function has been mostly taken from huggingface conversational
ai code at
https://medium.com/huggingface/how-to-build-a-state-of-the-art-
conversational-ai-with-transfer-learning-2d818ac26313"""
if top_k > 0:
# Remove all tokens with a probability less than the
# last token of the top-k
indices_to_remove = logits < torch.topk(logits, top_k)[0][..., -1, None]
logits[indices_to_remove] = filter_value
if top_p > 0.0:
# Cconvert to 1D
sorted_logits, sorted_indices = torch.sort(logits, descending=True, dim=-1)
cumulative_probs = torch.cumsum(F.softmax(sorted_logits, dim=-1), dim=-1)
# Remove tokens with cumulative probability above the threshold
sorted_indices_to_remove = cumulative_probs > top_p
# Shift the indices to the right to keep also the first token
# above the threshold
sorted_indices_to_remove[..., 1:] = sorted_indices_to_remove[..., :-1].clone()
sorted_indices_to_remove[..., 0] = 0
for i in range(sorted_indices.size(0)):
indices_to_remove = sorted_indices[i][sorted_indices_to_remove[i]]
logits[i][indices_to_remove] = filter_value
return logits
def switch(val1, val2, boolean):
boolean = boolean.type_as(val1)
return (1 - boolean) * val1 + boolean * val2

276
tokenization_qwen.py Normal file
View File

@@ -0,0 +1,276 @@
# Copyright (c) Alibaba Cloud.
#
# This source code is licensed under the license found in the
# LICENSE file in the root directory of this source tree.
"""Tokenization classes for QWen."""
import base64
import logging
import os
import unicodedata
from typing import Collection, Dict, List, Set, Tuple, Union
import tiktoken
from transformers import PreTrainedTokenizer, AddedToken
logger = logging.getLogger(__name__)
VOCAB_FILES_NAMES = {"vocab_file": "qwen.tiktoken"}
PAT_STR = r"""(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}| ?[^\s\p{L}\p{N}]+[\r\n]*|\s*[\r\n]+|\s+(?!\S)|\s+"""
ENDOFTEXT = "<|endoftext|>"
IMSTART = "<|im_start|>"
IMEND = "<|im_end|>"
# as the default behavior is changed to allow special tokens in
# regular texts, the surface forms of special tokens need to be
# as different as possible to minimize the impact
EXTRAS = tuple((f"<|extra_{i}|>" for i in range(205)))
# changed to use actual index to avoid misconfiguration with vocabulary expansion
SPECIAL_START_ID = 151643
SPECIAL_TOKENS = tuple(
enumerate(
(
(
ENDOFTEXT,
IMSTART,
IMEND,
)
+ EXTRAS
),
start=SPECIAL_START_ID,
)
)
SPECIAL_TOKENS_SET = set(t for i, t in SPECIAL_TOKENS)
def _load_tiktoken_bpe(tiktoken_bpe_file: str) -> Dict[bytes, int]:
with open(tiktoken_bpe_file, "rb") as f:
contents = f.read()
return {
base64.b64decode(token): int(rank)
for token, rank in (line.split() for line in contents.splitlines() if line)
}
class QWenTokenizer(PreTrainedTokenizer):
"""QWen tokenizer."""
vocab_files_names = VOCAB_FILES_NAMES
def __init__(
self,
vocab_file,
errors="replace",
extra_vocab_file=None,
**kwargs,
):
super().__init__(**kwargs)
# how to handle errors in decoding UTF-8 byte sequences
# use ignore if you are in streaming inference
self.errors = errors
self.mergeable_ranks = _load_tiktoken_bpe(vocab_file) # type: Dict[bytes, int]
self.special_tokens = {
token: index
for index, token in SPECIAL_TOKENS
}
# try load extra vocab from file
if extra_vocab_file is not None:
used_ids = set(self.mergeable_ranks.values()) | set(self.special_tokens.values())
extra_mergeable_ranks = _load_tiktoken_bpe(extra_vocab_file)
for token, index in extra_mergeable_ranks.items():
if token in self.mergeable_ranks:
logger.info(f"extra token {token} exists, skipping")
continue
if index in used_ids:
logger.info(f'the index {index} for extra token {token} exists, skipping')
continue
self.mergeable_ranks[token] = index
# the index may be sparse after this, but don't worry tiktoken.Encoding will handle this
enc = tiktoken.Encoding(
"Qwen",
pat_str=PAT_STR,
mergeable_ranks=self.mergeable_ranks,
special_tokens=self.special_tokens,
)
assert (
len(self.mergeable_ranks) + len(self.special_tokens) == enc.n_vocab
), f"{len(self.mergeable_ranks) + len(self.special_tokens)} != {enc.n_vocab} in encoding"
self.decoder = {
v: k for k, v in self.mergeable_ranks.items()
} # type: dict[int, bytes|str]
self.decoder.update({v: k for k, v in self.special_tokens.items()})
self.tokenizer = enc # type: tiktoken.Encoding
self.eod_id = self.tokenizer.eot_token
self.im_start_id = self.special_tokens[IMSTART]
self.im_end_id = self.special_tokens[IMEND]
def __getstate__(self):
# for pickle lovers
state = self.__dict__.copy()
del state["tokenizer"]
return state
def __setstate__(self, state):
# tokenizer is not python native; don't pass it; rebuild it
self.__dict__.update(state)
enc = tiktoken.Encoding(
"Qwen",
pat_str=PAT_STR,
mergeable_ranks=self.mergeable_ranks,
special_tokens=self.special_tokens,
)
self.tokenizer = enc
def __len__(self) -> int:
return self.tokenizer.n_vocab
def get_vocab(self) -> Dict[bytes, int]:
return self.mergeable_ranks
def convert_tokens_to_ids(
self, tokens: Union[bytes, str, List[Union[bytes, str]]]
) -> List[int]:
ids = []
if isinstance(tokens, (str, bytes)):
if tokens in self.special_tokens:
return self.special_tokens[tokens]
else:
return self.mergeable_ranks.get(tokens)
for token in tokens:
if token in self.special_tokens:
ids.append(self.special_tokens[token])
else:
ids.append(self.mergeable_ranks.get(token))
return ids
def _add_tokens(
self,
new_tokens: Union[List[str], List[AddedToken]],
special_tokens: bool = False,
) -> int:
if not special_tokens and new_tokens:
raise ValueError("Adding regular tokens is not supported")
for token in new_tokens:
surface_form = token.content if isinstance(token, AddedToken) else token
if surface_form not in SPECIAL_TOKENS_SET:
raise ValueError("Adding unknown special tokens is not supported")
return 0
def save_vocabulary(self, save_directory: str, **kwargs) -> Tuple[str]:
"""
Save only the vocabulary of the tokenizer (vocabulary).
Returns:
`Tuple(str)`: Paths to the files saved.
"""
file_path = os.path.join(save_directory, "qwen.tiktoken")
with open(file_path, "w", encoding="utf8") as w:
for k, v in self.mergeable_ranks.items():
line = base64.b64encode(k).decode("utf8") + " " + str(v) + "\n"
w.write(line)
return (file_path,)
def tokenize(
self,
text: str,
allowed_special: Union[Set, str] = "all",
disallowed_special: Union[Collection, str] = (),
**kwargs,
) -> List[Union[bytes, str]]:
"""
Converts a string in a sequence of tokens.
Args:
text (`str`):
The sequence to be encoded.
allowed_special (`Literal["all"]` or `set`):
The surface forms of the tokens to be encoded as special tokens in regular texts.
Default to "all".
disallowed_special (`Literal["all"]` or `Collection`):
The surface forms of the tokens that should not be in regular texts and trigger errors.
Default to an empty tuple.
kwargs (additional keyword arguments, *optional*):
Will be passed to the underlying model specific encode method.
Returns:
`List[bytes|str]`: The list of tokens.
"""
tokens = []
text = unicodedata.normalize("NFC", text)
# this implementation takes a detour: text -> token id -> token surface forms
for t in self.tokenizer.encode(
text, allowed_special=allowed_special, disallowed_special=disallowed_special
):
tokens.append(self.decoder[t])
return tokens
def convert_tokens_to_string(self, tokens: List[Union[bytes, str]]) -> str:
"""
Converts a sequence of tokens in a single string.
"""
text = ""
temp = b""
for t in tokens:
if isinstance(t, str):
if temp:
text += temp.decode("utf-8", errors=self.errors)
temp = b""
text += t
elif isinstance(t, bytes):
temp += t
else:
raise TypeError("token should only be of type types or str")
if temp:
text += temp.decode("utf-8", errors=self.errors)
return text
@property
def vocab_size(self):
return self.tokenizer.n_vocab
def _convert_id_to_token(self, index: int) -> Union[bytes, str]:
"""Converts an id to a token, special tokens included"""
if index in self.decoder:
return self.decoder[index]
raise ValueError("unknown ids")
def _convert_token_to_id(self, token: Union[bytes, str]) -> int:
"""Converts a token to an id using the vocab, special tokens included"""
if token in self.special_tokens:
return self.special_tokens[token]
if token in self.mergeable_ranks:
return self.mergeable_ranks[token]
raise ValueError("unknown token")
def _tokenize(self, text: str, **kwargs):
"""
Converts a string in a sequence of tokens (string), using the tokenizer. Split in words for word-based
vocabulary or sub-words for sub-word-based vocabularies (BPE/SentencePieces/WordPieces).
Do NOT take care of added tokens.
"""
raise NotImplementedError
def _decode(
self,
token_ids: Union[int, List[int]],
skip_special_tokens: bool = False,
errors: str = None,
**kwargs,
) -> str:
if isinstance(token_ids, int):
token_ids = [token_ids]
if skip_special_tokens:
token_ids = [i for i in token_ids if i < self.eod_id]
return self.tokenizer.decode(token_ids, errors=errors or self.errors)

13
tokenizer_config.json Normal file
View File

@@ -0,0 +1,13 @@
{
"model_max_length": 32768,
"tokenizer_class": "QWenTokenizer",
"auto_map": {
"AutoTokenizer": [
"tokenization_qwen.QWenTokenizer",
null
]
},
"bos_token": "<|endoftext|>",
"eos_token": "<|endoftext|>",
"pad_token": "<|extra_204|>"
}