初始化项目,由ModelHub XC社区提供模型

Model: Salesforce/xgen-small-9B-base-r
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-06-21 06:08:14 +08:00
commit 7b07f368cd
21 changed files with 100725 additions and 0 deletions

50
.gitattributes vendored Normal file
View File

@@ -0,0 +1,50 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bin.* filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zstandard filter=lfs diff=lfs merge=lfs -text
*.tfevents* filter=lfs diff=lfs merge=lfs -text
*.db* filter=lfs diff=lfs merge=lfs -text
*.ark* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.gguf* filter=lfs diff=lfs merge=lfs -text
*.ggml filter=lfs diff=lfs merge=lfs -text
*.llamafile* filter=lfs diff=lfs merge=lfs -text
*.pt2 filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text
vocab.json filter=lfs diff=lfs merge=lfs -text

93
README.md Normal file
View File

@@ -0,0 +1,93 @@
---
license: cc-by-nc-4.0
language:
- en
library_name: transformers
---
# Welcome to the xGen-small family!
**xGen-small** ([blog](https://www.salesforce.com/blog/xgen-small-enterprise-ready-small-language-models/), [arXiv](https://arxiv.org/abs/2505.06496)) is an enterprise-ready compact LM that combines domain-focused data curation, scalable pre-training, length-extension, and RL fine-tuning to deliver long-context performance at predictable, low cost.
**This model release is for research purposes only.**
<p align="center">
<img width="60%" src="https://huggingface.co/Salesforce/xgen-small/resolve/main/xgen-small.png?download=true">
</p>
## Model Series
[xGen-small](https://www.salesforce.com/blog/xgen-small-enterprise-ready-small-language-models/) comes in two sizes (4B and 9B) with two variants (pre-trained and post-trained):
| Model | # Total Params | Context Length | Variant | Download |
|---------------------------------------|----------------|----------------|--------------|----------------|
| salesforce/xgen-small-4B-base-r | 4B | 128k | Pre-trained | [🤗 Link](https://huggingface.co/Salesforce/xgen-small-4b-base-r) |
| salesforce/xgen-small-4B-instruct-r | 4B | 128k | Post-trained | [🤗 Link](https://huggingface.co/Salesforce/xgen-small-4b-instruct-r) |
| salesforce/xgen-small-9B-base-r | 9B | 128k | Pre-trained | [🤗 Link](https://huggingface.co/Salesforce/xgen-small-9b-base-r) |
| salesforce/xgen-small-9B-instruct-r | 9B | 128k | Post-trained | [🤗 Link](https://huggingface.co/Salesforce/xgen-small-9b-instruct-r) |
## Usage
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Salesforce/xgen-small-9B-base-r"
tokenizer = AutoTokenizer.from_pretrained(model_name)
device = "cuda" if torch.cuda.is_available() else "cpu"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto"
).to(device)
prompt = "What is Salesforce?"
inputs = tokenizer(
prompt,
return_tensors="pt",
padding=False,
truncation=True
).to(device)
generated = model.generate(**inputs, max_new_tokens=32)
output = tokenizer.decode(
generated[0],
skip_special_tokens=True,
)
print(output)
```
## Evaluation
| Category | Task | Llama 3.1-8B | Granite 3.3-8B | Qwen2.5-7B | xGen-small 9B Base|
| :------------------------------- | :------------- | :----------- | :------------- | :--------- | :-----------------|
| General Knowledge & Reasoning | ARC-Challenge | 58.0 | 62.5 | 63.7 | 67.4 |
| General Knowledge & Reasoning | Big-Bench Hard | 46.3 | 46.8 | 53.6 | 58.2 |
| General Knowledge & Reasoning | HellaSwag | 81.8 | 83.0 | 80.0 | 83.7 |
| General Knowledge & Reasoning | MMLU | 65.1 | 62.7 | 74.2 | 71.1 |
| General Knowledge & Reasoning | MMLU-Pro | 32.7 | 31.3 | 43.7 | 39.8 |
| General Knowledge & Reasoning | TruthfulQA | 45.2 | 52.2 | 56.4 | 48.6 |
| General Knowledge & Reasoning | WinoGrande | 76.9 | 80.3 | 76.1 | 78.6 |
| Math & Science | GPQA | 31.9 | 30.3 | 31.4 | 32.0 |
| Math & Science | GSM8K | 55.6 | 61.4 | 79.1 | 83.2 |
| Math & Science | MATH | 22.0 | 30.9 | 50.2 | 52.5 |
| Coding | HumanEval | 37.3 | 38.9 | 55.2 | 53.9 |
| Coding | HumanEval+ | 31.4 | 34.3 | 47.7 | 47.9 |
| Coding | MBPP | 45.0 | 43.5 | 57.1 | 50.1 |
| Coding | MBPP+ | 51.0 | 48.1 | 64.8 | 57.6 |
## Citation
```bibtex
@misc{xgensmall,
title={xGen-small Technical Report},
author={Erik Nijkamp and Bo Pang and Egor Pakhomov and Akash Gokul and Jin Qu and Silvio Savarese and Yingbo Zhou and Caiming Xiong},
year={2025},
eprint={2505.06496},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.06496},
}
```
## Ethical Considerations
This release is for research purposes only in support of an academic paper. Our models, datasets, and code are not specifically designed or evaluated for all downstream purposes. We strongly recommend users evaluate and address potential concerns related to accuracy, safety, and fairness before deploying this model. We encourage users to consider the common limitations of AI, comply with applicable laws, and leverage best practices when selecting use cases, particularly for high-risk scenarios where errors or misuse could significantly impact people's lives, rights, or safety. For further guidance on use cases, refer to our AUP and AI AUP.
## Model Licenses
The models are being released under CC-BY-NC-4.0, Copyright © Salesforce, Inc. All Rights Reserved.

5
added_tokens.json Normal file
View File

@@ -0,0 +1,5 @@
{
"<|endofprompt|>": 100276,
"<|im_end|>": 100265,
"<|im_start|>": 100264
}

29
config.json Normal file
View File

@@ -0,0 +1,29 @@
{
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 262144,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 45,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 128000000,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.51.3",
"use_cache": true,
"vocab_size": 102400
}

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}

6
generation_config.json Normal file
View File

@@ -0,0 +1,6 @@
{
"_from_model_config": true,
"bos_token_id": 1,
"eos_token_id": 2,
"transformers_version": "4.51.3"
}

100001
merges.txt Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9adec15d39eacf3b14af095e547a8cbc1bcc7f340f4058755737711e7b507dc9
size 4932603736

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:3dd0bc487e12ef295136c084a53a41db124efa3ab53c7f93008d12e4526a001b
size 4999813072

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:707777bd4dd9f616369781ae6fd17646192638f078dcdd0b866e8fc7f8d8ddb5
size 4999813112

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:458ea8f8151dfa4a062ea2edbcf3ed78514e836a0820ea7121cb83ecaf41691a
size 4832007496

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0837695d0e746a95452822b94a358d52eba9cc5b85c743f0e453eb1a742e3909
size 4999813120

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:41f56f12decd8631c23c55930a7307d2394ffec9a9daf8f6bb4bd2d58decc3b5
size 4999813128

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:19e70a5771505c37ce51d41f4cfa83002a7d02d61605d21d23f265330c01c4da
size 4832007496

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c28b2a5f9dfb7046e41630db55ab3a426db9bf19f4a12135404eb7107151f82a
size 4999813120

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0b750272b65be17986f6017a171ed14e6dc40555914781fb977cf86e43493e92
size 3019982520

View File

@@ -0,0 +1,415 @@
{
"metadata": {
"total_size": 42615619584
},
"weight_map": {
"lm_head.weight": "model-00009-of-00009.safetensors",
"model.embed_tokens.weight": "model-00001-of-00009.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00009.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00009.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00009.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00009.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00009.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00009.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00009.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00009.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00009.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00009.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00009.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00009.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00009.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00009.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00009.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00009.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00009.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00009.safetensors",
"model.layers.10.input_layernorm.weight": "model-00003-of-00009.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00003-of-00009.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.11.input_layernorm.weight": "model-00003-of-00009.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00003-of-00009.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.12.input_layernorm.weight": "model-00003-of-00009.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00003-of-00009.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.13.input_layernorm.weight": "model-00003-of-00009.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00003-of-00009.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.14.input_layernorm.weight": "model-00003-of-00009.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00003-of-00009.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.15.input_layernorm.weight": "model-00004-of-00009.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00004-of-00009.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.16.input_layernorm.weight": "model-00004-of-00009.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00004-of-00009.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.17.input_layernorm.weight": "model-00004-of-00009.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00004-of-00009.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.18.input_layernorm.weight": "model-00004-of-00009.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00004-of-00009.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.19.input_layernorm.weight": "model-00004-of-00009.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00004-of-00009.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00009.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00009.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00009.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00009.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00009.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00009.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00009.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00009.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00009.safetensors",
"model.layers.20.input_layernorm.weight": "model-00005-of-00009.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00005-of-00009.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00004-of-00009.safetensors",
"model.layers.21.input_layernorm.weight": "model-00005-of-00009.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00005-of-00009.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.22.input_layernorm.weight": "model-00005-of-00009.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00005-of-00009.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.23.input_layernorm.weight": "model-00005-of-00009.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00005-of-00009.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.24.input_layernorm.weight": "model-00005-of-00009.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00005-of-00009.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.25.input_layernorm.weight": "model-00005-of-00009.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00005-of-00009.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.26.input_layernorm.weight": "model-00006-of-00009.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00006-of-00009.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00005-of-00009.safetensors",
"model.layers.27.input_layernorm.weight": "model-00006-of-00009.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00006-of-00009.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.28.input_layernorm.weight": "model-00006-of-00009.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00006-of-00009.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.29.input_layernorm.weight": "model-00006-of-00009.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00006-of-00009.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.3.input_layernorm.weight": "model-00002-of-00009.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00009.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00009.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00002-of-00009.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00009.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00009.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00009.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00009.safetensors",
"model.layers.30.input_layernorm.weight": "model-00006-of-00009.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00006-of-00009.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.31.input_layernorm.weight": "model-00006-of-00009.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00006-of-00009.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.32.input_layernorm.weight": "model-00007-of-00009.safetensors",
"model.layers.32.mlp.down_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.32.mlp.gate_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.32.mlp.up_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.32.post_attention_layernorm.weight": "model-00007-of-00009.safetensors",
"model.layers.32.self_attn.k_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.32.self_attn.o_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.32.self_attn.q_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.32.self_attn.v_proj.weight": "model-00006-of-00009.safetensors",
"model.layers.33.input_layernorm.weight": "model-00007-of-00009.safetensors",
"model.layers.33.mlp.down_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.33.mlp.gate_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.33.mlp.up_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.33.post_attention_layernorm.weight": "model-00007-of-00009.safetensors",
"model.layers.33.self_attn.k_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.33.self_attn.o_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.33.self_attn.q_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.33.self_attn.v_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.34.input_layernorm.weight": "model-00007-of-00009.safetensors",
"model.layers.34.mlp.down_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.34.mlp.gate_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.34.mlp.up_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.34.post_attention_layernorm.weight": "model-00007-of-00009.safetensors",
"model.layers.34.self_attn.k_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.34.self_attn.o_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.34.self_attn.q_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.34.self_attn.v_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.35.input_layernorm.weight": "model-00007-of-00009.safetensors",
"model.layers.35.mlp.down_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.35.mlp.gate_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.35.mlp.up_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.35.post_attention_layernorm.weight": "model-00007-of-00009.safetensors",
"model.layers.35.self_attn.k_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.35.self_attn.o_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.35.self_attn.q_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.35.self_attn.v_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.36.input_layernorm.weight": "model-00007-of-00009.safetensors",
"model.layers.36.mlp.down_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.36.mlp.gate_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.36.mlp.up_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.36.post_attention_layernorm.weight": "model-00007-of-00009.safetensors",
"model.layers.36.self_attn.k_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.36.self_attn.o_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.36.self_attn.q_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.36.self_attn.v_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.37.input_layernorm.weight": "model-00008-of-00009.safetensors",
"model.layers.37.mlp.down_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.37.mlp.gate_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.37.mlp.up_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.37.post_attention_layernorm.weight": "model-00008-of-00009.safetensors",
"model.layers.37.self_attn.k_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.37.self_attn.o_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.37.self_attn.q_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.37.self_attn.v_proj.weight": "model-00007-of-00009.safetensors",
"model.layers.38.input_layernorm.weight": "model-00008-of-00009.safetensors",
"model.layers.38.mlp.down_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.38.mlp.gate_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.38.mlp.up_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.38.post_attention_layernorm.weight": "model-00008-of-00009.safetensors",
"model.layers.38.self_attn.k_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.38.self_attn.o_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.38.self_attn.q_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.38.self_attn.v_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.39.input_layernorm.weight": "model-00008-of-00009.safetensors",
"model.layers.39.mlp.down_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.39.mlp.gate_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.39.mlp.up_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.39.post_attention_layernorm.weight": "model-00008-of-00009.safetensors",
"model.layers.39.self_attn.k_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.39.self_attn.o_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.39.self_attn.q_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.39.self_attn.v_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.4.input_layernorm.weight": "model-00002-of-00009.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00002-of-00009.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.40.input_layernorm.weight": "model-00008-of-00009.safetensors",
"model.layers.40.mlp.down_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.40.mlp.gate_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.40.mlp.up_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.40.post_attention_layernorm.weight": "model-00008-of-00009.safetensors",
"model.layers.40.self_attn.k_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.40.self_attn.o_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.40.self_attn.q_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.40.self_attn.v_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.41.input_layernorm.weight": "model-00008-of-00009.safetensors",
"model.layers.41.mlp.down_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.41.mlp.gate_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.41.mlp.up_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.41.post_attention_layernorm.weight": "model-00008-of-00009.safetensors",
"model.layers.41.self_attn.k_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.41.self_attn.o_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.41.self_attn.q_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.41.self_attn.v_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.42.input_layernorm.weight": "model-00008-of-00009.safetensors",
"model.layers.42.mlp.down_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.42.mlp.gate_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.42.mlp.up_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.42.post_attention_layernorm.weight": "model-00008-of-00009.safetensors",
"model.layers.42.self_attn.k_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.42.self_attn.o_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.42.self_attn.q_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.42.self_attn.v_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.43.input_layernorm.weight": "model-00009-of-00009.safetensors",
"model.layers.43.mlp.down_proj.weight": "model-00009-of-00009.safetensors",
"model.layers.43.mlp.gate_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.43.mlp.up_proj.weight": "model-00009-of-00009.safetensors",
"model.layers.43.post_attention_layernorm.weight": "model-00009-of-00009.safetensors",
"model.layers.43.self_attn.k_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.43.self_attn.o_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.43.self_attn.q_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.43.self_attn.v_proj.weight": "model-00008-of-00009.safetensors",
"model.layers.44.input_layernorm.weight": "model-00009-of-00009.safetensors",
"model.layers.44.mlp.down_proj.weight": "model-00009-of-00009.safetensors",
"model.layers.44.mlp.gate_proj.weight": "model-00009-of-00009.safetensors",
"model.layers.44.mlp.up_proj.weight": "model-00009-of-00009.safetensors",
"model.layers.44.post_attention_layernorm.weight": "model-00009-of-00009.safetensors",
"model.layers.44.self_attn.k_proj.weight": "model-00009-of-00009.safetensors",
"model.layers.44.self_attn.o_proj.weight": "model-00009-of-00009.safetensors",
"model.layers.44.self_attn.q_proj.weight": "model-00009-of-00009.safetensors",
"model.layers.44.self_attn.v_proj.weight": "model-00009-of-00009.safetensors",
"model.layers.5.input_layernorm.weight": "model-00002-of-00009.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00002-of-00009.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.6.input_layernorm.weight": "model-00002-of-00009.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00002-of-00009.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.7.input_layernorm.weight": "model-00002-of-00009.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00002-of-00009.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.8.input_layernorm.weight": "model-00002-of-00009.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00002-of-00009.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.9.input_layernorm.weight": "model-00003-of-00009.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00003-of-00009.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00003-of-00009.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00002-of-00009.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00002-of-00009.safetensors",
"model.norm.weight": "model-00009-of-00009.safetensors"
}
}

23
special_tokens_map.json Normal file
View File

@@ -0,0 +1,23 @@
{
"bos_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"unk_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

3
tokenizer.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2bdbab692e14b23f1f9daee01196cd0a29c7b31dc22e47f5d5a77dda1b73632b
size 7133808

69
tokenizer_config.json Normal file
View File

@@ -0,0 +1,69 @@
{
"add_prefix_space": false,
"added_tokens_decoder": {
"100257": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"100258": {
"content": "<|fim_prefix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"100259": {
"content": "<|fim_middle|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"100260": {
"content": "<|fim_suffix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"100264": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"100265": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"100276": {
"content": "<|endofprompt|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
}
},
"bos_token": "<|endoftext|>",
"chat_template": "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '\n<|im_end|>' + '\n'}}{% endfor %}{{ '<|im_start|>assistant\n' }}",
"clean_up_tokenization_spaces": false,
"eos_token": "<|endoftext|>",
"extra_special_tokens": {},
"model_max_length": 16384,
"tokenizer_class": "GPT2Tokenizer",
"unk_token": "<|endoftext|>"
}

3
vocab.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:240b7e18e79826df8c0dbeff7133a3ab4708d7958fede6e27e9409e09159375f
size 1610691