初始化项目,由ModelHub XC社区提供模型

Model: cjvt/GaMS-1B
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-06-12 23:10:07 +08:00
commit 9ae86cbd0b
10 changed files with 228495 additions and 0 deletions

35
.gitattributes vendored Normal file
View File

@@ -0,0 +1,35 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text

148
README.md Normal file
View File

@@ -0,0 +1,148 @@
---
library_name: transformers
license: apache-2.0
language:
- en
- sl
- hr
- sr
- bs
pipeline_tag: text-generation
---
# Model Card for GaMS-1B
We proudly present the family of GaMS (Generative Model for Slovene) models. The 1B version is based on [Facebook's OPT model](https://huggingface.co/facebook/opt-1.3b) and is adapted for Slovene. GaMS-1B uses a BPE tokenizer with a vocabulary size of 80.000. The tokenizer was trained on Slovene, English, and Croatian data.
## Acknowledgment
The model was developed within the [PoVeJMo](https://www.cjvt.si/povejmo/en/project/) research program (Adaptive Natural Language Processing with Large Language Models), particularly within the research project titled SloLLaMai -- Open-access computationally efficient models for Slovenian. The program is funded within the Recovery and Resilience Plan by the Slovenian Research and Innovation Agency (ARIS) and NextGenerationEU. The authors also acknowledge the financial support from the Slovenian Research and Innovation Agency (research core funding No. P6-0411 -- Language Resources and Technologies for Slovene).
We thank everyone who worked on data collection and preparation, enabling us to train our model. Special thanks go to Nikola Ljubešić, Tjaša Arčon, Jaka Čibej, Simon Krek, Tomaž Erjavec and Iztok Kosem.
## Basic information
- **Developed by:** team of researchers at the University of Ljubljana, Faculty for Computer and Information Science and XLAB.doo. Team members: Domen Vreš, Martin Božič, Aljaž Potočnik, Tomaž Martinčič, Iztok Lebar Bajec, Timotej Petrič and Marko Robnik-Šikonja.
- **Languages:** Slovene (primary), English, Croatian, Bosnian and Serbian (secondary)
- **License:** Apache 2.0
- **Repository:** https://github.com/SloLama/NeMo
- **Paper:** https://www.sdjt.si/wp/wp-content/uploads/2024/09/JT-DH-2024_Vres_Bozic_Potocnik_Martincic_Robnik.pdf
## Intended usage
This version of the model is quite small and lacks instruction and safety tuning. Hence, using it as a general-purpose model is **STRONGLY DISCOURAGED!** The model might also contain certain biases. We do not recommend the usage of this model in any other language than Slovene.
The model can be efficiently tuned for specific use cases as suggested by promising results of fine-tuned models on SuperGLUE and SI-NLI benchmarks
## How to get started with the model
The inference can be done using the following snippet of code:
```python
from transformers import pipeline
model_id = ("cjvt/GaMS-1B")
pline = pipeline(
"text-generation",
model=model_id,
device_map="auto"
)
prompts = [
"The examples of antonyms are:\nhigh => low\nwide => narrow\nbig =>",
"Pristanek je bil prvi nadzorovani spust ameriškega vesoljskega plovila na površje Lune po Apollu 17 leta 1972, ko je na Luni pristala zadnja Nasina misija s posadko.\nDoslej so na Luni pristala vesoljska plovila le iz štirih drugih držav ",
"U četvrtak je bila prva polufinalna večer Dore, a komentari na društvenim mrežama ne prestaju. U nedjeljno finale prošli su:"
]
sequences = pline(
prompts,
max_length=1000,
do_sample=False,
num_return_sequences=1
)
for seq in sequences:
print("--------------------------")
print(f"Result: {seq[0]['generated_text']}")
print("--------------------------\n")
```
## Training details
### Training data
The model was additionally pretrained on the following Slovene, English, and Croatian-Bosnian-Serbian (CBS) corpora:
| Corpus | Language | # Tokens | Percentage |
| :----- | :------- | :------: | :--------: |
| MetaFida | Slovene | 3.35 B | 11.9 % |
| KAS | Slovene | 1.66 B | 5.89 % |
| Trendi | Slovene | 0.68 B | 2.4 % |
| mC4 | Slovene | 2.88 B | 10.25 % |
| MaCoCu | Slovene | 2.34 B | 8.3 % |
| CC100 | Slovene | 0.29 B | 1.02 % |
| Riznica | Croatian | 0.11 B | 0.39 % |
| Hr News | Croatian | 2.14 B | 7.59 % |
| MaCoCu HBS | CBS | 8.63 B | 30.69 % |
| Wikipedia | English | 5.61 B | 19.93 % |
| CC-News | English | 0.46 B | 1.64 % |
The total size of additional training data is **28.13 B** tokens.
### Training Procedure
The model was trained using the NeMo framework on Slovene HPC Vega, utilizing 64 A100 GPUs simultaneously. The model was trained on 4 epochs. WECHSEL initialization method was used to initialize the embedding matrix of the new vocabulary. All layers apart from the embedding and the output layer were frozen during the first epoch to avoid forgetting. Training took approximately 60 hours. The model was trained with batch size 1024 (2 million tokens) using Adam optimizer and cosine learning rate scheduler with 10.000 warmup and 5.000 constant steps.
## Evaluation
The models were evaluated using [Slovene SuperGLUE](https://slobench.cjvt.si/leaderboard/view/3) and [SI-NLI](https://slobench.cjvt.si/leaderboard/view/9) tasks on [SloBench](https://slobench.cjvt.si). Additionally, the models were evaluated on an improved version of the Slovenian-LLM-eval introduced by Aleksa Gordić. All decoder-type models were evaluated using few-shot prompts and were not finetuned on the benchmark (except for the versions with finetuned in the name).
### SuperGLUE results
| Model | SuperGLUE Average | BoolQ Accuracy | CB Accuracy | CB F1 Score | CB Average | COPA Accuracy | MultiRC EM | MultiRC F1a Score | MultiRC Average | RTE Accuracy | WSC Accuracy |
| :---- | :---------------: | :------------: | :---------: | :---------: | :--------: | :-----------: | :--------: | :---------------: | :-------------: | :----------: | :----------: |
| OPT_GaMS-1B | 0.4408 | 0.5667 | 0.5040 | 0.3885 | 0.4463 | 0.5020 | 0.0961 | 0.2543 | 0.1752 | 0.4138 | 0.5411 |
| GaMS-1B | 0.4604 | 0.5000 | 0.6200 | 0.4565 | 0.5382 | 0.4920 | 0.1351 | 0.2675 | 0.2013 | 0.4828 | 0.5479 |
| OPT_GaMS-1B-Chat | 0.4165 | 0.7000 | 0.3720 | 0.2961 | 0.3341 | 0.4600 | 0.1111 | 0.3448 | 0.2280 | 0.4138 | 0.3630 |
| GaMS-1B-Chat | 0.4570 | **0.8000** | 0.4880 | 0.3023 | 0.3951 | 0.4840 | 0.1081 | 0.2428 | 0.1755 | 0.5172 | 0.3699 |
| OPT_GaMS-1B-Chat finetuned | 0.5645 | 0.7000 | 0.8040 | 0.5884 | 0.6962 | 0.5860 | 0.1021 | 0.4808 | 0.2914 | 0.5862 | 0.5274 |
| GaMS-1B-Chat finetuned | 0.5806 | 0.7333 | **0.8120** | 0.5592 | 0.6856 | 0.5080 | 0.1381 | 0.4882 | 0.3132 | 0.5862 | **0.6575** |
| SlovenianGPT-Chat* | 0.5078 | 0.7333 | 0.3920 | 0.3829 | 0.3874 | **0.6840** | **0.2432** | 0.4944 | **0.3688** | 0.5172 | 0.3562 |
| CroSloEngual BERT | **0.6078** | 0.7333 | 0.7920 | **0.7437** | **0.7679** | 0.5720 | 0.0931 | **0.5241** | 0.3086 | **0.6552** | 0.6096 |
*SlovenianGPT-Chat was obtained by instruction-tuning Aleksa Gordić's [SlovenianGPT](https://huggingface.co/gordicaleksa/SlovenianGPT) on our instruction dataset.
### SI-NLI results
| Model | Accuracy | P(entailment) | R(entailment) | F1(entailment) | P(neutral) | R(neutral) | F1(neutral) | P(contradiction) | R(contradiction) | F1(contradiction) |
| :---- | :------: | :-----------: | :-----------: | :------------: | :--------: | :---------: | :---------: | :---------------: | :---------------: | :----------------: |
| OPT_GaMS-1B | 0.3277 | 0.3407 | 0.6754 | 0.4529 | 0.3538 | 0.1402 | 0.2009 | 0.2632 | 0.1524 | 0.1931 |
| GaMS-1B | 0.3317 | 0.3418 | 0.4327 | 0.3819 | 0.3353 | 0.5122 | 0.4053 | 0.2344 | 0.0457 | 0.0765 |
| OPT_GaMS-1B-Chat | 0.3447 | 0.3515 | 0.6784 | 0.4631 | 0.3386 | 0.3293 | 0.3338 | 0.2105 | 0.0122 | 0.0231 |
| GaMS-1B-Chat | 0.3417 | 0.3405 | **0.9737** | 0.5045 | 0.2857 | 0.0061 | 0.0119 | 0.4615 | 0.0183 | 0.0352 |
| OPT_GaMS-1B-Chat finetuned | 0.7244 | 0.7065 | 0.8304 | 0.7634 | 0.7269 | 0.6006 | 0.6578 | 0.7446 | 0.7378 | 0.7412 |
| GaMS-1B-Chat finetuned | 0.7144 | 0.8037 | 0.6345 | 0.7092 | 0.7247 | 0.6341 | 0.6764 | 0.6531 | **0.8780** | 0.7490 |
| SlovenianGPT-Chat* | 0.4729 | 0.4399 | 0.7281 | 0.5485 | 0.3719 | 0.1372 | 0.2004 | 0.5723 | 0.5427 | 0.5571 |
| GPT-3.5-Turbo finetuned | **0.8567** | **0.8464** | 0.8538 | **0.8501** | **0.8041** | **0.8384** | **0.8209** | **0.9260** | **0.8780** | **0.9014** |
| SloBERTa | 0.7375 | 0.8127 | 0.7105 | 0.7582 | 0.6844 | 0.7470 | 0.7143 | 0.7273 | 0.7561 | 0.7414 |
| CroSloEngual BERT | 0.6623 | 0.7147 | 0.6667 | 0.6899 | 0.6072 | 0.6646 | 0.6346 | 0.6719 | 0.6555 | 0.6636 |
*SlovenianGPT-Chat was obtained by instruction-tuning Aleksa Gordić's [SlovenianGPT](https://huggingface.co/gordicaleksa/SlovenianGPT) on our instruction dataset.
### Slovenian-LLM-eval results
| Model | ARC-Challenge Accuracy | ARC-Easy Accuracy | BoolQ Accuracy | HellaSwag Accuracy | NQ-Open EM | OpenBookQA Accuracy | PIQA Accuracy | WinoGrande Accuracy |
| :---- | :--------------------: | :---------------: | :------------: | :----------------: | :--------------: | :-----------------: | :-----------: | :-----------------: |
| OPT_GaMS-1B | 0.2227 ± 0.0122 | 0.436 ± 0.0102 | 0.378 ± 0.0085 | 0.3394 ± 0.0047 | 0.0003 ± 0.0003 | 0.214 ± 0.0184 | 0.6083 ± 0.0114 | 0.5533 ± 0.014 |
| GaMS-1B | 0.2329 ± 0.0124 | 0.4743 ± 0.0102 | 0.3813 ± 0.0085 | 0.3555 ± 0.0048 | 0.0036 ± 0.001 | 0.22 ± 0.0185 | 0.624 ± 0.0113 | 0.532 ± 0.014 |
| OPT_GaMS-1B-Chat | 0.2355 ± 0.0124 | 0.3960 ± 0.0100 | 0.4398 ± 0.0087 | 0.3459 ± 0.0047 | 0.0011 ± 0.0006 | 0.20 ± 0.0179 | 0.5778 ± 0.0115 | 0.5359 ± 0.014 |
| GaMS-1B-Chat | 0.2517 ± 0.0127 | 0.4394 ± 0.0102 | 0.4502 ± 0.0087 | 0.3634 ± 0.0048 | 0 ± 0 | 0.196 ± 0.0178 | 0.6115 ± 0.0114 | 0.5572 ± 0.014 |
| YugoGPT | 0.2961 ± 0.0133 | 0.4781 ± 0.0102 | 0.3783 ± 0.0085 | 0.3890 ± 0.0047 | 0.0385 ± 0.0032 | 0.226 ± 0.0187 | 0.5816 ± 0.0115 | 0.5588 ± 0.014 |
| SlovenianGPT | **0.3805 ± 0.0142** | **0.6498 ± 0.0098** | 0.4523 ± 0.0087 | **0.4935 ± 0.0050** | **0.0432 ± 0.0034** | **0.27 ± 0.0199** | **0.6937 ± 0.0108** | **0.644 ± 0.0135** |
| SlovenianGPT-Chat* | 0.3567 ± 0.014 | 0.5901 ± 0.0101 | **0.4706 ± 0.0087** | 0.4719 ± 0.0050 | 0.0003 ± 0.0003 | **0.27 ± 0.0199** | 0.6861 ± 0.0108 | 0.6425 ± 0.0135 |
*SlovenianGPT-Chat was obtained by instruction-tuning Aleksa Gordić's [SlovenianGPT](https://huggingface.co/gordicaleksa/SlovenianGPT) on our instruction dataset.
![image/png](https://cdn-uploads.huggingface.co/production/uploads/652d40a78fa1fbb0aae165bb/_2h977RjIu0nI_IJG_9bL.png)
```
@inproceedings{GaMS,
author = {Vre{\v s}, Domen and Bo{\v z}i{\v c}, Martin and Poto{\v c}nik, Alja{\v z} and Martin{\v c}i{\v c}, Toma{\v z} and Robnik-{\v S}ikonja, Marko},
booktitle = {Language Technologies and Digital Humanities Conference},
title = {{Generative Model for Less-Resourced Language with 1 billion parameters}},
url = {https://www.sdjt.si/wp/wp-content/uploads/2024/09/JT-DH-2024_Vres_Bozic_Potocnik_Martincic_Robnik.pdf},
year = {2024}
}
```

32
config.json Normal file
View File

@@ -0,0 +1,32 @@
{
"_name_or_path": "cjvt/GaMS-1B",
"_remove_final_layer_norm": false,
"activation_dropout": 0.0,
"activation_function": "relu",
"architectures": [
"OPTForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 1,
"do_layer_norm_before": true,
"dropout": 0.1,
"enable_bias": true,
"eos_token_id": 2,
"ffn_dim": 8192,
"hidden_size": 2048,
"init_std": 0.02,
"layer_norm_elementwise_affine": true,
"layerdrop": 0.0,
"max_position_embeddings": 2048,
"model_type": "opt",
"num_attention_heads": 32,
"num_hidden_layers": 24,
"pad_token_id": 3,
"sep_token_id": 5,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.38.2",
"use_cache": true,
"vocab_size": 80000,
"word_embed_proj_dim": 2048
}

7
generation_config.json Normal file
View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 1,
"eos_token_id": 2,
"pad_token_id": 3,
"transformers_version": "4.38.2"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:fe8f0f2df9fc9f4c200ed06c2499ddceb8fedf5a9f003a3a7e246eae078999f8
size 4969464112

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:430f1f25fc8c6a2729aac2dd428b9a2a4658014be1034f46d51a29d717eff1fc
size 1192505784

View File

@@ -0,0 +1,396 @@
{
"metadata": {
"total_size": 6161924096
},
"weight_map": {
"lm_head.weight": "model-00002-of-00002.safetensors",
"model.decoder.embed_positions.weight": "model-00001-of-00002.safetensors",
"model.decoder.embed_tokens.weight": "model-00001-of-00002.safetensors",
"model.decoder.final_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.final_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.0.fc1.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.0.fc1.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.0.fc2.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.0.fc2.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.0.final_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.0.final_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.0.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.0.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.0.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.0.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.0.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.0.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.0.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.0.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.0.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.1.fc1.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.1.fc1.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.1.fc2.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.1.fc2.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.1.final_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.1.final_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.1.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.1.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.1.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.1.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.1.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.1.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.1.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.1.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.1.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.10.fc1.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.10.fc1.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.10.fc2.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.10.fc2.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.10.final_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.10.final_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.10.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.10.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.10.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.10.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.10.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.10.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.10.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.10.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.10.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.11.fc1.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.11.fc1.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.11.fc2.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.11.fc2.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.11.final_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.11.final_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.11.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.11.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.11.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.11.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.11.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.11.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.11.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.11.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.11.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.12.fc1.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.12.fc1.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.12.fc2.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.12.fc2.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.12.final_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.12.final_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.12.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.12.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.12.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.12.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.12.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.12.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.12.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.13.fc1.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.13.fc1.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.13.fc2.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.13.fc2.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.13.final_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.13.final_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.13.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.13.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.13.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.13.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.13.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.13.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.13.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.13.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.13.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.14.fc1.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.14.fc1.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.14.fc2.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.14.fc2.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.14.final_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.14.final_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.14.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.14.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.14.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.14.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.14.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.14.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.14.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.14.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.14.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.15.fc1.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.15.fc1.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.15.fc2.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.15.fc2.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.15.final_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.15.final_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.15.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.15.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.15.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.15.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.15.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.15.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.15.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.15.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.15.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.16.fc1.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.16.fc1.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.16.fc2.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.16.fc2.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.16.final_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.16.final_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.16.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.16.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.16.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.16.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.16.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.16.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.16.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.16.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.16.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.17.fc1.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.17.fc1.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.17.fc2.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.17.fc2.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.17.final_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.17.final_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.17.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.17.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.17.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.17.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.17.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.17.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.17.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.17.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.17.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.18.fc1.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.18.fc1.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.18.fc2.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.18.fc2.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.18.final_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.18.final_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.18.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.18.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.18.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.18.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.18.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.18.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.18.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.18.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.18.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.18.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.19.fc1.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.19.fc1.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.19.fc2.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.19.fc2.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.19.final_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.19.final_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.19.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.19.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.19.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.19.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.19.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.19.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.19.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.19.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.19.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.19.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.2.fc1.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.2.fc1.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.2.fc2.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.2.fc2.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.2.final_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.2.final_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.2.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.2.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.2.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.2.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.2.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.2.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.2.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.2.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.2.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.20.fc1.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.20.fc1.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.20.fc2.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.20.fc2.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.20.final_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.20.final_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.20.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.20.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.20.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.20.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.20.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.20.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.20.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.20.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.20.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.20.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.21.fc1.bias": "model-00002-of-00002.safetensors",
"model.decoder.layers.21.fc1.weight": "model-00002-of-00002.safetensors",
"model.decoder.layers.21.fc2.bias": "model-00002-of-00002.safetensors",
"model.decoder.layers.21.fc2.weight": "model-00002-of-00002.safetensors",
"model.decoder.layers.21.final_layer_norm.bias": "model-00002-of-00002.safetensors",
"model.decoder.layers.21.final_layer_norm.weight": "model-00002-of-00002.safetensors",
"model.decoder.layers.21.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.21.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.21.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.21.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.21.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.21.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.21.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.21.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.21.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.21.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.22.fc1.bias": "model-00002-of-00002.safetensors",
"model.decoder.layers.22.fc1.weight": "model-00002-of-00002.safetensors",
"model.decoder.layers.22.fc2.bias": "model-00002-of-00002.safetensors",
"model.decoder.layers.22.fc2.weight": "model-00002-of-00002.safetensors",
"model.decoder.layers.22.final_layer_norm.bias": "model-00002-of-00002.safetensors",
"model.decoder.layers.22.final_layer_norm.weight": "model-00002-of-00002.safetensors",
"model.decoder.layers.22.self_attn.k_proj.bias": "model-00002-of-00002.safetensors",
"model.decoder.layers.22.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.decoder.layers.22.self_attn.out_proj.bias": "model-00002-of-00002.safetensors",
"model.decoder.layers.22.self_attn.out_proj.weight": "model-00002-of-00002.safetensors",
"model.decoder.layers.22.self_attn.q_proj.bias": "model-00002-of-00002.safetensors",
"model.decoder.layers.22.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.decoder.layers.22.self_attn.v_proj.bias": "model-00002-of-00002.safetensors",
"model.decoder.layers.22.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.decoder.layers.22.self_attn_layer_norm.bias": "model-00002-of-00002.safetensors",
"model.decoder.layers.22.self_attn_layer_norm.weight": "model-00002-of-00002.safetensors",
"model.decoder.layers.23.fc1.bias": "model-00002-of-00002.safetensors",
"model.decoder.layers.23.fc1.weight": "model-00002-of-00002.safetensors",
"model.decoder.layers.23.fc2.bias": "model-00002-of-00002.safetensors",
"model.decoder.layers.23.fc2.weight": "model-00002-of-00002.safetensors",
"model.decoder.layers.23.final_layer_norm.bias": "model-00002-of-00002.safetensors",
"model.decoder.layers.23.final_layer_norm.weight": "model-00002-of-00002.safetensors",
"model.decoder.layers.23.self_attn.k_proj.bias": "model-00002-of-00002.safetensors",
"model.decoder.layers.23.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.decoder.layers.23.self_attn.out_proj.bias": "model-00002-of-00002.safetensors",
"model.decoder.layers.23.self_attn.out_proj.weight": "model-00002-of-00002.safetensors",
"model.decoder.layers.23.self_attn.q_proj.bias": "model-00002-of-00002.safetensors",
"model.decoder.layers.23.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.decoder.layers.23.self_attn.v_proj.bias": "model-00002-of-00002.safetensors",
"model.decoder.layers.23.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.decoder.layers.23.self_attn_layer_norm.bias": "model-00002-of-00002.safetensors",
"model.decoder.layers.23.self_attn_layer_norm.weight": "model-00002-of-00002.safetensors",
"model.decoder.layers.3.fc1.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.3.fc1.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.3.fc2.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.3.fc2.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.3.final_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.3.final_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.3.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.3.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.3.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.3.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.3.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.3.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.3.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.3.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.3.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.4.fc1.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.4.fc1.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.4.fc2.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.4.fc2.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.4.final_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.4.final_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.4.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.4.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.4.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.4.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.4.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.4.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.4.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.5.fc1.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.5.fc1.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.5.fc2.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.5.fc2.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.5.final_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.5.final_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.5.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.5.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.5.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.5.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.5.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.5.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.5.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.5.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.5.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.6.fc1.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.6.fc1.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.6.fc2.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.6.fc2.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.6.final_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.6.final_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.6.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.6.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.6.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.6.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.6.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.6.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.6.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.6.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.6.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.7.fc1.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.7.fc1.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.7.fc2.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.7.fc2.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.7.final_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.7.final_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.7.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.7.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.7.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.7.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.7.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.7.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.7.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.7.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.7.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.8.fc1.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.8.fc1.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.8.fc2.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.8.fc2.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.8.final_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.8.final_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.8.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.8.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.8.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.8.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.8.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.8.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.8.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.8.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.8.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.9.fc1.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.9.fc1.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.9.fc2.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.9.fc2.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.9.final_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.9.final_layer_norm.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.9.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.9.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.9.self_attn.out_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.9.self_attn.out_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.9.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.9.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.9.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.9.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.decoder.layers.9.self_attn_layer_norm.bias": "model-00001-of-00002.safetensors",
"model.decoder.layers.9.self_attn_layer_norm.weight": "model-00001-of-00002.safetensors"
}
}

51
special_tokens_map.json Normal file
View File

@@ -0,0 +1,51 @@
{
"bos_token": {
"content": "<s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"cls_token": {
"content": "<cls>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "</s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"mask_token": {
"content": "<mask>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<pad>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"sep_token": {
"content": "<sep>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"unk_token": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

227744
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

76
tokenizer_config.json Normal file
View File

@@ -0,0 +1,76 @@
{
"add_bos_token": false,
"add_eos_token": false,
"added_tokens_decoder": {
"0": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"1": {
"content": "<s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"2": {
"content": "</s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"3": {
"content": "<pad>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": true
},
"4": {
"content": "<cls>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": true
},
"5": {
"content": "<sep>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": true
},
"6": {
"content": "<mask>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": true
}
},
"bos_token": "<s>",
"clean_up_tokenization_spaces": false,
"cls_token": "<cls>",
"eos_token": "</s>",
"legacy": null,
"mask_token": "<mask>",
"model_max_length": 1000000000000000019884624838656,
"pad_token": "<pad>",
"sep_token": "<sep>",
"sp_model_kwargs": {},
"spaces_between_special_tokens": false,
"tokenizer_class": "LlamaTokenizer",
"unk_token": "<unk>",
"use_default_system_prompt": true
}