初始化项目,由ModelHub XC社区提供模型

Model: beomi/Solar-Ko-Recovery-11B
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-22 09:36:17 +08:00
commit a0a68b4068
16 changed files with 157359 additions and 0 deletions

35
.gitattributes vendored Normal file
View File

@@ -0,0 +1,35 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text

163
README.md Normal file
View File

@@ -0,0 +1,163 @@
---
language:
- ko
- en
pipeline_tag: text-generation
inference: false
tags:
- solar
- mistral
- pytorch
- solar-ko
library_name: transformers
license: apache-2.0
base_model: upstage/SOLAR-10.7B-v1.0
---
<img src="https://cdn-uploads.huggingface.co/production/uploads/5e56829137cb5b49818287ea/WuiaS45EAWDurGTOtjR_d.png" style="max-width:250px;margin:0 auto;" />
**Update Log**
- 2024.07.01: Released Solar-Ko-Recovery & Uploaded Benchmark scores
- 2024.05.16: Preview Released Solar-Ko-Recovery
# **Solar-Ko-Recovery-11B** 🌟❤️‍🩹
Solar-Ko-Recovery-11B aimed to recover Solar's capability on Korean with re-arrange of Embeddings and LM head, featuring an expanded vocabulary and the inclusion of a Korean+English corpus for enhanced representation.
## Model Details
**Model Developers:** Junbum Lee (Beomi)
**Variations:** Solar-Ko-Recovery is available with one parameter sizes — 11B(10.99B🤣).
**Input:** The model accepts only text input.
**Output:** The model produces text output exclusively.
**Model Architecture:**
Solar-Ko-Recovery is an auto-regressive language model that leverages an optimized transformer architecture derived from Llama-2.
| |Training Data|Parameters|Content Length|GQA|Tokens|Learning Rate|
|---|---|---|---|---|---|---|
|Solar-Ko-Recovery|*A curated mix of Korean+English Corpora*|11B(10.99B)|4k|O|>100B*|5e<sup>-5</sup>|
> NOTE: 2-step training processed
>
> 1) Only Embedding layer and LM Head layer are trained
> 2) Full params trained
**Vocab Expansion**
Vocab expansion is conducted on edited [upstage/solar-1-mini-tokenizer](https://huggingface.co/upstage/solar-1-mini-tokenizer), which is superset of Solar tokenizer.
| Model Name | Vocabulary Size | Description |
| --- | --- | --- |
| Original Solar | 32000 | Sentencepiece BPE |
| **solar-1-mini-tokenizer** | 64000 | Sentencepiece BPE. Added Ko/JP vocabs |
**Tokenizing "안녕하세요, 오늘은 날씨가 좋네요."**
- SOLAR-10.7B: 26 tokens
- Solar-Ko-Recovery: 7 tokens
| Model | Tokens |
| --- | --- |
| SOLAR-10.7B | `['▁', '안', '<0xEB>', '<0x85>', '<0x95>', '하', '세', '요', ',', '▁', '오', '<0xEB>', '<0x8A>', '<0x98>', '은', '▁', '날', '<0xEC>', '<0x94>', '<0xA8>', '가', '▁', '좋', '네', '요', '.']` |
| Solar-Ko-Recovery | `['▁안녕하세요', ',', '▁오늘은', '▁날씨가', '▁좋', '네요', '.']` |
**Tokenizing "Meet 10.7B Solar: Elevating Performance with Upstage Depth UP Scaling!"**
- SOLAR-10.7B: 22 tokens
- Solar-Ko-Recovery: 22 tokens
| Model | Tokens |
| --- | --- |
| SOLAR-10.7B | `['▁Meet', '▁', '1', '0', '.', '7', 'B', '▁Solar', ':', '▁E', 'lev', 'ating', '▁Performance', '▁with', '▁Up', 'stage', '▁Dep', 'th', '▁UP', '▁Scal', 'ing', '!']` |
| Solar-Ko-Recovery | `['▁Meet', '▁', '1', '0', '.', '7', 'B', '▁Solar', ':', '▁E', 'lev', 'ating', '▁Performance', '▁with', '▁Up', 'stage', '▁Dep', 'th', '▁UP', '▁Scal', 'ing', '!']` |
# LICENSE
Apache 2.0
# **Model Benchmark**
## LM Eval Harness - Korean
- Used EleutherAI's [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)
- 5-shot scores
| Tasks | Metric | Value | | Stderr |
|----------------------------------------------------------|-----------|--------:|---|--------:|
|haerae |acc_norm | 0.7874 |± | 0.0118 |
| - haerae_general_knowledge |acc | 0.5000 |± | 0.0378 |
| - haerae_history |acc | 0.8723 |± | 0.0244 |
| - haerae_loan_word |acc | 0.8402 |± | 0.0283 |
| - haerae_rare_word |acc | 0.8346 |± | 0.0185 |
| - haerae_standard_nomenclature |acc | 0.8301 |± | 0.0305 |
|kmmlu_direct |exact_match| 0.4205 |± | 0.0026 |
| - kmmlu_direct_accounting |exact_match| 0.3700 |± | 0.0485 |
| - kmmlu_direct_agricultural_sciences |exact_match| 0.3140 |± | 0.0147 |
| - kmmlu_direct_aviation_engineering_and_maintenance |exact_match| 0.3870 |± | 0.0154 |
| - kmmlu_direct_biology |exact_match| 0.3510 |± | 0.0151 |
| - kmmlu_direct_chemical_engineering |exact_match| 0.3910 |± | 0.0154 |
| - kmmlu_direct_chemistry |exact_match| 0.4000 |± | 0.0200 |
| - kmmlu_direct_civil_engineering |exact_match| 0.4010 |± | 0.0155 |
| - kmmlu_direct_computer_science |exact_match| 0.6520 |± | 0.0151 |
| - kmmlu_direct_construction |exact_match| 0.3080 |± | 0.0146 |
| - kmmlu_direct_criminal_law |exact_match| 0.3100 |± | 0.0328 |
| - kmmlu_direct_ecology |exact_match| 0.4660 |± | 0.0158 |
| - kmmlu_direct_economics |exact_match| 0.5385 |± | 0.0439 |
| - kmmlu_direct_education |exact_match| 0.6200 |± | 0.0488 |
| - kmmlu_direct_electrical_engineering |exact_match| 0.3000 |± | 0.0145 |
| - kmmlu_direct_electronics_engineering |exact_match| 0.4740 |± | 0.0158 |
| - kmmlu_direct_energy_management |exact_match| 0.3560 |± | 0.0151 |
| - kmmlu_direct_environmental_science |exact_match| 0.2980 |± | 0.0145 |
| - kmmlu_direct_fashion |exact_match| 0.4470 |± | 0.0157 |
| - kmmlu_direct_food_processing |exact_match| 0.3690 |± | 0.0153 |
| - kmmlu_direct_gas_technology_and_engineering |exact_match| 0.3000 |± | 0.0145 |
| - kmmlu_direct_geomatics |exact_match| 0.3820 |± | 0.0154 |
| - kmmlu_direct_health |exact_match| 0.5700 |± | 0.0498 |
| - kmmlu_direct_industrial_engineer |exact_match| 0.3830 |± | 0.0154 |
| - kmmlu_direct_information_technology |exact_match| 0.6090 |± | 0.0154 |
| - kmmlu_direct_interior_architecture_and_design |exact_match| 0.5440 |± | 0.0158 |
| - kmmlu_direct_korean_history |exact_match| 0.3800 |± | 0.0488 |
| - kmmlu_direct_law |exact_match| 0.4670 |± | 0.0158 |
| - kmmlu_direct_machine_design_and_manufacturing |exact_match| 0.3960 |± | 0.0155 |
| - kmmlu_direct_management |exact_match| 0.5030 |± | 0.0158 |
| - kmmlu_direct_maritime_engineering |exact_match| 0.4283 |± | 0.0202 |
| - kmmlu_direct_marketing |exact_match| 0.7460 |± | 0.0138 |
| - kmmlu_direct_materials_engineering |exact_match| 0.4020 |± | 0.0155 |
| - kmmlu_direct_math |exact_match| 0.2867 |± | 0.0262 |
| - kmmlu_direct_mechanical_engineering |exact_match| 0.3490 |± | 0.0151 |
| - kmmlu_direct_nondestructive_testing |exact_match| 0.3760 |± | 0.0153 |
| - kmmlu_direct_patent |exact_match| 0.3700 |± | 0.0485 |
| - kmmlu_direct_political_science_and_sociology |exact_match| 0.5300 |± | 0.0289 |
| - kmmlu_direct_psychology |exact_match| 0.4470 |± | 0.0157 |
| - kmmlu_direct_public_safety |exact_match| 0.3520 |± | 0.0151 |
| - kmmlu_direct_railway_and_automotive_engineering |exact_match| 0.3220 |± | 0.0148 |
| - kmmlu_direct_real_estate |exact_match| 0.4350 |± | 0.0351 |
| - kmmlu_direct_refrigerating_machinery |exact_match| 0.3240 |± | 0.0148 |
| - kmmlu_direct_social_welfare |exact_match| 0.4970 |± | 0.0158 |
| - kmmlu_direct_taxation |exact_match| 0.3800 |± | 0.0344 |
| - kmmlu_direct_telecommunications_and_wireless_technology|exact_match| 0.5480 |± | 0.0157 |
|kobest_boolq |acc | 0.9202 |± | 0.0072 |
| |f1 | 0.9202 |± |N/A |
|kobest_copa |acc | 0.8680 |± | 0.0107 |
| |f1 | 0.8678 |± |N/A |
|kobest_hellaswag |acc | 0.5560 |± | 0.0222 |
| |f1 | 0.5520 |± |N/A |
| |acc_norm | 0.6540 |± | 0.0213 |
|kobest_sentineg |acc | 0.9824 |± | 0.0066 |
| |f1 | 0.9824 |± |N/A |
## Citation
TBD
## Acknowledgements
- Training support was provided by the [TPU Research Cloud](https://sites.research.google/trc/) program.

28
config.json Normal file
View File

@@ -0,0 +1,28 @@
{
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 4096,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 48,
"num_key_value_heads": 8,
"pad_token_id": 2,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.33.1",
"use_cache": true,
"vocab_size": 64000
}

7
generation_config.json Normal file
View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 1,
"eos_token_id": 2,
"pad_token_id": 2,
"transformers_version": "4.33.1"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:fb6affa2927d827bc01ca8590f9462ae1859bdb5fb9ba5f621ef8b2a3089a64a
size 2906740504

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5c65c0b80c8fa0c5b1bc32eaa18bb186041aba581e7f0cc68aa65133034a0ccd
size 2936134664

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8ef7f7ad65c4c11ba9d0e454f8117ad1f7ebba254ce3ad9a4ad6bad29a37cfbf
size 2969688800

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a4dd17d2d38cbe79e6112a8d68998f4436bbfe8bbec478fddf52b87bccec121e
size 2936118096

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5fd8eb4dcbad5c27834e6645ac23a0389c319f76044d48d2afa6dffbe95306e1
size 2936134712

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1654a934b42d88b3f87b19eed46328aa3a913ae1f1d037f77c6c7fff8e84e47b
size 2936134720

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:48b5980e2196f35e2d45f70905c2643196edf927d87a0c269da3c567f5ba1ad9
size 2969688800

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:eb92e8ae16686ad8b03a12c67c5f03c573e83594b24049b283a911f1e3c4d532
size 1396746488

View File

@@ -0,0 +1,442 @@
{
"metadata": {
"total_size": 21987336192
},
"weight_map": {
"lm_head.weight": "model-00008-of-00008.safetensors",
"model.embed_tokens.weight": "model-00001-of-00008.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00008.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00008.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00008.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00008.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.10.input_layernorm.weight": "model-00002-of-00008.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00002-of-00008.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.11.input_layernorm.weight": "model-00002-of-00008.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00008.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.12.input_layernorm.weight": "model-00003-of-00008.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00003-of-00008.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.13.input_layernorm.weight": "model-00003-of-00008.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00003-of-00008.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.14.input_layernorm.weight": "model-00003-of-00008.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00003-of-00008.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.15.input_layernorm.weight": "model-00003-of-00008.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00003-of-00008.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.16.input_layernorm.weight": "model-00003-of-00008.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00003-of-00008.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.17.input_layernorm.weight": "model-00003-of-00008.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00003-of-00008.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.18.input_layernorm.weight": "model-00003-of-00008.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00003-of-00008.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.19.input_layernorm.weight": "model-00004-of-00008.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00004-of-00008.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00008.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00008.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.20.input_layernorm.weight": "model-00004-of-00008.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00004-of-00008.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.21.input_layernorm.weight": "model-00004-of-00008.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00004-of-00008.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.22.input_layernorm.weight": "model-00004-of-00008.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00004-of-00008.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.23.input_layernorm.weight": "model-00004-of-00008.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00004-of-00008.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.24.input_layernorm.weight": "model-00004-of-00008.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00004-of-00008.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.25.input_layernorm.weight": "model-00005-of-00008.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00005-of-00008.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.26.input_layernorm.weight": "model-00005-of-00008.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00005-of-00008.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.27.input_layernorm.weight": "model-00005-of-00008.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00005-of-00008.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.28.input_layernorm.weight": "model-00005-of-00008.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00005-of-00008.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.29.input_layernorm.weight": "model-00005-of-00008.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00005-of-00008.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.3.input_layernorm.weight": "model-00001-of-00008.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00008.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.30.input_layernorm.weight": "model-00005-of-00008.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00005-of-00008.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.31.input_layernorm.weight": "model-00005-of-00008.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00005-of-00008.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.32.input_layernorm.weight": "model-00006-of-00008.safetensors",
"model.layers.32.mlp.down_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.32.mlp.gate_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.32.mlp.up_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.32.post_attention_layernorm.weight": "model-00006-of-00008.safetensors",
"model.layers.32.self_attn.k_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.32.self_attn.o_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.32.self_attn.q_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.32.self_attn.v_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.33.input_layernorm.weight": "model-00006-of-00008.safetensors",
"model.layers.33.mlp.down_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.33.mlp.gate_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.33.mlp.up_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.33.post_attention_layernorm.weight": "model-00006-of-00008.safetensors",
"model.layers.33.self_attn.k_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.33.self_attn.o_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.33.self_attn.q_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.33.self_attn.v_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.34.input_layernorm.weight": "model-00006-of-00008.safetensors",
"model.layers.34.mlp.down_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.34.mlp.gate_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.34.mlp.up_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.34.post_attention_layernorm.weight": "model-00006-of-00008.safetensors",
"model.layers.34.self_attn.k_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.34.self_attn.o_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.34.self_attn.q_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.34.self_attn.v_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.35.input_layernorm.weight": "model-00006-of-00008.safetensors",
"model.layers.35.mlp.down_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.35.mlp.gate_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.35.mlp.up_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.35.post_attention_layernorm.weight": "model-00006-of-00008.safetensors",
"model.layers.35.self_attn.k_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.35.self_attn.o_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.35.self_attn.q_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.35.self_attn.v_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.36.input_layernorm.weight": "model-00006-of-00008.safetensors",
"model.layers.36.mlp.down_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.36.mlp.gate_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.36.mlp.up_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.36.post_attention_layernorm.weight": "model-00006-of-00008.safetensors",
"model.layers.36.self_attn.k_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.36.self_attn.o_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.36.self_attn.q_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.36.self_attn.v_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.37.input_layernorm.weight": "model-00006-of-00008.safetensors",
"model.layers.37.mlp.down_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.37.mlp.gate_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.37.mlp.up_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.37.post_attention_layernorm.weight": "model-00006-of-00008.safetensors",
"model.layers.37.self_attn.k_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.37.self_attn.o_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.37.self_attn.q_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.37.self_attn.v_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.38.input_layernorm.weight": "model-00006-of-00008.safetensors",
"model.layers.38.mlp.down_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.38.mlp.gate_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.38.mlp.up_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.38.post_attention_layernorm.weight": "model-00006-of-00008.safetensors",
"model.layers.38.self_attn.k_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.38.self_attn.o_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.38.self_attn.q_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.38.self_attn.v_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.39.input_layernorm.weight": "model-00007-of-00008.safetensors",
"model.layers.39.mlp.down_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.39.mlp.gate_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.39.mlp.up_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.39.post_attention_layernorm.weight": "model-00007-of-00008.safetensors",
"model.layers.39.self_attn.k_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.39.self_attn.o_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.39.self_attn.q_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.39.self_attn.v_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.4.input_layernorm.weight": "model-00001-of-00008.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00008.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.40.input_layernorm.weight": "model-00007-of-00008.safetensors",
"model.layers.40.mlp.down_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.40.mlp.gate_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.40.mlp.up_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.40.post_attention_layernorm.weight": "model-00007-of-00008.safetensors",
"model.layers.40.self_attn.k_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.40.self_attn.o_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.40.self_attn.q_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.40.self_attn.v_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.41.input_layernorm.weight": "model-00007-of-00008.safetensors",
"model.layers.41.mlp.down_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.41.mlp.gate_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.41.mlp.up_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.41.post_attention_layernorm.weight": "model-00007-of-00008.safetensors",
"model.layers.41.self_attn.k_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.41.self_attn.o_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.41.self_attn.q_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.41.self_attn.v_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.42.input_layernorm.weight": "model-00007-of-00008.safetensors",
"model.layers.42.mlp.down_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.42.mlp.gate_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.42.mlp.up_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.42.post_attention_layernorm.weight": "model-00007-of-00008.safetensors",
"model.layers.42.self_attn.k_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.42.self_attn.o_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.42.self_attn.q_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.42.self_attn.v_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.43.input_layernorm.weight": "model-00007-of-00008.safetensors",
"model.layers.43.mlp.down_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.43.mlp.gate_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.43.mlp.up_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.43.post_attention_layernorm.weight": "model-00007-of-00008.safetensors",
"model.layers.43.self_attn.k_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.43.self_attn.o_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.43.self_attn.q_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.43.self_attn.v_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.44.input_layernorm.weight": "model-00007-of-00008.safetensors",
"model.layers.44.mlp.down_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.44.mlp.gate_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.44.mlp.up_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.44.post_attention_layernorm.weight": "model-00007-of-00008.safetensors",
"model.layers.44.self_attn.k_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.44.self_attn.o_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.44.self_attn.q_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.44.self_attn.v_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.45.input_layernorm.weight": "model-00007-of-00008.safetensors",
"model.layers.45.mlp.down_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.45.mlp.gate_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.45.mlp.up_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.45.post_attention_layernorm.weight": "model-00007-of-00008.safetensors",
"model.layers.45.self_attn.k_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.45.self_attn.o_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.45.self_attn.q_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.45.self_attn.v_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.46.input_layernorm.weight": "model-00008-of-00008.safetensors",
"model.layers.46.mlp.down_proj.weight": "model-00008-of-00008.safetensors",
"model.layers.46.mlp.gate_proj.weight": "model-00008-of-00008.safetensors",
"model.layers.46.mlp.up_proj.weight": "model-00008-of-00008.safetensors",
"model.layers.46.post_attention_layernorm.weight": "model-00008-of-00008.safetensors",
"model.layers.46.self_attn.k_proj.weight": "model-00008-of-00008.safetensors",
"model.layers.46.self_attn.o_proj.weight": "model-00008-of-00008.safetensors",
"model.layers.46.self_attn.q_proj.weight": "model-00008-of-00008.safetensors",
"model.layers.46.self_attn.v_proj.weight": "model-00008-of-00008.safetensors",
"model.layers.47.input_layernorm.weight": "model-00008-of-00008.safetensors",
"model.layers.47.mlp.down_proj.weight": "model-00008-of-00008.safetensors",
"model.layers.47.mlp.gate_proj.weight": "model-00008-of-00008.safetensors",
"model.layers.47.mlp.up_proj.weight": "model-00008-of-00008.safetensors",
"model.layers.47.post_attention_layernorm.weight": "model-00008-of-00008.safetensors",
"model.layers.47.self_attn.k_proj.weight": "model-00008-of-00008.safetensors",
"model.layers.47.self_attn.o_proj.weight": "model-00008-of-00008.safetensors",
"model.layers.47.self_attn.q_proj.weight": "model-00008-of-00008.safetensors",
"model.layers.47.self_attn.v_proj.weight": "model-00008-of-00008.safetensors",
"model.layers.5.input_layernorm.weight": "model-00002-of-00008.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00002-of-00008.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.6.input_layernorm.weight": "model-00002-of-00008.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00002-of-00008.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.7.input_layernorm.weight": "model-00002-of-00008.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00002-of-00008.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.8.input_layernorm.weight": "model-00002-of-00008.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00002-of-00008.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.9.input_layernorm.weight": "model-00002-of-00008.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00002-of-00008.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00002-of-00008.safetensors",
"model.norm.weight": "model-00008-of-00008.safetensors"
}
}

5
special_tokens_map.json Normal file
View File

@@ -0,0 +1,5 @@
{
"bos_token": "<s>",
"eos_token": "</s>",
"unk_token": "<unk>"
}

156602
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

53
tokenizer_config.json Normal file
View File

@@ -0,0 +1,53 @@
{
"add_bos_token": true,
"add_eos_token": false,
"added_tokens_decoder": {
"0": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"1": {
"content": "<s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"2": {
"content": "</s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"63988": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"63989": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
}
},
"bos_token": "<s>",
"clean_up_tokenization_spaces": false,
"eos_token": "</s>",
"model_max_length": 1000000000000000019884624838656,
"tokenizer_class": "LlamaTokenizer",
"unk_token": "<unk>",
"use_default_system_prompt": false
}