初始化项目,由ModelHub XC社区提供模型

Model: orai-nlp/Gemma-Kimu-9b-it
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-06-17 13:31:17 +08:00
commit 170bd0cb64
14 changed files with 2768 additions and 0 deletions

36
.gitattributes vendored Normal file
View File

@@ -0,0 +1,36 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text

105
README.md Normal file
View File

@@ -0,0 +1,105 @@
---
base_model: google/gemma-2-9b
datasets:
- orai-nlp/ZelaiHandi
- HuggingFaceFW/fineweb
language:
- eu
library_name: transformers
pipeline_tag: text-generation
license: gemma
---
Gemma-Kimu-9B-Instruct v1.0 is an instruction large language model (LLM) tailored specifically for the Basque language built from Google's Gemma-2-9b foundational and Gemma-2-9b instruct models, The used approach decouples language adaptation from post-training alignment by first continually pre-training the foundational LLM on a modest amount of monolingual target-language data while anchoring on English replay, and then injecting instruction-following capabilities via delta-based weight merging from the instructed counterpart of the base LLM.
We first continually pre-train the base LLM on monolingual data in Basque to improve its linguistic capacity. Then, instead of post-training from scratch, we merge the post-training delta into the language-adapted model via weight merging. This simple yet effective method allows us to transfer not only instruction-following capabilities, but also human preference alignment.
Evaluations show that Gemma-Kimu-9b-it exhibits notable improvements over Gemma-2-9b-it in Basque in instruction following, safety, and linguistic correctness.
Want to test this model in a real setting? Join the waitlist:
[PLAYGROUND](https://kimu.orai.eus)
# Training Data
For continual pre-training, we leveraged a combination of Basque and English data to enhance linguistic performance in Basque while maintaining general English capabilities. The goal is to improve cross-lingual transfer by retaining the model's proficiency in English.
ZelaiHandi [ZelaiHandi dataset](https://huggingface.co/datasets/orai-nlp/ZelaiHandi) (San Vicente et al., 2024): ZelaiHandi is the largest collection of freely licensed and high-quality Basque texts gathered from selected web sources. This collection comprises approximately 521 million words which correspond to 1.5 billion tokens (Llama 3.1 tokenizer).
[FineWeb dataset](https://huggingface.co/datasets/HuggingFaceFW/fineweb) (Penedo et al., 2024): FineWeb consists of more than 15T tokens of cleaned and deduplicated English web data from CommonCrawl. We selected a random subset of around 300 million tokens (Llama 3.1 tokenizer)
# Evaluation
To evaluate the instruction-following capabilities of our models in Basque, we use the NoRobotsEU benchmark (Corral et al., 2025), a manually translated subset of the original NoRobots test set. It consists of 100 Basque instructions, each paired with its English counterpart, spanning 9 diverse categories.
<div class="tb tb-eu-l10">
| Model | Instruct follow. EU | Instruct follow. EN |
|------------------------------|---------------------|---------------------|
| Gemma-2-2b-it | 7 | 71 |
| **Gemma-Kimu-2b-it** | **48** | **60** |
| Gemma-2-9b-it | 57 | 86 |
| **Gemma-Kimu-9b-it** | **71** | **82** |
</div>
Additional evaluation results across linguistic proficiency and safety are included in (Sarasua et al., 2025).
# Usage with the `pipeline` API
Instrall the transformers library with:
```sh
pip install -U transformers
```
Then, copy the following snippet, replace the content of the user message with your prompt and run it!
```python
import torch
from transformers import pipeline
pipe = pipeline(
"text-generation",
model="orai-nlp/Gemma-Kimu-9b-it",
device_map="auto",
dtype=torch.bfloat16
)
messages = [
{"role": "user", "content": "Kaixo! Ba al dakizu euskaraz?"}
]
output = pipe(messages, max_new_tokens=128)
response = output[0]["generated_text"][-1]["content"].strip()
print(response)
# Bai, euskaraz dakit! Kaixo! 👋
#
# Zer moduz? Zer egin dezaket zugatik? 😊
# *Euskaraz hitz egin nahi baduzu, aurrera! Nik ulertzen dut eta erantzungo dizut.*
```
# License
This model is derived from Gemma 2 and is licensed under the Gemma License. Copyright © Google DeepMind. All Rights Reserved.
# Acknowledgments
This work is part of the BasqueLLM project, titled "bi-SLM: Optimization of Industrial Processes through Bilingual SLMs" (EXP: 2025-CIE4-000048-01), partially funded by the Guipuzcoa Science, Technology and Innovation Network Program of the Provincial Council of Gipuzkoa. Model training and development were conducted using the Hyperion system at the Donostia International Physics Center (DIPC).
# Citation
If you use this model please cite the following reference:
```bibtex
@inproceedings{sarasua2025,
title={DIPLomA: Efficient Adaptation of Instructed LLMs to Low-Resource Languages via Post-Training Delta Merging},
author={Sarasua, Ixak and Corral, Ander and Saralegi, Xabier},
booktitle={Findings of the Association for Computational Linguistics: EMNLP 2025},
year={2025}
}
```
# Contact
- Ixak Sarasua (i.sarasua@orai.eus)
- Ander Corral (a.corral@orai.eus)
- Xabier Saralegi (x.saralegi@orai.eus)

4
chat_template.jinja Normal file
View File

@@ -0,0 +1,4 @@
{{ bos_token }}{% if messages[0]['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if (message['role'] == 'assistant') %}{% set role = 'model' %}{% else %}{% set role = message['role'] %}{% endif %}{{ '<start_of_turn>' + role + '
' + message['content'] | trim + '<end_of_turn>
' }}{% endfor %}{% if add_generation_prompt %}{{'<start_of_turn>model
'}}{% endif %}

78
config.json Normal file
View File

@@ -0,0 +1,78 @@
{
"architectures": [
"Gemma2ForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"attn_logit_softcapping": 50.0,
"bos_token_id": 2,
"cache_implementation": "hybrid",
"dtype": "bfloat16",
"eos_token_id": 1,
"final_logit_softcapping": 30.0,
"head_dim": 256,
"hidden_act": "gelu_pytorch_tanh",
"hidden_activation": "gelu_pytorch_tanh",
"hidden_size": 3584,
"initializer_range": 0.02,
"intermediate_size": 14336,
"layer_types": [
"sliding_attention",
"full_attention",
"sliding_attention",
"full_attention",
"sliding_attention",
"full_attention",
"sliding_attention",
"full_attention",
"sliding_attention",
"full_attention",
"sliding_attention",
"full_attention",
"sliding_attention",
"full_attention",
"sliding_attention",
"full_attention",
"sliding_attention",
"full_attention",
"sliding_attention",
"full_attention",
"sliding_attention",
"full_attention",
"sliding_attention",
"full_attention",
"sliding_attention",
"full_attention",
"sliding_attention",
"full_attention",
"sliding_attention",
"full_attention",
"sliding_attention",
"full_attention",
"sliding_attention",
"full_attention",
"sliding_attention",
"full_attention",
"sliding_attention",
"full_attention",
"sliding_attention",
"full_attention",
"sliding_attention",
"full_attention"
],
"max_position_embeddings": 8192,
"model_type": "gemma2",
"num_attention_heads": 16,
"num_hidden_layers": 42,
"num_key_value_heads": 8,
"pad_token_id": 0,
"pretraining_tp": 1,
"query_pre_attn_scalar": 256,
"rms_norm_eps": 1e-06,
"rope_theta": 10000.0,
"sliding_window": 4096,
"sliding_window_size": 4096,
"transformers_version": "4.56.1",
"use_cache": false,
"vocab_size": 256000
}

8
generation_config.json Normal file
View File

@@ -0,0 +1,8 @@
{
"_from_model_config": true,
"bos_token_id": 2,
"cache_implementation": "hybrid",
"eos_token_id": 1,
"pad_token_id": 0,
"transformers_version": "4.56.1"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:511dc2b7799144118163c550f9b6db2b05d92b1e5e805cd9555597f162a10432
size 4903351912

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5d48d91a96d9462071f9949b77df046dc3373400796477f38c7b7af5c426ebce
size 4947570872

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d8c6ce7d44de339da31b2a45c031eaa72c4840bfa364422c5c06c1fee0199f1e
size 4962221464

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ce1d2007f466afb52b1c9fe97db3cfb65f31e527ed4eaf1f5381a41fd853681b
size 3670322200

View File

@@ -0,0 +1,472 @@
{
"metadata": {
"total_parameters": 9241705984,
"total_size": 18483411968
},
"weight_map": {
"model.embed_tokens.weight": "model-00001-of-00004.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.0.post_feedforward_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.0.pre_feedforward_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.1.post_feedforward_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.1.pre_feedforward_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.10.post_feedforward_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.10.pre_feedforward_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.11.post_feedforward_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.11.pre_feedforward_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.12.post_feedforward_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.12.pre_feedforward_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.13.post_feedforward_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.13.pre_feedforward_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.14.post_feedforward_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.14.pre_feedforward_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.15.post_feedforward_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.15.pre_feedforward_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.16.post_feedforward_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.16.pre_feedforward_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.17.post_feedforward_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.17.pre_feedforward_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.18.post_feedforward_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.18.pre_feedforward_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.19.post_feedforward_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.19.pre_feedforward_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.2.post_feedforward_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.2.pre_feedforward_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.20.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.20.post_feedforward_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.20.pre_feedforward_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.21.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.21.post_feedforward_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.21.pre_feedforward_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.22.post_feedforward_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.22.pre_feedforward_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.23.post_feedforward_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.23.pre_feedforward_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.24.post_feedforward_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.24.pre_feedforward_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.25.post_feedforward_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.25.pre_feedforward_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.26.post_feedforward_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.26.pre_feedforward_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.27.post_feedforward_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.27.pre_feedforward_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.28.post_feedforward_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.28.pre_feedforward_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.29.post_feedforward_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.29.pre_feedforward_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.3.post_feedforward_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.3.pre_feedforward_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.30.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.30.post_feedforward_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.30.pre_feedforward_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.31.post_feedforward_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.31.pre_feedforward_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.32.input_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.32.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.32.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.32.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.32.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.32.post_feedforward_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.32.pre_feedforward_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.32.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.32.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.32.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.32.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.33.input_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.33.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.33.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.33.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.33.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.33.post_feedforward_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.33.pre_feedforward_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.33.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.33.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.33.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.33.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.34.input_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.34.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.34.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.34.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.34.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.34.post_feedforward_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.34.pre_feedforward_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.34.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.34.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.34.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.34.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.35.input_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.35.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.35.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.35.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.35.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.35.post_feedforward_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.35.pre_feedforward_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.35.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.35.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.35.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.35.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.36.input_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.36.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.36.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.36.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.36.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.36.post_feedforward_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.36.pre_feedforward_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.36.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.36.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.36.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.36.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.37.input_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.37.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.37.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.37.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.37.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.37.post_feedforward_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.37.pre_feedforward_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.37.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.37.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.37.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.37.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.38.input_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.38.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.38.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.38.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.38.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.38.post_feedforward_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.38.pre_feedforward_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.38.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.38.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.38.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.38.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.39.input_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.39.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.39.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.39.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.39.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.39.post_feedforward_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.39.pre_feedforward_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.39.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.39.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.39.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.39.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.4.post_feedforward_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.4.pre_feedforward_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.40.input_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.40.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.40.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.40.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.40.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.40.post_feedforward_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.40.pre_feedforward_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.40.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.40.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.40.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.40.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.41.input_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.41.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.41.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.41.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.41.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.41.post_feedforward_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.41.pre_feedforward_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.41.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.41.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.41.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.41.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.5.post_feedforward_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.5.pre_feedforward_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.6.post_feedforward_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.6.pre_feedforward_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.7.post_feedforward_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.7.pre_feedforward_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.8.post_feedforward_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.8.pre_feedforward_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.9.post_feedforward_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.9.pre_feedforward_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.norm.weight": "model-00004-of-00004.safetensors"
}
}

34
special_tokens_map.json Normal file
View File

@@ -0,0 +1,34 @@
{
"additional_special_tokens": [
"<start_of_turn>",
"<end_of_turn>"
],
"bos_token": {
"content": "<bos>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<eos>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<pad>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"unk_token": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

3
tokenizer.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5f7eee611703c5ce5d1eee32d9cdcfe465647b8aff0c1dfb3bed7ad7dbb05060
size 34362873

3
tokenizer.model Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:61a7b147390c64585d6c3543dd6fc636906c9af3865a5548f27f31aee1d4c8e2
size 4241003

2013
tokenizer_config.json Normal file

File diff suppressed because it is too large Load Diff