初始化项目,由ModelHub XC社区提供模型

Model: KISTI-KONI/KONI-4B-instruct-20250901
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-06-25 18:07:16 +08:00
commit 24214dcd69
18 changed files with 52199 additions and 0 deletions

36
.gitattributes vendored Normal file
View File

@@ -0,0 +1,36 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text

133
README.md Normal file
View File

@@ -0,0 +1,133 @@
---
license: gemma
language:
- ko
- en
tags:
- text-generation
- pytorch
- causal-lm
- gemma3
- 4b
library_name: transformers
base_model:
- google/gemma-3-4b-pt
---
# KISTI-KONI/KONI-4B-instruct-20250901
## Model Description
**KONI (KISTI Open Neural Intelligence)** is a large language model developed by the Korea Institute of Science and Technology Information (KISTI). Designed specifically for the scientific and technological domains, KONI excels in both Korean and English, making it an ideal tool for tasks requiring specialized knowledge in these areas.
---
## Key Features
- **Bilingual Model**: Supports both Korean and English, with a focus on scientific and technical texts.
- **Post-training**: The model undergoes post-training via instruction tuning (IT) and direct preference optimization (DPO) using a filtered, high-quality bilingual dataset that includes scientific data and publicly available resources. This ensures adaptability to evolving scientific and technological content.
- **Base Model**: Built upon *KISTI-KONI/KONI-4B-base-20250819*, KONI-4B-instruct undergoes post-training for superior performance on both general and scientific benchmarks.
- **Training Environment**: Trained on *24* H200 GPUs at the KISTI supercomputer, optimizing both speed and quality during development.
- **Dataset**: Utilizes a high-quality and balanced dataset of 9 billion instruction-following pairs, comprising scientific texts as well as publicly available bilingual data.
- **Data Optimization**: The post-training process involved testing a variety of data distributions (balanced, reasoning-enhanced, knowledge-enhanced, minimal Korean settings, etc.) and selecting the optimal combination for training.
- **Enhanced Performance**: KONI-4B-instruct, developed through instruction tuning of the KONI-4B-base model, delivers superior performance compared to other similarly-sized models.
---
## Model Performance
KONI-4B-instruct has demonstrated strong performance on a variety of scientific benchmarks, outperforming several other 4B-sized pretrained models. Here is a comparison of KONI-4B-instructs performance across various benchmarks including scientific and technological benchmarks:
| Rank | Model | KMMLU | KMMLU-Hard | KMMLU-Direct | KoBEST | HAERAE | kormedmcqa | MMLU | ARC_easy | ARC_challenge | Hellaswag | ScholarBench-MC | AidaBench-MC | average |
|------|--------------------------------------------------------------|-------|------------|------------|--------|--------|------------|-------|----------|---------------|-----------|-----------------|--------------|---------|
| 1 | Qwen/Qwen3-8B | 0.5500 | 0.2900 | 0.5558 | 0.7800 | 0.6700 | 0.3750 | 0.7400 | 0.8700 | 0.6400 | 0.5700 | 0.7094 | 0.7314 | 0.623462 |
| 2 | kakaocorp/kanana-1.5-8b-base | 0.4800 | 0.2500 | 0.4872 | 0.6200 | 0.8200 | 0.5910 | 0.6300 | 0.8300 | 0.5600 | 0.6000 | 0.6800 | 0.7548 | 0.608580 |
| 3 | LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct | 0.4700 | 0.2300 | 0.4532 | 0.5900 | 0.7800 | 0.5310 | 0.6500 | 0.8300 | 0.5900 | 0.6200 | 0.6900 | 0.7057 | 0.594986 |
| 4 | **KISTI-KONI/KONI-4B-instruct-20250901** | **0.4188** | **0.2110** | **0.4194** | **0.7393** | **0.7333** | **0.4719** | **0.5823** | **0.8342** | **0.5452** | **0.5783** | **0.6980** | **0.6274** | **0.571603** |
| 5 | kakaocorp/kanana-1.5-2.1b-instruct-2505 | 0.4200 | 0.2100 | 0.4247 | 0.7700 | 0.7900 | 0.5224 | 0.5500 | 0.8000 | 0.5300 | 0.5100 | 0.6630 | 0.6688 | 0.571577 |
| 6 | **KISTI-KONI/KONI-4B-base-20250819** | **0.4300** | **0.2100** | **0.4349** | **0.7300** | 0.6600 | **0.4800** | **0.5800** | **0.8200** | **0.5200** | **0.5700** | **0.6800** | **0.6147** | **0.560803** |
| 7 | LGAI-EXAONE/EXAONE-3.5-2.4B-Instruct | 0.4300 | 0.2100 | 0.4379 | 0.7400 | 0.6600 | 0.4842 | 0.5900 | 0.7700 | 0.5000 | 0.5400 | 0.6900 | 0.6511 | 0.558603 |
| 8 | KISTI-KONI/KONI-Llama3.1-8B-Instruct-20241024 | 0.4000 | 0.2000 | 0.4100 | 0.5600 | 0.6400 | 0.4905 | 0.6300 | 0.8300 | 0.5400 | 0.6100 | 0.6980 | 0.6722 | 0.556725 |
| 9 | meta-llama/Llama-3.1-8B-Instruct | 0.4000 | 0.2000 | 0.4119 | 0.7000 | 0.4400 | 0.4789 | 0.6500 | 0.8400 | 0.5400 | 0.6100 | 0.6960 | 0.6709 | 0.553135 |
| 10 | google/gemma-3-4b-pt | 0.3980 | 0.1998 | 0.3966 | 0.6990 | 0.6672 | 0.4726 | 0.5964 | 0.8300 | 0.5435 | 0.5763 | 0.6670 | 0.5886 | 0.552906 |
| 11 | google/gemma-3-4b-it | 0.3900 | 0.2100 | 0.3904 | 0.7200 | 0.5900 | 0.4400 | 0.5800 | 0.8400 | 0.5600 | 0.5600 | 0.6990 | 0.6013 | 0.548388 |
| 12 | saltlux/Ko-Llama3-Luxia-8B | 0.3800 | 0.2100 | 0.3935 | 0.7100 | 0.6800 | 0.4320 | 0.5500 | 0.8000 | 0.4800 | 0.5600 | 0.6650 | 0.6109 | 0.539283 |
| 13 | MLP-KTLim/llama-3-Korean-Bllossom-8B | 0.3700 | 0.2200 | 0.3738 | 0.5500 | 0.4700 | 0.4163 | 0.6400 | 0.8400 | 0.5700 | 0.5900 | 0.6525 | 0.5862 | 0.523239 |
| 14 | kakaocorp/kanana-1.5-2.1b-base | 0.3900 | 0.2400 | 0.4502 | 0.6200 | 0.5700 | 0.5138 | 0.4700 | 0.7300 | 0.4400 | 0.4500 | 0.6500 | 0.6478 | 0.514315 |
| 15 | naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-1.5B | 0.3900 | 0.2400 | 0.3524 | 0.6400 | 0.5700 | 0.3550 | 0.4700 | 0.7300 | 0.4400 | 0.4500 | 0.5950 | 0.5450 | 0.481447 |
| 16 | naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B | 0.3700 | 0.2200 | 0.3798 | 0.6200 | 0.5600 | 0.3383 | 0.4400 | 0.7200 | 0.3900 | 0.4100 | 0.5600 | 0.5173 | 0.460449 |
| 17 | mistralai/Mistral-7B-v0.3 | 0.3700 | 0.2200 | 0.3739 | 0.6300 | 0.3700 | 0.3735 | 0.6200 | 0.8300 | 0.5500 | 0.6200 | 0.5440 | 0.4257 | 0.413117 |
| 18 | google/gemma-3-1b-it | 0.3069 | 0.2400 | 0.2935 | 0.3556 | 0.5987 | 0.2761 | 0.3970 | 0.6620 | 0.3430 | 0.4204 | 0.5720 | 0.3972 | 0.390038 |
| 19 | google/gemma-3-1b-pt | 0.2582 | 0.2456 | 0.2556 | 0.5569 | 0.1952 | 0.1964 | 0.2641 | 0.7146 | 0.3541 | 0.4703 | 0.2192 | 0.1980 | 0.327362 |
| 20 | etri-lirs/eagle-3b-preview | 0.1600 | 0.2100 | 0.1617 | 0.5100 | 0.1900 | 0.1804 | 0.2500 | 0.5700 | 0.2400 | 0.3700 | 0.2678 | 0.2224 | 0.236846 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/67a07b7b89e01818543f9ec8/YARkt9n6sTa9fZdXUA2lV.png)
As shown, **KISTI-KONI/KONI-4B-instruct-20250901** is the top-performing model in the 4B-size instruction-tuned model category, outperforming *google/gemma-3-4b-it* and *KISTI-KONI/KONI-4B-base-20250819*.
---
## Strengths & Use Cases
- **Domain-Specific Excellence**: KONI-4B-instruct excels at tasks involving scientific literature, technological content, and complex reasoning. It is ideal for research, academic analysis, and specialized problem-solving.
- **Bilingual Advantage**: The models bilingual nature enables handling diverse datasets and generating high-quality responses in both English and Korean, especially in bilingual scientific collaborations.
- **Benchmark Performance**: KONI-4B-instruct has shown superior performance in benchmarks such as *KMMLU*, *kormedmcqa*, and *ScholarBench-MC*, proving its robustness in knowledge-intensive tasks.
---
## Usage
```sh
$ pip install -U transformers
```
```python
from transformers import pipeline
import torch
pipe = pipeline("text-generation", model="KISTI-KONI/KONI-4B-instruct-20250901", device="cuda", torch_dtype=torch.bfloat16)
messages = [
[
{
"role": "system",
"content": [{"type": "text", "text": "You are a helpful assistant."},]
},
{
"role": "user",
"content": [{"type": "text", "text": "슈퍼컴퓨터에 대해서 설명해줘."},]
},
],
]
output = pipe(
messages,
max_new_tokens=512,
eos_token_id=[pipe.tokenizer.eos_token_id, pipe.tokenizer.convert_tokens_to_ids("<end_of_turn>")]
)
```
## Citation
If you use this model in your work, please cite it as follows:
```bibtex
@article{KISTI-KONI/KONI-4B-instruct-20250901,
title={KISTI-KONI/KONI-4B-instruct-20250901},
author={KISTI},
year={2025},
url={https://huggingface.co/KISTI-KONI/KONI-4B-instruct-20250901}
}
```
---
## Acknowledgements
- This research was supported by the Korea Institute of Science and Technology Information (KISTI) in 2025 (No. (KISTI) K25L1M1C1), aimed at developing KONI (KISTI Open Neural Intelligence), a large language model specialized in science and technology.
- This work also benefited from the resources and technical support provided by the National Supercomputing Center (KISTI).
---
## References
- https://huggingface.co/KISTI-KONI/KONI-4B-base-20250819

3
added_tokens.json Normal file
View File

@@ -0,0 +1,3 @@
{
"<image_soft_token>": 262144
}

37
config.json Normal file
View File

@@ -0,0 +1,37 @@
{
"architectures": [
"Gemma3ForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"attn_logit_softcapping": null,
"bos_token_id": 2,
"cache_implementation": "hybrid",
"eos_token_id": 1,
"final_logit_softcapping": null,
"head_dim": 256,
"hidden_activation": "gelu_pytorch_tanh",
"hidden_size": 2560,
"initializer_range": 0.02,
"intermediate_size": 10240,
"max_position_embeddings": 131072,
"model_type": "gemma3_text",
"num_attention_heads": 8,
"num_hidden_layers": 34,
"num_key_value_heads": 4,
"pad_token_id": 0,
"query_pre_attn_scalar": 256,
"rms_norm_eps": 1e-06,
"rope_local_base_freq": 10000.0,
"rope_scaling": {
"factor": 8.0,
"rope_type": "linear"
},
"rope_theta": 1000000.0,
"sliding_window": 1024,
"sliding_window_pattern": 6,
"torch_dtype": "bfloat16",
"transformers_version": "4.51.3",
"use_cache": false,
"vocab_size": 262208
}

8
generation_config.json Normal file
View File

@@ -0,0 +1,8 @@
{
"_from_model_config": true,
"bos_token_id": 2,
"cache_implementation": "hybrid",
"eos_token_id": 1,
"pad_token_id": 0,
"transformers_version": "4.51.3"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f40e98c205111e9af22ed214a5b26cf90829c33e6f05d88af62396f38b5d00a9
size 4960531344

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:21e45df3bd141e6ccd474af1d2db5942235e4604e158e1b9deab033fb6f4be0c
size 2800046672

View File

@@ -0,0 +1,451 @@
{
"metadata": {
"total_size": 7760526336
},
"weight_map": {
"model.embed_tokens.weight": "model-00001-of-00002.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.0.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.0.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.0.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.0.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.1.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.1.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.1.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.1.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.10.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.10.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.10.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.10.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.11.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.11.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.11.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.11.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.12.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.12.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.12.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.12.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.13.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.13.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.13.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.13.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.14.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.14.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.14.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.14.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.15.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.15.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.15.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.15.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.16.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.16.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.16.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.16.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.17.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.17.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.17.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.17.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.17.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.18.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.18.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.18.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.18.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.19.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.19.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.19.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.19.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.19.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.2.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.2.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.2.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.2.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.20.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.20.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.20.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.20.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.20.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.21.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.21.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.21.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.21.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.21.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.22.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.22.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.22.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.22.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.22.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.23.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.23.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.23.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.23.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.23.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.24.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.24.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.24.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.24.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.24.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.25.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.25.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.25.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.25.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.25.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.26.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.26.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.26.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.26.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.26.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.27.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.27.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.27.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.27.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.27.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.28.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.28.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.28.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.28.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.28.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.29.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.29.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.29.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.29.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.29.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.3.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.3.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.3.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.3.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.30.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.30.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.30.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.30.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.30.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.31.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.31.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.31.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.31.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.31.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.32.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.32.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.32.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.32.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.32.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.32.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.32.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.32.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.32.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.32.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.32.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.32.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.32.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.33.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.33.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.33.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.33.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.33.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.33.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.33.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.33.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.33.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.33.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.33.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.33.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.33.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.4.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.4.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.4.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.4.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.5.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.5.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.5.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.5.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.6.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.6.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.6.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.6.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.7.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.7.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.7.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.7.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.8.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.8.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.8.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.8.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.9.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.9.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.9.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.9.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.norm.weight": "model-00002-of-00002.safetensors"
}
}

3
optimizer.pt Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d46c2191c3c221dbb25605b811f7baa5d376a37b1fc1ae872bc7638c008ec289
size 15521436230

3
rng_state_0.pth Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ad792af33c7cfa8b15298ecc9d976ebdcdeb444ca0e704c7b0657f41ee6547eb
size 14512

3
rng_state_1.pth Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:722c924fceffd85f8ab1a5445f1ea1e6c502644b6a42e2ff6b5a9a76ea26e1fe
size 14512

3
scheduler.pt Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1ef70ab1184b83350d02aafe832ec2b8861804f69af1f8fdac0c80bf8503b5ca
size 1064

33
special_tokens_map.json Normal file
View File

@@ -0,0 +1,33 @@
{
"boi_token": "<start_of_image>",
"bos_token": {
"content": "<bos>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eoi_token": "<end_of_image>",
"eos_token": {
"content": "<eos>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"image_token": "<image_soft_token>",
"pad_token": {
"content": "<pad>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"unk_token": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

3
tokenizer.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4667f2089529e8e7657cfb6d1c19910ae71ff5f28aa7ab2ff2763330affad795
size 33384568

3
tokenizer.model Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1299c11d7cf632ef3b4e11937501358ada021bbdf7c47638d13c0ee982f2e79c
size 4689074

51347
tokenizer_config.json Normal file

File diff suppressed because it is too large Load Diff

124
trainer_state.json Normal file
View File

@@ -0,0 +1,124 @@
{
"best_global_step": null,
"best_metric": null,
"best_model_checkpoint": null,
"epoch": 0.14222222222222222,
"eval_steps": 500,
"global_step": 60,
"is_hyper_param_search": false,
"is_local_process_zero": true,
"is_world_process_zero": true,
"log_history": [
{
"epoch": 0.023703703703703703,
"grad_norm": 108.5,
"learning_rate": 2.0930232558139536e-06,
"logits/chosen": 4.719546318054199,
"logits/rejected": 4.862860202789307,
"logps/chosen": -389.59698486328125,
"logps/rejected": -377.825439453125,
"loss": 0.6845,
"rewards/accuracies": 0.4312500059604645,
"rewards/chosen": 0.6139063835144043,
"rewards/margins": 0.04724857956171036,
"rewards/rejected": 0.5666579008102417,
"step": 10
},
{
"epoch": 0.047407407407407405,
"grad_norm": 109.0,
"learning_rate": 4.418604651162791e-06,
"logits/chosen": 4.730769634246826,
"logits/rejected": 4.875722408294678,
"logps/chosen": -366.2303161621094,
"logps/rejected": -381.54351806640625,
"loss": 0.603,
"rewards/accuracies": 0.637499988079071,
"rewards/chosen": 2.2159204483032227,
"rewards/margins": 0.5498644113540649,
"rewards/rejected": 1.6660559177398682,
"step": 20
},
{
"epoch": 0.07111111111111111,
"grad_norm": 64.5,
"learning_rate": 6.744186046511628e-06,
"logits/chosen": 4.784668922424316,
"logits/rejected": 4.905774116516113,
"logps/chosen": -407.79986572265625,
"logps/rejected": -423.7102966308594,
"loss": 0.5481,
"rewards/accuracies": 0.731249988079071,
"rewards/chosen": -0.20952901244163513,
"rewards/margins": 0.7444091439247131,
"rewards/rejected": -0.9539381265640259,
"step": 30
},
{
"epoch": 0.09481481481481481,
"grad_norm": 75.5,
"learning_rate": 9.069767441860465e-06,
"logits/chosen": 4.709995746612549,
"logits/rejected": 4.814078330993652,
"logps/chosen": -397.01055908203125,
"logps/rejected": -400.6348571777344,
"loss": 0.4682,
"rewards/accuracies": 0.796875,
"rewards/chosen": 2.5699944496154785,
"rewards/margins": 1.2299325466156006,
"rewards/rejected": 1.340061902999878,
"step": 40
},
{
"epoch": 0.11851851851851852,
"grad_norm": 102.0,
"learning_rate": 9.993784606094612e-06,
"logits/chosen": 4.691997528076172,
"logits/rejected": 4.816564083099365,
"logps/chosen": -410.54034423828125,
"logps/rejected": -446.1881408691406,
"loss": 0.4441,
"rewards/accuracies": 0.778124988079071,
"rewards/chosen": -0.03334064409136772,
"rewards/margins": 1.9787142276763916,
"rewards/rejected": -2.012054443359375,
"step": 50
},
{
"epoch": 0.14222222222222222,
"grad_norm": 96.0,
"learning_rate": 9.955857588395065e-06,
"logits/chosen": 4.5857696533203125,
"logits/rejected": 4.653135299682617,
"logps/chosen": -393.78936767578125,
"logps/rejected": -445.19342041015625,
"loss": 0.4635,
"rewards/accuracies": 0.768750011920929,
"rewards/chosen": 2.299717426300049,
"rewards/margins": 2.069148302078247,
"rewards/rejected": 0.23056945204734802,
"step": 60
}
],
"logging_steps": 10,
"max_steps": 421,
"num_input_tokens_seen": 0,
"num_train_epochs": 1,
"save_steps": 10,
"stateful_callbacks": {
"TrainerControl": {
"args": {
"should_epoch_stop": false,
"should_evaluate": false,
"should_log": false,
"should_save": true,
"should_training_stop": false
},
"attributes": {}
}
},
"total_flos": 0.0,
"train_batch_size": 1,
"trial_name": null,
"trial_params": null
}

3
training_args.bin Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a80be37fcb3b33881679ae2c84e9e2f22baf6c5c0cb6375d93f92dce67ff8f2a
size 6712