初始化项目,由ModelHub XC社区提供模型
Model: KISTI-KONI/KONI-4B-instruct-20250901 Source: Original Platform
This commit is contained in:
36
.gitattributes
vendored
Normal file
36
.gitattributes
vendored
Normal file
@@ -0,0 +1,36 @@
|
|||||||
|
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.model filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||||
|
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||||
133
README.md
Normal file
133
README.md
Normal file
@@ -0,0 +1,133 @@
|
|||||||
|
---
|
||||||
|
license: gemma
|
||||||
|
language:
|
||||||
|
- ko
|
||||||
|
- en
|
||||||
|
tags:
|
||||||
|
- text-generation
|
||||||
|
- pytorch
|
||||||
|
- causal-lm
|
||||||
|
- gemma3
|
||||||
|
- 4b
|
||||||
|
library_name: transformers
|
||||||
|
base_model:
|
||||||
|
- google/gemma-3-4b-pt
|
||||||
|
---
|
||||||
|
|
||||||
|
# KISTI-KONI/KONI-4B-instruct-20250901
|
||||||
|
|
||||||
|
## Model Description
|
||||||
|
|
||||||
|
**KONI (KISTI Open Neural Intelligence)** is a large language model developed by the Korea Institute of Science and Technology Information (KISTI). Designed specifically for the scientific and technological domains, KONI excels in both Korean and English, making it an ideal tool for tasks requiring specialized knowledge in these areas.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Key Features
|
||||||
|
|
||||||
|
- **Bilingual Model**: Supports both Korean and English, with a focus on scientific and technical texts.
|
||||||
|
- **Post-training**: The model undergoes post-training via instruction tuning (IT) and direct preference optimization (DPO) using a filtered, high-quality bilingual dataset that includes scientific data and publicly available resources. This ensures adaptability to evolving scientific and technological content.
|
||||||
|
- **Base Model**: Built upon *KISTI-KONI/KONI-4B-base-20250819*, KONI-4B-instruct undergoes post-training for superior performance on both general and scientific benchmarks.
|
||||||
|
- **Training Environment**: Trained on *24* H200 GPUs at the KISTI supercomputer, optimizing both speed and quality during development.
|
||||||
|
- **Dataset**: Utilizes a high-quality and balanced dataset of 9 billion instruction-following pairs, comprising scientific texts as well as publicly available bilingual data.
|
||||||
|
- **Data Optimization**: The post-training process involved testing a variety of data distributions (balanced, reasoning-enhanced, knowledge-enhanced, minimal Korean settings, etc.) and selecting the optimal combination for training.
|
||||||
|
- **Enhanced Performance**: KONI-4B-instruct, developed through instruction tuning of the KONI-4B-base model, delivers superior performance compared to other similarly-sized models.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Model Performance
|
||||||
|
|
||||||
|
KONI-4B-instruct has demonstrated strong performance on a variety of scientific benchmarks, outperforming several other 4B-sized pretrained models. Here is a comparison of KONI-4B-instruct’s performance across various benchmarks including scientific and technological benchmarks:
|
||||||
|
|
||||||
|
| Rank | Model | KMMLU | KMMLU-Hard | KMMLU-Direct | KoBEST | HAERAE | kormedmcqa | MMLU | ARC_easy | ARC_challenge | Hellaswag | ScholarBench-MC | AidaBench-MC | average |
|
||||||
|
|------|--------------------------------------------------------------|-------|------------|------------|--------|--------|------------|-------|----------|---------------|-----------|-----------------|--------------|---------|
|
||||||
|
| 1 | Qwen/Qwen3-8B | 0.5500 | 0.2900 | 0.5558 | 0.7800 | 0.6700 | 0.3750 | 0.7400 | 0.8700 | 0.6400 | 0.5700 | 0.7094 | 0.7314 | 0.623462 |
|
||||||
|
| 2 | kakaocorp/kanana-1.5-8b-base | 0.4800 | 0.2500 | 0.4872 | 0.6200 | 0.8200 | 0.5910 | 0.6300 | 0.8300 | 0.5600 | 0.6000 | 0.6800 | 0.7548 | 0.608580 |
|
||||||
|
| 3 | LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct | 0.4700 | 0.2300 | 0.4532 | 0.5900 | 0.7800 | 0.5310 | 0.6500 | 0.8300 | 0.5900 | 0.6200 | 0.6900 | 0.7057 | 0.594986 |
|
||||||
|
| 4 | **KISTI-KONI/KONI-4B-instruct-20250901** | **0.4188** | **0.2110** | **0.4194** | **0.7393** | **0.7333** | **0.4719** | **0.5823** | **0.8342** | **0.5452** | **0.5783** | **0.6980** | **0.6274** | **0.571603** |
|
||||||
|
| 5 | kakaocorp/kanana-1.5-2.1b-instruct-2505 | 0.4200 | 0.2100 | 0.4247 | 0.7700 | 0.7900 | 0.5224 | 0.5500 | 0.8000 | 0.5300 | 0.5100 | 0.6630 | 0.6688 | 0.571577 |
|
||||||
|
| 6 | **KISTI-KONI/KONI-4B-base-20250819** | **0.4300** | **0.2100** | **0.4349** | **0.7300** | 0.6600 | **0.4800** | **0.5800** | **0.8200** | **0.5200** | **0.5700** | **0.6800** | **0.6147** | **0.560803** |
|
||||||
|
| 7 | LGAI-EXAONE/EXAONE-3.5-2.4B-Instruct | 0.4300 | 0.2100 | 0.4379 | 0.7400 | 0.6600 | 0.4842 | 0.5900 | 0.7700 | 0.5000 | 0.5400 | 0.6900 | 0.6511 | 0.558603 |
|
||||||
|
| 8 | KISTI-KONI/KONI-Llama3.1-8B-Instruct-20241024 | 0.4000 | 0.2000 | 0.4100 | 0.5600 | 0.6400 | 0.4905 | 0.6300 | 0.8300 | 0.5400 | 0.6100 | 0.6980 | 0.6722 | 0.556725 |
|
||||||
|
| 9 | meta-llama/Llama-3.1-8B-Instruct | 0.4000 | 0.2000 | 0.4119 | 0.7000 | 0.4400 | 0.4789 | 0.6500 | 0.8400 | 0.5400 | 0.6100 | 0.6960 | 0.6709 | 0.553135 |
|
||||||
|
| 10 | google/gemma-3-4b-pt | 0.3980 | 0.1998 | 0.3966 | 0.6990 | 0.6672 | 0.4726 | 0.5964 | 0.8300 | 0.5435 | 0.5763 | 0.6670 | 0.5886 | 0.552906 |
|
||||||
|
| 11 | google/gemma-3-4b-it | 0.3900 | 0.2100 | 0.3904 | 0.7200 | 0.5900 | 0.4400 | 0.5800 | 0.8400 | 0.5600 | 0.5600 | 0.6990 | 0.6013 | 0.548388 |
|
||||||
|
| 12 | saltlux/Ko-Llama3-Luxia-8B | 0.3800 | 0.2100 | 0.3935 | 0.7100 | 0.6800 | 0.4320 | 0.5500 | 0.8000 | 0.4800 | 0.5600 | 0.6650 | 0.6109 | 0.539283 |
|
||||||
|
| 13 | MLP-KTLim/llama-3-Korean-Bllossom-8B | 0.3700 | 0.2200 | 0.3738 | 0.5500 | 0.4700 | 0.4163 | 0.6400 | 0.8400 | 0.5700 | 0.5900 | 0.6525 | 0.5862 | 0.523239 |
|
||||||
|
| 14 | kakaocorp/kanana-1.5-2.1b-base | 0.3900 | 0.2400 | 0.4502 | 0.6200 | 0.5700 | 0.5138 | 0.4700 | 0.7300 | 0.4400 | 0.4500 | 0.6500 | 0.6478 | 0.514315 |
|
||||||
|
| 15 | naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-1.5B | 0.3900 | 0.2400 | 0.3524 | 0.6400 | 0.5700 | 0.3550 | 0.4700 | 0.7300 | 0.4400 | 0.4500 | 0.5950 | 0.5450 | 0.481447 |
|
||||||
|
| 16 | naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B | 0.3700 | 0.2200 | 0.3798 | 0.6200 | 0.5600 | 0.3383 | 0.4400 | 0.7200 | 0.3900 | 0.4100 | 0.5600 | 0.5173 | 0.460449 |
|
||||||
|
| 17 | mistralai/Mistral-7B-v0.3 | 0.3700 | 0.2200 | 0.3739 | 0.6300 | 0.3700 | 0.3735 | 0.6200 | 0.8300 | 0.5500 | 0.6200 | 0.5440 | 0.4257 | 0.413117 |
|
||||||
|
| 18 | google/gemma-3-1b-it | 0.3069 | 0.2400 | 0.2935 | 0.3556 | 0.5987 | 0.2761 | 0.3970 | 0.6620 | 0.3430 | 0.4204 | 0.5720 | 0.3972 | 0.390038 |
|
||||||
|
| 19 | google/gemma-3-1b-pt | 0.2582 | 0.2456 | 0.2556 | 0.5569 | 0.1952 | 0.1964 | 0.2641 | 0.7146 | 0.3541 | 0.4703 | 0.2192 | 0.1980 | 0.327362 |
|
||||||
|
| 20 | etri-lirs/eagle-3b-preview | 0.1600 | 0.2100 | 0.1617 | 0.5100 | 0.1900 | 0.1804 | 0.2500 | 0.5700 | 0.2400 | 0.3700 | 0.2678 | 0.2224 | 0.236846 |
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
As shown, **KISTI-KONI/KONI-4B-instruct-20250901** is the top-performing model in the 4B-size instruction-tuned model category, outperforming *google/gemma-3-4b-it* and *KISTI-KONI/KONI-4B-base-20250819*.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Strengths & Use Cases
|
||||||
|
|
||||||
|
- **Domain-Specific Excellence**: KONI-4B-instruct excels at tasks involving scientific literature, technological content, and complex reasoning. It is ideal for research, academic analysis, and specialized problem-solving.
|
||||||
|
- **Bilingual Advantage**: The model’s bilingual nature enables handling diverse datasets and generating high-quality responses in both English and Korean, especially in bilingual scientific collaborations.
|
||||||
|
- **Benchmark Performance**: KONI-4B-instruct has shown superior performance in benchmarks such as *KMMLU*, *kormedmcqa*, and *ScholarBench-MC*, proving its robustness in knowledge-intensive tasks.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
```sh
|
||||||
|
$ pip install -U transformers
|
||||||
|
```
|
||||||
|
|
||||||
|
```python
|
||||||
|
from transformers import pipeline
|
||||||
|
import torch
|
||||||
|
|
||||||
|
pipe = pipeline("text-generation", model="KISTI-KONI/KONI-4B-instruct-20250901", device="cuda", torch_dtype=torch.bfloat16)
|
||||||
|
|
||||||
|
messages = [
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"role": "system",
|
||||||
|
"content": [{"type": "text", "text": "You are a helpful assistant."},]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"role": "user",
|
||||||
|
"content": [{"type": "text", "text": "슈퍼컴퓨터에 대해서 설명해줘."},]
|
||||||
|
},
|
||||||
|
],
|
||||||
|
]
|
||||||
|
|
||||||
|
output = pipe(
|
||||||
|
messages,
|
||||||
|
max_new_tokens=512,
|
||||||
|
eos_token_id=[pipe.tokenizer.eos_token_id, pipe.tokenizer.convert_tokens_to_ids("<end_of_turn>")]
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
## Citation
|
||||||
|
|
||||||
|
If you use this model in your work, please cite it as follows:
|
||||||
|
|
||||||
|
```bibtex
|
||||||
|
@article{KISTI-KONI/KONI-4B-instruct-20250901,
|
||||||
|
title={KISTI-KONI/KONI-4B-instruct-20250901},
|
||||||
|
author={KISTI},
|
||||||
|
year={2025},
|
||||||
|
url={https://huggingface.co/KISTI-KONI/KONI-4B-instruct-20250901}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Acknowledgements
|
||||||
|
- This research was supported by the Korea Institute of Science and Technology Information (KISTI) in 2025 (No. (KISTI) K25L1M1C1), aimed at developing KONI (KISTI Open Neural Intelligence), a large language model specialized in science and technology.
|
||||||
|
- This work also benefited from the resources and technical support provided by the National Supercomputing Center (KISTI).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## References
|
||||||
|
- https://huggingface.co/KISTI-KONI/KONI-4B-base-20250819
|
||||||
3
added_tokens.json
Normal file
3
added_tokens.json
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
{
|
||||||
|
"<image_soft_token>": 262144
|
||||||
|
}
|
||||||
37
config.json
Normal file
37
config.json
Normal file
@@ -0,0 +1,37 @@
|
|||||||
|
{
|
||||||
|
"architectures": [
|
||||||
|
"Gemma3ForCausalLM"
|
||||||
|
],
|
||||||
|
"attention_bias": false,
|
||||||
|
"attention_dropout": 0.0,
|
||||||
|
"attn_logit_softcapping": null,
|
||||||
|
"bos_token_id": 2,
|
||||||
|
"cache_implementation": "hybrid",
|
||||||
|
"eos_token_id": 1,
|
||||||
|
"final_logit_softcapping": null,
|
||||||
|
"head_dim": 256,
|
||||||
|
"hidden_activation": "gelu_pytorch_tanh",
|
||||||
|
"hidden_size": 2560,
|
||||||
|
"initializer_range": 0.02,
|
||||||
|
"intermediate_size": 10240,
|
||||||
|
"max_position_embeddings": 131072,
|
||||||
|
"model_type": "gemma3_text",
|
||||||
|
"num_attention_heads": 8,
|
||||||
|
"num_hidden_layers": 34,
|
||||||
|
"num_key_value_heads": 4,
|
||||||
|
"pad_token_id": 0,
|
||||||
|
"query_pre_attn_scalar": 256,
|
||||||
|
"rms_norm_eps": 1e-06,
|
||||||
|
"rope_local_base_freq": 10000.0,
|
||||||
|
"rope_scaling": {
|
||||||
|
"factor": 8.0,
|
||||||
|
"rope_type": "linear"
|
||||||
|
},
|
||||||
|
"rope_theta": 1000000.0,
|
||||||
|
"sliding_window": 1024,
|
||||||
|
"sliding_window_pattern": 6,
|
||||||
|
"torch_dtype": "bfloat16",
|
||||||
|
"transformers_version": "4.51.3",
|
||||||
|
"use_cache": false,
|
||||||
|
"vocab_size": 262208
|
||||||
|
}
|
||||||
8
generation_config.json
Normal file
8
generation_config.json
Normal file
@@ -0,0 +1,8 @@
|
|||||||
|
{
|
||||||
|
"_from_model_config": true,
|
||||||
|
"bos_token_id": 2,
|
||||||
|
"cache_implementation": "hybrid",
|
||||||
|
"eos_token_id": 1,
|
||||||
|
"pad_token_id": 0,
|
||||||
|
"transformers_version": "4.51.3"
|
||||||
|
}
|
||||||
3
model-00001-of-00002.safetensors
Normal file
3
model-00001-of-00002.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:f40e98c205111e9af22ed214a5b26cf90829c33e6f05d88af62396f38b5d00a9
|
||||||
|
size 4960531344
|
||||||
3
model-00002-of-00002.safetensors
Normal file
3
model-00002-of-00002.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:21e45df3bd141e6ccd474af1d2db5942235e4604e158e1b9deab033fb6f4be0c
|
||||||
|
size 2800046672
|
||||||
451
model.safetensors.index.json
Normal file
451
model.safetensors.index.json
Normal file
@@ -0,0 +1,451 @@
|
|||||||
|
{
|
||||||
|
"metadata": {
|
||||||
|
"total_size": 7760526336
|
||||||
|
},
|
||||||
|
"weight_map": {
|
||||||
|
"model.embed_tokens.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.0.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.0.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.0.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.0.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.1.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.1.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.1.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.1.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.10.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.10.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.10.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.10.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.10.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.10.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.10.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.10.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.10.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.10.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.11.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.11.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.11.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.11.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.11.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.11.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.11.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.11.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.11.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.11.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.11.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.12.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.12.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.12.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.12.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.12.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.12.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.12.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.12.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.13.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.13.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.13.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.13.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.13.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.13.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.13.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.13.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.13.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.13.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.14.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.14.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.14.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.14.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.14.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.14.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.14.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.14.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.14.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.14.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.14.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.15.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.15.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.15.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.15.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.15.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.15.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.15.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.15.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.15.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.15.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.15.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.16.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.16.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.16.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.16.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.16.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.16.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.16.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.16.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.16.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.16.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.16.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.17.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.17.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.17.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.17.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.17.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.17.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.17.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.17.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.17.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.17.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.17.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.17.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.18.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.18.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.18.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.18.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.18.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.18.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.18.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.18.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.18.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.18.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.18.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.18.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.19.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.19.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.19.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.19.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.19.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.19.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.19.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.19.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.19.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.19.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.19.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.19.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.19.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.2.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.2.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.2.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.2.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.20.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.20.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.20.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.20.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.20.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.20.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.20.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.20.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.20.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.20.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.20.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.20.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.20.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.21.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.21.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.21.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.21.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.21.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.21.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.21.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.21.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.21.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.21.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.21.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.21.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.21.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.22.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.22.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.22.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.22.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.22.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.22.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.22.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.22.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.22.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.22.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.22.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.22.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.22.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.23.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.23.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.23.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.23.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.23.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.23.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.23.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.23.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.23.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.23.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.23.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.23.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.23.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.24.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.24.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.24.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.24.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.24.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.24.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.24.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.24.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.24.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.24.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.24.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.24.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.24.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.25.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.25.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.25.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.25.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.25.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.25.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.25.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.25.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.25.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.25.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.25.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.25.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.25.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.26.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.26.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.26.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.26.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.26.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.26.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.26.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.26.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.26.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.26.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.26.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.26.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.26.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.27.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.27.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.27.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.27.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.27.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.27.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.27.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.27.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.27.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.27.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.27.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.27.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.27.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.28.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.28.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.28.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.28.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.28.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.28.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.28.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.28.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.28.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.28.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.28.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.28.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.28.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.29.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.29.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.29.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.29.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.29.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.29.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.29.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.29.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.29.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.29.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.29.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.29.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.29.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.3.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.3.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.3.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.3.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.30.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.30.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.30.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.30.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.30.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.30.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.30.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.30.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.30.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.30.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.30.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.30.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.30.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.31.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.31.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.31.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.31.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.31.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.31.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.31.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.31.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.31.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.31.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.31.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.31.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.31.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.32.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.32.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.32.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.32.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.32.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.32.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.32.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.32.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.32.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.32.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.32.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.32.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.32.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.33.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.33.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.33.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.33.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.33.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.33.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.33.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.33.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.33.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.33.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.33.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.33.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.33.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||||
|
"model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.4.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.4.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.4.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.4.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.5.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.5.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.5.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.5.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.6.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.6.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.6.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.6.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.7.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.7.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.7.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.7.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.8.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.8.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.8.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.8.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.8.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.8.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.9.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.9.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.9.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.9.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.9.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.9.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.9.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.9.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.9.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.9.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.9.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.layers.9.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||||
|
"model.norm.weight": "model-00002-of-00002.safetensors"
|
||||||
|
}
|
||||||
|
}
|
||||||
3
optimizer.pt
Normal file
3
optimizer.pt
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:d46c2191c3c221dbb25605b811f7baa5d376a37b1fc1ae872bc7638c008ec289
|
||||||
|
size 15521436230
|
||||||
3
rng_state_0.pth
Normal file
3
rng_state_0.pth
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:ad792af33c7cfa8b15298ecc9d976ebdcdeb444ca0e704c7b0657f41ee6547eb
|
||||||
|
size 14512
|
||||||
3
rng_state_1.pth
Normal file
3
rng_state_1.pth
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:722c924fceffd85f8ab1a5445f1ea1e6c502644b6a42e2ff6b5a9a76ea26e1fe
|
||||||
|
size 14512
|
||||||
3
scheduler.pt
Normal file
3
scheduler.pt
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:1ef70ab1184b83350d02aafe832ec2b8861804f69af1f8fdac0c80bf8503b5ca
|
||||||
|
size 1064
|
||||||
33
special_tokens_map.json
Normal file
33
special_tokens_map.json
Normal file
@@ -0,0 +1,33 @@
|
|||||||
|
{
|
||||||
|
"boi_token": "<start_of_image>",
|
||||||
|
"bos_token": {
|
||||||
|
"content": "<bos>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"eoi_token": "<end_of_image>",
|
||||||
|
"eos_token": {
|
||||||
|
"content": "<eos>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"image_token": "<image_soft_token>",
|
||||||
|
"pad_token": {
|
||||||
|
"content": "<pad>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"unk_token": {
|
||||||
|
"content": "<unk>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
}
|
||||||
|
}
|
||||||
3
tokenizer.json
Normal file
3
tokenizer.json
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:4667f2089529e8e7657cfb6d1c19910ae71ff5f28aa7ab2ff2763330affad795
|
||||||
|
size 33384568
|
||||||
3
tokenizer.model
Normal file
3
tokenizer.model
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:1299c11d7cf632ef3b4e11937501358ada021bbdf7c47638d13c0ee982f2e79c
|
||||||
|
size 4689074
|
||||||
51347
tokenizer_config.json
Normal file
51347
tokenizer_config.json
Normal file
File diff suppressed because it is too large
Load Diff
124
trainer_state.json
Normal file
124
trainer_state.json
Normal file
@@ -0,0 +1,124 @@
|
|||||||
|
{
|
||||||
|
"best_global_step": null,
|
||||||
|
"best_metric": null,
|
||||||
|
"best_model_checkpoint": null,
|
||||||
|
"epoch": 0.14222222222222222,
|
||||||
|
"eval_steps": 500,
|
||||||
|
"global_step": 60,
|
||||||
|
"is_hyper_param_search": false,
|
||||||
|
"is_local_process_zero": true,
|
||||||
|
"is_world_process_zero": true,
|
||||||
|
"log_history": [
|
||||||
|
{
|
||||||
|
"epoch": 0.023703703703703703,
|
||||||
|
"grad_norm": 108.5,
|
||||||
|
"learning_rate": 2.0930232558139536e-06,
|
||||||
|
"logits/chosen": 4.719546318054199,
|
||||||
|
"logits/rejected": 4.862860202789307,
|
||||||
|
"logps/chosen": -389.59698486328125,
|
||||||
|
"logps/rejected": -377.825439453125,
|
||||||
|
"loss": 0.6845,
|
||||||
|
"rewards/accuracies": 0.4312500059604645,
|
||||||
|
"rewards/chosen": 0.6139063835144043,
|
||||||
|
"rewards/margins": 0.04724857956171036,
|
||||||
|
"rewards/rejected": 0.5666579008102417,
|
||||||
|
"step": 10
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"epoch": 0.047407407407407405,
|
||||||
|
"grad_norm": 109.0,
|
||||||
|
"learning_rate": 4.418604651162791e-06,
|
||||||
|
"logits/chosen": 4.730769634246826,
|
||||||
|
"logits/rejected": 4.875722408294678,
|
||||||
|
"logps/chosen": -366.2303161621094,
|
||||||
|
"logps/rejected": -381.54351806640625,
|
||||||
|
"loss": 0.603,
|
||||||
|
"rewards/accuracies": 0.637499988079071,
|
||||||
|
"rewards/chosen": 2.2159204483032227,
|
||||||
|
"rewards/margins": 0.5498644113540649,
|
||||||
|
"rewards/rejected": 1.6660559177398682,
|
||||||
|
"step": 20
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"epoch": 0.07111111111111111,
|
||||||
|
"grad_norm": 64.5,
|
||||||
|
"learning_rate": 6.744186046511628e-06,
|
||||||
|
"logits/chosen": 4.784668922424316,
|
||||||
|
"logits/rejected": 4.905774116516113,
|
||||||
|
"logps/chosen": -407.79986572265625,
|
||||||
|
"logps/rejected": -423.7102966308594,
|
||||||
|
"loss": 0.5481,
|
||||||
|
"rewards/accuracies": 0.731249988079071,
|
||||||
|
"rewards/chosen": -0.20952901244163513,
|
||||||
|
"rewards/margins": 0.7444091439247131,
|
||||||
|
"rewards/rejected": -0.9539381265640259,
|
||||||
|
"step": 30
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"epoch": 0.09481481481481481,
|
||||||
|
"grad_norm": 75.5,
|
||||||
|
"learning_rate": 9.069767441860465e-06,
|
||||||
|
"logits/chosen": 4.709995746612549,
|
||||||
|
"logits/rejected": 4.814078330993652,
|
||||||
|
"logps/chosen": -397.01055908203125,
|
||||||
|
"logps/rejected": -400.6348571777344,
|
||||||
|
"loss": 0.4682,
|
||||||
|
"rewards/accuracies": 0.796875,
|
||||||
|
"rewards/chosen": 2.5699944496154785,
|
||||||
|
"rewards/margins": 1.2299325466156006,
|
||||||
|
"rewards/rejected": 1.340061902999878,
|
||||||
|
"step": 40
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"epoch": 0.11851851851851852,
|
||||||
|
"grad_norm": 102.0,
|
||||||
|
"learning_rate": 9.993784606094612e-06,
|
||||||
|
"logits/chosen": 4.691997528076172,
|
||||||
|
"logits/rejected": 4.816564083099365,
|
||||||
|
"logps/chosen": -410.54034423828125,
|
||||||
|
"logps/rejected": -446.1881408691406,
|
||||||
|
"loss": 0.4441,
|
||||||
|
"rewards/accuracies": 0.778124988079071,
|
||||||
|
"rewards/chosen": -0.03334064409136772,
|
||||||
|
"rewards/margins": 1.9787142276763916,
|
||||||
|
"rewards/rejected": -2.012054443359375,
|
||||||
|
"step": 50
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"epoch": 0.14222222222222222,
|
||||||
|
"grad_norm": 96.0,
|
||||||
|
"learning_rate": 9.955857588395065e-06,
|
||||||
|
"logits/chosen": 4.5857696533203125,
|
||||||
|
"logits/rejected": 4.653135299682617,
|
||||||
|
"logps/chosen": -393.78936767578125,
|
||||||
|
"logps/rejected": -445.19342041015625,
|
||||||
|
"loss": 0.4635,
|
||||||
|
"rewards/accuracies": 0.768750011920929,
|
||||||
|
"rewards/chosen": 2.299717426300049,
|
||||||
|
"rewards/margins": 2.069148302078247,
|
||||||
|
"rewards/rejected": 0.23056945204734802,
|
||||||
|
"step": 60
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"logging_steps": 10,
|
||||||
|
"max_steps": 421,
|
||||||
|
"num_input_tokens_seen": 0,
|
||||||
|
"num_train_epochs": 1,
|
||||||
|
"save_steps": 10,
|
||||||
|
"stateful_callbacks": {
|
||||||
|
"TrainerControl": {
|
||||||
|
"args": {
|
||||||
|
"should_epoch_stop": false,
|
||||||
|
"should_evaluate": false,
|
||||||
|
"should_log": false,
|
||||||
|
"should_save": true,
|
||||||
|
"should_training_stop": false
|
||||||
|
},
|
||||||
|
"attributes": {}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"total_flos": 0.0,
|
||||||
|
"train_batch_size": 1,
|
||||||
|
"trial_name": null,
|
||||||
|
"trial_params": null
|
||||||
|
}
|
||||||
3
training_args.bin
Normal file
3
training_args.bin
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:a80be37fcb3b33881679ae2c84e9e2f22baf6c5c0cb6375d93f92dce67ff8f2a
|
||||||
|
size 6712
|
||||||
Reference in New Issue
Block a user