初始化项目,由ModelHub XC社区提供模型

Model: KOREAson/KO-REAson-KL3_1-8B-0831
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-06-02 03:06:14 +08:00
commit 6be47631cd
17 changed files with 2867 additions and 0 deletions

56
.gitattributes vendored Normal file
View File

@@ -0,0 +1,56 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bin.* filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zstandard filter=lfs diff=lfs merge=lfs -text
*.tfevents* filter=lfs diff=lfs merge=lfs -text
*.db* filter=lfs diff=lfs merge=lfs -text
*.ark* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.gguf* filter=lfs diff=lfs merge=lfs -text
*.ggml filter=lfs diff=lfs merge=lfs -text
*.llamafile* filter=lfs diff=lfs merge=lfs -text
*.pt2 filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
model-00002-of-00007.safetensors filter=lfs diff=lfs merge=lfs -text
model-00001-of-00007.safetensors filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text
model-00006-of-00007.safetensors filter=lfs diff=lfs merge=lfs -text
model-00007-of-00007.safetensors filter=lfs diff=lfs merge=lfs -text
model-00003-of-00007.safetensors filter=lfs diff=lfs merge=lfs -text
model-00005-of-00007.safetensors filter=lfs diff=lfs merge=lfs -text
model-00004-of-00007.safetensors filter=lfs diff=lfs merge=lfs -text

242
README.md Normal file
View File

@@ -0,0 +1,242 @@
---
library_name: transformers
tags: []
---
# KO-REAson
**KO-REAson** is a series of Korean-centric reasoning language models developed in collaboration with [OneLineAI](https://onelineai.com/), [KISTI-KONI](https://huggingface.co/KISTI-KONI), [HAE-RAE](https://huggingface.co/HAERAE-HUB) and ORACLE.
We use the **Language-Mixed Chain-of-Thought (CoT)** approach, which allows the model to alternate between English and Korean during the “Think” stage of reasoning, preserving key Korean terms while leveraging English for logical scaffolding.
Top-performing models of our series [KO-REAson-AX3_1-7B-0831 (KONI-7B-R-20250831)](https://huggingface.co/KISTI-KONI/KONI-7B-R-20250831) and [KO-REAson-7B-Q2_5-0831](https://huggingface.co/KoReason/KO-REASon-7B-Q2_5-0831) show performance comparable to models trained on closed-source datasets such as Exaone-Deep-7.8B.
<p align="left">
<img src="https://cdn-uploads.huggingface.co/production/uploads/60d3e619b8448e1785bbda2a/uqrKdxbQEqAFknYBmuH7Y.png"
alt="Model Comparison" width="750"/>
<br>
<em style="display:inline-block; max-width:750px; text-align:cener; white-space:normal; word-wrap:break-word; line-height:1.5;">
<b>Left:</b> Average performance (Held-out-Ko) of open models trained on closed or open data;
our models are highlighted in green.
</em>
</p>
## Model Details
The **KO-REAson-0831** family comes in six variants based on the base model used.
| Model (link) | Base | Notes |
| -------------------------------------------------------------------------------------------- | -------------------- | --------------------------- |
| [KO-REAson-L3_1-8B-0831](https://huggingface.co/KoReason/KO-REASon-L3_1-8B-0831) | [Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | `L3_1` → Llama-3.1-8B |
| [KO-REAson-KL3_1-8B-0831](https://huggingface.co/KOREAson/KO-REAson-KL3_1-8B-0831) | [Koni-Llama-3.1-8B](https://huggingface.co/KISTI-KONI/KONI-Llama3.1-8B-Instruct-20241024) | `KL3_1` → Koni-Llama-3.1-8B; also called [KONI-Llama3.1-8B-R-20250831](https://huggingface.co/KISTI-KONI/KONI-Llama3.1-8B-R-20250831) |
| [KO-REAson-G3-4B-0831](https://huggingface.co/KoReason/KO-REASon-G3-4B-0831) | [Gemma-3 4B](https://huggingface.co/google/gemma-3-4b-it) | `G3` → Gemma-3-4B |
| [KO-REAson-AX3_1-7B-0831](https://huggingface.co/KOREAson/KO-REAson-7B-AX3_1-0831) | [A.X.-3.1-Light (≈7B)](https://huggingface.co/skt/A.X-3.1-Light) | `AX3_1` → A.X.-3.1-Light; also called [KONI-7B-R-20250831](https://huggingface.co/KISTI-KONI/KONI-7B-R-20250831) |
| [KO-REAson-K2505_8B-0831](https://huggingface.co/KoReason/KO-REASon-K2505_8B-0831) | [Kanana-2505 (8B)](https://huggingface.co/kakaocorp/kanana-1.5-8b-instruct-2505) | `K2505` → Kanana-2505 |
| [KO-REAson-7B-Q2_5-0831](https://huggingface.co/KoReason/KO-REASon-7B-Q2_5-0831) | [Qwen-2.5 (7B)](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) | `Q2_5` → Qwen-2.5 |
# Performance
**Evaluation Datasets**
The model's performance was evaluated across a total of 11 benchmarks, and the evaluation suite is divided into two parts: (You can check these benchmarks in [HAERAE-HUB/KoSimpleEval](https://huggingface.co/datasets/HAERAE-HUB/KoSimpleEval))
- **Held-in**: This set of benchmarks is used for routine monitoring of the model's performance during the training and ablation study phases.
- **Held-out**: This set is used only once to evaluate the final model after all training and ablations are complete.
This separation is designed to prevent inadvertent overfitting to the benchmarks during the iterative training process and to provide a more accurate measure of the model's generalization capabilities.
|**Category**|**Held-in**|**Held-out**|
|---|---|---|
|**General Knowledge**|KMMLU-Redux|KMMLU-HARD, KMMLU-Pro|
|**Reasoning**|MCLM|KSM, GPQA, AIME2024, AIME2025|
|**Korean-specific**|HAE-RAE Bench|CLIcK, KoBALT-700|
**Comparison with models trained on public datasets**
<table>
<thead>
<tr>
<th>Models</th>
<th># Instances</th>
<th>Methodology</th>
<th>Held-Out (Ko)</th>
<th>Held-Out (En)</th>
<th>Total</th>
</tr>
</thead>
<tbody>
<tr>
<th>KO-REASon-AX3_1-7B-0831(KONI-7B-R-20250831; Ours)</th>
<td>260k</td>
<td>SFT</td>
<td><b>44.6</b></td>
<td>41.2</td>
<td><u>43.3</u></td>
</tr>
<tr>
<th>KO-REASon-7B-Q2_5-0831(Ours)</th>
<td>260k</td>
<td>SFT</td>
<td><b>45.10</b></td>
<td>38.75</td>
<td><u>49.95</u></td>
</tr>
<tr>
<th>KO-REAson-KL3_1-8B-0831(KONI-Llama3.1-8B-R-20250831)</th>
<td>260k</td>
<td>SFT</td>
<td>40.13</td>
<td>30.57</td>
<td>43.66</td>
</tr>
<tr>
<td colspan="6" style="text-align:center; font-weight:bold;">Open Recipe (En)</td>
</tr>
<tr>
<th>OpenThinker3-7B</th>
<td>1.2M</td>
<td>SFT</td>
<td>33.6</td>
<td><b>55.5</b></td>
<td>41.8</td>
</tr>
<tr>
<th>s1.1-7B</th>
<td>1k</td>
<td>SFT</td>
<td>35.6</td>
<td>23.4</td>
<td>31.1</td>
</tr>
<tr>
<th>Llama-3.1-Nemotron-Nano-8B-v1</th>
<td>&gt;3M</td>
<td>SFT &amp; RL</td>
<td>27.0</td>
<td>44.1</td>
<td>33.4</td>
</tr>
<tr>
<td colspan="6" style="text-align:center; font-weight:bold;">Open Recipe (Ko)</td>
</tr>
<tr>
<th>Ko-R1-14B</th>
<td>45k</td>
<td>SFT</td>
<td><u>43.7</u></td>
<td><u>46.3</u></td>
<td><b>44.7</b></td>
</tr>
<tr>
<th>Ko-R1-7B</th>
<td>45k</td>
<td>SFT</td>
<td>27.3</td>
<td>36.1</td>
<td>30.6</td>
</tr>
<tr>
<th>LLaMa-3.1-Ko-Reasoning-8B</th>
<td>63k</td>
<td>SFT</td>
<td>17.7</td>
<td>7.7</td>
<td>14.0</td>
</tr>
</tbody>
</table>
**Held-out benchmark performance**
<table border="1" cellspacing="0" cellpadding="6">
<thead>
<tr>
<th rowspan="2">Model</th>
<th rowspan="2">Model Size</th>
<th colspan="2">General</th>
<th colspan="4">Reasoning</th>
<th colspan="2">Korean-Specific</th>
<th rowspan="2">Average<br>(Held-out)</th>
<th rowspan="2">Average<br>(Held-out-Ko)</th>
</tr>
<tr>
<th>KMMLU-HARD</th>
<th>KMMLU-Pro</th>
<th>KSM</th>
<th>AIME 2024</th>
<th>AIME 2025</th>
<th>GPQA</th>
<th>CLIcK</th>
<th>KoBALT-700</th>
</tr>
</thead>
<tbody>
<tr>
<td><b>Llama-3.1-Nemotron-Nano-8B</b></td>
<td>8.03</td><td>21.47</td><td>22.89</td><td>47.06</td><td>56.67</td><td>43.33</td><td>32.32</td><td>34.54</td><td>9.29</td><td>33.45</td><td>27.05</td>
</tr>
<tr>
<td><b>Llama-3.1-Korean-Reasoning-8B-Instruct</b></td>
<td>8.03</td><td>14.91</td><td>21.72</td><td>6.09</td><td>0.00</td><td>0.00</td><td>23.23</td><td>39.65</td><td>6.14</td><td>13.97</td><td>17.70</td>
</tr>
<tr>
<td><b>EXAONE-Deep-7.8B</b></td>
<td>7.82</td><td><u>40.96</u></td><td>37.35</td><td><b>70.80</b></td><td><b>70.00</b></td><td><b>63.33</b></td><td><b>64.65</b></td><td>54.24</td><td>18.86</td><td><b>52.52</b></td><td>44.44</td>
</tr>
<tr>
<td><b>DeepSeek-R1-Distill-Qwen-7B</b></td>
<td>7.62</td><td>0.00</td><td>23.00</td><td>56.09</td><td>60.00</td><td>40.00</td><td>43.43</td><td>0.00</td><td>8.29</td><td>28.85</td><td>17.48</td>
</tr>
<tr>
<td><b>DeepSeek-R1-Distill-Llama-8B</b></td>
<td>8.03</td><td>23.22</td><td>26.26</td><td>29.97</td><td>33.33</td><td>20.00</td><td><U>46.46</u></td><td>39.05</td><td>13.29</td><td>28.95</td><td>26.36</td>
</tr>
<tr>
<td><b>s1.1-7B</b></td>
<td>7.62</td><td>31.16</td><td><u>37.70</u></td><td>30.60</td><td>16.67</td><td>23.33</td><td>30.30</td><td><u>56.84</u></td><td><u>21.86</u></td><td>31.06</td><td>35.63</td>
</tr>
<tr>
<td><b>OpenThinker3-7B</b></td>
<td>7.62</td><td>30.31</td><td>26.26</td><td><u>63.59</u></td><td><u>66.67</u></td><td><u>53.33</u></td><td><u>46.46</u></td><td>47.69</td><td>10.14</td><td>35.63</td><td>30.60</td>
</tr>
<tr>
<td><b>Ko-R1-7B</b></td>
<td>7.61</td><td>28.46</td><td>19.31</td><td>51.61</td><td>46.67</td><td>33.33</td><td>28.28</td><td>32.48</td><td>4.71</td><td>30.61</td><td>27.31</td>
</tr>
<tr>
<td><b>KO-REAson-KL3_1-8B-0831(KONI-Llama3.1-8B-R-20250831)</b></td>
<td>8.03</td><td>44.64</td><td>40.08</td><td>37.96</td><td>23.33</td><td>30.00</td><td>38.38</td><td>56.39</td><td>21.57</td><td>30.57</td><td>40.13</td>
</tr>
<tr>
<td><b>KO-REASon-AX3_1-7B-0831 (KONI-7B-R-20250831)</b></td>
<td>7.26</td><td>45.57</td><td>38.13</td><td>52.80</td><td>53.33</td><td>33.33</td><td>36.87</td><td><b>62.86</b></td><td>23.43</td><td><u>43.29</u></td><td><u>44.56</u></td>
</tr>
<tr>
<td><b>KO-REASon-7B-Q2_5-0831</b></td>
<td>7.26</td><td><b>46.81</b></td><td><b>44.93</b></td><td>48.11</td><td>43.33</td><td>30.00</td><td>42.93</td><td>60.65</td><td><b>25.00</b></td><td>42.72</td><td><b>45.10</b></td>
</tr>
</tbody>
</table>
## Citation
```
The paper will be released soon!
```
## Contact
For any questions contact us via the following email :)
```
spthsrbwls123@yonsei.ac.kr
```
## Acknowlegments
This research was supported by the Korea Institute of Science and Technology Information (KISTI) (No.(KISTI) K25L1M1C1), aimed at developing KONI (KISTI Open Neural Intelligence), a large language model specialized in science and technology.

114
chat_template.jinja Normal file
View File

@@ -0,0 +1,114 @@
{{- bos_token }}
{%- if custom_tools is defined %}
{%- set tools = custom_tools %}
{%- endif %}
{%- if not tools_in_user_message is defined %}
{%- set tools_in_user_message = true %}
{%- endif %}
{%- if not date_string is defined %}
{%- set date_string = "27 Aug 2024" %}
{%- endif %}
{%- if not tools is defined %}
{%- set tools = none %}
{%- endif %}
{#- This block extracts the system message, so we can slot it into the right place. #}
{%- if messages[0]['role'] == 'system' %}
{%- set system_message = messages[0]['content']|trim %}
{%- set messages = messages[1:] %}
{%- else %}
{%- set system_message = "" %}
{%- endif %}
{#- System message + builtin tools #}
{{- "<|start_header_id|>system<|end_header_id|>\n\n" }}
{%- if builtin_tools is defined or tools is not none %}
{{- "Environment: ipython\n" }}
{%- endif %}
{%- if builtin_tools is defined %}
{{- "Tools: " + builtin_tools | reject('equalto', 'code_interpreter') | join(", ") + "\n\n"}}
{%- endif %}
{{- "You are KONI, an AI assistant trained based on LlaMA3.1 and created by KISTI to be helpful and honest. Your knowledge spans a wide range of topics, allowing you to engage in substantive conversations and provide analysis on complex subjects. Below is an instruction that describes a task. Write a response that appropriately completes the request. If you don't know the answer, just say that you don't know.\n" }}
{{- "Cutting Knowledge Date: July 2024\n" }}
{{- "Today Date: " + date_string + "\n\n" }}
{%- if tools is not none and not tools_in_user_message %}
{{- "You have access to the following functions. To call a function, please respond with JSON for a function call." }}
{{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
{{- "Do not use variables.\n\n" }}
{%- for t in tools %}
{{- t | tojson(indent=4) }}
{{- "\n\n" }}
{%- endfor %}
{%- endif %}
{{- system_message }}
{{- "<|eot_id|>" }}
{#- Custom tools are passed in a user message with some extra guidance #}
{%- if tools_in_user_message and not tools is none %}
{#- Extract the first user message so we can plug it in here #}
{%- if messages | length != 0 %}
{%- set first_user_message = messages[0]['content']|trim %}
{%- set messages = messages[1:] %}
{%- else %}
{{- raise_exception("Cannot put tools in the first user message when there's no first user message!") }}
{%- endif %}
{{- '<|start_header_id|>user<|end_header_id|>\n\n' -}}
{{- "Given the following functions, please respond with a JSON for a function call " }}
{{- "with its proper arguments that best answers the given prompt.\n\n" }}
{{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
{{- "Do not use variables.\n\n" }}
{%- for t in tools %}
{{- t | tojson(indent=4) }}
{{- "\n\n" }}
{%- endfor %}
{{- first_user_message + "<|eot_id|>"}}
{%- endif %}
{%- for message in messages %}
{%- if not (message.role == 'ipython' or message.role == 'tool' or 'tool_calls' in message) %}
{%- if (message.role == 'assistant') %}
{{- '<|start_header_id|>' + 'KONI' + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' }}
{%- else %}
{{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' }}
{%- endif %}
{%- elif 'tool_calls' in message %}
{%- if not message.tool_calls|length == 1 %}
{{- raise_exception("This model only supports single tool-calls at once!") }}
{%- endif %}
{%- set tool_call = message.tool_calls[0].function %}
{%- if builtin_tools is defined and tool_call.name in builtin_tools %}
{{- '<|start_header_id|>KONI<|end_header_id|>\n\n' -}}
{{- "<|python_tag|>" + tool_call.name + ".call(" }}
{%- for arg_name, arg_val in tool_call.arguments | items %}
{{- arg_name + '="' + arg_val + '"' }}
{%- if not loop.last %}
{{- ", " }}
{%- endif %}
{%- endfor %}
{{- ")" }}
{%- else %}
{{- '<|start_header_id|>KONI<|end_header_id|>\n\n' -}}
{{- '{"name": "' + tool_call.name + '", ' }}
{{- '"parameters": ' }}
{{- tool_call.arguments | tojson }}
{{- "}" }}
{%- endif %}
{%- if builtin_tools is defined %}
{#- This means we're in ipython mode #}
{{- "<|eom_id|>" }}
{%- else %}
{{- "<|eot_id|>" }}
{%- endif %}
{%- elif message.role == "tool" or message.role == "ipython" %}
{{- "<|start_header_id|>ipython<|end_header_id|>\n\n" }}
{%- if message.content is mapping or message.content is iterable %}
{{- message.content | tojson }}
{%- else %}
{{- message.content }}
{%- endif %}
{{- "<|eot_id|>" }}
{%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
{{- '<|start_header_id|>KONI<|end_header_id|>\n\n' }}
{%- endif %}

35
config.json Normal file
View File

@@ -0,0 +1,35 @@
{
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128001,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 131072,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": {
"factor": 8.0,
"high_freq_factor": 4.0,
"low_freq_factor": 1.0,
"original_max_position_embeddings": 8192,
"rope_type": "llama3"
},
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.55.4",
"use_cache": true,
"vocab_size": 128256
}

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}

6
generation_config.json Normal file
View File

@@ -0,0 +1,6 @@
{
"_from_model_config": true,
"bos_token_id": 128000,
"eos_token_id": 128001,
"transformers_version": "4.55.4"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:726490927ea06607b0a1a13994b0662fdc883a3dadad212061465818f9e033a0
size 4886466168

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8d3e6917f461fde3a97ccaaa578b474f07f70ff6c58a342261e48baa826362a5
size 4832007448

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:3ce97c72935073ab6a7c50bc2b320155df80f46251ef8bf088a30bc37a92a091
size 4999813112

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c5965898aa4f442cad9543fe58e3d8334f11f5e35caade889e3daa92eb6abfd0
size 4999813128

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c0a1362a790ac81a63a2b91a81b1d7d88c7efc747c9df26e5c5bf9ed4493dda4
size 4832007496

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6f11514d1a474e726b76f17b83a29a61a21b3e49527288daaf264ee4d35b0fa6
size 4999813120

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:bae7dbb1f8bfe3c3229babb7c2747d5f0d27003612a3328ab95e627d964a0f34
size 2571158184

View File

@@ -0,0 +1,299 @@
{
"metadata": {
"total_parameters": 8030261248,
"total_size": 32121044992
},
"weight_map": {
"lm_head.weight": "model-00007-of-00007.safetensors",
"model.embed_tokens.weight": "model-00001-of-00007.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.10.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.15.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.20.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.21.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.22.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.26.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.27.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.28.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.3.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.30.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.input_layernorm.weight": "model-00007-of-00007.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.4.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.9.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.norm.weight": "model-00007-of-00007.safetensors"
}
}

23
special_tokens_map.json Normal file
View File

@@ -0,0 +1,23 @@
{
"bos_token": {
"content": "<|begin_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<|eot_id|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<|finetune_right_pad_id|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

BIN
tokenizer.json (Stored with Git LFS) Normal file

Binary file not shown.

2067
tokenizer_config.json Normal file

File diff suppressed because it is too large Load Diff