初始化项目,由ModelHub XC社区提供模型

Model: nvidia/AceReason-Nemotron-14B
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-06-06 08:18:43 +08:00
commit 21c9ab1f0d
17 changed files with 1216 additions and 0 deletions

37
.gitattributes vendored Normal file
View File

@@ -0,0 +1,37 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
*.png filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text

146
README.md Normal file
View File

@@ -0,0 +1,146 @@
---
library_name: transformers
license: other
license_name: nvidia-open-model-license
license_link: >-
https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
pipeline_tag: text-generation
language:
- en
tags:
- nvidia
- reasoning
- math
- code
- reinforcement learning
- pytorch
---
# AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning
<p align="center">
[![Technical Report](https://img.shields.io/badge/2505.16400-Technical_Report-blue)](https://arxiv.org/abs/2505.16400)
[![Dataset](https://img.shields.io/badge/🤗-Math_RL_Datset-blue)](https://huggingface.co/datasets/nvidia/AceReason-Math)
[![Models](https://img.shields.io/badge/🤗-Models-blue)](https://huggingface.co/collections/nvidia/acereason-682f4e1261dc22f697fd1485)
[![Eval Toolkit](https://img.shields.io/badge/🤗-Eval_Code-blue)](https://huggingface.co/nvidia/AceReason-Nemotron-14B/blob/main/README_EVALUATION.md)
</p>
<img src="fig/main_fig.png" alt="main_fig" style="width: 600px; max-width: 100%;" />
## 🔥News
- **6/16/2025**: We are excited to share our new release combining SFT with RL: **AceReason-Nemotron-1.1-7B**
- Paper: https://arxiv.org/pdf/2506.13284
- Model: https://huggingface.co/nvidia/AceReason-Nemotron-1.1-7B
- 4M SFT Data: https://huggingface.co/datasets/nvidia/AceReason-1.1-SFT
- **6/11/2025**: We share our evaluation toolkit at [AceReason Evalution](https://huggingface.co/nvidia/AceReason-Nemotron-14B/blob/main/README_EVALUATION.md) including:
- scripts to run inference and scoring
- LiveCodeBench (avg@8): model prediction files and scores for each month (2023/5-2025/5)
- AIME24/25 (avg@64): model prediction files and scores
- **6/2/2025**: We are excited to share our Math RL training dataset at [AceReason-Math](https://huggingface.co/datasets/nvidia/AceReason-Math)
We're thrilled to introduce AceReason-Nemotron-14B, a math and code reasoning model trained entirely through reinforcement learning (RL), starting from the DeepSeek-R1-Distilled-Qwen-14B. It delivers impressive results, achieving 78.6% on AIME 2024 (+8.9%), 67.4% on AIME 2025 (+17.4%), 61.1% on LiveCodeBench v5 (+8%), 54.9% on LiveCodeBench v6 (+7%), and 2024 on Codeforces (+543). We systematically study the RL training process through extensive ablations and propose a simple yet effective approach: first RL training on math-only prompts, then RL training on code-only prompts. Notably, we find that math-only RL not only significantly enhances the performance of strong distilled models on math benchmarks, but also code reasoning tasks. In addition, extended code-only RL further improves code benchmark performance while causing minimal degradation in math results. We find that RL not only elicits the foundational reasoning capabilities acquired during pre-training and supervised fine-tuning (e.g., distillation), but also pushes the limits of the model's reasoning ability, enabling it to solve problems that were previously unsolvable.
We share our training recipe, training logs in our [technical report](https://arxiv.org/abs/2505.16400).
## Results
We evaluate our model against competitive reasoning models of comparable size within Qwen2.5 and Llama3.1 model family on AIME 2024, AIME 2025, LiveCodeBench v5 (2024/08/01 - 2025/02/01), and LiveCodeBench v6 (2025/02/01-2025/05/01). More evaluation results can be found in our [technical report](https://arxiv.org/abs/2505.16400).
| **Model** | **AIME 2024<br>(avg@64)** | **AIME 2025<br>(avg@64)** | **LCB v5<br>(avg@8)** | **LCB v6<br>(avg@8)** |
| :---: | :---: | :---: | :---: | :---: |
| <small>QwQ-32B</small> | 79.5 | 65.8 | 63.4 | - |
| <small>DeepSeek-R1-671B</small> | 79.8 | 70.0 | 65.9 | - |
| <small>Llama-Nemotron-Ultra-253B</small> | 80.8 | 72.5 | 66.3 | - |
| <small>o3-mini (medium)</small> | 79.6 | 76.7 | 67.4 | - |
| <small>Light-R1-14B</small> | 74 | 60.2 | 57.9 | 51.5 |
| <small>DeepCoder-14B (32K Inference)</small> | 71 | 56.1 | 57.9 | 50.4 |
| <small>OpenMath-Nemotron-14B</small> | 76.3 | 63.0 | - | - |
| <small>OpenCodeReasoning-Nemotron-14B</small> | - | - | 59.4 | 54.1 |
| <small>Llama-Nemotron-Super-49B-v1</small> | 67.5 | 60.0 | 45.5 | - |
| <small>DeepSeek-R1-Distilled-Qwen-14B</small> | 69.7 | 50.2 | 53.1 | 47.9 |
| <small>DeepSeek-R1-Distilled-Qwen-32B</small> | 72.6 | 54.9 | 57.2 | - |
| [AceReason-Nemotron-7B 🤗](https://huggingface.co/nvidia/AceReason-Nemotron-7B)| 69.0 | 53.6 | 51.8 | 44.1 |
| [AceReason-Nemotron-14B 🤗](https://huggingface.co/nvidia/AceReason-Nemotron-14B)| 78.6 | 67.4 | 61.1 | 54.9 |
## How to use
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = 'nvidia/AceReason-Nemotron-14B'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
prompt = "Jen enters a lottery by picking $4$ distinct numbers from $S=\\{1,2,3,\\cdots,9,10\\}.$ $4$ numbers are randomly chosen from $S.$ She wins a prize if at least two of her numbers were $2$ of the randomly chosen numbers, and wins the grand prize if all four of her numbers were the randomly chosen numbers. The probability of her winning the grand prize given that she won a prize is $\\tfrac{m}{n}$ where $m$ and $n$ are relatively prime positive integers. Find $m+n$."
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to("cuda")
generated_ids = model.generate(
**model_inputs,
max_new_tokens=32768,
temperature=0.6,
top_p=0.95
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```
## Usage Recommendations
1. Don't include a system prompt; instead, place all instructions directly in the user prompt.
2. We recommend using the following instruction for math questions: Please reason step by step, and put your final answer within \\boxed{}.
3. We recommend using the following instruction for code questions:
```python
question = "" # code question
starter_code = "" # starter code function header
code_instruction_nostartercode = """Write Python code to solve the problem. Please place the solution code in the following format:\n```python\n# Your solution code here\n```"""
code_instruction_hasstartercode = """Please place the solution code in the following format:\n```python\n# Your solution code here\n```"""
if starter_code != "":
question += "\n\n" + "Solve the problem starting with the provided function header.\n\nFunction header:\n" + "```\n" + starter_code + "\n```"
question += "\n\n" + code_instruction_hasstartercode
else:
question += "\n\n" + code_instruction_nostartercode
final_prompt = "<User>" + question + "<Assistant><think>\n"
```
4. Our inference engine for evaluation is **vLLM==0.7.3** using top-p=0.95, temperature=0.6, max_tokens=32768.
## Evaluation Toolkit
Please check evaluation code, scripts, cached prediction files in https://huggingface.co/nvidia/AceReason-Nemotron-14B/blob/main/README_EVALUATION.md
## Correspondence to
Yang Chen (yachen@nvidia.com), Zhuolin Yang (zhuoliny@nvidia.com), Zihan Liu (zihanl@nvidia.com), Chankyu Lee (chankyul@nvidia.com), Wei Ping (wping@nvidia.com)
## License
Your use of this model is governed by the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
## Citation
```
@article{chen2025acereason,
title={AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning},
author={Chen, Yang and Yang, Zhuolin and Liu, Zihan and Lee, Chankyu and Xu, Peng and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
journal={arXiv preprint arXiv:2505.16400},
year={2025}
}
```

166
README_EVALUATION.md Normal file
View File

@@ -0,0 +1,166 @@
# AceReason Evaluation Toolkit
We share our evaluation script and code in https://huggingface.co/nvidia/AceReason-Nemotron-14B/blob/main/evaluation.tar.gz
## Environment
- vllm==0.7.3
- torch==2.5.1
- transformers==4.48.2
- 8x NVIDIA H100 80GB HBM3 (CUDA Version: 12.8)
### Dataset Download
LiveCodeBench:
```
from datasets import load_dataset
ds = load_dataset(
"livecodebench/code_generation_lite",
version_tag="release_v6",
)["test"]
ds.to_json("data/livecodebench_problems.json", orient="records", lines=False)
```
Math: see data/*
## Evaluation Script
For model generation on single seed, please use the following command:
```
bash generate_livecodebench.sh ${model_path} ${seed} ${output_path} ${model_type}
bash generate_aime.sh ${model_path} ${seed} aime24 ${output_path} ${model_type}
bash generate_aime.sh ${model_path} ${seed} aime25 ${output_path} ${model_type}
```
Please specify model_type as r1 for AceReason-Nemotron-1.0 models, and qwen for AceReason-Nemotron-1.1 models.
Or you can use our configured seeds to reproduce our results on AIME 24/25 (avg@64) and LiveCodeBench v5/v6 (avg@8) as follows:
```
bash run_livecodebench.sh ${model_path} ${output_path}
bash run_aime.sh ${model_path} ${output_path}
```
For benchmark evaluation, we provide the following evaluation command to reproduce our results:
```
python evaluate_livecodebench.py -g ${output_path}
python evaluate_aime.py --modelfolder ${output_path} --test_data data/aime24.jsonl
python evaluate_aime.py --modelfolder ${output_path} --test_data data/aime25.jsonl
```
## Reference Results
We also left our generations into cache.tar.gz as references.
```
LiveCodeBench AceReason-Nemotron-1.0-7B (Avg@8)
=================================================================
Months Corrects Total Accuracy
2023-05 180 272 66.17647058823529
2023-06 238 312 76.28205128205128
2023-07 337 432 78.00925925925925
2023-08 185 288 64.23611111111111
2023-09 275 352 78.125
2023-10 257 352 73.01136363636364
2023-11 217 280 77.5
2023-12 228 320 71.25
2024-01 193 288 67.01388888888889
2024-02 169 256 66.015625
2024-03 234 360 65.0
2024-04 226 296 76.35135135135135
2024-05 211 288 73.26388888888889
05/23-05/24 2950 4096 72.021484375
2024-06 277 368 75.27173913043478
2024-07 223 344 64.82558139534883
2024-08 275 528 52.083333333333336
2024-09 204 376 54.255319148936174
2024-10 209 424 49.29245283018868
2024-11 216 456 47.36842105263158
2024-12 223 392 56.88775510204081
2025-01 161 408 39.46078431372549
06/24-01/25 1788 3296 54.24757281553398
2025-02 179 408 43.872549019607845
2025-03 258 544 47.4264705882353
2025-04 38 96 39.583333333333336
v5 1142 2232 51.16487455197132
v6 621 1400 44.357142857142854
LiveCodeBench AceReason-Nemotron-1.0-14B (Avg@8)
=================================================================
Months Corrects Total Accuracy
2023-05 211 272 77.57352941176471
2023-06 282 312 90.38461538461539
2023-07 393 432 90.97222222222223
2023-08 219 288 76.04166666666667
2023-09 315 352 89.48863636363636
2023-10 294 352 83.52272727272727
2023-11 229 280 81.78571428571429
2023-12 263 320 82.1875
2024-01 219 288 76.04166666666667
2024-02 201 256 78.515625
2024-03 296 360 82.22222222222223
2024-04 252 296 85.13513513513513
2024-05 233 288 80.90277777777777
05/23-05/24 3407 4096 83.1787109375
2024-06 311 368 84.51086956521739
2024-07 248 344 72.09302325581395
2024-08 299 528 56.628787878787875
2024-09 232 376 61.702127659574465
2024-10 266 424 62.735849056603776
2024-11 282 456 61.8421052631579
2024-12 253 392 64.54081632653062
2025-01 217 408 53.18627450980392
06/24-01/25 2108 3296 63.95631067961165
2025-02 211 408 51.71568627450981
2025-03 324 544 59.55882352941177
2025-04 41 96 42.708333333333336
v5 1350 2232 60.483870967741936
v6 775 1400 55.357142857142854
LiveCodeBench AceReason-Nemotron-1.1-7B (Avg@8)
=================================================================
Months Corrects Total Accuracy
2023-05 205 272 75.36764705882354
2023-06 255 312 81.73076923076923
2023-07 356 432 82.4074074074074
2023-08 208 288 72.22222222222223
2023-09 287 352 81.5340909090909
2023-10 278 352 78.97727272727273
2023-11 234 280 83.57142857142857
2023-12 263 320 82.1875
2024-01 215 288 74.65277777777777
2024-02 182 256 71.09375
2024-03 270 360 75.0
2024-04 254 296 85.8108108108108
2024-05 221 288 76.73611111111111
05/23-05/24 3228 4096 78.80859375
2024-06 309 368 83.96739130434783
2024-07 235 344 68.31395348837209
2024-08 292 528 55.303030303030305
2024-09 211 376 56.11702127659574
2024-10 254 424 59.905660377358494
2024-11 269 456 58.99122807017544
2024-12 239 392 60.96938775510204
2025-01 194 408 47.549019607843135
06/24-01/25 2003 3296 60.77063106796116
2025-02 203 408 49.754901960784316
2025-03 306 544 56.25
2025-04 41 96 42.708333333333336
v5 1283 2232 57.482078853046595
v6 726 1400 51.857142857142854
AceReason-Nemotron-7B
====================================
AIME2024 (Avg@64) 68.64583333333334
AIME2025 (Avg@64) 53.59375000000002
AceReason-Nemotron-14B
====================================
AIME2024 (Avg@64) 78.43749999999997
AIME2025 (Avg@64) 67.65625
AceReason-Nemotron-1.1-7B
====================================
AIME2024 (Avg@64) 72.60416666666667
AIME2025 (Avg@64) 64.84375
```

29
config.json Normal file
View File

@@ -0,0 +1,29 @@
{
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151646,
"eos_token_id": 151643,
"hidden_act": "silu",
"hidden_size": 5120,
"initializer_range": 0.02,
"intermediate_size": 13824,
"max_position_embeddings": 131072,
"max_window_layers": 48,
"model_type": "qwen2",
"num_attention_heads": 40,
"num_hidden_layers": 48,
"num_key_value_heads": 8,
"pad_token_id": 151643,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.49.0",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 152064
}

3
evaluation.tar.gz Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1c1daadb3bfead2369b1ad8937369b85a4e550b4a2fadd6c9fac4b429c24e818
size 374916092

3
fig/main_fig.png Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:40ed09ffba7835a9a3f4c1d39c809c8ca5fe7d947e91199b4e9b266fa85178d0
size 105946

7
generation_config.json Normal file
View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 151646,
"eos_token_id": 151643,
"pad_token_id": 151643,
"transformers_version": "4.49.0"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e322e1e9498cee8d18b0c04f592ee221a54344e6137cb3c948a603b807e37217
size 4986211280

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:fa4520787b14831eab69fcb517f2f24893ff79ece41972de1cfcabf786f89ad4
size 4954847344

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c3716bd5b77888fbe5f838b742b0146005bc19ae82adfb6db4a21e0ca7ea1b72
size 4954847392

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6211c78217e6d8fdfcc81db8edea3737519a1a8984d15d0c3639ce312447f731
size 4954847392

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7c260cbd5fd753270f75a0cea6e2f5f52b2d7bbf28bd019755ca2639e129166f
size 4954847392

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c175ad433d94c2a121424e9fb4ceb35a273cdbd7510716268d5790cd39a4c301
size 4734533160

View File

@@ -0,0 +1,586 @@
{
"metadata": {
"total_size": 29540067328
},
"weight_map": {
"lm_head.weight": "model-00006-of-00006.safetensors",
"model.embed_tokens.weight": "model-00001-of-00006.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00006.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
"model.layers.0.self_attn.k_proj.bias": "model-00001-of-00006.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.0.self_attn.q_proj.bias": "model-00001-of-00006.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.0.self_attn.v_proj.bias": "model-00001-of-00006.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00006.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
"model.layers.1.self_attn.k_proj.bias": "model-00001-of-00006.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.1.self_attn.q_proj.bias": "model-00001-of-00006.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.1.self_attn.v_proj.bias": "model-00001-of-00006.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.10.input_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.10.self_attn.k_proj.bias": "model-00002-of-00006.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.10.self_attn.q_proj.bias": "model-00002-of-00006.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.10.self_attn.v_proj.bias": "model-00002-of-00006.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.11.input_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.11.self_attn.k_proj.bias": "model-00002-of-00006.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.11.self_attn.q_proj.bias": "model-00002-of-00006.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.11.self_attn.v_proj.bias": "model-00002-of-00006.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.12.input_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.12.self_attn.k_proj.bias": "model-00002-of-00006.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.12.self_attn.q_proj.bias": "model-00002-of-00006.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.12.self_attn.v_proj.bias": "model-00002-of-00006.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.13.input_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.13.self_attn.k_proj.bias": "model-00002-of-00006.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.13.self_attn.q_proj.bias": "model-00002-of-00006.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.13.self_attn.v_proj.bias": "model-00002-of-00006.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.14.input_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.14.self_attn.k_proj.bias": "model-00002-of-00006.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.14.self_attn.q_proj.bias": "model-00002-of-00006.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.14.self_attn.v_proj.bias": "model-00002-of-00006.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.15.input_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.15.self_attn.k_proj.bias": "model-00002-of-00006.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.15.self_attn.q_proj.bias": "model-00002-of-00006.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.15.self_attn.v_proj.bias": "model-00002-of-00006.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.16.input_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.16.self_attn.k_proj.bias": "model-00003-of-00006.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.16.self_attn.q_proj.bias": "model-00003-of-00006.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.16.self_attn.v_proj.bias": "model-00003-of-00006.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.17.input_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.17.self_attn.k_proj.bias": "model-00003-of-00006.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.17.self_attn.q_proj.bias": "model-00003-of-00006.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.17.self_attn.v_proj.bias": "model-00003-of-00006.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.18.input_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.18.self_attn.k_proj.bias": "model-00003-of-00006.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.18.self_attn.q_proj.bias": "model-00003-of-00006.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.18.self_attn.v_proj.bias": "model-00003-of-00006.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.19.input_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.19.self_attn.k_proj.bias": "model-00003-of-00006.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.19.self_attn.q_proj.bias": "model-00003-of-00006.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.19.self_attn.v_proj.bias": "model-00003-of-00006.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00006.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
"model.layers.2.self_attn.k_proj.bias": "model-00001-of-00006.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.2.self_attn.q_proj.bias": "model-00001-of-00006.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.2.self_attn.v_proj.bias": "model-00001-of-00006.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.20.input_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.20.self_attn.k_proj.bias": "model-00003-of-00006.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.20.self_attn.q_proj.bias": "model-00003-of-00006.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.20.self_attn.v_proj.bias": "model-00003-of-00006.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.21.input_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.21.self_attn.k_proj.bias": "model-00003-of-00006.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.21.self_attn.q_proj.bias": "model-00003-of-00006.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.21.self_attn.v_proj.bias": "model-00003-of-00006.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.22.input_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.22.self_attn.k_proj.bias": "model-00003-of-00006.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.22.self_attn.q_proj.bias": "model-00003-of-00006.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.22.self_attn.v_proj.bias": "model-00003-of-00006.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.23.input_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.23.self_attn.k_proj.bias": "model-00003-of-00006.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.23.self_attn.q_proj.bias": "model-00003-of-00006.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.23.self_attn.v_proj.bias": "model-00003-of-00006.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.24.input_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.24.self_attn.k_proj.bias": "model-00003-of-00006.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.24.self_attn.q_proj.bias": "model-00003-of-00006.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.24.self_attn.v_proj.bias": "model-00003-of-00006.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.25.input_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.25.self_attn.k_proj.bias": "model-00004-of-00006.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.25.self_attn.q_proj.bias": "model-00004-of-00006.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.25.self_attn.v_proj.bias": "model-00004-of-00006.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.26.input_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.26.self_attn.k_proj.bias": "model-00004-of-00006.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.26.self_attn.q_proj.bias": "model-00004-of-00006.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.26.self_attn.v_proj.bias": "model-00004-of-00006.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.27.input_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.27.self_attn.k_proj.bias": "model-00004-of-00006.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.27.self_attn.q_proj.bias": "model-00004-of-00006.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.27.self_attn.v_proj.bias": "model-00004-of-00006.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.28.input_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.28.self_attn.k_proj.bias": "model-00004-of-00006.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.28.self_attn.q_proj.bias": "model-00004-of-00006.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.28.self_attn.v_proj.bias": "model-00004-of-00006.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.29.input_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.29.self_attn.k_proj.bias": "model-00004-of-00006.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.29.self_attn.q_proj.bias": "model-00004-of-00006.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.29.self_attn.v_proj.bias": "model-00004-of-00006.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.3.input_layernorm.weight": "model-00001-of-00006.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
"model.layers.3.self_attn.k_proj.bias": "model-00001-of-00006.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.3.self_attn.q_proj.bias": "model-00001-of-00006.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.3.self_attn.v_proj.bias": "model-00001-of-00006.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.30.input_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.30.self_attn.k_proj.bias": "model-00004-of-00006.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.30.self_attn.q_proj.bias": "model-00004-of-00006.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.30.self_attn.v_proj.bias": "model-00004-of-00006.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.31.input_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.31.self_attn.k_proj.bias": "model-00004-of-00006.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.31.self_attn.q_proj.bias": "model-00004-of-00006.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.31.self_attn.v_proj.bias": "model-00004-of-00006.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.32.input_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.32.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.32.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.32.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.32.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.32.self_attn.k_proj.bias": "model-00004-of-00006.safetensors",
"model.layers.32.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.32.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.32.self_attn.q_proj.bias": "model-00004-of-00006.safetensors",
"model.layers.32.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.32.self_attn.v_proj.bias": "model-00004-of-00006.safetensors",
"model.layers.32.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.33.input_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.33.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.33.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.33.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.33.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.33.self_attn.k_proj.bias": "model-00004-of-00006.safetensors",
"model.layers.33.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.33.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.33.self_attn.q_proj.bias": "model-00004-of-00006.safetensors",
"model.layers.33.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.33.self_attn.v_proj.bias": "model-00004-of-00006.safetensors",
"model.layers.33.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.34.input_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.34.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.34.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.34.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.34.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.34.self_attn.k_proj.bias": "model-00005-of-00006.safetensors",
"model.layers.34.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.34.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.34.self_attn.q_proj.bias": "model-00005-of-00006.safetensors",
"model.layers.34.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.34.self_attn.v_proj.bias": "model-00005-of-00006.safetensors",
"model.layers.34.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.35.input_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.35.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.35.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.35.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.35.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.35.self_attn.k_proj.bias": "model-00005-of-00006.safetensors",
"model.layers.35.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.35.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.35.self_attn.q_proj.bias": "model-00005-of-00006.safetensors",
"model.layers.35.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.35.self_attn.v_proj.bias": "model-00005-of-00006.safetensors",
"model.layers.35.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.36.input_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.36.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.36.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.36.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.36.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.36.self_attn.k_proj.bias": "model-00005-of-00006.safetensors",
"model.layers.36.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.36.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.36.self_attn.q_proj.bias": "model-00005-of-00006.safetensors",
"model.layers.36.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.36.self_attn.v_proj.bias": "model-00005-of-00006.safetensors",
"model.layers.36.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.37.input_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.37.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.37.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.37.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.37.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.37.self_attn.k_proj.bias": "model-00005-of-00006.safetensors",
"model.layers.37.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.37.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.37.self_attn.q_proj.bias": "model-00005-of-00006.safetensors",
"model.layers.37.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.37.self_attn.v_proj.bias": "model-00005-of-00006.safetensors",
"model.layers.37.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.38.input_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.38.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.38.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.38.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.38.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.38.self_attn.k_proj.bias": "model-00005-of-00006.safetensors",
"model.layers.38.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.38.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.38.self_attn.q_proj.bias": "model-00005-of-00006.safetensors",
"model.layers.38.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.38.self_attn.v_proj.bias": "model-00005-of-00006.safetensors",
"model.layers.38.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.39.input_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.39.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.39.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.39.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.39.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.39.self_attn.k_proj.bias": "model-00005-of-00006.safetensors",
"model.layers.39.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.39.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.39.self_attn.q_proj.bias": "model-00005-of-00006.safetensors",
"model.layers.39.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.39.self_attn.v_proj.bias": "model-00005-of-00006.safetensors",
"model.layers.39.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.4.input_layernorm.weight": "model-00001-of-00006.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
"model.layers.4.self_attn.k_proj.bias": "model-00001-of-00006.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.4.self_attn.q_proj.bias": "model-00001-of-00006.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.4.self_attn.v_proj.bias": "model-00001-of-00006.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.40.input_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.40.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.40.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.40.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.40.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.40.self_attn.k_proj.bias": "model-00005-of-00006.safetensors",
"model.layers.40.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.40.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.40.self_attn.q_proj.bias": "model-00005-of-00006.safetensors",
"model.layers.40.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.40.self_attn.v_proj.bias": "model-00005-of-00006.safetensors",
"model.layers.40.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.41.input_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.41.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.41.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.41.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.41.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.41.self_attn.k_proj.bias": "model-00005-of-00006.safetensors",
"model.layers.41.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.41.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.41.self_attn.q_proj.bias": "model-00005-of-00006.safetensors",
"model.layers.41.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.41.self_attn.v_proj.bias": "model-00005-of-00006.safetensors",
"model.layers.41.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.42.input_layernorm.weight": "model-00006-of-00006.safetensors",
"model.layers.42.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.42.mlp.gate_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.42.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.42.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
"model.layers.42.self_attn.k_proj.bias": "model-00005-of-00006.safetensors",
"model.layers.42.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.42.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.42.self_attn.q_proj.bias": "model-00005-of-00006.safetensors",
"model.layers.42.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.42.self_attn.v_proj.bias": "model-00005-of-00006.safetensors",
"model.layers.42.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.43.input_layernorm.weight": "model-00006-of-00006.safetensors",
"model.layers.43.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.43.mlp.gate_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.43.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.43.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
"model.layers.43.self_attn.k_proj.bias": "model-00006-of-00006.safetensors",
"model.layers.43.self_attn.k_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.43.self_attn.o_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.43.self_attn.q_proj.bias": "model-00006-of-00006.safetensors",
"model.layers.43.self_attn.q_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.43.self_attn.v_proj.bias": "model-00006-of-00006.safetensors",
"model.layers.43.self_attn.v_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.44.input_layernorm.weight": "model-00006-of-00006.safetensors",
"model.layers.44.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.44.mlp.gate_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.44.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.44.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
"model.layers.44.self_attn.k_proj.bias": "model-00006-of-00006.safetensors",
"model.layers.44.self_attn.k_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.44.self_attn.o_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.44.self_attn.q_proj.bias": "model-00006-of-00006.safetensors",
"model.layers.44.self_attn.q_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.44.self_attn.v_proj.bias": "model-00006-of-00006.safetensors",
"model.layers.44.self_attn.v_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.45.input_layernorm.weight": "model-00006-of-00006.safetensors",
"model.layers.45.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.45.mlp.gate_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.45.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.45.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
"model.layers.45.self_attn.k_proj.bias": "model-00006-of-00006.safetensors",
"model.layers.45.self_attn.k_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.45.self_attn.o_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.45.self_attn.q_proj.bias": "model-00006-of-00006.safetensors",
"model.layers.45.self_attn.q_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.45.self_attn.v_proj.bias": "model-00006-of-00006.safetensors",
"model.layers.45.self_attn.v_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.46.input_layernorm.weight": "model-00006-of-00006.safetensors",
"model.layers.46.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.46.mlp.gate_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.46.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.46.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
"model.layers.46.self_attn.k_proj.bias": "model-00006-of-00006.safetensors",
"model.layers.46.self_attn.k_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.46.self_attn.o_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.46.self_attn.q_proj.bias": "model-00006-of-00006.safetensors",
"model.layers.46.self_attn.q_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.46.self_attn.v_proj.bias": "model-00006-of-00006.safetensors",
"model.layers.46.self_attn.v_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.47.input_layernorm.weight": "model-00006-of-00006.safetensors",
"model.layers.47.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.47.mlp.gate_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.47.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.47.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
"model.layers.47.self_attn.k_proj.bias": "model-00006-of-00006.safetensors",
"model.layers.47.self_attn.k_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.47.self_attn.o_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.47.self_attn.q_proj.bias": "model-00006-of-00006.safetensors",
"model.layers.47.self_attn.q_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.47.self_attn.v_proj.bias": "model-00006-of-00006.safetensors",
"model.layers.47.self_attn.v_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.5.input_layernorm.weight": "model-00001-of-00006.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
"model.layers.5.self_attn.k_proj.bias": "model-00001-of-00006.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.5.self_attn.q_proj.bias": "model-00001-of-00006.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.5.self_attn.v_proj.bias": "model-00001-of-00006.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.6.input_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.6.self_attn.k_proj.bias": "model-00001-of-00006.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.6.self_attn.q_proj.bias": "model-00001-of-00006.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.6.self_attn.v_proj.bias": "model-00001-of-00006.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.7.input_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.7.self_attn.k_proj.bias": "model-00002-of-00006.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.7.self_attn.q_proj.bias": "model-00002-of-00006.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.7.self_attn.v_proj.bias": "model-00002-of-00006.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.8.input_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.8.self_attn.k_proj.bias": "model-00002-of-00006.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.8.self_attn.q_proj.bias": "model-00002-of-00006.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.8.self_attn.v_proj.bias": "model-00002-of-00006.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.9.input_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.9.self_attn.k_proj.bias": "model-00002-of-00006.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.9.self_attn.q_proj.bias": "model-00002-of-00006.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.9.self_attn.v_proj.bias": "model-00002-of-00006.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
"model.norm.weight": "model-00006-of-00006.safetensors"
}
}

23
special_tokens_map.json Normal file
View File

@@ -0,0 +1,23 @@
{
"bos_token": {
"content": "<begin▁of▁sentence>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<end▁of▁sentence>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<end▁of▁sentence>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

3
tokenizer.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e20ddafc659ba90242154b55275402edeca0715e5dbb30f56815a4ce081f4893
size 11422778

195
tokenizer_config.json Normal file
View File

@@ -0,0 +1,195 @@
{
"add_bos_token": true,
"add_eos_token": false,
"add_prefix_space": null,
"added_tokens_decoder": {
"151643": {
"content": "<end▁of▁sentence>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151644": {
"content": "<User>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151645": {
"content": "<Assistant>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151646": {
"content": "<begin▁of▁sentence>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151647": {
"content": "<|EOT|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151648": {
"content": "<think>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151649": {
"content": "</think>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151650": {
"content": "<|quad_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151651": {
"content": "<|quad_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151652": {
"content": "<|vision_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151653": {
"content": "<|vision_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151654": {
"content": "<|vision_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151655": {
"content": "<|image_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151656": {
"content": "<|video_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151657": {
"content": "<tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151658": {
"content": "</tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151659": {
"content": "<|fim_prefix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151660": {
"content": "<|fim_middle|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151661": {
"content": "<|fim_suffix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151662": {
"content": "<|fim_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151663": {
"content": "<|repo_name|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151664": {
"content": "<|file_sep|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
}
},
"bos_token": "<begin▁of▁sentence>",
"chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='') %}{%- for message in messages %}{%- if message['role'] == 'system' %}{% set ns.system_prompt = message['content'] %}{%- endif %}{%- endfor %}{{bos_token}}{{ns.system_prompt}}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{{'<User>' + message['content']}}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is none %}{%- set ns.is_tool = false -%}{%- for tool in message['tool_calls']%}{%- if not ns.is_first %}{{'<Assistant><tool▁calls▁begin><tool▁call▁begin>' + tool['type'] + '<tool▁sep>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<tool▁call▁end>'}}{%- set ns.is_first = true -%}{%- else %}{{'\\n' + '<tool▁call▁begin>' + tool['type'] + '<tool▁sep>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<tool▁call▁end>'}}{{'<tool▁calls▁end><end▁of▁sentence>'}}{%- endif %}{%- endfor %}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is not none %}{%- if ns.is_tool %}{{'<tool▁outputs▁end>' + message['content'] + '<end▁of▁sentence>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{% if '</think>' in content %}{% set content = content.split('</think>')[-1] %}{% endif %}{{'<Assistant>' + content + '<end▁of▁sentence>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<tool▁outputs▁begin><tool▁output▁begin>' + message['content'] + '<tool▁output▁end>'}}{%- set ns.is_output_first = false %}{%- else %}{{'\\n<tool▁output▁begin>' + message['content'] + '<tool▁output▁end>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<tool▁outputs▁end>'}}{% endif %}{% if add_generation_prompt and not ns.is_tool %}{{'<Assistant><think>\\n'}}{% endif %}",
"clean_up_tokenization_spaces": false,
"eos_token": "<end▁of▁sentence>",
"extra_special_tokens": {},
"legacy": true,
"model_max_length": 16384,
"pad_token": "<end▁of▁sentence>",
"sp_model_kwargs": {},
"tokenizer_class": "LlamaTokenizerFast",
"unk_token": null,
"use_default_system_prompt": false
}