初始化项目,由ModelHub XC社区提供模型

Model: introvoyz041/Goedel-Prover-V2-32B
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-02 18:03:52 +08:00
commit da847b491d
25 changed files with 152760 additions and 0 deletions

36
.gitattributes vendored Normal file
View File

@@ -0,0 +1,36 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text

239
README.md Normal file
View File

@@ -0,0 +1,239 @@
---
base_model:
- Qwen/Qwen3-32B
license: apache-2.0
pipeline_tag: text-generation
library_name: transformers
---
<div align="center">
<h1> <a href="http://blog.goedel-prover.com"> <strong>Goedel-Prover-V2: The Strongest Open-Source Theorem Prover to Date</strong></a></h1>
</div>
<div align="center">
[![Website](https://img.shields.io/badge/%F0%9F%A4%96%20Homepage-Goedel-536af5?color=536af5&logoColor=white)](http://blog.goedel-prover.com)
[![GitHub](https://img.shields.io/badge/GitHub-Code-black.svg?logo=github)](https://github.com/Goedel-LM/Goedel-Prover-V2)
[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20face-Goedel-ffc107?color=ffc107&logoColor=white)](https://huggingface.co/Goedel-LM/Goedel-Prover-V2-32B)
[![arXiv](https://img.shields.io/badge/arXiv-2508.03613-b31b1b.svg?style=flat)](https://arxiv.org/abs/2508.03613)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
</div>
## 1. Introduction
We introduce Goedel-Prover-V2, an open-source language model series that sets a new state-of-the-art in automated formal proof generation. Built on the standard expert iteration and reinforcement learning pipeline, our approach incorporates three key innovations: (1) <strong>Scaffolded data synthesis</strong>: We generate synthetic proof tasks of increasing difficulty to progressively train the model, enabling it to master increasingly complex theorems; (2) <strong>Verifier-guided self-correction</strong>: The model learns to iteratively revise its own proofs by leveraging feedback from Leans compiler, closely mimicking how humans refine their work; (3) <strong>Model averaging</strong>: We combine multiple model checkpoints to improve robustness and overall performance.
Our small model, Goedel-Prover-V2-8B, reaches 84.6% on MiniF2F test set at Pass@32, matching the performance of prior state-of-the-art DeepSeek-Prover-V2-671B while being nearly 100 times smaller in model size. Our flagship model, Goedel-Prover-V2-32B, achieves 88.0% on MiniF2F at Pass@32 on standard mode and 90.4% on self-correction mode, outperforming prior SOTA DeepSeek-Prover-V2-671B and concurrent work Kimina-Prover-72B by a large margin. Additionaly, our flagship model with self-correction solves 64 problems on PutnamBench at Pass@64, securing the 1st on the leaderboard surpassing DeepSeek-Prover-V2-671B's record of solving 47 problems by Pass@1024.
## 2. Benchmark Performance
<strong>Self-correction mode</strong>: Our model improves proof quality by first generating an initial candidate and then using Lean compiler feedback to iteratively revise it. We perform two rounds of self-correction, which remain computationally efficient—the total output length (including the initial proof and two revisions) increases only modestly from the standard 32K to 40K tokens.
<style>
.fig-row {
display: flex;
justify-content: space-between; /* spread them out */
align-items: flex-start; /* align tops */
gap: 1rem; /* space between images */
}
.fig-row img {
display: block;
width: 100%;
height: auto;
}
.fig-row .panel {
/* override perpanel width as needed */
/* e.g. .panel-1 { width:25%; } .panel-2 { width:40%; } etc. */
}
figure {
margin: 0;
}
figure figcaption {
text-align: center;
font-size: 0.9em;
margin-top: 0.75rem;
color: #555;
}
figure figcaption strong {
font-weight: bold;
}
/* Italicize the rest of the caption */
figure figcaption em {
font-style: italic;
}
</style>
<figure>
<div class="fig-row">
<div class="panel panel-1" style="width:100%;">
<img src="https://github.com/Goedel-LM/Goedel-Prover-V2/blob/main/assets/combined_performance_plots_varied_width.png?raw=true" alt="…">
</div>
</div>
<figcaption>
<strong>Figure 1</strong>: <em>Pass@32 performance on MiniF2F, PutnamBench, and our new MathOlympiadBench containing 360 IMO-level problems.</em>
</figcaption>
</figure>
The charts above demonstrate the state-of-the-art performance of Goedel-Prover-V2. We report all numbers at Pass@32: (1) Across all three datasets, our flagship 32B model, in both standard and self-correction mode, significantly outperforms prior state-of-the-art DeepSeek-Prover-V2-671B and Kimina-Prover-72B; (2) on miniF2F, our 8B model matches the performance of DeepSeek-Prover-V2-671B while being 100 times smaller in model size.
<div align="center">
<table style="margin: 0 auto;">
<thead>
<tr>
<th>#</th>
<th>Model</th>
<th>numsolved</th>
<th>compute</th>
</tr>
</thead>
<tbody>
<tr><td>1</td><td><strong>Goedel-Prover-V2-32B (self-correction mode)</strong></td><td><strong>86</strong></td><td><strong>Pass@192</strong></td></tr>
<tr><td>1</td><td><strong>Goedel-Prover-V2-32B (self-correction mode)</strong></td><td><strong>57</strong></td><td><strong>Pass@32</strong></td></tr>
<tr><td>1</td><td><strong>Goedel-Prover-V2-32B</strong></td><td><strong>43</strong></td><td><strong>Pass@32</strong></td></tr>
<tr><td>2</td><td>DeepSeekProverV2-671B</td><td>47</td><td>Pass@1024</td></tr>
<tr><td>2</td><td>DeepSeekProverV2-671B</td><td>22</td><td>Pass@32</td></tr>
<tr><td>3</td><td>DSP+</td><td>23</td><td>Pass@128</td></tr>
<tr><td>4</td><td>KiminaProver7BDistill</td><td>10</td><td>Pass@192</td></tr>
<tr><td>5</td><td>Self-play Theorem Prover</td><td>8</td><td>Pass@3200</td></tr>
<tr><td>6</td><td>Goedel-Prover-V1</td><td>7</td><td>Pass@512</td></tr>
</tbody>
</table>
<!-- table caption -->
<caption align="bottom"><strong>Table 1</strong>: <em>PutnamBench leaderboard. Goedel-Prover-V2-32B secures the top rank with significantly less compute (pass number) than the previous state-of-the-art.</em>
</div>
## 3. Compelling Scaling Performance
<style>
.fig-row {
display: flex;
justify-content: space-between; /* spread them out */
align-items: flex-start; /* align tops */
gap: 1rem; /* space between images */
}
.fig-row img {
display: block;
width: 100%;
height: auto;
}
.fig-row .panel {
/* override perpanel width as needed */
/* e.g. .panel-1 { width:25%; } .panel-2 { width:40%; } etc. */
}
figure {
margin: 0;
}
figure figcaption {
text-align: center;
font-size: 0.9em;
margin-top: 0.75rem;
color: #555;
}
figure figcaption strong {
font-weight: bold;
}
/* Italicize the rest of the caption */
figure figcaption em {
font-style: italic;
}
</style>
<figure>
<div class="fig-row">
<div class="panel panel-1" style="width:80%;">
<img src="https://github.com/Goedel-LM/Goedel-Prover-V2/blob/main/assets/inference_scale_performance.png?raw=true" alt="…">
</div>
</div>
<figcaption>
<strong>Figure 2</strong>: <em>Performance on MiniF2F test set under different sample budgets.</em>
</figcaption>
</figure>
The scaling curves above show that our 32B model consistently outperforms all prior state-of-the-art models across the entire range of inference-time compute budgets.
## 4. Model & Dataset Downloads
We release our Goedel-Prover-V2 models and the new MathOlympiadBench benchmark to foster future research.
<div align="center">
| Model | Download |
| -------- | -------- |
| Goedel-Prover-V2-32B | [🤗Download](https://huggingface.co/Goedel-LM/Goedel-Prover-V2-32B) |
| Goedel-Prover-V2-8B | [🤗Download](https://huggingface.co/Goedel-LM/Goedel-Prover-V2-8B) |
</div>
<div align="center">
| Dataset | Download |
| -------- | -------- |
| MathOlympiadBench | [🤗Download](https://huggingface.co/datasets/Goedel-LM/MathOlympiadBench) |
</div>
<strong>MathOlympiadBench</strong> (Math Olympiad Bench) comprises human-verified formalizations of Olympiad-level mathematical competition problems, sourced from [Compfiles](https://dwrensha.github.io/compfiles/imo.html) and [IMOSLLean4 repository](https://github.com/mortarsanjaya/IMOSLLean4). MathOlympiadBench contains 360 problems, including 158 IMO problems from 1959 to 2024, 131 IMO shortlist problems covering 2006 to 2023, 68 regional mathematical Olympiad problems, and 3 additional mathematical puzzles.
This model is being released to aid other open-source projects, including those geared towards the upcoming IMO competition. A full paper with all details will be released in the coming weeks.
## 5. Quick Start
You can directly use [Huggingface's Transformers](https://github.com/huggingface/transformers) for model inference.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
torch.manual_seed(30)
model_id = "Goedel-LM/Goedel-Prover-V2-32B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True)
formal_statement = """
import Mathlib
import Aesop
set_option maxHeartbeats 0
open BigOperators Real Nat Topology Rat
theorem square_equation_solution {x y : } (h : x^2 + y^2 = 2*x - 4*y - 5) : x + y = -1 := by
sorry
""".strip()
prompt = """
Complete the following Lean 4 code:
```lean4
{}```
Before producing the Lean 4 code to formally prove the given theorem, provide a detailed proof plan outlining the main proof steps and strategies.
The plan should highlight key ideas, intermediate lemmas, and proof structures that will guide the construction of the final formal proof.
""".strip()
chat = [
{"role": "user", "content": prompt.format(formal_statement)},
]
inputs = tokenizer.apply_chat_template(chat, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
import time
start = time.time()
outputs = model.generate(inputs, max_new_tokens=32768)
print(tokenizer.batch_decode(outputs))
print(time.time() - start)
```
### Cite
```bibtex
@article{lin2025goedelproverv2,
title={Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction},
author={Lin, Yong and Tang, Shange and Lyu, Bohan and Yang, Ziran and Chung, Jui-Hui and Zhao, Haoyu and Jiang, Lai and Geng, Yihan and Ge, Jiawei and Sun, Jingruo and others},
journal={arXiv preprint arXiv:2508.03613},
year={2025}
}
```

28
added_tokens.json Normal file
View File

@@ -0,0 +1,28 @@
{
"</think>": 151668,
"</tool_call>": 151658,
"</tool_response>": 151666,
"<think>": 151667,
"<tool_call>": 151657,
"<tool_response>": 151665,
"<|box_end|>": 151649,
"<|box_start|>": 151648,
"<|endoftext|>": 151643,
"<|file_sep|>": 151664,
"<|fim_middle|>": 151660,
"<|fim_pad|>": 151662,
"<|fim_prefix|>": 151659,
"<|fim_suffix|>": 151661,
"<|im_end|>": 151645,
"<|im_start|>": 151644,
"<|image_pad|>": 151655,
"<|object_ref_end|>": 151647,
"<|object_ref_start|>": 151646,
"<|quad_end|>": 151651,
"<|quad_start|>": 151650,
"<|repo_name|>": 151663,
"<|video_pad|>": 151656,
"<|vision_end|>": 151653,
"<|vision_pad|>": 151654,
"<|vision_start|>": 151652
}

30
config.json Normal file
View File

@@ -0,0 +1,30 @@
{
"architectures": [
"Qwen3ForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 5120,
"initializer_range": 0.02,
"intermediate_size": 25600,
"max_position_embeddings": 40960,
"max_window_layers": 64,
"model_type": "qwen3",
"num_attention_heads": 64,
"num_hidden_layers": 64,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.51.3",
"use_cache": false,
"use_sliding_window": false,
"vocab_size": 151936
}

7
generation_config.json Normal file
View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 151643,
"eos_token_id": 151645,
"transformers_version": "4.51.3",
"use_cache": false
}

151388
merges.txt Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:152d72688b31b1adc36aca768f5cf1dd5e9514586d1c618dd306622e84c6c6f0
size 4932307584

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ec435641fd3bd63944832d0897fa66be2c1158332acdcf41f4ce3824c0530626
size 4875989696

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b50b6662d302d10664f1e4c5806a0710b058cbf521ae72a636e3149fbe35fc63
size 4875989720

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:3e67d5e960d5fd4327cde3c0d8bf8473903133bfebd111d4a39d8b2c98348d00
size 4875989752

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:12f1c18134b3b9780c3c0944cce4e85d508e29cdd615586b14b1e2e7e49859d3
size 4875989752

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:fd056bb091271b8436930db34dfc90b63838e77d7984aaa844c48bc9cf94e8f5
size 4875989752

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:66a9b925f82a396dd0b8a5fd239e3f3bf31b6d1c124d29d24feb3f05e41dba41
size 4875989752

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:16cec003e66ac61770ce2abe6606327b1dedf81e2782d0da27892e9c5f3a6543
size 4875989752

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9a557b7c658c41112c5a5ea02777afbc59c31f116cbc638fd914e01d75ee2e0b
size 4875989752

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:306ee6207677fdb93eb2c04b45409e9a9c24758eac08f148062f1b1e4a5e3e06
size 4875989752

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:32bc24d3d996e1656343ef1f27ffac1e9e687906bf6bd65bf8c7ba43c9cb1a74
size 4875989752

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:06c5276ddf4ec59f423d3f94200ff2eff1d16995adb5369ec3080546ebf3db00
size 4875989752

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a23863742017ca3d49cda1d40a50c9ce1f40ecf2405e67583cb089a17704bfe3
size 4875989752

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:79a3e28f4178d94c0a39bebe47fc14b030807e97fed674757c0cef219830e4fa
size 2080144040

View File

@@ -0,0 +1,714 @@
{
"metadata": {
"total_size": 65524246528
},
"weight_map": {
"lm_head.weight": "model-00002-of-00014.safetensors",
"model.embed_tokens.weight": "model-00003-of-00014.safetensors",
"model.layers.0.input_layernorm.weight": "model-00003-of-00014.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00004-of-00014.safetensors",
"model.layers.0.self_attn.k_norm.weight": "model-00001-of-00014.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.0.self_attn.q_norm.weight": "model-00004-of-00014.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.1.input_layernorm.weight": "model-00014-of-00014.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00014.safetensors",
"model.layers.1.self_attn.k_norm.weight": "model-00010-of-00014.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.1.self_attn.q_norm.weight": "model-00006-of-00014.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.10.input_layernorm.weight": "model-00005-of-00014.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00007-of-00014.safetensors",
"model.layers.10.self_attn.k_norm.weight": "model-00004-of-00014.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.10.self_attn.q_norm.weight": "model-00005-of-00014.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.11.input_layernorm.weight": "model-00007-of-00014.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00011-of-00014.safetensors",
"model.layers.11.self_attn.k_norm.weight": "model-00004-of-00014.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.11.self_attn.q_norm.weight": "model-00005-of-00014.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.12.input_layernorm.weight": "model-00002-of-00014.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00013-of-00014.safetensors",
"model.layers.12.self_attn.k_norm.weight": "model-00006-of-00014.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.12.self_attn.q_norm.weight": "model-00004-of-00014.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.13.input_layernorm.weight": "model-00013-of-00014.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00014-of-00014.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00008-of-00014.safetensors",
"model.layers.13.self_attn.k_norm.weight": "model-00008-of-00014.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.13.self_attn.q_norm.weight": "model-00001-of-00014.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.14.input_layernorm.weight": "model-00013-of-00014.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00014-of-00014.safetensors",
"model.layers.14.self_attn.k_norm.weight": "model-00001-of-00014.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00014-of-00014.safetensors",
"model.layers.14.self_attn.q_norm.weight": "model-00007-of-00014.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.15.input_layernorm.weight": "model-00010-of-00014.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00010-of-00014.safetensors",
"model.layers.15.self_attn.k_norm.weight": "model-00006-of-00014.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.15.self_attn.q_norm.weight": "model-00002-of-00014.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.16.input_layernorm.weight": "model-00007-of-00014.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00013-of-00014.safetensors",
"model.layers.16.self_attn.k_norm.weight": "model-00011-of-00014.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.16.self_attn.q_norm.weight": "model-00001-of-00014.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.17.input_layernorm.weight": "model-00006-of-00014.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00004-of-00014.safetensors",
"model.layers.17.self_attn.k_norm.weight": "model-00006-of-00014.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.17.self_attn.q_norm.weight": "model-00003-of-00014.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.18.input_layernorm.weight": "model-00013-of-00014.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00014-of-00014.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00011-of-00014.safetensors",
"model.layers.18.self_attn.k_norm.weight": "model-00004-of-00014.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.18.self_attn.q_norm.weight": "model-00003-of-00014.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.19.input_layernorm.weight": "model-00012-of-00014.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00008-of-00014.safetensors",
"model.layers.19.self_attn.k_norm.weight": "model-00012-of-00014.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.19.self_attn.q_norm.weight": "model-00001-of-00014.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.2.input_layernorm.weight": "model-00012-of-00014.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00007-of-00014.safetensors",
"model.layers.2.self_attn.k_norm.weight": "model-00007-of-00014.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.2.self_attn.q_norm.weight": "model-00001-of-00014.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.20.input_layernorm.weight": "model-00007-of-00014.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00006-of-00014.safetensors",
"model.layers.20.self_attn.k_norm.weight": "model-00012-of-00014.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.20.self_attn.q_norm.weight": "model-00001-of-00014.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.21.input_layernorm.weight": "model-00013-of-00014.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00006-of-00014.safetensors",
"model.layers.21.self_attn.k_norm.weight": "model-00012-of-00014.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.21.self_attn.q_norm.weight": "model-00003-of-00014.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.22.input_layernorm.weight": "model-00001-of-00014.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00012-of-00014.safetensors",
"model.layers.22.self_attn.k_norm.weight": "model-00013-of-00014.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.22.self_attn.q_norm.weight": "model-00008-of-00014.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.23.input_layernorm.weight": "model-00004-of-00014.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00010-of-00014.safetensors",
"model.layers.23.self_attn.k_norm.weight": "model-00004-of-00014.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.23.self_attn.q_norm.weight": "model-00005-of-00014.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.24.input_layernorm.weight": "model-00012-of-00014.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00013-of-00014.safetensors",
"model.layers.24.self_attn.k_norm.weight": "model-00013-of-00014.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.24.self_attn.q_norm.weight": "model-00013-of-00014.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.25.input_layernorm.weight": "model-00001-of-00014.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00013-of-00014.safetensors",
"model.layers.25.self_attn.k_norm.weight": "model-00007-of-00014.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.25.self_attn.q_norm.weight": "model-00008-of-00014.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.26.input_layernorm.weight": "model-00001-of-00014.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00013-of-00014.safetensors",
"model.layers.26.self_attn.k_norm.weight": "model-00001-of-00014.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.26.self_attn.q_norm.weight": "model-00004-of-00014.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.27.input_layernorm.weight": "model-00001-of-00014.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00014-of-00014.safetensors",
"model.layers.27.self_attn.k_norm.weight": "model-00001-of-00014.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.27.self_attn.q_norm.weight": "model-00006-of-00014.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.28.input_layernorm.weight": "model-00013-of-00014.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00003-of-00014.safetensors",
"model.layers.28.self_attn.k_norm.weight": "model-00001-of-00014.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.28.self_attn.q_norm.weight": "model-00008-of-00014.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.29.input_layernorm.weight": "model-00012-of-00014.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00008-of-00014.safetensors",
"model.layers.29.self_attn.k_norm.weight": "model-00011-of-00014.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.29.self_attn.q_norm.weight": "model-00014-of-00014.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.3.input_layernorm.weight": "model-00012-of-00014.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00007-of-00014.safetensors",
"model.layers.3.self_attn.k_norm.weight": "model-00005-of-00014.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.3.self_attn.q_norm.weight": "model-00014-of-00014.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00014-of-00014.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.30.input_layernorm.weight": "model-00006-of-00014.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00011-of-00014.safetensors",
"model.layers.30.self_attn.k_norm.weight": "model-00001-of-00014.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.30.self_attn.q_norm.weight": "model-00007-of-00014.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.31.input_layernorm.weight": "model-00001-of-00014.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00001-of-00014.safetensors",
"model.layers.31.self_attn.k_norm.weight": "model-00008-of-00014.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00014-of-00014.safetensors",
"model.layers.31.self_attn.q_norm.weight": "model-00002-of-00014.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00014-of-00014.safetensors",
"model.layers.32.input_layernorm.weight": "model-00013-of-00014.safetensors",
"model.layers.32.mlp.down_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.32.mlp.gate_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.32.mlp.up_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.32.post_attention_layernorm.weight": "model-00008-of-00014.safetensors",
"model.layers.32.self_attn.k_norm.weight": "model-00001-of-00014.safetensors",
"model.layers.32.self_attn.k_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.32.self_attn.o_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.32.self_attn.q_norm.weight": "model-00009-of-00014.safetensors",
"model.layers.32.self_attn.q_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.32.self_attn.v_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.33.input_layernorm.weight": "model-00004-of-00014.safetensors",
"model.layers.33.mlp.down_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.33.mlp.gate_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.33.mlp.up_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.33.post_attention_layernorm.weight": "model-00007-of-00014.safetensors",
"model.layers.33.self_attn.k_norm.weight": "model-00006-of-00014.safetensors",
"model.layers.33.self_attn.k_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.33.self_attn.o_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.33.self_attn.q_norm.weight": "model-00012-of-00014.safetensors",
"model.layers.33.self_attn.q_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.33.self_attn.v_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.34.input_layernorm.weight": "model-00011-of-00014.safetensors",
"model.layers.34.mlp.down_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.34.mlp.gate_proj.weight": "model-00014-of-00014.safetensors",
"model.layers.34.mlp.up_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.34.post_attention_layernorm.weight": "model-00006-of-00014.safetensors",
"model.layers.34.self_attn.k_norm.weight": "model-00011-of-00014.safetensors",
"model.layers.34.self_attn.k_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.34.self_attn.o_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.34.self_attn.q_norm.weight": "model-00011-of-00014.safetensors",
"model.layers.34.self_attn.q_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.34.self_attn.v_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.35.input_layernorm.weight": "model-00008-of-00014.safetensors",
"model.layers.35.mlp.down_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.35.mlp.gate_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.35.mlp.up_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.35.post_attention_layernorm.weight": "model-00003-of-00014.safetensors",
"model.layers.35.self_attn.k_norm.weight": "model-00013-of-00014.safetensors",
"model.layers.35.self_attn.k_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.35.self_attn.o_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.35.self_attn.q_norm.weight": "model-00011-of-00014.safetensors",
"model.layers.35.self_attn.q_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.35.self_attn.v_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.36.input_layernorm.weight": "model-00008-of-00014.safetensors",
"model.layers.36.mlp.down_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.36.mlp.gate_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.36.mlp.up_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.36.post_attention_layernorm.weight": "model-00007-of-00014.safetensors",
"model.layers.36.self_attn.k_norm.weight": "model-00005-of-00014.safetensors",
"model.layers.36.self_attn.k_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.36.self_attn.o_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.36.self_attn.q_norm.weight": "model-00012-of-00014.safetensors",
"model.layers.36.self_attn.q_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.36.self_attn.v_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.37.input_layernorm.weight": "model-00011-of-00014.safetensors",
"model.layers.37.mlp.down_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.37.mlp.gate_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.37.mlp.up_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.37.post_attention_layernorm.weight": "model-00014-of-00014.safetensors",
"model.layers.37.self_attn.k_norm.weight": "model-00013-of-00014.safetensors",
"model.layers.37.self_attn.k_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.37.self_attn.o_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.37.self_attn.q_norm.weight": "model-00005-of-00014.safetensors",
"model.layers.37.self_attn.q_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.37.self_attn.v_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.38.input_layernorm.weight": "model-00009-of-00014.safetensors",
"model.layers.38.mlp.down_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.38.mlp.gate_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.38.mlp.up_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.38.post_attention_layernorm.weight": "model-00007-of-00014.safetensors",
"model.layers.38.self_attn.k_norm.weight": "model-00008-of-00014.safetensors",
"model.layers.38.self_attn.k_proj.weight": "model-00014-of-00014.safetensors",
"model.layers.38.self_attn.o_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.38.self_attn.q_norm.weight": "model-00011-of-00014.safetensors",
"model.layers.38.self_attn.q_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.38.self_attn.v_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.39.input_layernorm.weight": "model-00006-of-00014.safetensors",
"model.layers.39.mlp.down_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.39.mlp.gate_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.39.mlp.up_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.39.post_attention_layernorm.weight": "model-00013-of-00014.safetensors",
"model.layers.39.self_attn.k_norm.weight": "model-00002-of-00014.safetensors",
"model.layers.39.self_attn.k_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.39.self_attn.o_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.39.self_attn.q_norm.weight": "model-00008-of-00014.safetensors",
"model.layers.39.self_attn.q_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.39.self_attn.v_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.4.input_layernorm.weight": "model-00011-of-00014.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00003-of-00014.safetensors",
"model.layers.4.self_attn.k_norm.weight": "model-00002-of-00014.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.4.self_attn.q_norm.weight": "model-00013-of-00014.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.40.input_layernorm.weight": "model-00014-of-00014.safetensors",
"model.layers.40.mlp.down_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.40.mlp.gate_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.40.mlp.up_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.40.post_attention_layernorm.weight": "model-00001-of-00014.safetensors",
"model.layers.40.self_attn.k_norm.weight": "model-00008-of-00014.safetensors",
"model.layers.40.self_attn.k_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.40.self_attn.o_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.40.self_attn.q_norm.weight": "model-00004-of-00014.safetensors",
"model.layers.40.self_attn.q_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.40.self_attn.v_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.41.input_layernorm.weight": "model-00011-of-00014.safetensors",
"model.layers.41.mlp.down_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.41.mlp.gate_proj.weight": "model-00014-of-00014.safetensors",
"model.layers.41.mlp.up_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.41.post_attention_layernorm.weight": "model-00001-of-00014.safetensors",
"model.layers.41.self_attn.k_norm.weight": "model-00011-of-00014.safetensors",
"model.layers.41.self_attn.k_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.41.self_attn.o_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.41.self_attn.q_norm.weight": "model-00002-of-00014.safetensors",
"model.layers.41.self_attn.q_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.41.self_attn.v_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.42.input_layernorm.weight": "model-00012-of-00014.safetensors",
"model.layers.42.mlp.down_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.42.mlp.gate_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.42.mlp.up_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.42.post_attention_layernorm.weight": "model-00013-of-00014.safetensors",
"model.layers.42.self_attn.k_norm.weight": "model-00009-of-00014.safetensors",
"model.layers.42.self_attn.k_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.42.self_attn.o_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.42.self_attn.q_norm.weight": "model-00009-of-00014.safetensors",
"model.layers.42.self_attn.q_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.42.self_attn.v_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.43.input_layernorm.weight": "model-00012-of-00014.safetensors",
"model.layers.43.mlp.down_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.43.mlp.gate_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.43.mlp.up_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.43.post_attention_layernorm.weight": "model-00014-of-00014.safetensors",
"model.layers.43.self_attn.k_norm.weight": "model-00010-of-00014.safetensors",
"model.layers.43.self_attn.k_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.43.self_attn.o_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.43.self_attn.q_norm.weight": "model-00005-of-00014.safetensors",
"model.layers.43.self_attn.q_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.43.self_attn.v_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.44.input_layernorm.weight": "model-00006-of-00014.safetensors",
"model.layers.44.mlp.down_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.44.mlp.gate_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.44.mlp.up_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.44.post_attention_layernorm.weight": "model-00002-of-00014.safetensors",
"model.layers.44.self_attn.k_norm.weight": "model-00008-of-00014.safetensors",
"model.layers.44.self_attn.k_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.44.self_attn.o_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.44.self_attn.q_norm.weight": "model-00002-of-00014.safetensors",
"model.layers.44.self_attn.q_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.44.self_attn.v_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.45.input_layernorm.weight": "model-00011-of-00014.safetensors",
"model.layers.45.mlp.down_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.45.mlp.gate_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.45.mlp.up_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.45.post_attention_layernorm.weight": "model-00006-of-00014.safetensors",
"model.layers.45.self_attn.k_norm.weight": "model-00009-of-00014.safetensors",
"model.layers.45.self_attn.k_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.45.self_attn.o_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.45.self_attn.q_norm.weight": "model-00012-of-00014.safetensors",
"model.layers.45.self_attn.q_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.45.self_attn.v_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.46.input_layernorm.weight": "model-00004-of-00014.safetensors",
"model.layers.46.mlp.down_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.46.mlp.gate_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.46.mlp.up_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.46.post_attention_layernorm.weight": "model-00012-of-00014.safetensors",
"model.layers.46.self_attn.k_norm.weight": "model-00012-of-00014.safetensors",
"model.layers.46.self_attn.k_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.46.self_attn.o_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.46.self_attn.q_norm.weight": "model-00013-of-00014.safetensors",
"model.layers.46.self_attn.q_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.46.self_attn.v_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.47.input_layernorm.weight": "model-00004-of-00014.safetensors",
"model.layers.47.mlp.down_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.47.mlp.gate_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.47.mlp.up_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.47.post_attention_layernorm.weight": "model-00010-of-00014.safetensors",
"model.layers.47.self_attn.k_norm.weight": "model-00012-of-00014.safetensors",
"model.layers.47.self_attn.k_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.47.self_attn.o_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.47.self_attn.q_norm.weight": "model-00006-of-00014.safetensors",
"model.layers.47.self_attn.q_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.47.self_attn.v_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.48.input_layernorm.weight": "model-00005-of-00014.safetensors",
"model.layers.48.mlp.down_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.48.mlp.gate_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.48.mlp.up_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.48.post_attention_layernorm.weight": "model-00011-of-00014.safetensors",
"model.layers.48.self_attn.k_norm.weight": "model-00012-of-00014.safetensors",
"model.layers.48.self_attn.k_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.48.self_attn.o_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.48.self_attn.q_norm.weight": "model-00006-of-00014.safetensors",
"model.layers.48.self_attn.q_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.48.self_attn.v_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.49.input_layernorm.weight": "model-00011-of-00014.safetensors",
"model.layers.49.mlp.down_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.49.mlp.gate_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.49.mlp.up_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.49.post_attention_layernorm.weight": "model-00002-of-00014.safetensors",
"model.layers.49.self_attn.k_norm.weight": "model-00006-of-00014.safetensors",
"model.layers.49.self_attn.k_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.49.self_attn.o_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.49.self_attn.q_norm.weight": "model-00006-of-00014.safetensors",
"model.layers.49.self_attn.q_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.49.self_attn.v_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.5.input_layernorm.weight": "model-00007-of-00014.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00006-of-00014.safetensors",
"model.layers.5.self_attn.k_norm.weight": "model-00012-of-00014.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.5.self_attn.q_norm.weight": "model-00002-of-00014.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.50.input_layernorm.weight": "model-00004-of-00014.safetensors",
"model.layers.50.mlp.down_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.50.mlp.gate_proj.weight": "model-00014-of-00014.safetensors",
"model.layers.50.mlp.up_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.50.post_attention_layernorm.weight": "model-00012-of-00014.safetensors",
"model.layers.50.self_attn.k_norm.weight": "model-00010-of-00014.safetensors",
"model.layers.50.self_attn.k_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.50.self_attn.o_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.50.self_attn.q_norm.weight": "model-00007-of-00014.safetensors",
"model.layers.50.self_attn.q_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.50.self_attn.v_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.51.input_layernorm.weight": "model-00004-of-00014.safetensors",
"model.layers.51.mlp.down_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.51.mlp.gate_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.51.mlp.up_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.51.post_attention_layernorm.weight": "model-00001-of-00014.safetensors",
"model.layers.51.self_attn.k_norm.weight": "model-00006-of-00014.safetensors",
"model.layers.51.self_attn.k_proj.weight": "model-00014-of-00014.safetensors",
"model.layers.51.self_attn.o_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.51.self_attn.q_norm.weight": "model-00001-of-00014.safetensors",
"model.layers.51.self_attn.q_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.51.self_attn.v_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.52.input_layernorm.weight": "model-00003-of-00014.safetensors",
"model.layers.52.mlp.down_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.52.mlp.gate_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.52.mlp.up_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.52.post_attention_layernorm.weight": "model-00012-of-00014.safetensors",
"model.layers.52.self_attn.k_norm.weight": "model-00005-of-00014.safetensors",
"model.layers.52.self_attn.k_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.52.self_attn.o_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.52.self_attn.q_norm.weight": "model-00010-of-00014.safetensors",
"model.layers.52.self_attn.q_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.52.self_attn.v_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.53.input_layernorm.weight": "model-00001-of-00014.safetensors",
"model.layers.53.mlp.down_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.53.mlp.gate_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.53.mlp.up_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.53.post_attention_layernorm.weight": "model-00013-of-00014.safetensors",
"model.layers.53.self_attn.k_norm.weight": "model-00008-of-00014.safetensors",
"model.layers.53.self_attn.k_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.53.self_attn.o_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.53.self_attn.q_norm.weight": "model-00012-of-00014.safetensors",
"model.layers.53.self_attn.q_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.53.self_attn.v_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.54.input_layernorm.weight": "model-00013-of-00014.safetensors",
"model.layers.54.mlp.down_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.54.mlp.gate_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.54.mlp.up_proj.weight": "model-00014-of-00014.safetensors",
"model.layers.54.post_attention_layernorm.weight": "model-00009-of-00014.safetensors",
"model.layers.54.self_attn.k_norm.weight": "model-00006-of-00014.safetensors",
"model.layers.54.self_attn.k_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.54.self_attn.o_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.54.self_attn.q_norm.weight": "model-00004-of-00014.safetensors",
"model.layers.54.self_attn.q_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.54.self_attn.v_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.55.input_layernorm.weight": "model-00004-of-00014.safetensors",
"model.layers.55.mlp.down_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.55.mlp.gate_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.55.mlp.up_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.55.post_attention_layernorm.weight": "model-00009-of-00014.safetensors",
"model.layers.55.self_attn.k_norm.weight": "model-00012-of-00014.safetensors",
"model.layers.55.self_attn.k_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.55.self_attn.o_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.55.self_attn.q_norm.weight": "model-00005-of-00014.safetensors",
"model.layers.55.self_attn.q_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.55.self_attn.v_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.56.input_layernorm.weight": "model-00009-of-00014.safetensors",
"model.layers.56.mlp.down_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.56.mlp.gate_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.56.mlp.up_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.56.post_attention_layernorm.weight": "model-00007-of-00014.safetensors",
"model.layers.56.self_attn.k_norm.weight": "model-00001-of-00014.safetensors",
"model.layers.56.self_attn.k_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.56.self_attn.o_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.56.self_attn.q_norm.weight": "model-00012-of-00014.safetensors",
"model.layers.56.self_attn.q_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.56.self_attn.v_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.57.input_layernorm.weight": "model-00006-of-00014.safetensors",
"model.layers.57.mlp.down_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.57.mlp.gate_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.57.mlp.up_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.57.post_attention_layernorm.weight": "model-00012-of-00014.safetensors",
"model.layers.57.self_attn.k_norm.weight": "model-00004-of-00014.safetensors",
"model.layers.57.self_attn.k_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.57.self_attn.o_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.57.self_attn.q_norm.weight": "model-00004-of-00014.safetensors",
"model.layers.57.self_attn.q_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.57.self_attn.v_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.58.input_layernorm.weight": "model-00013-of-00014.safetensors",
"model.layers.58.mlp.down_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.58.mlp.gate_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.58.mlp.up_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.58.post_attention_layernorm.weight": "model-00010-of-00014.safetensors",
"model.layers.58.self_attn.k_norm.weight": "model-00009-of-00014.safetensors",
"model.layers.58.self_attn.k_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.58.self_attn.o_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.58.self_attn.q_norm.weight": "model-00001-of-00014.safetensors",
"model.layers.58.self_attn.q_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.58.self_attn.v_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.59.input_layernorm.weight": "model-00006-of-00014.safetensors",
"model.layers.59.mlp.down_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.59.mlp.gate_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.59.mlp.up_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.59.post_attention_layernorm.weight": "model-00012-of-00014.safetensors",
"model.layers.59.self_attn.k_norm.weight": "model-00005-of-00014.safetensors",
"model.layers.59.self_attn.k_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.59.self_attn.o_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.59.self_attn.q_norm.weight": "model-00007-of-00014.safetensors",
"model.layers.59.self_attn.q_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.59.self_attn.v_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.6.input_layernorm.weight": "model-00004-of-00014.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00005-of-00014.safetensors",
"model.layers.6.self_attn.k_norm.weight": "model-00006-of-00014.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.6.self_attn.q_norm.weight": "model-00002-of-00014.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.60.input_layernorm.weight": "model-00001-of-00014.safetensors",
"model.layers.60.mlp.down_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.60.mlp.gate_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.60.mlp.up_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.60.post_attention_layernorm.weight": "model-00008-of-00014.safetensors",
"model.layers.60.self_attn.k_norm.weight": "model-00008-of-00014.safetensors",
"model.layers.60.self_attn.k_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.60.self_attn.o_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.60.self_attn.q_norm.weight": "model-00008-of-00014.safetensors",
"model.layers.60.self_attn.q_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.60.self_attn.v_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.61.input_layernorm.weight": "model-00006-of-00014.safetensors",
"model.layers.61.mlp.down_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.61.mlp.gate_proj.weight": "model-00007-of-00014.safetensors",
"model.layers.61.mlp.up_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.61.post_attention_layernorm.weight": "model-00001-of-00014.safetensors",
"model.layers.61.self_attn.k_norm.weight": "model-00003-of-00014.safetensors",
"model.layers.61.self_attn.k_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.61.self_attn.o_proj.weight": "model-00010-of-00014.safetensors",
"model.layers.61.self_attn.q_norm.weight": "model-00003-of-00014.safetensors",
"model.layers.61.self_attn.q_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.61.self_attn.v_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.62.input_layernorm.weight": "model-00003-of-00014.safetensors",
"model.layers.62.mlp.down_proj.weight": "model-00014-of-00014.safetensors",
"model.layers.62.mlp.gate_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.62.mlp.up_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.62.post_attention_layernorm.weight": "model-00011-of-00014.safetensors",
"model.layers.62.self_attn.k_norm.weight": "model-00004-of-00014.safetensors",
"model.layers.62.self_attn.k_proj.weight": "model-00006-of-00014.safetensors",
"model.layers.62.self_attn.o_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.62.self_attn.q_norm.weight": "model-00009-of-00014.safetensors",
"model.layers.62.self_attn.q_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.62.self_attn.v_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.63.input_layernorm.weight": "model-00013-of-00014.safetensors",
"model.layers.63.mlp.down_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.63.mlp.gate_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.63.mlp.up_proj.weight": "model-00011-of-00014.safetensors",
"model.layers.63.post_attention_layernorm.weight": "model-00012-of-00014.safetensors",
"model.layers.63.self_attn.k_norm.weight": "model-00008-of-00014.safetensors",
"model.layers.63.self_attn.k_proj.weight": "model-00005-of-00014.safetensors",
"model.layers.63.self_attn.o_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.63.self_attn.q_norm.weight": "model-00013-of-00014.safetensors",
"model.layers.63.self_attn.q_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.63.self_attn.v_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.7.input_layernorm.weight": "model-00001-of-00014.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00013-of-00014.safetensors",
"model.layers.7.self_attn.k_norm.weight": "model-00013-of-00014.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.7.self_attn.q_norm.weight": "model-00007-of-00014.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00003-of-00014.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00014-of-00014.safetensors",
"model.layers.8.input_layernorm.weight": "model-00012-of-00014.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00013-of-00014.safetensors",
"model.layers.8.self_attn.k_norm.weight": "model-00007-of-00014.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00014.safetensors",
"model.layers.8.self_attn.q_norm.weight": "model-00002-of-00014.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00002-of-00014.safetensors",
"model.layers.9.input_layernorm.weight": "model-00012-of-00014.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00004-of-00014.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00013-of-00014.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00009-of-00014.safetensors",
"model.layers.9.self_attn.k_norm.weight": "model-00001-of-00014.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00009-of-00014.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00008-of-00014.safetensors",
"model.layers.9.self_attn.q_norm.weight": "model-00013-of-00014.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00012-of-00014.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00001-of-00014.safetensors",
"model.norm.weight": "model-00001-of-00014.safetensors"
}
}

31
special_tokens_map.json Normal file
View File

@@ -0,0 +1,31 @@
{
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"eos_token": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

3
tokenizer.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
size 11422654

241
tokenizer_config.json Normal file
View File

@@ -0,0 +1,241 @@
{
"add_bos_token": false,
"add_prefix_space": false,
"added_tokens_decoder": {
"151643": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151644": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151645": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151646": {
"content": "<|object_ref_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151647": {
"content": "<|object_ref_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151648": {
"content": "<|box_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151649": {
"content": "<|box_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151650": {
"content": "<|quad_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151651": {
"content": "<|quad_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151652": {
"content": "<|vision_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151653": {
"content": "<|vision_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151654": {
"content": "<|vision_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151655": {
"content": "<|image_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151656": {
"content": "<|video_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151657": {
"content": "<tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151658": {
"content": "</tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151659": {
"content": "<|fim_prefix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151660": {
"content": "<|fim_middle|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151661": {
"content": "<|fim_suffix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151662": {
"content": "<|fim_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151663": {
"content": "<|repo_name|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151664": {
"content": "<|file_sep|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151665": {
"content": "<tool_response>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151666": {
"content": "</tool_response>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151667": {
"content": "<think>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151668": {
"content": "</think>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
}
},
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"bos_token": null,
"chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {{- messages[0].content + '\\n\\n' }}\n {%- endif %}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n {%- set index = (messages|length - 1) - loop.index0 %}\n {%- if ns.multi_step_tool and message.role == \"user\" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}\n {%- set ns.multi_step_tool = false %}\n {%- set ns.last_query_index = index %}\n {%- endif %}\n{%- endfor %}\n{%- for message in messages %}\n {%- if message.content is string %}\n {%- set content = message.content %}\n {%- else %}\n {%- set content = '' %}\n {%- endif %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {%- set reasoning_content = '' %}\n {%- if message.reasoning_content is string %}\n {%- set reasoning_content = message.reasoning_content %}\n {%- else %}\n {%- if '</think>' in content %}\n {%- set reasoning_content = content.split('</think>')[0].rstrip('\\n').split('<think>')[-1].lstrip('\\n') %}\n {%- set content = content.split('</think>')[-1].lstrip('\\n') %}\n {%- endif %}\n {%- endif %}\n {%- if loop.index0 > ns.last_query_index %}\n {%- if loop.last or (not loop.last and reasoning_content) %}\n {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content.strip('\\n') + '\\n</think>\\n\\n' + content.lstrip('\\n') }}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}\n {{- '\\n' }}\n {%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n {%- if enable_thinking is defined and enable_thinking is false %}\n {{- '<think>\\n\\n</think>\\n\\n' }}\n {%- endif %}\n{%- endif %}",
"clean_up_tokenization_spaces": false,
"eos_token": "<|im_end|>",
"errors": "replace",
"extra_special_tokens": {},
"model_max_length": 131072,
"pad_token": "<|endoftext|>",
"padding_side": "right",
"split_special_tokens": false,
"tokenizer_class": "Qwen2Tokenizer",
"unk_token": null
}

1
vocab.json Normal file

File diff suppressed because one or more lines are too long