初始化项目,由ModelHub XC社区提供模型

Model: lm-provers/QED-Nano-SFT
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-17 21:02:39 +08:00
commit abbc87fa4d
15 changed files with 152552 additions and 0 deletions

37
.gitattributes vendored Normal file
View File

@@ -0,0 +1,37 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text
logo.png filter=lfs diff=lfs merge=lfs -text

243
README.md Normal file
View File

@@ -0,0 +1,243 @@
---
license: apache-2.0
base_model: Qwen/Qwen3-4B-Thinking-2507
tags:
- math
- reasoning
- olympiad
- proof-generation
- sft
- distillation
library_name: transformers
pipeline_tag: text-generation
datasets:
- lm-provers/FineProofs-SFT
---
# QED-Nano SFT
![logo.png](https://huggingface.co/lm-provers/QED-Nano-SFT/resolve/main/logo.png)
## Table of Contents
1. [Model Summary](#model-summary)
2. [How to use](#how-to-use)
3. [Evaluation](#evaluation)
4. [Training](#training)
5. [Limitations](#limitations)
6. [License](#license)
## Model Summary
**QED-Nano SFT** is a 4B parameter mathematical reasoning model fine-tuned on olympiad-level proof problems. This is the supervised fine-tuning (SFT) checkpoint trained via knowledge distillation from [deepseek-ai/DeepSeek-Math-V2](https://huggingface.co/deepseek-ai/DeepSeek-Math-V2) (685B parameters). The model generates mathematical proofs with explicit chain-of-thought reasoning enclosed in `<think>` tags.
This model serves as the foundation for the full QED-Nano pipeline, which includes subsequent reinforcement learning and reasoning cache extensions. For the complete trained model, see [lm-provers/QED-Nano](https://huggingface.co/lm-provers/QED-Nano).
### Key features
- Reasoning model optimized for **theorem proving**
- **Supervised fine-tuning checkpoint**: Foundation for the full QED-Nano model (SFT + RL + Reasoning Cache)
- **Distilled from DeepSeek-Math-V2 (685B)**: Knowledge distillation to 4B parameters
- **Fully open**: open weights + full training details including public data mixture and training configs
For more details refer to our blog post: https://huggingface.co/spaces/lm-provers/qed-nano-blogpost
## How to use
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "lm-provers/QED-Nano-SFT"
device = "cuda" # for GPU usage or "cpu" for CPU usage
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
).to(device)
# prepare the model input
prompt = "Generate a rigorous proof to the following question: is \sqrt{2} rational or irrational?"
messages_think = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages_think,
tokenize=False,
add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# Generate the output
generated_ids = model.generate(**model_inputs, max_new_tokens=32768)
# Get and decode the output
output_ids = generated_ids[0][len(model_inputs.input_ids[0]) :]
print(tokenizer.decode(output_ids, skip_special_tokens=True))
```
>[!TIP]
> We recommend setting `temperature=0.6` and `top_p=0.95` in the sampling parameters.
### vLLM and SGLang
You can use vLLM and SGLang to deploy the model in an API compatible with OpenAI format.
#### SGLang
```bash
python -m sglang.launch_server --model-path lm-provers/QED-Nano-SFT
```
#### vLLM
```bash
vllm serve lm-provers/QED-Nano-SFT
```
## Evaluation
In this section, we report the evaluation results of QED-Nano on IMO-ProofBench, ProofBench, and IMO-AnswerBench. All evaluations except those on IMO-AnswerBench are reported as avg@3 unless stated otherwise.
| Model | IMO-ProofBench | ProofBench | IMO-AnswerBench |
|:---|:---:|:---:|:---:|
| Qwen3-4B-Thinking-2507 | 20.4 (2.6) | 19.5 (0.9) | 55.8 |
| **QED-Nano-SFT** | **39.5 (2.9)** | **33.3 (0.5)** | **57.5** |
| **QED-Nano** | **40.0 (0.6)** | **44.9 (3.4)** | **67.5** |
| **QED-Nano (Agent)** | **54.0 (3.7)** | **54.4 (2.4)** | **-** |
| Qwen3-30B-A3B-Thinking-2507 | 27.6 (1.0) | 26.1 (2.4) | 67.0 |
| Qwen3-235B-A22B-Thinking-2507 | 34.1 (0.7) | 33.7 (1.1) | 70.5 |
| Nomos-1 | 40.3 (3.5) | 28.3 (3.9) | 49.0 |
| GPT-OSS-20B | 38.3 (1.2) | 38.4 (3.9) | 61.5 |
| GPT-OSS-120B | 43.1 (3.2) | 47.5 (1.7) | 70.5 |
| DeepSeek-Math-V2 | 57.9 (2.0) | 60.6 (0.1) | 75.8 |
| Gemini 3 Pro | 58.7 (2.9) | 66.7 (3.1) | 83.2 |
**Note**: This is the SFT-only checkpoint. Performance improves significantly with RL training and reasoning cache. See the full [lm-provers/QED-Nano](https://huggingface.co/lm-provers/QED-Nano) model for complete results.
## Training
### Model
- **Architecture:** Transformer decoder
- **Precision:** bfloat16
- **Base Model**: [Qwen/Qwen3-4B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507)
- **Parameters**: 4 billion
- **Training Data**: [lm-provers/FineProofs-SFT](https://huggingface.co/datasets/lm-provers/FineProofs-SFT) (4,300 unique problems)
- **Teacher Model**: [deepseek-ai/DeepSeek-Math-V2](https://huggingface.co/deepseek-ai/DeepSeek-Math-V2) (685B)
### Training Hyperparameters
- **Optimization Steps**: 620
- **Global Batch Size**: 32
- **Learning Rate Schedule**: Cosine with peak at 3×10⁻⁵
- **Max Sequence Length**: 45,056 tokens
- **Training Configuration**: Unique problems only (4.3k samples outperformed 7.7k with duplicates)
### Software & hardware
- **GPUs:** 1 node with 8 H100
- **Training time:** 5 hours
- **Training Framework:** [TRL (Transformer Reinforcement Learning)](https://github.com/huggingface/trl)
- **Evaluation framework:** [CMU-AIRe/QED-Nano](https://github.com/CMU-AIRe/QED-Nano)
### Training script
You can reproduce this training using the TRL library:
```python
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer
from trl import SFTConfig, SFTTrainer
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3-4B-Thinking-2507",
torch_dtype="auto",
attn_implementation="kernels-community/vllm-flash-attn3",
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B-Thinking-2507")
# Load dataset
dataset = load_dataset("lm-provers/Olympiads-SFT", split="train")
# Training configuration
training_args = SFTConfig(
output_dir="./qed-nano-sft",
max_steps=620,
per_device_train_batch_size=2,
gradient_accumulation_steps=2,
learning_rate=3.0e-5,
lr_scheduler_type="cosine",
warmup_ratio=0.03,
max_length=45056,
bf16=True,
gradient_checkpointing=True,
logging_steps=1,
save_steps=62,
push_to_hub=True,
)
# Train
trainer = SFTTrainer(model=model, tokenizer=tokenizer, args=training_args, train_dataset=dataset)
trainer.train()
```
## Limitations
QED-Nano SFT is a supervised fine-tuning checkpoint with several important limitations:
### Critical Limitations
1. **Length Explosion**: This SFT-only model suffers from uncontrolled reasoning length growth. It frequently generates responses exceeding 100k tokens, often meandering or repeating steps without converging to a solution. This is a known limitation of pure supervised learning on long-form reasoning tasks.
2. **Correctness Issues**: While the model produces fluent mathematical text, it may generate incorrect proofs. The SFT stage does not directly optimize for correctness—only for imitating the teacher's reasoning style.
3. **No Self-Correction**: Unlike the full QED-Nano model with RL training, this checkpoint lacks the ability to verify and refine its solutions.
### Other Limitations
- **Domain-specific**: Designed specifically for theorem proving. Using as a general assistant will likely produce nonsense outside of this domain.
- **Language**: English only
- **Text-Only**: Cannot handle problems with diagrams or visual elements
- **Teacher Errors**: May have inherited some errors from the teacher model (DeepSeek-Math-V2)
### Recommended Mitigations
- Use the full [lm-provers/QED-Nano](https://huggingface.co/lm-provers/QED-Nano) model for production use
- Apply length penalties or early stopping to control generation length
- Combine with external verification tools or judges
- Consider RL fine-tuning to address correctness and length issues
These models should be used as assistive tools rather than definitive sources of information. Users should always verify important information and critically evaluate any generated content.
## License
[Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
## Citation
If you use this model in your research, please cite:
```bibtex
@model{qed_nano_sft_2026,
title={QED-Nano SFT: Distilling Olympiad Mathematical Reasoning to 4B Parameters},
author={Edward Beeching and Jasper Dekoninck and Jia Li and Yuxiao Qu and Amrith Setlur and Ian Wu and Aviral Kumar and Lewis Tunstall},
year={2026},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/lm-provers/QED-Nano-SFT}}
}
```
Please also cite the training dataset:
```bibtex
@dataset{fineproofs_sft_dataset_2026,
title={FineProofs SFT Dataset: Mathematical Olympiad Problems with Chain-of-Thought Reasoning},
author={Edward Beeching and Jasper Dekoninck and Jia Li and Yuxiao Qu and Amrith Setlur and Ian Wu and Aviral Kumar and Lewis Tunstall},
year={2026},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/datasets/lm-provers/FineProofs-SFT}}
}
```

28
added_tokens.json Normal file
View File

@@ -0,0 +1,28 @@
{
"</think>": 151668,
"</tool_call>": 151658,
"</tool_response>": 151666,
"<think>": 151667,
"<tool_call>": 151657,
"<tool_response>": 151665,
"<|box_end|>": 151649,
"<|box_start|>": 151648,
"<|endoftext|>": 151643,
"<|file_sep|>": 151664,
"<|fim_middle|>": 151660,
"<|fim_pad|>": 151662,
"<|fim_prefix|>": 151659,
"<|fim_suffix|>": 151661,
"<|im_end|>": 151645,
"<|im_start|>": 151644,
"<|image_pad|>": 151655,
"<|object_ref_end|>": 151647,
"<|object_ref_start|>": 151646,
"<|quad_end|>": 151651,
"<|quad_start|>": 151650,
"<|repo_name|>": 151663,
"<|video_pad|>": 151656,
"<|vision_end|>": 151653,
"<|vision_pad|>": 151654,
"<|vision_start|>": 151652
}

86
chat_template.jinja Normal file
View File

@@ -0,0 +1,86 @@
{%- if tools %}
{{- '<|im_start|>system\n' }}
{%- if messages[0].role == 'system' %}
{{- messages[0].content + '\n\n' }}
{%- endif %}
{{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
{%- for tool in tools %}
{{- "\n" }}
{{- tool | tojson }}
{%- endfor %}
{{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
{%- else %}
{%- if messages[0].role == 'system' %}
{{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
{%- endif %}
{%- endif %}
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
{%- for message in messages[::-1] %}
{%- set index = (messages|length - 1) - loop.index0 %}
{%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
{%- set ns.multi_step_tool = false %}
{%- set ns.last_query_index = index %}
{%- endif %}
{%- endfor %}
{%- for message in messages %}
{%- if message.content is string %}
{%- set content = message.content %}
{%- else %}
{%- set content = '' %}
{%- endif %}
{%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
{{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
{%- elif message.role == "assistant" %}
{%- set reasoning_content = '' %}
{%- if message.reasoning_content is string %}
{%- set reasoning_content = message.reasoning_content %}
{%- else %}
{%- if '</think>' in content %}
{%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
{%- set content = content.split('</think>')[-1].lstrip('\n') %}
{%- endif %}
{%- endif %}
{%- if loop.index0 > ns.last_query_index %}
{%- if loop.last or (not loop.last and reasoning_content) %}
{{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
{%- else %}
{{- '<|im_start|>' + message.role + '\n' + content }}
{%- endif %}
{%- else %}
{{- '<|im_start|>' + message.role + '\n' + content }}
{%- endif %}
{%- if message.tool_calls %}
{%- for tool_call in message.tool_calls %}
{%- if (loop.first and content) or (not loop.first) %}
{{- '\n' }}
{%- endif %}
{%- if tool_call.function %}
{%- set tool_call = tool_call.function %}
{%- endif %}
{{- '<tool_call>\n{"name": "' }}
{{- tool_call.name }}
{{- '", "arguments": ' }}
{%- if tool_call.arguments is string %}
{{- tool_call.arguments }}
{%- else %}
{{- tool_call.arguments | tojson }}
{%- endif %}
{{- '}\n</tool_call>' }}
{%- endfor %}
{%- endif %}
{{- '<|im_end|>\n' }}
{%- elif message.role == "tool" %}
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
{{- '<|im_start|>user' }}
{%- endif %}
{{- '\n<tool_response>\n' }}
{{- content }}
{{- '\n</tool_response>' }}
{%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
{{- '<|im_end|>\n' }}
{%- endif %}
{%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
{{- '<|im_start|>assistant\n<think>\n' }}
{%- endif %}

68
config.json Normal file
View File

@@ -0,0 +1,68 @@
{
"architectures": [
"Qwen3ForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 151643,
"dtype": "bfloat16",
"eos_token_id": 151645,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 2560,
"initializer_range": 0.02,
"intermediate_size": 9728,
"layer_types": [
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention"
],
"max_position_embeddings": 262144,
"max_window_layers": 36,
"model_type": "qwen3",
"num_attention_heads": 32,
"num_hidden_layers": 36,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 5000000,
"sliding_window": null,
"tie_word_embeddings": true,
"transformers_version": "4.57.3",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 151936
}

13
generation_config.json Normal file
View File

@@ -0,0 +1,13 @@
{
"bos_token_id": 151643,
"do_sample": true,
"eos_token_id": [
151645,
151643
],
"pad_token_id": 151643,
"temperature": 0.6,
"top_k": 20,
"top_p": 0.95,
"transformers_version": "4.57.3"
}

3
logo.png Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:53db7ff09e16ec14c3633d509844f1b52b751537d78fb49a48e90ddd3cdc69c1
size 260424

151388
merges.txt Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e9195595455ef710ea358842bcb76e078bf159a8f3575f718679c13e4c651e84
size 4967215360

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:261fabfc919699eb2f4f7ab20abc3b5cdd51097382fd7d47d35bad4a3a91477d
size 3077766632

View File

@@ -0,0 +1,406 @@
{
"metadata": {
"total_parameters": 4022468096,
"total_size": 8044936192
},
"weight_map": {
"model.embed_tokens.weight": "model-00001-of-00002.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.0.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.0.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.1.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.1.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.10.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.10.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.11.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.11.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.12.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.12.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.13.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.13.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.14.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.14.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.15.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.15.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.16.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.16.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.17.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.17.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.17.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.18.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.18.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.19.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.19.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.19.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.2.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.2.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.20.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.20.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.20.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.21.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.21.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.21.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.22.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.22.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.22.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.23.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.23.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.23.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.24.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.24.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.24.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.25.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.25.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.25.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.26.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.26.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.26.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.27.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.27.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.27.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.28.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.28.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.28.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.29.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.29.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.29.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.3.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.3.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.30.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.30.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.30.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.31.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.31.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.31.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.32.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.32.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.32.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.32.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.32.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.32.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.32.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.32.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.32.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.32.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.32.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.33.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.33.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.33.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.33.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.33.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.33.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.33.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.33.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.33.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.33.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.33.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.34.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.34.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.34.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.34.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.34.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.34.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.34.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.34.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.34.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.34.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.34.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.35.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.35.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.35.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.35.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.35.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.35.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.35.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.35.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.35.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
"model.layers.35.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.35.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.4.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.4.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.5.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.5.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.6.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.6.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.7.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.7.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.8.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.8.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.9.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.9.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.norm.weight": "model-00002-of-00002.safetensors"
}
}

31
special_tokens_map.json Normal file
View File

@@ -0,0 +1,31 @@
{
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"eos_token": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

BIN
tokenizer.json (Stored with Git LFS) Normal file

Binary file not shown.

239
tokenizer_config.json Normal file
View File

@@ -0,0 +1,239 @@
{
"add_bos_token": false,
"add_prefix_space": false,
"added_tokens_decoder": {
"151643": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151644": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151645": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151646": {
"content": "<|object_ref_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151647": {
"content": "<|object_ref_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151648": {
"content": "<|box_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151649": {
"content": "<|box_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151650": {
"content": "<|quad_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151651": {
"content": "<|quad_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151652": {
"content": "<|vision_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151653": {
"content": "<|vision_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151654": {
"content": "<|vision_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151655": {
"content": "<|image_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151656": {
"content": "<|video_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151657": {
"content": "<tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151658": {
"content": "</tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151659": {
"content": "<|fim_prefix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151660": {
"content": "<|fim_middle|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151661": {
"content": "<|fim_suffix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151662": {
"content": "<|fim_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151663": {
"content": "<|repo_name|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151664": {
"content": "<|file_sep|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151665": {
"content": "<tool_response>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151666": {
"content": "</tool_response>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151667": {
"content": "<think>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151668": {
"content": "</think>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
}
},
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"bos_token": null,
"clean_up_tokenization_spaces": false,
"eos_token": "<|im_end|>",
"errors": "replace",
"extra_special_tokens": {},
"model_max_length": 262144,
"pad_token": "<|endoftext|>",
"split_special_tokens": false,
"tokenizer_class": "Qwen2Tokenizer",
"unk_token": null
}

1
vocab.json Normal file

File diff suppressed because one or more lines are too long