初始化项目,由ModelHub XC社区提供模型

Model: miromind-ai/MiroThinker-14B-DPO-v0.2
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-04-11 12:32:56 +08:00
commit b916f7f5e7
19 changed files with 1004 additions and 0 deletions

51
.gitattributes vendored Normal file
View File

@@ -0,0 +1,51 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bin.* filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zstandard filter=lfs diff=lfs merge=lfs -text
*.tfevents* filter=lfs diff=lfs merge=lfs -text
*.db* filter=lfs diff=lfs merge=lfs -text
*.ark* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.gguf* filter=lfs diff=lfs merge=lfs -text
*.ggml filter=lfs diff=lfs merge=lfs -text
*.llamafile* filter=lfs diff=lfs merge=lfs -text
*.pt2 filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
merges.txt filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text
vocab.json filter=lfs diff=lfs merge=lfs -text

105
README.md Normal file
View File

@@ -0,0 +1,105 @@
---
library_name: transformers
pipeline_tag: text-generation
license: apache-2.0
language:
- en
base_model:
- miromind-ai/MiroThinker-14B-SFT-v0.2
tags:
- agent
- open-source
- miromind
new_version: miromind-ai/MiroThinker-v1.0-30B
---
<div align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/68525b342230a897a65cc1c0/87mYQ_a-4jpnMkVR4hrgm.png" width="55%" alt="MiroThinker" />
</div>
<!-- <hr> -->
<div align="center">
[![Demo](https://img.shields.io/badge/Demo-FFB300?style=for-the-badge&logo=airplayvideo&logoColor=white)](https://dr.miromind.ai/)
[![Models](https://img.shields.io/badge/Models-5EDDD2?style=for-the-badge&logo=huggingface&logoColor=ffffff&labelColor)](https://huggingface.co/collections/miromind-ai/mirothinker-v02-68af084a18035f57b17cd902)
[![Data](https://img.shields.io/badge/Data-0040A1?style=for-the-badge&logo=huggingface&logoColor=ffffff&labelColor)](https://huggingface.co/datasets/miromind-ai/MiroVerse-v0.1)
[![Blog](https://img.shields.io/badge/Blog-4285F4?style=for-the-badge&logo=google-chrome&logoColor=white)](https://miromind.ai/blog/miromind-research-agent)
[![Github](https://img.shields.io/badge/GitHub-24292F?style=for-the-badge&logo=github&logoColor=white)](https://github.com/MiroMindAI/MiroThinker)
[![Discord](https://img.shields.io/badge/Discord-5865F2?style=for-the-badge&logo=discord&logoColor=white)](https://discord.com/invite/GPqEnkzQZd)
[![WeChat](https://img.shields.io/badge/WeChat-07C160?style=for-the-badge&logo=wechat&logoColor=white)](https://huggingface.co/datasets/miromind-ai/MiroFlow-Benchmarks/resolve/main/assets/wechat.png)
[![RedNote](https://img.shields.io/badge/RedNote-FF2442?style=for-the-badge&logo=revoltdotchat&logoColor=white)](https://www.xiaohongshu.com/user/profile/5e353bd80000000001000239)
[![Website](https://img.shields.io/badge/Website-4285F4?style=for-the-badge&logo=monster&logoColor=white)](https://miromind.ai/)
</div>
## Introduction
MiroThinker is an open-source agentic model series. Designed as a research agent for complex, long-horizon problem solving, it integrates strong capabilities in task decomposition, multi-hop reasoning, retrieval-augmented generation, code execution, web browsing, and document/file processing, enabling a wide range of real-world applications.
In MiroThinker-v0.2, we introduced three key improvements:
- **Richer training data** from both English and Chinese sources, yielding significant gains in benchmark performance and generalization.
- **Unified DPO training** with a single preference dataset across all models.
- **Extended context length** from 40k to 64k for more challenging multi-turn tool-use tasks.
Compared to v0.1, MiroThinker-v0.2 delivers consistent gains across benchmarks. For example, scores improved from **57.3 → 64.1** on **GAIA-Text-103** and from **17.0 → 29.4** on **BrowseComp-ZH**, reflecting substantial advancements in the models general research agent capabilities.
<div>
<img src="https://huggingface.co/datasets/miromind-ai/MiroFlow-Benchmarks/resolve/main/assets/MiroThinker_v0.2_Performance_2.png" width="100%" alt="MiroThinker" />
</div>
## Online Demo
Welcome to try out our online demo [here](https://dr.miromind.ai/).
## Performance
> [!IMPORTANT]
> <div>
> To prevent data leakage during searches, we block Hugging Face domains to ensure the model doesn't access answers through shortcuts.
> </div>
### Comparison with SOTA Research Agents
<div>
<img src="https://huggingface.co/datasets/miromind-ai/MiroFlow-Benchmarks/resolve/main/assets/MiroThinker_v0.2_Performance_0.png" width="100%" alt="MiroThinker" />
</div>
### GAIA Benchmark
<div>
<img src="https://huggingface.co/datasets/miromind-ai/MiroFlow-Benchmarks/resolve/main/assets/MiroThinker_v0.2_Performance_1.png" width="100%" alt="MiroThinker" />
</div>
## Quick Start
MiroThinker-v0.2 is trained on our large-scale, high-quality trajectory and preference datasets MiroVerse-v0.2, utilizing the efficient training framework [MiroTrain](https://github.com/MiroMindAI/MiroTrain), and enhanced with tool-use capabilities through our agentic framework [MiroFlow](https://github.com/MiroMindAI/MiroFlow).
To promote reproducibility and benefit the community, we decided to open-source the entire suite mentioned above. For more technical details, evaluation results, and usage tutorials, please visit our [GitHub repository](https://github.com/MiroMindAI/MiroThinker).
## License
MiroThinker-v0.2 is licensed under Apache 2.0.
## Citation
If you find this project useful in your research, please consider citing:
```
@article{miromind2025mirothinker,
title={MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling},
author={MiroMind Team and Bai, Song and Bing, Lidong and Chen, Carson and Chen, Guanzheng and Chen, Yuntao and Chen, Zhe and Chen, Ziyi and Dai, Jifeng and Dong, Xuan and others},
journal={arXiv preprint arXiv:2511.11793},
year={2025}
}
```
## Contact Us
MiroThinker is developed by the MiroMind Foundation Model Team.
If you would like to leave us a message, feel free to get in touch.
In addition to [GitHub](https://github.com/MiroMindAI/),
[Discord](https://discord.com/invite/GPqEnkzQZd),
[WeChat](https://huggingface.co/datasets/miromind-ai/MiroFlow-Benchmarks/resolve/main/assets/wechat.png),
and [RedNote](https://www.xiaohongshu.com/user/profile/5e353bd80000000001000239),
you can also reach us via email at service@miromind.ai.

82
chat_template.jinja Normal file
View File

@@ -0,0 +1,82 @@
{%- if tools %}
{{- '<|im_start|>system\n' }}
{%- if messages[0].role == 'system' %}
{{- messages[0].content + '\n\n' }}
{%- endif %}
{{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
{%- for tool in tools %}
{{- "\n" }}
{{- tool | tojson }}
{%- endfor %}
{{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
{%- else %}
{%- if messages[0].role == 'system' %}
{{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
{%- endif %}
{%- endif %}
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
{%- for message in messages[::-1] %}
{%- set index = (messages|length - 1) - loop.index0 %}
{%- if ns.multi_step_tool and message.role == "user" and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
{%- set ns.multi_step_tool = false %}
{%- set ns.last_query_index = index %}
{%- endif %}
{%- endfor %}
{%- for message in messages %}
{%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
{{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
{%- elif message.role == "assistant" %}
{%- set content = message.content %}
{%- set reasoning_content = '' %}
{%- if message.reasoning_content is defined and message.reasoning_content is not none %}
{%- set reasoning_content = message.reasoning_content %}
{%- else %}
{%- if '</think>' in message.content %}
{%- set content = message.content.split('</think>')[-1].lstrip('\n') %}
{%- set reasoning_content = message.content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
{%- endif %}
{%- endif %}
{%- if loop.index0 > ns.last_query_index %}
{%- if loop.last or (not loop.last and reasoning_content) %}
{{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
{%- else %}
{{- '<|im_start|>' + message.role + '\n' + content }}
{%- endif %}
{%- else %}
{{- '<|im_start|>' + message.role + '\n' + content }}
{%- endif %}
{%- if message.tool_calls %}
{%- for tool_call in message.tool_calls %}
{%- if (loop.first and content) or (not loop.first) %}
{{- '\n' }}
{%- endif %}
{%- if tool_call.function %}
{%- set tool_call = tool_call.function %}
{%- endif %}
{{- '<tool_call>\n{"name": "' }}
{{- tool_call.name }}
{{- '", "arguments": ' }}
{%- if tool_call.arguments is string %}
{{- tool_call.arguments }}
{%- else %}
{{- tool_call.arguments | tojson }}
{%- endif %}
{{- '}\n</tool_call>' }}
{%- endfor %}
{%- endif %}
{{- '<|im_end|>\n' }}
{%- elif message.role == "tool" %}
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
{{- '<|im_start|>user' }}
{%- endif %}
{{- '\n<tool_response>\n' }}
{{- message.content }}
{{- '\n</tool_response>' }}
{%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
{{- '<|im_end|>\n' }}
{%- endif %}
{%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
{{- '<|im_start|>assistant\n<think>\n\n</think>\n\n' }}
{%- endif %}

30
config.json Normal file
View File

@@ -0,0 +1,30 @@
{
"architectures": [
"Qwen3ForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 5120,
"initializer_range": 0.02,
"intermediate_size": 17408,
"max_position_embeddings": 65536,
"max_window_layers": 40,
"model_type": "qwen3",
"num_attention_heads": 40,
"num_hidden_layers": 40,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.51.0",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 151936
}

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}

13
generation_config.json Normal file
View File

@@ -0,0 +1,13 @@
{
"bos_token_id": 151643,
"do_sample": true,
"eos_token_id": [
151645,
151643
],
"pad_token_id": 151643,
"temperature": 0.6,
"top_k": 20,
"top_p": 0.95,
"transformers_version": "4.51.0"
}

3
merges.txt Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8831e4f1a044471340f7c0a83d7bd71306a5b867e95fd870f74d0c5308a904d5
size 1671853

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:49b19fb1ba9640a661e1ba98af5ce118eaa504b43e8b40114c9d23b1db548f28
size 3841788544

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:25c6d986b2ddbaf42cb029b4ad328b6b8a6d9183f62fb2f12453f518bb6cbea2
size 3963750816

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:925e42c0e5660886bf6ae7678ad1095b2eda71bfaee114f7fef37ca975e77803
size 3963750880

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1d23b214d174f76c6f63e0bc79cabe8f94a1250cf56321cbb26d0c5322eef10d
size 3963750880

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:055cca0e5aff3cf1f59dafd96b50ea184334548af95674181121b25ae38331ec
size 3963750880

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0349b97e07620520c40d4c49a1f683ce3e291bae71335911b9bf88281908c86e
size 3963750880

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5a734fd9641ae87130816c8281f3d4fce46cdec80ed574ae4244ea7432f652eb
size 3963750880

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:20ffb5c74f1cf442604c2b6d1cc0044d60eb33a85c5c2f82bfcd5307ecf5c45a
size 1912371880

View File

@@ -0,0 +1,450 @@
{
"metadata": {
"total_size": 29536614400
},
"weight_map": {
"model.embed_tokens.weight": "model-00001-of-00008.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00008.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00008.safetensors",
"model.layers.0.self_attn.k_norm.weight": "model-00001-of-00008.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.0.self_attn.q_norm.weight": "model-00001-of-00008.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00008.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00008.safetensors",
"model.layers.1.self_attn.k_norm.weight": "model-00001-of-00008.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.1.self_attn.q_norm.weight": "model-00001-of-00008.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00008.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00008.safetensors",
"model.layers.2.self_attn.k_norm.weight": "model-00001-of-00008.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.2.self_attn.q_norm.weight": "model-00001-of-00008.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.3.self_attn.k_norm.weight": "model-00001-of-00008.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.3.self_attn.q_norm.weight": "model-00001-of-00008.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.3.input_layernorm.weight": "model-00002-of-00008.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00002-of-00008.safetensors",
"model.layers.4.input_layernorm.weight": "model-00002-of-00008.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00002-of-00008.safetensors",
"model.layers.4.self_attn.k_norm.weight": "model-00002-of-00008.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.4.self_attn.q_norm.weight": "model-00002-of-00008.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.5.input_layernorm.weight": "model-00002-of-00008.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00002-of-00008.safetensors",
"model.layers.5.self_attn.k_norm.weight": "model-00002-of-00008.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.5.self_attn.q_norm.weight": "model-00002-of-00008.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.6.input_layernorm.weight": "model-00002-of-00008.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00002-of-00008.safetensors",
"model.layers.6.self_attn.k_norm.weight": "model-00002-of-00008.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.6.self_attn.q_norm.weight": "model-00002-of-00008.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.7.input_layernorm.weight": "model-00002-of-00008.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00002-of-00008.safetensors",
"model.layers.7.self_attn.k_norm.weight": "model-00002-of-00008.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.7.self_attn.q_norm.weight": "model-00002-of-00008.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.8.input_layernorm.weight": "model-00002-of-00008.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00002-of-00008.safetensors",
"model.layers.8.self_attn.k_norm.weight": "model-00002-of-00008.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.8.self_attn.q_norm.weight": "model-00002-of-00008.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.9.self_attn.k_norm.weight": "model-00002-of-00008.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.9.self_attn.q_norm.weight": "model-00002-of-00008.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.10.input_layernorm.weight": "model-00003-of-00008.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00003-of-00008.safetensors",
"model.layers.10.self_attn.k_norm.weight": "model-00003-of-00008.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.10.self_attn.q_norm.weight": "model-00003-of-00008.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.11.input_layernorm.weight": "model-00003-of-00008.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00003-of-00008.safetensors",
"model.layers.11.self_attn.k_norm.weight": "model-00003-of-00008.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.11.self_attn.q_norm.weight": "model-00003-of-00008.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.12.input_layernorm.weight": "model-00003-of-00008.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00003-of-00008.safetensors",
"model.layers.12.self_attn.k_norm.weight": "model-00003-of-00008.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.12.self_attn.q_norm.weight": "model-00003-of-00008.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.13.input_layernorm.weight": "model-00003-of-00008.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00003-of-00008.safetensors",
"model.layers.13.self_attn.k_norm.weight": "model-00003-of-00008.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.13.self_attn.q_norm.weight": "model-00003-of-00008.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.14.input_layernorm.weight": "model-00003-of-00008.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00003-of-00008.safetensors",
"model.layers.14.self_attn.k_norm.weight": "model-00003-of-00008.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.14.self_attn.q_norm.weight": "model-00003-of-00008.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.15.self_attn.k_norm.weight": "model-00003-of-00008.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.15.self_attn.q_norm.weight": "model-00003-of-00008.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.9.input_layernorm.weight": "model-00003-of-00008.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00003-of-00008.safetensors",
"model.layers.15.input_layernorm.weight": "model-00004-of-00008.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00004-of-00008.safetensors",
"model.layers.16.input_layernorm.weight": "model-00004-of-00008.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00004-of-00008.safetensors",
"model.layers.16.self_attn.k_norm.weight": "model-00004-of-00008.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.16.self_attn.q_norm.weight": "model-00004-of-00008.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.17.input_layernorm.weight": "model-00004-of-00008.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00004-of-00008.safetensors",
"model.layers.17.self_attn.k_norm.weight": "model-00004-of-00008.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.17.self_attn.q_norm.weight": "model-00004-of-00008.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.18.input_layernorm.weight": "model-00004-of-00008.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00004-of-00008.safetensors",
"model.layers.18.self_attn.k_norm.weight": "model-00004-of-00008.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.18.self_attn.q_norm.weight": "model-00004-of-00008.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.19.input_layernorm.weight": "model-00004-of-00008.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00004-of-00008.safetensors",
"model.layers.19.self_attn.k_norm.weight": "model-00004-of-00008.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.19.self_attn.q_norm.weight": "model-00004-of-00008.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.20.input_layernorm.weight": "model-00004-of-00008.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00004-of-00008.safetensors",
"model.layers.20.self_attn.k_norm.weight": "model-00004-of-00008.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.20.self_attn.q_norm.weight": "model-00004-of-00008.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.21.self_attn.k_norm.weight": "model-00004-of-00008.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.21.self_attn.q_norm.weight": "model-00004-of-00008.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.21.input_layernorm.weight": "model-00005-of-00008.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00005-of-00008.safetensors",
"model.layers.22.input_layernorm.weight": "model-00005-of-00008.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00005-of-00008.safetensors",
"model.layers.22.self_attn.k_norm.weight": "model-00005-of-00008.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.22.self_attn.q_norm.weight": "model-00005-of-00008.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.23.input_layernorm.weight": "model-00005-of-00008.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00005-of-00008.safetensors",
"model.layers.23.self_attn.k_norm.weight": "model-00005-of-00008.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.23.self_attn.q_norm.weight": "model-00005-of-00008.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.24.input_layernorm.weight": "model-00005-of-00008.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00005-of-00008.safetensors",
"model.layers.24.self_attn.k_norm.weight": "model-00005-of-00008.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.24.self_attn.q_norm.weight": "model-00005-of-00008.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.25.input_layernorm.weight": "model-00005-of-00008.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00005-of-00008.safetensors",
"model.layers.25.self_attn.k_norm.weight": "model-00005-of-00008.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.25.self_attn.q_norm.weight": "model-00005-of-00008.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.26.input_layernorm.weight": "model-00005-of-00008.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00005-of-00008.safetensors",
"model.layers.26.self_attn.k_norm.weight": "model-00005-of-00008.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.26.self_attn.q_norm.weight": "model-00005-of-00008.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.27.self_attn.k_norm.weight": "model-00005-of-00008.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.27.self_attn.q_norm.weight": "model-00005-of-00008.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.27.input_layernorm.weight": "model-00006-of-00008.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00006-of-00008.safetensors",
"model.layers.28.input_layernorm.weight": "model-00006-of-00008.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00006-of-00008.safetensors",
"model.layers.28.self_attn.k_norm.weight": "model-00006-of-00008.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.28.self_attn.q_norm.weight": "model-00006-of-00008.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.29.input_layernorm.weight": "model-00006-of-00008.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00006-of-00008.safetensors",
"model.layers.29.self_attn.k_norm.weight": "model-00006-of-00008.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.29.self_attn.q_norm.weight": "model-00006-of-00008.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.30.input_layernorm.weight": "model-00006-of-00008.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00006-of-00008.safetensors",
"model.layers.30.self_attn.k_norm.weight": "model-00006-of-00008.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.30.self_attn.q_norm.weight": "model-00006-of-00008.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.31.input_layernorm.weight": "model-00006-of-00008.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00006-of-00008.safetensors",
"model.layers.31.self_attn.k_norm.weight": "model-00006-of-00008.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.31.self_attn.q_norm.weight": "model-00006-of-00008.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.32.input_layernorm.weight": "model-00006-of-00008.safetensors",
"model.layers.32.mlp.down_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.32.mlp.gate_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.32.mlp.up_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.32.post_attention_layernorm.weight": "model-00006-of-00008.safetensors",
"model.layers.32.self_attn.k_norm.weight": "model-00006-of-00008.safetensors",
"model.layers.32.self_attn.k_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.32.self_attn.o_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.32.self_attn.q_norm.weight": "model-00006-of-00008.safetensors",
"model.layers.32.self_attn.q_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.32.self_attn.v_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.33.mlp.gate_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.33.self_attn.k_norm.weight": "model-00006-of-00008.safetensors",
"model.layers.33.self_attn.k_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.33.self_attn.o_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.33.self_attn.q_norm.weight": "model-00006-of-00008.safetensors",
"model.layers.33.self_attn.q_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.33.self_attn.v_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.33.input_layernorm.weight": "model-00007-of-00008.safetensors",
"model.layers.33.mlp.down_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.33.mlp.up_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.33.post_attention_layernorm.weight": "model-00007-of-00008.safetensors",
"model.layers.34.input_layernorm.weight": "model-00007-of-00008.safetensors",
"model.layers.34.mlp.down_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.34.mlp.gate_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.34.mlp.up_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.34.post_attention_layernorm.weight": "model-00007-of-00008.safetensors",
"model.layers.34.self_attn.k_norm.weight": "model-00007-of-00008.safetensors",
"model.layers.34.self_attn.k_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.34.self_attn.o_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.34.self_attn.q_norm.weight": "model-00007-of-00008.safetensors",
"model.layers.34.self_attn.q_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.34.self_attn.v_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.35.input_layernorm.weight": "model-00007-of-00008.safetensors",
"model.layers.35.mlp.down_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.35.mlp.gate_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.35.mlp.up_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.35.post_attention_layernorm.weight": "model-00007-of-00008.safetensors",
"model.layers.35.self_attn.k_norm.weight": "model-00007-of-00008.safetensors",
"model.layers.35.self_attn.k_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.35.self_attn.o_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.35.self_attn.q_norm.weight": "model-00007-of-00008.safetensors",
"model.layers.35.self_attn.q_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.35.self_attn.v_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.36.input_layernorm.weight": "model-00007-of-00008.safetensors",
"model.layers.36.mlp.down_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.36.mlp.gate_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.36.mlp.up_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.36.post_attention_layernorm.weight": "model-00007-of-00008.safetensors",
"model.layers.36.self_attn.k_norm.weight": "model-00007-of-00008.safetensors",
"model.layers.36.self_attn.k_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.36.self_attn.o_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.36.self_attn.q_norm.weight": "model-00007-of-00008.safetensors",
"model.layers.36.self_attn.q_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.36.self_attn.v_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.37.input_layernorm.weight": "model-00007-of-00008.safetensors",
"model.layers.37.mlp.down_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.37.mlp.gate_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.37.mlp.up_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.37.post_attention_layernorm.weight": "model-00007-of-00008.safetensors",
"model.layers.37.self_attn.k_norm.weight": "model-00007-of-00008.safetensors",
"model.layers.37.self_attn.k_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.37.self_attn.o_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.37.self_attn.q_norm.weight": "model-00007-of-00008.safetensors",
"model.layers.37.self_attn.q_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.37.self_attn.v_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.38.input_layernorm.weight": "model-00007-of-00008.safetensors",
"model.layers.38.mlp.down_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.38.mlp.gate_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.38.mlp.up_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.38.post_attention_layernorm.weight": "model-00007-of-00008.safetensors",
"model.layers.38.self_attn.k_norm.weight": "model-00007-of-00008.safetensors",
"model.layers.38.self_attn.k_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.38.self_attn.o_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.38.self_attn.q_norm.weight": "model-00007-of-00008.safetensors",
"model.layers.38.self_attn.q_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.38.self_attn.v_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.39.mlp.gate_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.39.self_attn.k_norm.weight": "model-00007-of-00008.safetensors",
"model.layers.39.self_attn.k_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.39.self_attn.o_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.39.self_attn.q_norm.weight": "model-00007-of-00008.safetensors",
"model.layers.39.self_attn.q_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.39.self_attn.v_proj.weight": "model-00007-of-00008.safetensors",
"lm_head.weight": "model-00008-of-00008.safetensors",
"model.layers.39.input_layernorm.weight": "model-00008-of-00008.safetensors",
"model.layers.39.mlp.down_proj.weight": "model-00008-of-00008.safetensors",
"model.layers.39.mlp.up_proj.weight": "model-00008-of-00008.safetensors",
"model.layers.39.post_attention_layernorm.weight": "model-00008-of-00008.safetensors",
"model.norm.weight": "model-00008-of-00008.safetensors"
}
}

3
tokenizer.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
size 11422654

239
tokenizer_config.json Normal file
View File

@@ -0,0 +1,239 @@
{
"add_bos_token": false,
"add_prefix_space": false,
"added_tokens_decoder": {
"151643": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151644": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151645": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151646": {
"content": "<|object_ref_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151647": {
"content": "<|object_ref_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151648": {
"content": "<|box_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151649": {
"content": "<|box_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151650": {
"content": "<|quad_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151651": {
"content": "<|quad_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151652": {
"content": "<|vision_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151653": {
"content": "<|vision_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151654": {
"content": "<|vision_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151655": {
"content": "<|image_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151656": {
"content": "<|video_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151657": {
"content": "<tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151658": {
"content": "</tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151659": {
"content": "<|fim_prefix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151660": {
"content": "<|fim_middle|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151661": {
"content": "<|fim_suffix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151662": {
"content": "<|fim_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151663": {
"content": "<|repo_name|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151664": {
"content": "<|file_sep|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151665": {
"content": "<tool_response>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151666": {
"content": "</tool_response>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151667": {
"content": "<think>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151668": {
"content": "</think>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
}
},
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"bos_token": null,
"chat_template": " {%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {{- messages[0].content + '\\n\\n' }}\n {%- endif %}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n {%- set index = (messages|length - 1) - loop.index0 %}\n {%- if ns.multi_step_tool and message.role == \"user\" and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}\n {%- set ns.multi_step_tool = false %}\n {%- set ns.last_query_index = index %}\n {%- endif %}\n{%- endfor %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {%- set content = message.content %}\n {%- set reasoning_content = '' %}\n {%- if message.reasoning_content is defined and message.reasoning_content is not none %}\n {%- set reasoning_content = message.reasoning_content %}\n {%- else %}\n {%- if '</think>' in message.content %}\n {%- set content = message.content.split('</think>')[-1].lstrip('\\n') %}\n {%- set reasoning_content = message.content.split('</think>')[0].rstrip('\\n').split('<think>')[-1].lstrip('\\n') %}\n {%- endif %}\n {%- endif %}\n {%- if loop.index0 > ns.last_query_index %}\n {%- if loop.last or (not loop.last and reasoning_content) %}\n {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content.strip('\\n') + '\\n</think>\\n\\n' + content.lstrip('\\n') }}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}\n {{- '\\n' }}\n {%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n<think>\\n\\n</think>\\n\\n' }}\n{%- endif %}",
"clean_up_tokenization_spaces": false,
"eos_token": "<|im_end|>",
"errors": "replace",
"model_max_length": 131072,
"pad_token": "<|endoftext|>",
"split_special_tokens": false,
"tokenizer_class": "Qwen2Tokenizer",
"unk_token": null
}

BIN
vocab.json (Stored with Git LFS) Normal file

Binary file not shown.