Upload to RealmSky/Qwen3-32B-MLX-bf16-3 on ModelScope hub

This commit is contained in:
Cherrytest
2025-07-25 06:17:56 +00:00
parent a3958de5a7
commit 7c151592bd
25 changed files with 1554 additions and 42 deletions

6
.gitattributes vendored
View File

@@ -44,4 +44,8 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
merges.txt filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text
vocab.json filter=lfs diff=lfs merge=lfs -text

BIN
.msc Normal file

Binary file not shown.

1
.mv Normal file
View File

@@ -0,0 +1 @@
Revision:master,CreatedAt:1751914180

202
LICENSE Normal file
View File

@@ -0,0 +1,202 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright 2024 Alibaba Cloud
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

354
README.md
View File

@@ -1,47 +1,319 @@
---
license: Apache License 2.0
#model-type:
##如 gpt、phi、llama、chatglm、baichuan 等
#- gpt
#domain:
##如 nlp、cv、audio、multi-modal
#- nlp
#language:
##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa
#- cn
#metrics:
##如 CIDEr、Blue、ROUGE 等
#- CIDEr
#tags:
##各种自定义,包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他
#- pretrained
#tools:
##如 vllm、fastchat、llamacpp、AdaSeq 等
#- vllm
library_name: mlx
license: apache-2.0
license_link: https://huggingface.co/Qwen/Qwen3-14B/blob/main/LICENSE
pipeline_tag: text-generation
---
### 当前模型的贡献者未提供更加详细的模型介绍。模型文件和权重,可浏览“模型文件”页面获取。
#### 您可以通过如下git clone命令或者ModelScope SDK来下载模型
SDK下载
# Qwen3-32B-MLX-bf16
<a href="https://chat.qwen.ai/" target="_blank" style="margin: 2px;">
<img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>
</a>
## Qwen3 Highlights
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features:
- **Uniquely support of seamless switching between thinking mode** (for complex logical reasoning, math, and coding) and **non-thinking mode** (for efficient, general-purpose dialogue) **within single model**, ensuring optimal performance across various scenarios.
- **Significantly enhancement in its reasoning capabilities**, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning.
- **Superior human preference alignment**, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience.
- **Expertise in agent capabilities**, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks.
- **Support of 100+ languages and dialects** with strong capabilities for **multilingual instruction following** and **translation**.
## Model Overview
**Qwen3-32B** has the following features:
- Type: Causal Language Models
- Training Stage: Pretraining & Post-training
- Number of Parameters: 32.8B
- Number of Paramaters (Non-Embedding): 31.2B
- Number of Layers: 64
- Number of Attention Heads (GQA): 64 for Q and 8 for KV
- Context Length: 32,768 natively and [131,072 tokens with YaRN](#processing-long-texts).
For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3/), [GitHub](https://github.com/QwenLM/Qwen3), and [Documentation](https://qwen.readthedocs.io/en/latest/).
## Quickstart
The code of Qwen3 has been in the latest versions of both **`transformers` (≥4.52.4)** and **`mlx_lm` (≥0.25.2)**, and we advise you to use the latest version of `transformers` and `mlx_lm`.
Older versions (e.g., `transformers<4.51.0`) may raise errors like:
```text
KeyError: 'qwen3'
```
Install or upgrade both packages:
```bash
#安装ModelScope
pip install modelscope
```
```python
#SDK模型下载
from modelscope import snapshot_download
model_dir = snapshot_download('RealmSky/Qwen3-32B-MLX-bf16-3')
```
Git下载
```
#Git模型下载
git clone https://www.modelscope.cn/RealmSky/Qwen3-32B-MLX-bf16-3.git
pip install --upgrade transformers mlx_lm
```
<p style="color: lightgrey;">如果您是本模型的贡献者,我们邀请您根据<a href="https://modelscope.cn/docs/ModelScope%E6%A8%A1%E5%9E%8B%E6%8E%A5%E5%85%A5%E6%B5%81%E7%A8%8B%E6%A6%82%E8%A7%88" style="color: lightgrey; text-decoration: underline;">模型贡献文档</a>,及时完善模型卡片内容。</p>
The following contains a code snippet illustrating how to use the model generate content based on given inputs.
```python
from mlx_lm import load, generate
model, tokenizer = load("Qwen/Qwen3-32B-MLX-bf16")
prompt = "Hello, please introduce yourself and tell me what you can do."
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True
)
response = generate(
model,
tokenizer,
prompt=prompt,
verbose=True,
max_tokens=1024
)
print(response)
```
## Switching Between Thinking and Non-Thinking Mode
> [!TIP]
> The `enable_thinking` switch is also available in APIs created by SGLang and vLLM.
> Please refer to our documentation for [SGLang](https://qwen.readthedocs.io/en/latest/deployment/sglang.html#thinking-non-thinking-modes) and [vLLM](https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes) users.
### `enable_thinking=True`
By default, Qwen3 has thinking capabilities enabled, similar to QwQ-32B. This means the model will use its reasoning abilities to enhance the quality of generated responses. For example, when explicitly setting `enable_thinking=True` or leaving it as the default value in `tokenizer.apply_chat_template`, the model will engage its thinking mode.
```python
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=True # True is the default value for enable_thinking
)
```
In this mode, the model will generate think content wrapped in a `<think>...</think>` block, followed by the final response.
> [!NOTE]
> For thinking mode, use `Temperature=0.6`, `TopP=0.95`, `TopK=20`, and `MinP=0` (the default setting in `generation_config.json`). **DO NOT use greedy decoding**, as it can lead to performance degradation and endless repetitions. For more detailed guidance, please refer to the [Best Practices](#best-practices) section.
### `enable_thinking=False`
We provide a hard switch to strictly disable the model's thinking behavior, aligning its functionality with the previous Qwen2.5-Instruct models. This mode is particularly useful in scenarios where disabling thinking is essential for enhancing efficiency.
```python
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=False # Setting enable_thinking=False disables thinking mode
)
```
In this mode, the model will not generate any think content and will not include a `<think>...</think>` block.
> [!NOTE]
> For non-thinking mode, we suggest using `Temperature=0.7`, `TopP=0.8`, `TopK=20`, and `MinP=0`. For more detailed guidance, please refer to the [Best Practices](#best-practices) section.
### Advanced Usage: Switching Between Thinking and Non-Thinking Modes via User Input
We provide a soft switch mechanism that allows users to dynamically control the model's behavior when `enable_thinking=True`. Specifically, you can add `/think` and `/no_think` to user prompts or system messages to switch the model's thinking mode from turn to turn. The model will follow the most recent instruction in multi-turn conversations.
Here is an example of a multi-turn conversation:
```python
from mlx_lm import load, generate
class QwenChatbot:
def __init__(self, model_name="Qwen/Qwen3-32B-MLX-bf16"):
self.model, self.tokenizer = load(model_name)
self.history = []
def generate_response(self, user_input):
messages = self.history + [{"role": "user", "content": user_input}]
text = self.tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
response = generate(
self.model,
self.tokenizer,
prompt=text,
verbose=True,
max_tokens=32768
)
# Update history
self.history.append({"role": "user", "content": user_input})
self.history.append({"role": "assistant", "content": response})
return response
# Example Usage
if __name__ == "__main__":
chatbot = QwenChatbot()
# First input (without /think or /no_think tags, thinking mode is enabled by default)
user_input_1 = "How many 'r's are in strawberries?"
print(f"User: {user_input_1}")
response_1 = chatbot.generate_response(user_input_1)
print(f"Bot: {response_1}")
print("----------------------")
# Second input with /no_think
user_input_2 = "Then, how many 'r's are in blueberries? /no_think"
print(f"User: {user_input_2}")
response_2 = chatbot.generate_response(user_input_2)
print(f"Bot: {response_2}")
print("----------------------")
# Third input with /think
user_input_3 = "Really? /think"
print(f"User: {user_input_3}")
response_3 = chatbot.generate_response(user_input_3)
print(f"Bot: {response_3}")
```
> [!NOTE]
> For API compatibility, when `enable_thinking=True`, regardless of whether the user uses `/think` or `/no_think`, the model will always output a block wrapped in `<think>...</think>`. However, the content inside this block may be empty if thinking is disabled.
> When `enable_thinking=False`, the soft switches are not valid. Regardless of any `/think` or `/no_think` tags input by the user, the model will not generate think content and will not include a `<think>...</think>` block.
## Agentic Use
Qwen3 excels in tool calling capabilities. We recommend using [Qwen-Agent](https://github.com/QwenLM/Qwen-Agent) to make the best use of agentic ability of Qwen3. Qwen-Agent encapsulates tool-calling templates and tool-calling parsers internally, greatly reducing coding complexity.
To define the available tools, you can use the MCP configuration file, use the integrated tool of Qwen-Agent, or integrate other tools by yourself.
```python
from qwen_agent.agents import Assistant
# Define LLM
llm_cfg = {
"model": "Qwen3-32B-MLX-bf16",
# Use the endpoint provided by Alibaba Model Studio:
# "model_type": "qwen_dashscope",
# "api_key": os.getenv("DASHSCOPE_API_KEY"),
# Use a custom endpoint compatible with OpenAI API:
"model_server": "http://localhost:8000/v1", # api_base
"api_key": "EMPTY",
# Other parameters:
# "generate_cfg": {
# # Add: When the response content is `<think>this is the thought</think>this is the answer;
# # Do not add: When the response has been separated by reasoning_content and content.
# "thought_in_content": True,
# },
}
# Define Tools
tools = [
{
"mcpServers": { # You can specify the MCP configuration file
"time": {
"command": "uvx",
"args": ["mcp-server-time", "--local-timezone=Asia/Shanghai"],
},
"fetch": {
"command": "uvx",
"args": ["mcp-server-fetch"],
},
}
},
"code_interpreter", # Built-in tools
]
# Define Agent
bot = Assistant(llm=llm_cfg, function_list=tools)
# Streaming generation
messages = [
{
"role": "user",
"content": "https://qwenlm.github.io/blog/ Introduce the latest developments of Qwen",
}
]
for responses in bot.run(messages=messages):
pass
print(responses)
```
## Processing Long Texts
Qwen3 natively supports context lengths of up to 32,768 tokens. For conversations where the total length (including both input and output) significantly exceeds this limit, we recommend using RoPE scaling techniques to handle long texts effectively. We have validated the model's performance on context lengths of up to 131,072 tokens using the [YaRN](https://arxiv.org/abs/2309.00071) method.
YaRN is currently supported by several inference frameworks, e.g., `transformers` and `llama.cpp` for local use, `vllm` and `sglang` for deployment. In general, there are two approaches to enabling YaRN for supported frameworks:
- Modifying the model files:
In the `config.json` file, add the `rope_scaling` fields:
```json
{
...,
"rope_scaling": {
"rope_type": "yarn",
"factor": 4.0,
"original_max_position_embeddings": 32768
}
}
```
> [!IMPORTANT]
> If you encounter the following warning
> ```
> Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'original_max_position_embeddings'}
> ```
> please upgrade `transformers>=4.51.0`.
> [!NOTE]
> All the notable open-source frameworks implement static YaRN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts.**
> We advise adding the `rope_scaling` configuration only when processing long contexts is required.
> It is also recommended to modify the `factor` as needed. For example, if the typical context length for your application is 65,536 tokens, it would be better to set `factor` as 2.0.
> [!NOTE]
> The default `max_position_embeddings` in `config.json` is set to 40,960. This allocation includes reserving 32,768 tokens for outputs and 8,192 tokens for typical prompts, which is sufficient for most scenarios involving short text processing. If the average context length does not exceed 32,768 tokens, we do not recommend enabling YaRN in this scenario, as it may potentially degrade model performance.
> [!TIP]
> The endpoint provided by Alibaba Model Studio supports dynamic YaRN by default and no extra configuration is needed.
## Best Practices
To achieve optimal performance, we recommend the following settings:
1. **Sampling Parameters**:
- For thinking mode (`enable_thinking=True`), use `Temperature=0.6`, `TopP=0.95`, `TopK=20`, and `MinP=0`. **DO NOT use greedy decoding**, as it can lead to performance degradation and endless repetitions.
- For non-thinking mode (`enable_thinking=False`), we suggest using `Temperature=0.7`, `TopP=0.8`, `TopK=20`, and `MinP=0`.
- For supported frameworks, you can adjust the `presence_penalty` parameter between 0 and 2 to reduce endless repetitions. However, using a higher value may occasionally result in language mixing and a slight decrease in model performance.
2. **Adequate Output Length**: We recommend using an output length of 32,768 tokens for most queries. For benchmarking on highly complex problems, such as those found in math and programming competitions, we suggest setting the max output length to 38,912 tokens. This provides the model with sufficient space to generate detailed and comprehensive responses, thereby enhancing its overall performance.
3. **Standardize Output Format**: We recommend using prompts to standardize model outputs when benchmarking.
- **Math Problems**: Include "Please reason step by step, and put your final answer within \boxed{}." in the prompt.
- **Multiple-Choice Questions**: Add the following JSON structure to the prompt to standardize responses: "Please show your choice in the `answer` field with only the choice letter, e.g., `"answer": "C"`."
4. **No Thinking Content in History**: In multi-turn conversations, the historical model output should only include the final output part and does not need to include the thinking content. It is implemented in the provided chat template in Jinja2. However, for frameworks that do not directly use the Jinja2 chat template, it is up to the developers to ensure that the best practice is followed.
### Citation
If you find our work helpful, feel free to give us a cite.
```
@misc{qwen3technicalreport,
title={Qwen3 Technical Report},
author={Qwen Team},
year={2025},
eprint={2505.09388},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.09388},
}
```

30
config.json Normal file
View File

@@ -0,0 +1,30 @@
{
"architectures": [
"Qwen3ForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 5120,
"initializer_range": 0.02,
"intermediate_size": 25600,
"max_position_embeddings": 40960,
"max_window_layers": 64,
"model_type": "qwen3",
"num_attention_heads": 64,
"num_hidden_layers": 64,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.51.3",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 151936
}

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework":"Pytorch","task":"text-generation"}

3
merges.txt Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8831e4f1a044471340f7c0a83d7bd71306a5b867e95fd870f74d0c5308a904d5
size 1671853

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:93d87d5a923eb30b41ce81ddd469f56bbfe4e73cfa010a5b3578a6118935acac
size 135

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ae31a41a958b05fa3600f41e47cd69f18c44acb680fb8bd20efcd1d9d320b28f
size 135

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:07eec08e8d71ac7145a4adc8748bcc7ce0f497b8877c6f5d213e175e2cd62284
size 135

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:504e3675ddfc74ed1909c53dbba8a127b15935ca934d4126ce6d3716dc8ca95d
size 135

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a938c83f9f93f8b26ac5c1466442862e31fa898e8962ce4c46c45a77ab6401c0
size 135

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:de1a0ebddabc4916e109c738969cc1cdfc17454708bd0e4ddd59a1bd5989bafa
size 135

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6e71db0c3dc0344220692042b985d03411f6bbd40ab4c31c69828474bda3502c
size 135

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:491855b686a840797c8f67c6156007c082a19a064df6b80a27820e429fc258fc
size 135

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:34a4ff364e47ccb460cbc8dd5928f98e869047a8a67c266b592b741871475b78
size 135

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e89d35faa32f138826c035510442f1c3877b3a9cb9c381cdf67fd3f29f3ceb64
size 135

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:fe970df4c5f976e6af27d749185e64f5db48682e03e923a557c5319d53f9868a
size 135

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f01444e5c65698dee32bb52446dee1463c998682cc634858f1efa2fd5f8a11df
size 135

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f721b62fbf0d2762805b67c9655f32572533079e1e265e56e28dd493622841b5
size 135

View File

@@ -0,0 +1,714 @@
{
"metadata": {
"total_size": 65524246528
},
"weight_map": {
"lm_head.weight": "model-00013-of-00013.safetensors",
"model.embed_tokens.weight": "model-00001-of-00013.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00013.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00013.safetensors",
"model.layers.0.self_attn.k_norm.weight": "model-00001-of-00013.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.0.self_attn.q_norm.weight": "model-00001-of-00013.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00013.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00013.safetensors",
"model.layers.1.self_attn.k_norm.weight": "model-00001-of-00013.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.1.self_attn.q_norm.weight": "model-00001-of-00013.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.10.input_layernorm.weight": "model-00003-of-00013.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00003-of-00013.safetensors",
"model.layers.10.self_attn.k_norm.weight": "model-00003-of-00013.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.10.self_attn.q_norm.weight": "model-00003-of-00013.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.11.input_layernorm.weight": "model-00003-of-00013.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00003-of-00013.safetensors",
"model.layers.11.self_attn.k_norm.weight": "model-00003-of-00013.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.11.self_attn.q_norm.weight": "model-00003-of-00013.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.12.input_layernorm.weight": "model-00003-of-00013.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00003-of-00013.safetensors",
"model.layers.12.self_attn.k_norm.weight": "model-00003-of-00013.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.12.self_attn.q_norm.weight": "model-00003-of-00013.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.13.input_layernorm.weight": "model-00003-of-00013.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00003-of-00013.safetensors",
"model.layers.13.self_attn.k_norm.weight": "model-00003-of-00013.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.13.self_attn.q_norm.weight": "model-00003-of-00013.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.14.input_layernorm.weight": "model-00004-of-00013.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00004-of-00013.safetensors",
"model.layers.14.self_attn.k_norm.weight": "model-00003-of-00013.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.14.self_attn.q_norm.weight": "model-00003-of-00013.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.15.input_layernorm.weight": "model-00004-of-00013.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00004-of-00013.safetensors",
"model.layers.15.self_attn.k_norm.weight": "model-00004-of-00013.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.15.self_attn.q_norm.weight": "model-00004-of-00013.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.16.input_layernorm.weight": "model-00004-of-00013.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00004-of-00013.safetensors",
"model.layers.16.self_attn.k_norm.weight": "model-00004-of-00013.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.16.self_attn.q_norm.weight": "model-00004-of-00013.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.17.input_layernorm.weight": "model-00004-of-00013.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00004-of-00013.safetensors",
"model.layers.17.self_attn.k_norm.weight": "model-00004-of-00013.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.17.self_attn.q_norm.weight": "model-00004-of-00013.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.18.input_layernorm.weight": "model-00004-of-00013.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00004-of-00013.safetensors",
"model.layers.18.self_attn.k_norm.weight": "model-00004-of-00013.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.18.self_attn.q_norm.weight": "model-00004-of-00013.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.19.input_layernorm.weight": "model-00005-of-00013.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00005-of-00013.safetensors",
"model.layers.19.self_attn.k_norm.weight": "model-00004-of-00013.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.19.self_attn.q_norm.weight": "model-00004-of-00013.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00004-of-00013.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00013.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00013.safetensors",
"model.layers.2.self_attn.k_norm.weight": "model-00001-of-00013.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.2.self_attn.q_norm.weight": "model-00001-of-00013.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.20.input_layernorm.weight": "model-00005-of-00013.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00005-of-00013.safetensors",
"model.layers.20.self_attn.k_norm.weight": "model-00005-of-00013.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.20.self_attn.q_norm.weight": "model-00005-of-00013.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.21.input_layernorm.weight": "model-00005-of-00013.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00005-of-00013.safetensors",
"model.layers.21.self_attn.k_norm.weight": "model-00005-of-00013.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.21.self_attn.q_norm.weight": "model-00005-of-00013.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.22.input_layernorm.weight": "model-00005-of-00013.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00005-of-00013.safetensors",
"model.layers.22.self_attn.k_norm.weight": "model-00005-of-00013.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.22.self_attn.q_norm.weight": "model-00005-of-00013.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.23.input_layernorm.weight": "model-00005-of-00013.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00005-of-00013.safetensors",
"model.layers.23.self_attn.k_norm.weight": "model-00005-of-00013.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.23.self_attn.q_norm.weight": "model-00005-of-00013.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.24.input_layernorm.weight": "model-00005-of-00013.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00005-of-00013.safetensors",
"model.layers.24.self_attn.k_norm.weight": "model-00005-of-00013.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.24.self_attn.q_norm.weight": "model-00005-of-00013.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.25.input_layernorm.weight": "model-00006-of-00013.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00006-of-00013.safetensors",
"model.layers.25.self_attn.k_norm.weight": "model-00005-of-00013.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.25.self_attn.q_norm.weight": "model-00005-of-00013.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00005-of-00013.safetensors",
"model.layers.26.input_layernorm.weight": "model-00006-of-00013.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00006-of-00013.safetensors",
"model.layers.26.self_attn.k_norm.weight": "model-00006-of-00013.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.26.self_attn.q_norm.weight": "model-00006-of-00013.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.27.input_layernorm.weight": "model-00006-of-00013.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00006-of-00013.safetensors",
"model.layers.27.self_attn.k_norm.weight": "model-00006-of-00013.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.27.self_attn.q_norm.weight": "model-00006-of-00013.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.28.input_layernorm.weight": "model-00006-of-00013.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00006-of-00013.safetensors",
"model.layers.28.self_attn.k_norm.weight": "model-00006-of-00013.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.28.self_attn.q_norm.weight": "model-00006-of-00013.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.29.input_layernorm.weight": "model-00006-of-00013.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00006-of-00013.safetensors",
"model.layers.29.self_attn.k_norm.weight": "model-00006-of-00013.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.29.self_attn.q_norm.weight": "model-00006-of-00013.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.3.input_layernorm.weight": "model-00002-of-00013.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00002-of-00013.safetensors",
"model.layers.3.self_attn.k_norm.weight": "model-00001-of-00013.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.3.self_attn.q_norm.weight": "model-00001-of-00013.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00013.safetensors",
"model.layers.30.input_layernorm.weight": "model-00007-of-00013.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00007-of-00013.safetensors",
"model.layers.30.self_attn.k_norm.weight": "model-00006-of-00013.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.30.self_attn.q_norm.weight": "model-00006-of-00013.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00006-of-00013.safetensors",
"model.layers.31.input_layernorm.weight": "model-00007-of-00013.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00007-of-00013.safetensors",
"model.layers.31.self_attn.k_norm.weight": "model-00007-of-00013.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.31.self_attn.q_norm.weight": "model-00007-of-00013.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.32.input_layernorm.weight": "model-00007-of-00013.safetensors",
"model.layers.32.mlp.down_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.32.mlp.gate_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.32.mlp.up_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.32.post_attention_layernorm.weight": "model-00007-of-00013.safetensors",
"model.layers.32.self_attn.k_norm.weight": "model-00007-of-00013.safetensors",
"model.layers.32.self_attn.k_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.32.self_attn.o_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.32.self_attn.q_norm.weight": "model-00007-of-00013.safetensors",
"model.layers.32.self_attn.q_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.32.self_attn.v_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.33.input_layernorm.weight": "model-00007-of-00013.safetensors",
"model.layers.33.mlp.down_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.33.mlp.gate_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.33.mlp.up_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.33.post_attention_layernorm.weight": "model-00007-of-00013.safetensors",
"model.layers.33.self_attn.k_norm.weight": "model-00007-of-00013.safetensors",
"model.layers.33.self_attn.k_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.33.self_attn.o_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.33.self_attn.q_norm.weight": "model-00007-of-00013.safetensors",
"model.layers.33.self_attn.q_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.33.self_attn.v_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.34.input_layernorm.weight": "model-00007-of-00013.safetensors",
"model.layers.34.mlp.down_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.34.mlp.gate_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.34.mlp.up_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.34.post_attention_layernorm.weight": "model-00007-of-00013.safetensors",
"model.layers.34.self_attn.k_norm.weight": "model-00007-of-00013.safetensors",
"model.layers.34.self_attn.k_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.34.self_attn.o_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.34.self_attn.q_norm.weight": "model-00007-of-00013.safetensors",
"model.layers.34.self_attn.q_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.34.self_attn.v_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.35.input_layernorm.weight": "model-00008-of-00013.safetensors",
"model.layers.35.mlp.down_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.35.mlp.gate_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.35.mlp.up_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.35.post_attention_layernorm.weight": "model-00008-of-00013.safetensors",
"model.layers.35.self_attn.k_norm.weight": "model-00007-of-00013.safetensors",
"model.layers.35.self_attn.k_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.35.self_attn.o_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.35.self_attn.q_norm.weight": "model-00007-of-00013.safetensors",
"model.layers.35.self_attn.q_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.35.self_attn.v_proj.weight": "model-00007-of-00013.safetensors",
"model.layers.36.input_layernorm.weight": "model-00008-of-00013.safetensors",
"model.layers.36.mlp.down_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.36.mlp.gate_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.36.mlp.up_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.36.post_attention_layernorm.weight": "model-00008-of-00013.safetensors",
"model.layers.36.self_attn.k_norm.weight": "model-00008-of-00013.safetensors",
"model.layers.36.self_attn.k_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.36.self_attn.o_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.36.self_attn.q_norm.weight": "model-00008-of-00013.safetensors",
"model.layers.36.self_attn.q_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.36.self_attn.v_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.37.input_layernorm.weight": "model-00008-of-00013.safetensors",
"model.layers.37.mlp.down_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.37.mlp.gate_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.37.mlp.up_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.37.post_attention_layernorm.weight": "model-00008-of-00013.safetensors",
"model.layers.37.self_attn.k_norm.weight": "model-00008-of-00013.safetensors",
"model.layers.37.self_attn.k_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.37.self_attn.o_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.37.self_attn.q_norm.weight": "model-00008-of-00013.safetensors",
"model.layers.37.self_attn.q_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.37.self_attn.v_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.38.input_layernorm.weight": "model-00008-of-00013.safetensors",
"model.layers.38.mlp.down_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.38.mlp.gate_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.38.mlp.up_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.38.post_attention_layernorm.weight": "model-00008-of-00013.safetensors",
"model.layers.38.self_attn.k_norm.weight": "model-00008-of-00013.safetensors",
"model.layers.38.self_attn.k_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.38.self_attn.o_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.38.self_attn.q_norm.weight": "model-00008-of-00013.safetensors",
"model.layers.38.self_attn.q_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.38.self_attn.v_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.39.input_layernorm.weight": "model-00008-of-00013.safetensors",
"model.layers.39.mlp.down_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.39.mlp.gate_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.39.mlp.up_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.39.post_attention_layernorm.weight": "model-00008-of-00013.safetensors",
"model.layers.39.self_attn.k_norm.weight": "model-00008-of-00013.safetensors",
"model.layers.39.self_attn.k_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.39.self_attn.o_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.39.self_attn.q_norm.weight": "model-00008-of-00013.safetensors",
"model.layers.39.self_attn.q_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.39.self_attn.v_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.4.input_layernorm.weight": "model-00002-of-00013.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00002-of-00013.safetensors",
"model.layers.4.self_attn.k_norm.weight": "model-00002-of-00013.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.4.self_attn.q_norm.weight": "model-00002-of-00013.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.40.input_layernorm.weight": "model-00008-of-00013.safetensors",
"model.layers.40.mlp.down_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.40.mlp.gate_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.40.mlp.up_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.40.post_attention_layernorm.weight": "model-00008-of-00013.safetensors",
"model.layers.40.self_attn.k_norm.weight": "model-00008-of-00013.safetensors",
"model.layers.40.self_attn.k_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.40.self_attn.o_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.40.self_attn.q_norm.weight": "model-00008-of-00013.safetensors",
"model.layers.40.self_attn.q_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.40.self_attn.v_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.41.input_layernorm.weight": "model-00009-of-00013.safetensors",
"model.layers.41.mlp.down_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.41.mlp.gate_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.41.mlp.up_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.41.post_attention_layernorm.weight": "model-00009-of-00013.safetensors",
"model.layers.41.self_attn.k_norm.weight": "model-00008-of-00013.safetensors",
"model.layers.41.self_attn.k_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.41.self_attn.o_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.41.self_attn.q_norm.weight": "model-00008-of-00013.safetensors",
"model.layers.41.self_attn.q_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.41.self_attn.v_proj.weight": "model-00008-of-00013.safetensors",
"model.layers.42.input_layernorm.weight": "model-00009-of-00013.safetensors",
"model.layers.42.mlp.down_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.42.mlp.gate_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.42.mlp.up_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.42.post_attention_layernorm.weight": "model-00009-of-00013.safetensors",
"model.layers.42.self_attn.k_norm.weight": "model-00009-of-00013.safetensors",
"model.layers.42.self_attn.k_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.42.self_attn.o_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.42.self_attn.q_norm.weight": "model-00009-of-00013.safetensors",
"model.layers.42.self_attn.q_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.42.self_attn.v_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.43.input_layernorm.weight": "model-00009-of-00013.safetensors",
"model.layers.43.mlp.down_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.43.mlp.gate_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.43.mlp.up_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.43.post_attention_layernorm.weight": "model-00009-of-00013.safetensors",
"model.layers.43.self_attn.k_norm.weight": "model-00009-of-00013.safetensors",
"model.layers.43.self_attn.k_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.43.self_attn.o_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.43.self_attn.q_norm.weight": "model-00009-of-00013.safetensors",
"model.layers.43.self_attn.q_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.43.self_attn.v_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.44.input_layernorm.weight": "model-00009-of-00013.safetensors",
"model.layers.44.mlp.down_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.44.mlp.gate_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.44.mlp.up_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.44.post_attention_layernorm.weight": "model-00009-of-00013.safetensors",
"model.layers.44.self_attn.k_norm.weight": "model-00009-of-00013.safetensors",
"model.layers.44.self_attn.k_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.44.self_attn.o_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.44.self_attn.q_norm.weight": "model-00009-of-00013.safetensors",
"model.layers.44.self_attn.q_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.44.self_attn.v_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.45.input_layernorm.weight": "model-00009-of-00013.safetensors",
"model.layers.45.mlp.down_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.45.mlp.gate_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.45.mlp.up_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.45.post_attention_layernorm.weight": "model-00009-of-00013.safetensors",
"model.layers.45.self_attn.k_norm.weight": "model-00009-of-00013.safetensors",
"model.layers.45.self_attn.k_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.45.self_attn.o_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.45.self_attn.q_norm.weight": "model-00009-of-00013.safetensors",
"model.layers.45.self_attn.q_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.45.self_attn.v_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.46.input_layernorm.weight": "model-00010-of-00013.safetensors",
"model.layers.46.mlp.down_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.46.mlp.gate_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.46.mlp.up_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.46.post_attention_layernorm.weight": "model-00010-of-00013.safetensors",
"model.layers.46.self_attn.k_norm.weight": "model-00009-of-00013.safetensors",
"model.layers.46.self_attn.k_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.46.self_attn.o_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.46.self_attn.q_norm.weight": "model-00009-of-00013.safetensors",
"model.layers.46.self_attn.q_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.46.self_attn.v_proj.weight": "model-00009-of-00013.safetensors",
"model.layers.47.input_layernorm.weight": "model-00010-of-00013.safetensors",
"model.layers.47.mlp.down_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.47.mlp.gate_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.47.mlp.up_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.47.post_attention_layernorm.weight": "model-00010-of-00013.safetensors",
"model.layers.47.self_attn.k_norm.weight": "model-00010-of-00013.safetensors",
"model.layers.47.self_attn.k_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.47.self_attn.o_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.47.self_attn.q_norm.weight": "model-00010-of-00013.safetensors",
"model.layers.47.self_attn.q_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.47.self_attn.v_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.48.input_layernorm.weight": "model-00010-of-00013.safetensors",
"model.layers.48.mlp.down_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.48.mlp.gate_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.48.mlp.up_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.48.post_attention_layernorm.weight": "model-00010-of-00013.safetensors",
"model.layers.48.self_attn.k_norm.weight": "model-00010-of-00013.safetensors",
"model.layers.48.self_attn.k_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.48.self_attn.o_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.48.self_attn.q_norm.weight": "model-00010-of-00013.safetensors",
"model.layers.48.self_attn.q_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.48.self_attn.v_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.49.input_layernorm.weight": "model-00010-of-00013.safetensors",
"model.layers.49.mlp.down_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.49.mlp.gate_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.49.mlp.up_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.49.post_attention_layernorm.weight": "model-00010-of-00013.safetensors",
"model.layers.49.self_attn.k_norm.weight": "model-00010-of-00013.safetensors",
"model.layers.49.self_attn.k_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.49.self_attn.o_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.49.self_attn.q_norm.weight": "model-00010-of-00013.safetensors",
"model.layers.49.self_attn.q_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.49.self_attn.v_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.5.input_layernorm.weight": "model-00002-of-00013.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00002-of-00013.safetensors",
"model.layers.5.self_attn.k_norm.weight": "model-00002-of-00013.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.5.self_attn.q_norm.weight": "model-00002-of-00013.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.50.input_layernorm.weight": "model-00010-of-00013.safetensors",
"model.layers.50.mlp.down_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.50.mlp.gate_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.50.mlp.up_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.50.post_attention_layernorm.weight": "model-00010-of-00013.safetensors",
"model.layers.50.self_attn.k_norm.weight": "model-00010-of-00013.safetensors",
"model.layers.50.self_attn.k_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.50.self_attn.o_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.50.self_attn.q_norm.weight": "model-00010-of-00013.safetensors",
"model.layers.50.self_attn.q_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.50.self_attn.v_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.51.input_layernorm.weight": "model-00011-of-00013.safetensors",
"model.layers.51.mlp.down_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.51.mlp.gate_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.51.mlp.up_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.51.post_attention_layernorm.weight": "model-00011-of-00013.safetensors",
"model.layers.51.self_attn.k_norm.weight": "model-00010-of-00013.safetensors",
"model.layers.51.self_attn.k_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.51.self_attn.o_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.51.self_attn.q_norm.weight": "model-00010-of-00013.safetensors",
"model.layers.51.self_attn.q_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.51.self_attn.v_proj.weight": "model-00010-of-00013.safetensors",
"model.layers.52.input_layernorm.weight": "model-00011-of-00013.safetensors",
"model.layers.52.mlp.down_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.52.mlp.gate_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.52.mlp.up_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.52.post_attention_layernorm.weight": "model-00011-of-00013.safetensors",
"model.layers.52.self_attn.k_norm.weight": "model-00011-of-00013.safetensors",
"model.layers.52.self_attn.k_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.52.self_attn.o_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.52.self_attn.q_norm.weight": "model-00011-of-00013.safetensors",
"model.layers.52.self_attn.q_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.52.self_attn.v_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.53.input_layernorm.weight": "model-00011-of-00013.safetensors",
"model.layers.53.mlp.down_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.53.mlp.gate_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.53.mlp.up_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.53.post_attention_layernorm.weight": "model-00011-of-00013.safetensors",
"model.layers.53.self_attn.k_norm.weight": "model-00011-of-00013.safetensors",
"model.layers.53.self_attn.k_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.53.self_attn.o_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.53.self_attn.q_norm.weight": "model-00011-of-00013.safetensors",
"model.layers.53.self_attn.q_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.53.self_attn.v_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.54.input_layernorm.weight": "model-00011-of-00013.safetensors",
"model.layers.54.mlp.down_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.54.mlp.gate_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.54.mlp.up_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.54.post_attention_layernorm.weight": "model-00011-of-00013.safetensors",
"model.layers.54.self_attn.k_norm.weight": "model-00011-of-00013.safetensors",
"model.layers.54.self_attn.k_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.54.self_attn.o_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.54.self_attn.q_norm.weight": "model-00011-of-00013.safetensors",
"model.layers.54.self_attn.q_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.54.self_attn.v_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.55.input_layernorm.weight": "model-00011-of-00013.safetensors",
"model.layers.55.mlp.down_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.55.mlp.gate_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.55.mlp.up_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.55.post_attention_layernorm.weight": "model-00011-of-00013.safetensors",
"model.layers.55.self_attn.k_norm.weight": "model-00011-of-00013.safetensors",
"model.layers.55.self_attn.k_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.55.self_attn.o_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.55.self_attn.q_norm.weight": "model-00011-of-00013.safetensors",
"model.layers.55.self_attn.q_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.55.self_attn.v_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.56.input_layernorm.weight": "model-00011-of-00013.safetensors",
"model.layers.56.mlp.down_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.56.mlp.gate_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.56.mlp.up_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.56.post_attention_layernorm.weight": "model-00011-of-00013.safetensors",
"model.layers.56.self_attn.k_norm.weight": "model-00011-of-00013.safetensors",
"model.layers.56.self_attn.k_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.56.self_attn.o_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.56.self_attn.q_norm.weight": "model-00011-of-00013.safetensors",
"model.layers.56.self_attn.q_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.56.self_attn.v_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.57.input_layernorm.weight": "model-00012-of-00013.safetensors",
"model.layers.57.mlp.down_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.57.mlp.gate_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.57.mlp.up_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.57.post_attention_layernorm.weight": "model-00012-of-00013.safetensors",
"model.layers.57.self_attn.k_norm.weight": "model-00011-of-00013.safetensors",
"model.layers.57.self_attn.k_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.57.self_attn.o_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.57.self_attn.q_norm.weight": "model-00011-of-00013.safetensors",
"model.layers.57.self_attn.q_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.57.self_attn.v_proj.weight": "model-00011-of-00013.safetensors",
"model.layers.58.input_layernorm.weight": "model-00012-of-00013.safetensors",
"model.layers.58.mlp.down_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.58.mlp.gate_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.58.mlp.up_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.58.post_attention_layernorm.weight": "model-00012-of-00013.safetensors",
"model.layers.58.self_attn.k_norm.weight": "model-00012-of-00013.safetensors",
"model.layers.58.self_attn.k_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.58.self_attn.o_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.58.self_attn.q_norm.weight": "model-00012-of-00013.safetensors",
"model.layers.58.self_attn.q_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.58.self_attn.v_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.59.input_layernorm.weight": "model-00012-of-00013.safetensors",
"model.layers.59.mlp.down_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.59.mlp.gate_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.59.mlp.up_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.59.post_attention_layernorm.weight": "model-00012-of-00013.safetensors",
"model.layers.59.self_attn.k_norm.weight": "model-00012-of-00013.safetensors",
"model.layers.59.self_attn.k_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.59.self_attn.o_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.59.self_attn.q_norm.weight": "model-00012-of-00013.safetensors",
"model.layers.59.self_attn.q_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.59.self_attn.v_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.6.input_layernorm.weight": "model-00002-of-00013.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00002-of-00013.safetensors",
"model.layers.6.self_attn.k_norm.weight": "model-00002-of-00013.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.6.self_attn.q_norm.weight": "model-00002-of-00013.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.60.input_layernorm.weight": "model-00012-of-00013.safetensors",
"model.layers.60.mlp.down_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.60.mlp.gate_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.60.mlp.up_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.60.post_attention_layernorm.weight": "model-00012-of-00013.safetensors",
"model.layers.60.self_attn.k_norm.weight": "model-00012-of-00013.safetensors",
"model.layers.60.self_attn.k_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.60.self_attn.o_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.60.self_attn.q_norm.weight": "model-00012-of-00013.safetensors",
"model.layers.60.self_attn.q_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.60.self_attn.v_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.61.input_layernorm.weight": "model-00012-of-00013.safetensors",
"model.layers.61.mlp.down_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.61.mlp.gate_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.61.mlp.up_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.61.post_attention_layernorm.weight": "model-00012-of-00013.safetensors",
"model.layers.61.self_attn.k_norm.weight": "model-00012-of-00013.safetensors",
"model.layers.61.self_attn.k_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.61.self_attn.o_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.61.self_attn.q_norm.weight": "model-00012-of-00013.safetensors",
"model.layers.61.self_attn.q_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.61.self_attn.v_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.62.input_layernorm.weight": "model-00013-of-00013.safetensors",
"model.layers.62.mlp.down_proj.weight": "model-00013-of-00013.safetensors",
"model.layers.62.mlp.gate_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.62.mlp.up_proj.weight": "model-00013-of-00013.safetensors",
"model.layers.62.post_attention_layernorm.weight": "model-00013-of-00013.safetensors",
"model.layers.62.self_attn.k_norm.weight": "model-00012-of-00013.safetensors",
"model.layers.62.self_attn.k_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.62.self_attn.o_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.62.self_attn.q_norm.weight": "model-00012-of-00013.safetensors",
"model.layers.62.self_attn.q_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.62.self_attn.v_proj.weight": "model-00012-of-00013.safetensors",
"model.layers.63.input_layernorm.weight": "model-00013-of-00013.safetensors",
"model.layers.63.mlp.down_proj.weight": "model-00013-of-00013.safetensors",
"model.layers.63.mlp.gate_proj.weight": "model-00013-of-00013.safetensors",
"model.layers.63.mlp.up_proj.weight": "model-00013-of-00013.safetensors",
"model.layers.63.post_attention_layernorm.weight": "model-00013-of-00013.safetensors",
"model.layers.63.self_attn.k_norm.weight": "model-00013-of-00013.safetensors",
"model.layers.63.self_attn.k_proj.weight": "model-00013-of-00013.safetensors",
"model.layers.63.self_attn.o_proj.weight": "model-00013-of-00013.safetensors",
"model.layers.63.self_attn.q_norm.weight": "model-00013-of-00013.safetensors",
"model.layers.63.self_attn.q_proj.weight": "model-00013-of-00013.safetensors",
"model.layers.63.self_attn.v_proj.weight": "model-00013-of-00013.safetensors",
"model.layers.7.input_layernorm.weight": "model-00002-of-00013.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00002-of-00013.safetensors",
"model.layers.7.self_attn.k_norm.weight": "model-00002-of-00013.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.7.self_attn.q_norm.weight": "model-00002-of-00013.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.8.input_layernorm.weight": "model-00002-of-00013.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00002-of-00013.safetensors",
"model.layers.8.self_attn.k_norm.weight": "model-00002-of-00013.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.8.self_attn.q_norm.weight": "model-00002-of-00013.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.9.input_layernorm.weight": "model-00003-of-00013.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00003-of-00013.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00003-of-00013.safetensors",
"model.layers.9.self_attn.k_norm.weight": "model-00002-of-00013.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.9.self_attn.q_norm.weight": "model-00002-of-00013.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00002-of-00013.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00002-of-00013.safetensors",
"model.norm.weight": "model-00013-of-00013.safetensors"
}
}

3
tokenizer.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
size 11422654

240
tokenizer_config.json Normal file
View File

@@ -0,0 +1,240 @@
{
"add_bos_token": false,
"add_prefix_space": false,
"added_tokens_decoder": {
"151643": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151644": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151645": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151646": {
"content": "<|object_ref_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151647": {
"content": "<|object_ref_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151648": {
"content": "<|box_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151649": {
"content": "<|box_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151650": {
"content": "<|quad_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151651": {
"content": "<|quad_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151652": {
"content": "<|vision_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151653": {
"content": "<|vision_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151654": {
"content": "<|vision_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151655": {
"content": "<|image_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151656": {
"content": "<|video_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151657": {
"content": "<tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151658": {
"content": "</tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151659": {
"content": "<|fim_prefix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151660": {
"content": "<|fim_middle|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151661": {
"content": "<|fim_suffix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151662": {
"content": "<|fim_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151663": {
"content": "<|repo_name|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151664": {
"content": "<|file_sep|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151665": {
"content": "<tool_response>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151666": {
"content": "</tool_response>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151667": {
"content": "<think>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151668": {
"content": "</think>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
}
},
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"bos_token": null,
"chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {{- messages[0].content + '\\n\\n' }}\n {%- endif %}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for index in range(ns.last_query_index, -1, -1) %}\n {%- set message = messages[index] %}\n {%- if ns.multi_step_tool and message.role == \"user\" and not('<tool_response>' in message.content and '</tool_response>' in message.content) %}\n {%- set ns.multi_step_tool = false %}\n {%- set ns.last_query_index = index %}\n {%- endif %}\n{%- endfor %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {%- set content = message.content %}\n {%- set reasoning_content = '' %}\n {%- if message.reasoning_content is defined and message.reasoning_content is not none %}\n {%- set reasoning_content = message.reasoning_content %}\n {%- else %}\n {%- if '</think>' in message.content %}\n {%- set content = message.content.split('</think>')[-1].lstrip('\\n') %}\n {%- set reasoning_content = message.content.split('</think>')[0].rstrip('\\n').split('<think>')[-1].lstrip('\\n') %}\n {%- endif %}\n {%- endif %}\n {%- if loop.index0 > ns.last_query_index %}\n {%- if loop.last or (not loop.last and reasoning_content) %}\n {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content.strip('\\n') + '\\n</think>\\n\\n' + content.lstrip('\\n') }}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}\n {{- '\\n' }}\n {%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n {%- if enable_thinking is defined and enable_thinking is false %}\n {{- '<think>\\n\\n</think>\\n\\n' }}\n {%- endif %}\n{%- endif %}",
"clean_up_tokenization_spaces": false,
"eos_token": "<|im_end|>",
"errors": "replace",
"extra_special_tokens": {},
"model_max_length": 131072,
"pad_token": "<|endoftext|>",
"split_special_tokens": false,
"tokenizer_class": "Qwen2Tokenizer",
"unk_token": null
}

BIN
vocab.json (Stored with Git LFS) Normal file

Binary file not shown.