初始化项目,由ModelHub XC社区提供模型

Model: inclusionAI/AReaL-boba-2-32B
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-13 08:34:32 +08:00
commit 0931ef2638
19 changed files with 152789 additions and 0 deletions

36
.gitattributes vendored Normal file
View File

@@ -0,0 +1,36 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text

220
README.md Normal file
View File

@@ -0,0 +1,220 @@
---
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
---
<h1 align="center">
<em>AReaL</em>: Ant Reasoning Reinforcement Learning for LLMs
</h1>
<p align="center">
| <a href="https://arxiv.org/pdf/2505.24298"><b>Paper</b></a> | <a href="https://inclusionai.github.io/AReaL/"><b>Documentation</b></a> | <a href="https://deepwiki.com/inclusionAI/AReaL"><b>Ask DeepWiki</b></a> | <a href="https://huggingface.co/collections/inclusionAI/areal-boba-2-683f0e819ccb7bb2e1b2f2d5"><b>🤗 Models & Data</b></a> |
</p>
AReaL (Ant Reasoning RL) is an open-source **fully asynchronous reinforcement learning training system** for large reasoning models developed at **the RL Lab, Ant Research**. Built upon the open-source project [RealHF](https://github.com/openpsi-project/ReaLHF), we are fully committed to open-source by providing training details, data, and infrastructure required to reproduce results along with the model itself. AReaL aims to help everyone build their own AI agents easily and affordably. Our team loves milk tea because it's delicious, customizable, and affordable. We hope you enjoy our project just like how you enjoy real-world milk tea (cheers).
**AReaL Highlights**
+ 🔥 <span style="color: red; font-weight: bold;">**[NEW] Asynchronous RL:**</span> With algorithm-system co-design, AReaL supports fully asynchronous RL for **the fastest training**! Experimental support for multi-turn agentic RL is also provided.
+ 🛠️ **Open & Reproducible**: We continuously release _all code, datasets, and training recipes_ for RL training of LLMs.
+ 🚀 **Scalability**: AReaL can seamlessly adapt to different computational resource settings, ranging from a single node to 1K GPUs.
+ 🔪 **Cutting-Edge Performance:** AReaL can produce models with cutting-edge reasoning capabilities in math and coding. We are also actively working on agentic tasks.
## News
**[2025/06/03] (v0.3, boba²)** We release **boba²** (double-boba) for fully asynchronous RL training, which achieves a **2.77x speedup while obtaining on-par or even better training performance** compared to synchronous systems. Moreover, asynchronous RL makes it extremely easy to set up multi-turn agentic RL training! Check out [our v0.3 overview blog](/blog/AReaL_v0_3.md) and the [research paper](https://arxiv.org/pdf/2505.24298).
**[2025/03/31] (v0.2, Boba)** Here comes our next milestone release - Boba! Please call it A-ReaL-Boba! This release includes much faster training with SGLang support and SOTA 7B and 32B models on math reasoning. Check our [v0.2 technical blog](/blog/AReaL_v0_2.md).
**[2025/02/24] (v0.1)** Our initial release includes reproducible results for 1.5B and 7B LRMs. Check our [v0.1 technical blog](/blog/AReaL_v0_1.md).
## Release Highlights
In our AReaL-boba² (A-ReaL-double-boba) release, we highlight the top 3 most important features:
+ A fully asynchronous RL training pipeline with **system and RL algorithm co-design**, achieving over 2.77x speedup without any performance drop. Check the [benchmark scripts and instructions here](https://github.com/inclusionAI/AReaL/tree/main/benchmark/verl_v0_3_0_post1_76084d3).
+ SOTA coding models, i.e., a 14B model with a **69.1 score on LCB-v5**. To reproduce, check the [configs](https://github.com/inclusionAI/AReaL/tree/main/examples/configs/v0.3-qwen3-code) and [instructions](https://inclusionai.github.io/AReaL/references/reproduce.html).
+ Experimental support for **multi-turn** agentic RL training. Check our [complete example](https://inclusionai.github.io/AReaL/customization/agent.html).
For the complete system design and more training details, please check [our v0.3 blog](/blog/AReaL_v0_3.md) and our [research paper](about:blank) for a more comprehensive presentation of our system design.
### Overview of Asynchronous RL Training
During the synchronous RL training process, a generation step must wait until the longest sequence completes within the batch of LLM outputs. Due to the varying output lengths for LRMs, a synchronous RL system suffers from massive GPU idle time, leading to training inefficiency. Some recent works ([DeepCoder](https://pretty-radio-b75.notion.site/DeepCoder-A-Fully-Open-Source-14B-Coder-at-O3-mini-Level-1cf81902c14680b3bee5eb349a512a51), [Intellect](https://www.primeintellect.ai/blog/intellect-2)) propose overlapping a single training step with a single generation step to accelerate training. However, the largest bottleneck remains unchanged: the samples within a batch are still from the same model version, leading to waiting and GPU idle time.
**Synchronous vs One-step Overlap RL**
*Fig.1. Left: Execution timeline of synchronous RL training. Right: Execution timeline of one-step overlap RL system.*
AReaL adopts a fully asynchronous RL training framework that completely decouples generation from training. In AReaL, LLM generation runs in a streaming manner, with each rollout worker continuously producing outputs without waiting. Meanwhile, trainer workers perform parallel model updates upon receiving training batches.
**Asynchronous RL Training**
*Fig 2. Execution timeline of our fully asynchronous RL system.*
AReaL follows a system-algorithm co-design principle: on the system side, AReaL efficiently syncs model parameters and carefully controls the staleness of each training sample; on the algorithm side, AReaL improves the objective of PPO to make async-RL stable.
We compare the scalability of **asynchronous RL** training based on our AReaL-boba² system with **classical synchronous RL** training (we adopt the fastest open-source system veRL, main branch on 05/07/2025) across different model sizes and different numbers of H800 GPUs. AReaL demonstrates much improved scaling capabilities with respect to training throughput. This is also partially due to AReaL decoupling training and generation, leading to much fewer GPU memory fragments.
**Scaling Comparison**
*Fig.3 The scaling trend of asynchronous RL (based on AReaL-boba2) and classical synchronous RL (based on veRL) with different model sizes. Dotted lines indicate ideal linear scaling.*
### SOTA Code Generation Model by AReaL-boba²
We use **Qwen3** as our base model. After asynchronous RL training, we achieve SOTA results on LiveCodeBench, Codeforces, and CodeContests benchmarks.
| **Model (8B)** | **LiveCodeBench v5**<br/>**(2024.10-2025.2)** | **Codeforces** | **CodeContests** |
| :---: | :---: | :---: | :---: |
| Qwen3-8B | 58.8 | 1879/96.7% | 31.4 |
| DeepSeek-R1-0528-Qwen3-8B | 58.4 | 1945/97.3% | 31.0 |
| [🤗 AReaL-boba²-8B-Open](https://huggingface.co/inclusionAI/AReaL-boba-2-8B-subset) | 62.0 | 1933/97.2% | **41.4** |
| [🤗 AReaL-boba²-8B](https://huggingface.co/inclusionAI/AReaL-boba-2-8B) | **63.0** | **1962/97.5%** | 40.8 |
| **Model (14B)** | **LiveCodeBench v5**<br/>**(2024.10-2025.2)** | **Codeforces** | **CodeContests** |
| :---: | :---: | :---: | :---: |
| Qwen3-14B | 65.4 | 1978/97.7% | 38.3 |
| DeepCoder-14B-Preview | 60.6 | 1936/95.3% | 40.1 |
| [🤗 AReaL-boba²-14B-Open](https://huggingface.co/inclusionAI/AReaL-boba-2-14B-subset) | 67.3 | 1990/97.8% | **46.2** |
| [🤗 AReal-boba²-14B](https://huggingface.co/inclusionAI/AReaL-boba-2-14B) | **69.1** | **2044/98.2%** | 46.1 |
| **Larger Models** | **LiveCodeBench v5**<br/>**(2024.10-2025.2)** | **Codeforces** | **CodeContests** |
| :---: | :---: | :---: | :---: |
| Qwen3-235B | 70.7 | 2056 | - |
| DeepSeek-R1 | 64.3 | 2029 | - |
| OpenAI-o3-mini (Medium) | 66.3 | 2036 | - |
*Table 1: Coding Task Performance Comparison. AReaL-boba²-8B/14B-Open denotes training results on open-source data. AReaL-boba²-8B/14B models are trained with an additional small amount of internal data and achieve SOTA performance on LiveCodeBench, Codeforces & CodeContests.*
We highlight the [tutorials](https://inclusionai.github.io/AReaL/customization/dataset.html) and [code walkthroughs](https://inclusionai.github.io/AReaL/developer/overview.html) about the following key features for asynchronous training:
+ [Streaming generation and reward computation](https://inclusionai.github.io/AReaL/developer/rollout/rollout_worker.html)
+ [Interruptible rollout](https://inclusionai.github.io/AReaL/developer/rollout/gserver.html)
+ [Data staleness control with the rollout controller](https://inclusionai.github.io/AReaL/developer/rollout/gserver.html)
+ [The adoption of decoupled PPO loss](https://inclusionai.github.io/AReaL/customization/algorithm.html)
### RL Training for Multi-turn Agent
AReaL-boba² allows you to independently customize the [dataset](https://inclusionai.github.io/AReaL/customization/dataset.html), [rollout behavior](https://inclusionai.github.io/AReaL/customization/agent.html), and the [training algorithm](https://inclusionai.github.io/AReaL/customization/algorithm.html), without needing to modify the heavy system-level code.
In particular, we show a simple example to develop a multi-turn math agent for RL training. Please see the learning curve below and reference the [step-by-step guide](https://inclusionai.github.io/AReaL/customization/agent.html) if you want to implement your own agentic RL project.
**Multi-turn Agent Learning Curve**
## Getting Started
### Quick Start
Train Qwen3 1.7B locally:
```bash
bash examples/run_async_ppo.sh
```
Evaluation:
```bash
cd evaluation
# Evaluate the model
python eval_and_aggregate.py \
--model_path ${MODEL_PATH} \
--output_path ${OUTPUT_PATH} \
--data_names aime24,aime25 \
--max_gen_tokens 32768 \
--data_names codeforces,lcb_v5 \
--prompt_type qwen3-think-pure \
--temperature 1.0
```
## Resources
+ [Documentation](https://inclusionai.github.io/AReaL/)
+ [Contributing](https://inclusionai.github.io/AReaL/contrib.html)
### Quickstart
+ [Installation](https://inclusionai.github.io/AReaL/tutorial/installation.html)
+ [Example: Improving the math capability of Qwen3 with PPO](https://inclusionai.github.io/AReaL/tutorial/quickstart.html)
### Benchmark and Reproduction
+ **Reproduce boba² Code Models**
- 🤗 **Model weights**: [8B-code](https://huggingface.co/inclusionAI/AReaL-boba-2-8B), [14B-code](https://huggingface.co/inclusionAI/AReaL-boba-2-14B), [8B-code-open](https://huggingface.co/inclusionAI/AReaL-boba-2-8B-subset), [14B-code-open](https://huggingface.co/inclusionAI/AReaL-boba-2-14B-subset)
- [Evaluation Guide](https://inclusionai.github.io/AReaL/tutorial/eval.html)
- [Training configs](https://github.com/inclusionAI/AReaL/tree/main/examples/configs/v0.3-qwen3-code) and [instructions](https://inclusionai.github.io/AReaL/references/reproduce.html)
+ [Scripts for Benchmark Training Throughput](https://github.com/inclusionAI/AReaL/tree/main/benchmark/verl_v0_3_0_post1_76084d3)
### Customization Guide
- [Use your own dataset](https://inclusionai.github.io/AReaL/customization/dataset.html)
- [Modifying the reward function and rollout behavior (multi-turn agentic RL)](https://inclusionai.github.io/AReaL/customization/agent.html)
- [Modifying PPO to GRPO](https://inclusionai.github.io/AReaL/customization/algorithm.html#grouped-advantage-normalization)
- [Developing the decoupled PPO loss](https://inclusionai.github.io/AReaL/customization/algorithm.html#the-decoupled-ppo-loss)
### System Code Walkthrough
+ [Trainer](https://inclusionai.github.io/AReaL/developer/trainer/model_worker.html)
+ [Model Backend and Algorithm Interface](https://inclusionai.github.io/AReaL/developer/trainer/algo_interface.html)
+ [Rollout Controller](https://inclusionai.github.io/AReaL/developer/rollout/gserver.html)
+ [Streaming generation and reward computation](https://inclusionai.github.io/AReaL/developer/rollout/rollout_worker.html)
## Future Plan
AReaL is under active development. We plan to have minor releases weekly and major releases monthly. Community engagement and contributions are extremely welcome. We are also **hiring interns and full-time employees** with open positions in both the US and China.
For the research and development plan already in place, please see the following list:
### System Development
- [x] Support for SGLang
- [x] RL training with coding problems
- [x] Asynchronous generation and RL training
- [ ] Optimizations for distributed training: expert parallel for MOE and zero-bubble pipelining
- [ ] RL for vision-language models (VLM)
- [x] Multi-turn agentic RL
- [ ] Function calling and tool use
### Algorithm Development
- [x] RL training recipes for 1.5B and 7B models
- [x] A complete RL training recipe for 32B models
- [ ] Sample-efficient multi-task RL algorithms
- [ ] Agentic capabilities with end-to-end RL
- [ ] Stable RL training for larger MOE models
## Acknowledgement
We would like to note that major contributors are from the RL Lab at Ant Research and the Institute for Interdisciplinary Information Sciences, Tsinghua University.
Our team has also received invaluable assistance from the Data Intelligence Lab at Ant Research for data support and from the Super Computing Technology (SCT) team at Ant Group, particularly in the realm of large-scale cluster operations and maintenance.
We also appreciate all the pioneering works from the community, particularly the [ReaLHF](https://github.com/openpsi-project/ReaLHF) project from OpenPsi Inc. and other projects, including but not limited to [DeepScaleR](https://github.com/agentica-project/deepscaler), [Open-Reasoner-Zero](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero/tree/main), [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), [VeRL](https://github.com/volcengine/verl), [SGLang](https://github.com/sgl-project/sglang), [QwQ](https://github.com/QwenLM/QwQ), [Light-R1](https://github.com/Qihoo360/Light-R1) and [DAPO](https://github.com/BytedTsinghua-SIA/DAPO).
## Citation
```bibtex
@inproceedings{mei2025real,
author = {Mei, Zhiyu and Fu, Wei and Li, Kaiwei and Wang, Guangju and Zhang, Huanchen and Wu, Yi},
title = {ReaL: Efficient RLHF Training of Large Language Models with Parameter Reallocation},
booktitle = {Proceedings of the Eighth Conference on Machine Learning and Systems,
MLSys 2025, Santa Clara, CA, USA, May 12-15, 2025},
publisher = {mlsys.org},
year = {2025},
}
```
```bibtex
@misc{fu2025areal,
title={AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning},
author={Wei Fu and Jiaxuan Gao and Xujie Shen and Chen Zhu and Zhiyu Mei and Chuyi He and Shusheng Xu and Guo Wei and Jun Mei and Jiashu Wang and Tongkai Yang and Binhang Yuan and Yi Wu},
year={2025},
eprint={2505.24298},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2505.24298},
}
```

28
added_tokens.json Normal file
View File

@@ -0,0 +1,28 @@
{
"</think>": 151668,
"</tool_call>": 151658,
"</tool_response>": 151666,
"<think>": 151667,
"<tool_call>": 151657,
"<tool_response>": 151665,
"<|box_end|>": 151649,
"<|box_start|>": 151648,
"<|endoftext|>": 151643,
"<|file_sep|>": 151664,
"<|fim_middle|>": 151660,
"<|fim_pad|>": 151662,
"<|fim_prefix|>": 151659,
"<|fim_suffix|>": 151661,
"<|im_end|>": 151645,
"<|im_start|>": 151644,
"<|image_pad|>": 151655,
"<|object_ref_end|>": 151647,
"<|object_ref_start|>": 151646,
"<|quad_end|>": 151651,
"<|quad_start|>": 151650,
"<|repo_name|>": 151663,
"<|video_pad|>": 151656,
"<|vision_end|>": 151653,
"<|vision_pad|>": 151654,
"<|vision_start|>": 151652
}

28
config.json Normal file
View File

@@ -0,0 +1,28 @@
{
"architectures": [
"Qwen3ForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 5120,
"initializer_range": 0.02,
"intermediate_size": 25600,
"max_position_embeddings": 40960,
"max_window_layers": 28,
"model_type": "qwen3",
"num_attention_heads": 64,
"num_hidden_layers": 64,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000,
"sliding_window": 4096,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.51.1",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 151936
}

13
generation_config.json Normal file
View File

@@ -0,0 +1,13 @@
{
"bos_token_id": 151643,
"do_sample": true,
"eos_token_id": [
151645,
151643
],
"pad_token_id": 151643,
"temperature": 0.6,
"top_k": 20,
"top_p": 0.95,
"transformers_version": "4.51.0"
}

151388
merges.txt Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:531c3678ac44437d918ee679c14ac309da7634859f26be420a427bee89ce3aab
size 8382232688

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:92578570053f969e93d2b09b5a627ccde21fb687f0c28774645018d935d6a354
size 8776809798

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c6654078c3ae3e209183b6478f82808702dca1665446d6992e8ca8a55e9393b5
size 8776809798

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:14e151e0af3fda4e079e2c8b87ff00e9d007ff8457d426ce73fff3889adc1052
size 7801608758

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:67e6b47ed3d9f5c00898160499e1a9fc1709d8b2f44f683a6182c9cd50cd0626
size 7801608758

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d782641191fb7113d88cd272766bdb0fc7b8e9c69b88495ba65ccae38f2a06a1
size 7801608758

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7654eeaf4f0ff3b37435f21e485472acb37eb2eb59efdff07394875095be637f
size 7801608758

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7df8c096183758942d4e225580ba91352a1381b97a44e44df1aa1131c5b85727
size 8382243342

View File

@@ -0,0 +1,778 @@
{
"metadata": {
"total_size": 65524262912
},
"weight_map": {
"model.embed_tokens.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.0.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00008.bin",
"model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.1.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00008.bin",
"model.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.2.mlp.down_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.2.mlp.gate_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.2.mlp.up_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.2.self_attn.o_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.2.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00008.bin",
"model.layers.3.input_layernorm.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.3.mlp.down_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.3.mlp.gate_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.3.mlp.up_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.3.post_attention_layernorm.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.3.self_attn.k_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.3.self_attn.o_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.3.self_attn.q_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.3.self_attn.v_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.3.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00008.bin",
"model.layers.4.input_layernorm.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.4.mlp.down_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.4.mlp.gate_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.4.mlp.up_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.4.post_attention_layernorm.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.4.self_attn.k_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.4.self_attn.o_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.4.self_attn.q_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.4.self_attn.v_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.4.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00008.bin",
"model.layers.5.input_layernorm.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.5.mlp.down_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.5.mlp.gate_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.5.mlp.up_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.5.post_attention_layernorm.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.5.self_attn.k_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.5.self_attn.o_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.5.self_attn.q_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.5.self_attn.v_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.5.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00008.bin",
"model.layers.6.input_layernorm.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.6.mlp.down_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.6.mlp.gate_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.6.mlp.up_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.6.post_attention_layernorm.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.6.self_attn.k_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.6.self_attn.o_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.6.self_attn.q_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.6.self_attn.v_proj.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.6.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00008.bin",
"model.layers.0.self_attn.k_norm.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.0.self_attn.q_norm.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.1.self_attn.k_norm.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.1.self_attn.q_norm.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.2.self_attn.k_norm.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.2.self_attn.q_norm.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.3.self_attn.k_norm.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.3.self_attn.q_norm.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.4.self_attn.k_norm.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.4.self_attn.q_norm.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.5.self_attn.k_norm.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.5.self_attn.q_norm.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.6.self_attn.k_norm.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.6.self_attn.q_norm.weight": "pytorch_model-00001-of-00008.bin",
"model.layers.7.input_layernorm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.7.mlp.down_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.7.mlp.gate_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.7.mlp.up_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.7.post_attention_layernorm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.7.self_attn.k_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.7.self_attn.o_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.7.self_attn.q_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.7.self_attn.v_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.7.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00008.bin",
"model.layers.8.input_layernorm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.8.mlp.down_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.8.mlp.gate_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.8.mlp.up_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.8.post_attention_layernorm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.8.self_attn.k_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.8.self_attn.o_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.8.self_attn.q_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.8.self_attn.v_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.8.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00008.bin",
"model.layers.9.input_layernorm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.9.mlp.down_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.9.mlp.gate_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.9.mlp.up_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.9.post_attention_layernorm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.9.self_attn.k_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.9.self_attn.o_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.9.self_attn.q_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.9.self_attn.v_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.9.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00008.bin",
"model.layers.10.input_layernorm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.10.mlp.down_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.10.mlp.gate_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.10.mlp.up_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.10.post_attention_layernorm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.10.self_attn.k_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.10.self_attn.o_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.10.self_attn.q_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.10.self_attn.v_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.10.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00008.bin",
"model.layers.11.input_layernorm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.11.mlp.down_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.11.mlp.gate_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.11.mlp.up_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.11.post_attention_layernorm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.11.self_attn.k_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.11.self_attn.o_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.11.self_attn.q_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.11.self_attn.v_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.11.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00008.bin",
"model.layers.12.input_layernorm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.12.mlp.down_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.12.mlp.gate_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.12.mlp.up_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.12.post_attention_layernorm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.12.self_attn.k_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.12.self_attn.o_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.12.self_attn.q_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.12.self_attn.v_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.12.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00008.bin",
"model.layers.13.input_layernorm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.13.mlp.down_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.13.mlp.gate_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.13.mlp.up_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.13.post_attention_layernorm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.13.self_attn.k_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.13.self_attn.o_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.13.self_attn.q_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.13.self_attn.v_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.13.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00008.bin",
"model.layers.14.input_layernorm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.14.mlp.down_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.14.mlp.gate_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.14.mlp.up_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.14.post_attention_layernorm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.14.self_attn.k_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.14.self_attn.o_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.14.self_attn.q_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.14.self_attn.v_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.14.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00008.bin",
"model.layers.15.input_layernorm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.15.mlp.down_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.15.mlp.gate_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.15.mlp.up_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.15.post_attention_layernorm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.15.self_attn.k_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.15.self_attn.o_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.15.self_attn.q_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.15.self_attn.v_proj.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.15.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00008.bin",
"model.layers.7.self_attn.k_norm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.7.self_attn.q_norm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.8.self_attn.k_norm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.8.self_attn.q_norm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.9.self_attn.k_norm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.9.self_attn.q_norm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.10.self_attn.k_norm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.10.self_attn.q_norm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.11.self_attn.k_norm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.11.self_attn.q_norm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.12.self_attn.k_norm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.12.self_attn.q_norm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.13.self_attn.k_norm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.13.self_attn.q_norm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.14.self_attn.k_norm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.14.self_attn.q_norm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.15.self_attn.k_norm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.15.self_attn.q_norm.weight": "pytorch_model-00002-of-00008.bin",
"model.layers.16.input_layernorm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.16.mlp.down_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.16.mlp.gate_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.16.mlp.up_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.16.post_attention_layernorm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.16.self_attn.k_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.16.self_attn.o_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.16.self_attn.q_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.16.self_attn.v_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.16.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00008.bin",
"model.layers.17.input_layernorm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.17.mlp.down_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.17.mlp.gate_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.17.mlp.up_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.17.post_attention_layernorm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.17.self_attn.k_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.17.self_attn.o_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.17.self_attn.q_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.17.self_attn.v_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.17.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00008.bin",
"model.layers.18.input_layernorm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.18.mlp.down_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.18.mlp.gate_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.18.mlp.up_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.18.post_attention_layernorm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.18.self_attn.k_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.18.self_attn.o_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.18.self_attn.q_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.18.self_attn.v_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.18.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00008.bin",
"model.layers.19.input_layernorm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.19.mlp.down_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.19.mlp.gate_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.19.mlp.up_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.19.post_attention_layernorm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.19.self_attn.k_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.19.self_attn.o_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.19.self_attn.q_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.19.self_attn.v_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.19.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00008.bin",
"model.layers.20.input_layernorm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.20.mlp.down_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.20.mlp.gate_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.20.mlp.up_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.20.post_attention_layernorm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.20.self_attn.k_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.20.self_attn.o_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.20.self_attn.q_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.20.self_attn.v_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.20.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00008.bin",
"model.layers.21.input_layernorm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.21.mlp.down_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.21.mlp.gate_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.21.mlp.up_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.21.post_attention_layernorm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.21.self_attn.k_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.21.self_attn.o_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.21.self_attn.q_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.21.self_attn.v_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.21.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00008.bin",
"model.layers.22.input_layernorm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.22.mlp.down_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.22.mlp.gate_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.22.mlp.up_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.22.post_attention_layernorm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.22.self_attn.k_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.22.self_attn.o_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.22.self_attn.q_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.22.self_attn.v_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.22.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00008.bin",
"model.layers.23.input_layernorm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.23.mlp.down_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.23.mlp.gate_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.23.mlp.up_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.23.post_attention_layernorm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.23.self_attn.k_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.23.self_attn.o_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.23.self_attn.q_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.23.self_attn.v_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.23.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00008.bin",
"model.layers.24.input_layernorm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.24.mlp.down_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.24.mlp.gate_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.24.mlp.up_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.24.post_attention_layernorm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.24.self_attn.k_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.24.self_attn.o_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.24.self_attn.q_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.24.self_attn.v_proj.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.24.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00008.bin",
"model.layers.16.self_attn.k_norm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.16.self_attn.q_norm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.17.self_attn.k_norm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.17.self_attn.q_norm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.18.self_attn.k_norm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.18.self_attn.q_norm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.19.self_attn.k_norm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.19.self_attn.q_norm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.20.self_attn.k_norm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.20.self_attn.q_norm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.21.self_attn.k_norm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.21.self_attn.q_norm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.22.self_attn.k_norm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.22.self_attn.q_norm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.23.self_attn.k_norm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.23.self_attn.q_norm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.24.self_attn.k_norm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.24.self_attn.q_norm.weight": "pytorch_model-00003-of-00008.bin",
"model.layers.31.input_layernorm.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.31.mlp.down_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.31.mlp.gate_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.31.mlp.up_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.31.post_attention_layernorm.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.31.self_attn.k_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.31.self_attn.o_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.31.self_attn.q_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.31.self_attn.v_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.31.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00008.bin",
"model.layers.32.input_layernorm.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.32.mlp.down_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.32.mlp.gate_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.32.mlp.up_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.32.post_attention_layernorm.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.32.self_attn.k_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.32.self_attn.o_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.32.self_attn.q_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.32.self_attn.v_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.32.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00008.bin",
"model.layers.25.input_layernorm.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.25.mlp.down_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.25.mlp.gate_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.25.mlp.up_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.25.post_attention_layernorm.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.25.self_attn.k_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.25.self_attn.o_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.25.self_attn.q_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.25.self_attn.v_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.25.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00008.bin",
"model.layers.26.input_layernorm.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.26.mlp.down_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.26.mlp.gate_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.26.mlp.up_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.26.post_attention_layernorm.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.26.self_attn.k_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.26.self_attn.o_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.26.self_attn.q_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.26.self_attn.v_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.26.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00008.bin",
"model.layers.27.input_layernorm.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.27.mlp.down_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.27.mlp.gate_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.27.mlp.up_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.27.post_attention_layernorm.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.27.self_attn.k_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.27.self_attn.o_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.27.self_attn.q_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.27.self_attn.v_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.27.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00008.bin",
"model.layers.28.input_layernorm.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.28.mlp.down_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.28.mlp.gate_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.28.mlp.up_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.28.post_attention_layernorm.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.28.self_attn.k_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.28.self_attn.o_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.28.self_attn.q_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.28.self_attn.v_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.28.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00008.bin",
"model.layers.29.input_layernorm.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.29.mlp.down_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.29.mlp.gate_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.29.mlp.up_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.29.post_attention_layernorm.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.29.self_attn.k_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.29.self_attn.o_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.29.self_attn.q_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.29.self_attn.v_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.29.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00008.bin",
"model.layers.30.input_layernorm.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.30.mlp.down_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.30.mlp.gate_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.30.mlp.up_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.30.post_attention_layernorm.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.30.self_attn.k_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.30.self_attn.o_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.30.self_attn.q_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.30.self_attn.v_proj.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.30.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00008.bin",
"model.layers.31.self_attn.k_norm.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.31.self_attn.q_norm.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.32.self_attn.k_norm.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.32.self_attn.q_norm.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.25.self_attn.k_norm.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.25.self_attn.q_norm.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.26.self_attn.k_norm.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.26.self_attn.q_norm.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.27.self_attn.k_norm.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.27.self_attn.q_norm.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.28.self_attn.k_norm.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.28.self_attn.q_norm.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.29.self_attn.k_norm.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.29.self_attn.q_norm.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.30.self_attn.k_norm.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.30.self_attn.q_norm.weight": "pytorch_model-00004-of-00008.bin",
"model.layers.33.input_layernorm.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.33.mlp.down_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.33.mlp.gate_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.33.mlp.up_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.33.post_attention_layernorm.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.33.self_attn.k_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.33.self_attn.o_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.33.self_attn.q_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.33.self_attn.v_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.33.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00008.bin",
"model.layers.34.input_layernorm.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.34.mlp.down_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.34.mlp.gate_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.34.mlp.up_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.34.post_attention_layernorm.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.34.self_attn.k_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.34.self_attn.o_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.34.self_attn.q_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.34.self_attn.v_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.34.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00008.bin",
"model.layers.35.input_layernorm.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.35.mlp.down_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.35.mlp.gate_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.35.mlp.up_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.35.post_attention_layernorm.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.35.self_attn.k_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.35.self_attn.o_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.35.self_attn.q_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.35.self_attn.v_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.35.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00008.bin",
"model.layers.36.input_layernorm.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.36.mlp.down_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.36.mlp.gate_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.36.mlp.up_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.36.post_attention_layernorm.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.36.self_attn.k_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.36.self_attn.o_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.36.self_attn.q_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.36.self_attn.v_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.36.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00008.bin",
"model.layers.37.input_layernorm.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.37.mlp.down_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.37.mlp.gate_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.37.mlp.up_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.37.post_attention_layernorm.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.37.self_attn.k_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.37.self_attn.o_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.37.self_attn.q_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.37.self_attn.v_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.37.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00008.bin",
"model.layers.38.input_layernorm.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.38.mlp.down_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.38.mlp.gate_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.38.mlp.up_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.38.post_attention_layernorm.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.38.self_attn.k_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.38.self_attn.o_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.38.self_attn.q_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.38.self_attn.v_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.38.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00008.bin",
"model.layers.39.input_layernorm.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.39.mlp.down_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.39.mlp.gate_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.39.mlp.up_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.39.post_attention_layernorm.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.39.self_attn.k_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.39.self_attn.o_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.39.self_attn.q_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.39.self_attn.v_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.39.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00008.bin",
"model.layers.40.input_layernorm.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.40.mlp.down_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.40.mlp.gate_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.40.mlp.up_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.40.post_attention_layernorm.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.40.self_attn.k_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.40.self_attn.o_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.40.self_attn.q_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.40.self_attn.v_proj.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.40.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00008.bin",
"model.layers.33.self_attn.k_norm.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.33.self_attn.q_norm.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.34.self_attn.k_norm.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.34.self_attn.q_norm.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.35.self_attn.k_norm.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.35.self_attn.q_norm.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.36.self_attn.k_norm.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.36.self_attn.q_norm.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.37.self_attn.k_norm.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.37.self_attn.q_norm.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.38.self_attn.k_norm.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.38.self_attn.q_norm.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.39.self_attn.k_norm.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.39.self_attn.q_norm.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.40.self_attn.k_norm.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.40.self_attn.q_norm.weight": "pytorch_model-00005-of-00008.bin",
"model.layers.41.input_layernorm.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.41.mlp.down_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.41.mlp.gate_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.41.mlp.up_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.41.post_attention_layernorm.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.41.self_attn.k_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.41.self_attn.o_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.41.self_attn.q_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.41.self_attn.v_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.41.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00008.bin",
"model.layers.42.input_layernorm.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.42.mlp.down_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.42.mlp.gate_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.42.mlp.up_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.42.post_attention_layernorm.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.42.self_attn.k_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.42.self_attn.o_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.42.self_attn.q_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.42.self_attn.v_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.42.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00008.bin",
"model.layers.43.input_layernorm.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.43.mlp.down_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.43.mlp.gate_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.43.mlp.up_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.43.post_attention_layernorm.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.43.self_attn.k_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.43.self_attn.o_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.43.self_attn.q_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.43.self_attn.v_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.43.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00008.bin",
"model.layers.44.input_layernorm.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.44.mlp.down_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.44.mlp.gate_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.44.mlp.up_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.44.post_attention_layernorm.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.44.self_attn.k_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.44.self_attn.o_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.44.self_attn.q_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.44.self_attn.v_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.44.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00008.bin",
"model.layers.45.input_layernorm.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.45.mlp.down_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.45.mlp.gate_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.45.mlp.up_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.45.post_attention_layernorm.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.45.self_attn.k_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.45.self_attn.o_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.45.self_attn.q_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.45.self_attn.v_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.45.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00008.bin",
"model.layers.46.input_layernorm.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.46.mlp.down_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.46.mlp.gate_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.46.mlp.up_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.46.post_attention_layernorm.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.46.self_attn.k_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.46.self_attn.o_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.46.self_attn.q_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.46.self_attn.v_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.46.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00008.bin",
"model.layers.47.input_layernorm.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.47.mlp.down_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.47.mlp.gate_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.47.mlp.up_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.47.post_attention_layernorm.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.47.self_attn.k_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.47.self_attn.o_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.47.self_attn.q_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.47.self_attn.v_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.47.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00008.bin",
"model.layers.48.input_layernorm.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.48.mlp.down_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.48.mlp.gate_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.48.mlp.up_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.48.post_attention_layernorm.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.48.self_attn.k_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.48.self_attn.o_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.48.self_attn.q_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.48.self_attn.v_proj.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.48.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00008.bin",
"model.layers.41.self_attn.k_norm.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.41.self_attn.q_norm.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.42.self_attn.k_norm.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.42.self_attn.q_norm.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.43.self_attn.k_norm.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.43.self_attn.q_norm.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.44.self_attn.k_norm.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.44.self_attn.q_norm.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.45.self_attn.k_norm.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.45.self_attn.q_norm.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.46.self_attn.k_norm.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.46.self_attn.q_norm.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.47.self_attn.k_norm.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.47.self_attn.q_norm.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.48.self_attn.k_norm.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.48.self_attn.q_norm.weight": "pytorch_model-00006-of-00008.bin",
"model.layers.49.input_layernorm.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.49.mlp.down_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.49.mlp.gate_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.49.mlp.up_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.49.post_attention_layernorm.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.49.self_attn.k_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.49.self_attn.o_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.49.self_attn.q_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.49.self_attn.v_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.49.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00008.bin",
"model.layers.50.input_layernorm.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.50.mlp.down_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.50.mlp.gate_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.50.mlp.up_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.50.post_attention_layernorm.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.50.self_attn.k_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.50.self_attn.o_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.50.self_attn.q_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.50.self_attn.v_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.50.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00008.bin",
"model.layers.51.input_layernorm.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.51.mlp.down_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.51.mlp.gate_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.51.mlp.up_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.51.post_attention_layernorm.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.51.self_attn.k_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.51.self_attn.o_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.51.self_attn.q_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.51.self_attn.v_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.51.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00008.bin",
"model.layers.52.input_layernorm.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.52.mlp.down_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.52.mlp.gate_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.52.mlp.up_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.52.post_attention_layernorm.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.52.self_attn.k_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.52.self_attn.o_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.52.self_attn.q_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.52.self_attn.v_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.52.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00008.bin",
"model.layers.53.input_layernorm.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.53.mlp.down_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.53.mlp.gate_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.53.mlp.up_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.53.post_attention_layernorm.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.53.self_attn.k_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.53.self_attn.o_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.53.self_attn.q_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.53.self_attn.v_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.53.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00008.bin",
"model.layers.54.input_layernorm.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.54.mlp.down_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.54.mlp.gate_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.54.mlp.up_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.54.post_attention_layernorm.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.54.self_attn.k_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.54.self_attn.o_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.54.self_attn.q_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.54.self_attn.v_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.54.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00008.bin",
"model.layers.55.input_layernorm.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.55.mlp.down_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.55.mlp.gate_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.55.mlp.up_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.55.post_attention_layernorm.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.55.self_attn.k_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.55.self_attn.o_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.55.self_attn.q_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.55.self_attn.v_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.55.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00008.bin",
"model.layers.56.input_layernorm.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.56.mlp.down_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.56.mlp.gate_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.56.mlp.up_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.56.post_attention_layernorm.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.56.self_attn.k_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.56.self_attn.o_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.56.self_attn.q_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.56.self_attn.v_proj.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.56.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00008.bin",
"model.layers.49.self_attn.k_norm.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.49.self_attn.q_norm.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.50.self_attn.k_norm.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.50.self_attn.q_norm.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.51.self_attn.k_norm.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.51.self_attn.q_norm.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.52.self_attn.k_norm.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.52.self_attn.q_norm.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.53.self_attn.k_norm.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.53.self_attn.q_norm.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.54.self_attn.k_norm.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.54.self_attn.q_norm.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.55.self_attn.k_norm.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.55.self_attn.q_norm.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.56.self_attn.k_norm.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.56.self_attn.q_norm.weight": "pytorch_model-00007-of-00008.bin",
"model.layers.63.input_layernorm.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.63.mlp.down_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.63.mlp.gate_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.63.mlp.up_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.63.post_attention_layernorm.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.63.self_attn.k_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.63.self_attn.o_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.63.self_attn.q_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.63.self_attn.v_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.63.self_attn.rotary_emb.inv_freq": "pytorch_model-00008-of-00008.bin",
"model.norm.weight": "pytorch_model-00008-of-00008.bin",
"lm_head.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.57.input_layernorm.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.57.mlp.down_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.57.mlp.gate_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.57.mlp.up_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.57.post_attention_layernorm.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.57.self_attn.k_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.57.self_attn.o_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.57.self_attn.q_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.57.self_attn.v_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.57.self_attn.rotary_emb.inv_freq": "pytorch_model-00008-of-00008.bin",
"model.layers.58.input_layernorm.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.58.mlp.down_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.58.mlp.gate_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.58.mlp.up_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.58.post_attention_layernorm.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.58.self_attn.k_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.58.self_attn.o_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.58.self_attn.q_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.58.self_attn.v_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.58.self_attn.rotary_emb.inv_freq": "pytorch_model-00008-of-00008.bin",
"model.layers.59.input_layernorm.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.59.mlp.down_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.59.mlp.gate_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.59.mlp.up_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.59.post_attention_layernorm.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.59.self_attn.k_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.59.self_attn.o_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.59.self_attn.q_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.59.self_attn.v_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.59.self_attn.rotary_emb.inv_freq": "pytorch_model-00008-of-00008.bin",
"model.layers.60.input_layernorm.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.60.mlp.down_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.60.mlp.gate_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.60.mlp.up_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.60.post_attention_layernorm.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.60.self_attn.k_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.60.self_attn.o_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.60.self_attn.q_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.60.self_attn.v_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.60.self_attn.rotary_emb.inv_freq": "pytorch_model-00008-of-00008.bin",
"model.layers.61.input_layernorm.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.61.mlp.down_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.61.mlp.gate_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.61.mlp.up_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.61.post_attention_layernorm.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.61.self_attn.k_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.61.self_attn.o_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.61.self_attn.q_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.61.self_attn.v_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.61.self_attn.rotary_emb.inv_freq": "pytorch_model-00008-of-00008.bin",
"model.layers.62.input_layernorm.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.62.mlp.down_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.62.mlp.gate_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.62.mlp.up_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.62.post_attention_layernorm.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.62.self_attn.k_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.62.self_attn.o_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.62.self_attn.q_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.62.self_attn.v_proj.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.62.self_attn.rotary_emb.inv_freq": "pytorch_model-00008-of-00008.bin",
"model.layers.63.self_attn.k_norm.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.63.self_attn.q_norm.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.57.self_attn.k_norm.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.57.self_attn.q_norm.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.58.self_attn.k_norm.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.58.self_attn.q_norm.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.59.self_attn.k_norm.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.59.self_attn.q_norm.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.60.self_attn.k_norm.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.60.self_attn.q_norm.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.61.self_attn.k_norm.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.61.self_attn.q_norm.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.62.self_attn.k_norm.weight": "pytorch_model-00008-of-00008.bin",
"model.layers.62.self_attn.q_norm.weight": "pytorch_model-00008-of-00008.bin"
}
}

31
special_tokens_map.json Normal file
View File

@@ -0,0 +1,31 @@
{
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"eos_token": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

3
tokenizer.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
size 11422654

239
tokenizer_config.json Normal file
View File

@@ -0,0 +1,239 @@
{
"add_bos_token": false,
"add_prefix_space": false,
"added_tokens_decoder": {
"151643": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151644": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151645": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151646": {
"content": "<|object_ref_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151647": {
"content": "<|object_ref_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151648": {
"content": "<|box_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151649": {
"content": "<|box_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151650": {
"content": "<|quad_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151651": {
"content": "<|quad_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151652": {
"content": "<|vision_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151653": {
"content": "<|vision_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151654": {
"content": "<|vision_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151655": {
"content": "<|image_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151656": {
"content": "<|video_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151657": {
"content": "<tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151658": {
"content": "</tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151659": {
"content": "<|fim_prefix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151660": {
"content": "<|fim_middle|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151661": {
"content": "<|fim_suffix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151662": {
"content": "<|fim_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151663": {
"content": "<|repo_name|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151664": {
"content": "<|file_sep|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151665": {
"content": "<tool_response>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151666": {
"content": "</tool_response>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151667": {
"content": "<think>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151668": {
"content": "</think>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
}
},
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"bos_token": null,
"chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {{- messages[0].content + '\\n\\n' }}\n {%- endif %}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n {%- set index = (messages|length - 1) - loop.index0 %}\n {%- if ns.multi_step_tool and message.role == \"user\" and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}\n {%- set ns.multi_step_tool = false %}\n {%- set ns.last_query_index = index %}\n {%- endif %}\n{%- endfor %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {%- set content = message.content %}\n {%- set reasoning_content = '' %}\n {%- if message.reasoning_content is defined and message.reasoning_content is not none %}\n {%- set reasoning_content = message.reasoning_content %}\n {%- else %}\n {%- if '</think>' in message.content %}\n {%- set content = message.content.split('</think>')[-1].lstrip('\\n') %}\n {%- set reasoning_content = message.content.split('</think>')[0].rstrip('\\n').split('<think>')[-1].lstrip('\\n') %}\n {%- endif %}\n {%- endif %}\n {%- if loop.index0 > ns.last_query_index %}\n {%- if loop.last or (not loop.last and reasoning_content) %}\n {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content.strip('\\n') + '\\n</think>\\n\\n' + content.lstrip('\\n') }}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}\n {{- '\\n' }}\n {%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n {%- if enable_thinking is defined and enable_thinking is false %}\n {{- '<think>\\n\\n</think>\\n\\n' }}\n {%- endif %}\n{%- endif %}",
"clean_up_tokenization_spaces": false,
"eos_token": "<|im_end|>",
"errors": "replace",
"model_max_length": 131072,
"pad_token": "<|endoftext|>",
"split_special_tokens": false,
"tokenizer_class": "Qwen2Tokenizer",
"unk_token": null
}

1
vocab.json Normal file

File diff suppressed because one or more lines are too long