初始化项目,由ModelHub XC社区提供模型
Model: okwinds/MiroMind-M1-RL-7B Source: Original Platform
This commit is contained in:
47
.gitattributes
vendored
Normal file
47
.gitattributes
vendored
Normal file
@@ -0,0 +1,47 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zstandard filter=lfs diff=lfs merge=lfs -text
|
||||
*.tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
*.db* filter=lfs diff=lfs merge=lfs -text
|
||||
*.ark* filter=lfs diff=lfs merge=lfs -text
|
||||
**/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
|
||||
**/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
|
||||
**/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.gguf* filter=lfs diff=lfs merge=lfs -text
|
||||
*.ggml filter=lfs diff=lfs merge=lfs -text
|
||||
*.llamafile* filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
109
README.md
Normal file
109
README.md
Normal file
@@ -0,0 +1,109 @@
|
||||
---
|
||||
frameworks:
|
||||
- Pytorch
|
||||
license: Apache License 2.0
|
||||
tasks:
|
||||
- text-generation
|
||||
base_model:
|
||||
- okwinds/Miromind-M1-SFT-7B
|
||||
---
|
||||
|
||||
本模型转载自 huggingface 【[miromind-ai](https://huggingface.co/miromind-ai)】
|
||||
|
||||
#### 📖 关于项目相关的研究,可阅读公众号“觉察流”文章👇</br>
|
||||
|
||||
《[MiroMind-M1:如何用CAMPO算法打造高效且可复现的全栈开源推理模型](https://mp.weixin.qq.com/s/REPzzgsUjDMikg4jIo9KRg)》
|
||||
|
||||
#### _本仓库作者在此 👇🏻 扫一扫_
|
||||
|
||||
<img src="https://www.modelscope.cn/models/okwinds/GPT-2/resolve/master/qrcode_for_jcl_258.jpg" />
|
||||
|
||||
---
|
||||
|
||||
SDK下载
|
||||
```bash
|
||||
#安装ModelScope
|
||||
pip install modelscope
|
||||
```
|
||||
```python
|
||||
#SDK模型下载
|
||||
from modelscope import snapshot_download
|
||||
model_dir = snapshot_download('okwinds/MiroMind-M1-RL-7B')
|
||||
```
|
||||
Git下载
|
||||
```
|
||||
#Git模型下载
|
||||
git clone https://www.modelscope.cn/okwinds/MiroMind-M1-RL-7B.git
|
||||
```
|
||||
|
||||
# 官方 MiroMind-M1-RL-7B 简介
|
||||
|
||||
<div align="center">
|
||||
<img src="https://www.modelscope.cn/models/okwinds/MiroMind-M1-RL-7B/resolve/master/assets/MiromindAI_H.svg" width="50%" alt="MiroMindM1" />
|
||||
</div>
|
||||
<!-- <hr> -->
|
||||
<div align="center">
|
||||
|
||||
[](https://www.modelscope.cn/models/okwinds/MiroMind-M1-RL-7B)
|
||||
[](https://www.modelscope.cn/datasets/okwinds/MiroMind-M1-RL-62K)
|
||||
[](https://arxiv.org/abs/2507.14683)
|
||||
[](https://github.com/MiroMindAsia/MiroMind-M1)
|
||||
[](https://miromind.ai/)
|
||||
|
||||
</div>
|
||||
|
||||
|
||||
|
||||
# MiroMind-M1
|
||||
|
||||
|
||||
## 🧾 Overview
|
||||
<div align="center">
|
||||
<img src="https://www.modelscope.cn/models/okwinds/MiroMind-M1-RL-7B/resolve/master/assets/7b_performance_training.png" width="80%" alt="7B Model Training Performance" />
|
||||
<p><i>Training performance of MiroMind-M1-RL-7B on AIME24 and AIME25.</i></p>
|
||||
</div>
|
||||
|
||||
**MiroMind-M1** is a fully open-source series of reasoning language models built on `Qwen-2.5`, focused on advancing mathematical reasoning. It is trained through supervised fine-tuning (**SFT**) on 719K curated problems and reinforcement learning with verifiable rewards (**RLVR**) on 62K challenging examples, using a context-aware multi-stage policy optimization method (**CAMPO**). MiroMind-M1 achieves state-of-the-art performance among open-source 7B Qwen-2.5-based models on AIME24, AIME25, and MATH500, with all models (`MiroMind-M1-SFT-7B`, `MiroMind-M1-RL-7B`, `MiroMind-M1-RL-32B`), data (`MiroMind-M1-SFT-719K`, `MiroMind-M1-RL-62K`), and training setups openly released.
|
||||
|
||||
|
||||
## 📊 Evaluation
|
||||
|
||||
### MiroMind-M1-SFT
|
||||
| Model | Initial Checkpoint | AIME24 (avg@64) | AIME25 (avg@64) | MATH500 (avg@5) |
|
||||
|------------------|----------------------------|--------|--------|---------|
|
||||
| DeepSeek-R1-Distill | Qwen2.5-Math-7B | 55.5 | 40.4† | 92.8 |
|
||||
| OpenThoughts | Qwen2.5-7-Instruct | 31.3 | 23.3 | 83.2 |
|
||||
| Open-R1 | Qwen2.5-Math-7B-Instruct | 36.7 | 40.0 | 90.6 |
|
||||
| Synthetic-1 | Qwen2.5-7B-Instruct | 30.0 | 26.6 | 85.6 |
|
||||
| **MiroMind-SFT-7B** | Qwen2.5-Math-7B | 60.4 | 45.0 | 94.6 |
|
||||
|
||||
*† means that the score of DeepSeek-R1 on AIME25 is from our evaluation.*
|
||||
|
||||
### MiroMind-M1-RL
|
||||
| Model | AIME24 (avg@64) | AIME25 (avg@64) | MATH500 (avg@5) |
|
||||
|----------------------------------|--------|--------|---------|
|
||||
| DeepSeek-R1 | 79.8 | 70.0 | – |
|
||||
| DeepSeek-R1-0528 | 91.4 | 87.5 | – |
|
||||
| Qwen3-8B | 76.0 | 67.3 | – |
|
||||
| DeepSeek-R1-0528-Qwen3-8B | 86.0 | 76.3 | – |
|
||||
| <tr><td colspan="4" align="center"><em>**32B Models trained from Qwen2.5 series**</em></td></tr> |
|
||||
| DeepSeek-R1-Distill-Qwen-32B | 70.8 | 52.1 | 95.8 |
|
||||
| Skywork-OR1-32B-Preview | 77.1 | 68.2 | 97.5 |
|
||||
| **MiroMind-M1-RL-32B** | 77.5 | 65.6 | 96.4 |
|
||||
| <tr><td colspan="4" align="center"><em>**7B Models trained from Qwen2.5 series**</em></td></tr> |
|
||||
| DeepSeek-R1-Distill-Qwen-7B | 55.5 | 39.2 | – |
|
||||
| **MiroMind-M1-SFT-7B** | 60.4 | 45.0 | 94.6 |
|
||||
| Light-R1-7B-DS | 59.1 | 44.3 | – |
|
||||
| Skywork-OR1-7B | 72.2 | 54.6 | – |
|
||||
| **MiroMind-M1-RL-7B** | 73.4 | 57.8 | 96.7 |
|
||||
|
||||
|
||||
## 🔗 Resources
|
||||
### Models
|
||||
[`MiroMind-M1-SFT-7B`](https://www.modelscope.cn/models/okwinds/MiroMind-M1-SFT-7B)<br>
|
||||
[`MiroMind-M1-RL-7B`](https://www.modelscope.cn/models/okwinds/MiroMind-M1-RL-7B)<br>
|
||||
[`MiroMind-M1-RL-32B`](https://www.modelscope.cn/models/okwinds/MiroMind-M1-RL-32B)<br>
|
||||
|
||||
### Data
|
||||
[`MiroMind-M1-SFT-719K`](https://www.modelscope.cn/datasets/okwinds/MiroMind-M1-SFT-719K)<br>
|
||||
[`MiroMind-M1-RL-62K`](https://www.modelscope.cn/datasets/okwinds/MiroMind-M1-RL-62K)<br>
|
||||
24
added_tokens.json
Normal file
24
added_tokens.json
Normal file
@@ -0,0 +1,24 @@
|
||||
{
|
||||
"</tool_call>": 151658,
|
||||
"<tool_call>": 151657,
|
||||
"<|box_end|>": 151649,
|
||||
"<|box_start|>": 151648,
|
||||
"<|endoftext|>": 151643,
|
||||
"<|file_sep|>": 151664,
|
||||
"<|fim_middle|>": 151660,
|
||||
"<|fim_pad|>": 151662,
|
||||
"<|fim_prefix|>": 151659,
|
||||
"<|fim_suffix|>": 151661,
|
||||
"<|im_end|>": 151645,
|
||||
"<|im_start|>": 151644,
|
||||
"<|image_pad|>": 151655,
|
||||
"<|object_ref_end|>": 151647,
|
||||
"<|object_ref_start|>": 151646,
|
||||
"<|quad_end|>": 151651,
|
||||
"<|quad_start|>": 151650,
|
||||
"<|repo_name|>": 151663,
|
||||
"<|video_pad|>": 151656,
|
||||
"<|vision_end|>": 151653,
|
||||
"<|vision_pad|>": 151654,
|
||||
"<|vision_start|>": 151652
|
||||
}
|
||||
BIN
assets/7b_performance_training.png
Normal file
BIN
assets/7b_performance_training.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 190 KiB |
5
assets/MiromindAI_H.svg
Normal file
5
assets/MiromindAI_H.svg
Normal file
@@ -0,0 +1,5 @@
|
||||
<svg width="316" height="91" viewBox="0 0 316 91" fill="none" xmlns="http://www.w3.org/2000/svg">
|
||||
<path fill-rule="evenodd" clip-rule="evenodd" d="M6.97263 67.6052C6.54789 68.7445 5.45995 69.5 4.24401 69.5V69.5C2.1993 69.5 0.79129 67.448 1.52696 65.5402L17.014 25.3783C17.6828 23.644 19.3499 22.5 21.2088 22.5V22.5C23.0676 22.5 24.7348 23.644 25.4036 25.3783L40.8906 65.5402C41.6263 67.448 40.2183 69.5 38.1736 69.5V69.5C36.9576 69.5 35.8697 68.7445 35.4449 67.6052L21.4689 30.1162C21.4285 30.0076 21.3247 29.9355 21.2088 29.9355V29.9355C21.0929 29.9355 20.9891 30.0076 20.9486 30.1162L6.97263 67.6052Z" fill="#5DDBD1"/>
|
||||
<path fill-rule="evenodd" clip-rule="evenodd" d="M29.8054 67.6433C29.3905 68.7563 28.2845 69.5 27.0442 69.5C25.007 69.5 23.5915 67.5693 24.2937 65.7483L39.9075 25.2575C40.5463 23.601 42.2026 22.5 44.0558 22.5C45.909 22.5 47.5653 23.601 48.204 25.2575L63.8179 65.7483C64.5201 67.5693 63.1046 69.5 61.0674 69.5C59.8271 69.5 58.7211 68.7563 58.3062 67.6433L44.3131 30.1086C44.2744 30.0048 44.1714 29.9355 44.0558 29.9355C43.9402 29.9355 43.8371 30.0048 43.7985 30.1086L29.8054 67.6433Z" fill="#003FA0"/>
|
||||
<path d="M76.616 71H69.016V33.912H76.616V71ZM67.876 22.512C67.876 21.0427 68.3067 19.852 69.168 18.94C70.08 17.9773 71.296 17.496 72.816 17.496C74.2853 17.496 75.476 17.9773 76.388 18.94C77.3 19.852 77.756 21.0427 77.756 22.512C77.756 23.9307 77.3 25.1213 76.388 26.084C75.476 26.996 74.2853 27.452 72.816 27.452C71.296 27.452 70.08 26.996 69.168 26.084C68.3067 25.1213 67.876 23.9307 67.876 22.512ZM92.3041 33.912L94.1281 43.944V71H86.5281V33.912H92.3041ZM92.9881 47.06L91.1641 46.224V40.676L91.8481 39.84C92.3547 38.928 93.1401 37.9653 94.2041 36.952C95.3187 35.888 96.6107 35.0013 98.0801 34.292C99.5494 33.532 101.044 33.152 102.564 33.152C103.324 33.152 104.033 33.2027 104.692 33.304C105.401 33.4053 105.933 33.5827 106.288 33.836V40.676H103.932C100.487 40.676 97.9787 41.208 96.4081 42.272C94.8374 43.2853 93.6974 44.8813 92.9881 47.06ZM126.775 71.76C123.026 71.76 119.707 70.9747 116.819 69.404C113.982 67.7827 111.778 65.528 110.207 62.64C108.637 59.752 107.851 56.3827 107.851 52.532C107.851 48.6307 108.637 45.236 110.207 42.348C111.778 39.46 113.982 37.2053 116.819 35.584C119.707 33.9627 123.026 33.152 126.775 33.152C130.575 33.152 133.894 33.9627 136.731 35.584C139.569 37.2053 141.773 39.46 143.343 42.348C144.965 45.236 145.775 48.6307 145.775 52.532C145.775 56.3827 144.965 59.752 143.343 62.64C141.773 65.528 139.569 67.7827 136.731 69.404C133.894 70.9747 130.575 71.76 126.775 71.76ZM126.775 64.464C130.17 64.464 132.881 63.3747 134.907 61.196C136.985 58.9667 138.023 56.0787 138.023 52.532C138.023 48.9347 136.985 46.0213 134.907 43.792C132.881 41.5627 130.17 40.448 126.775 40.448C123.431 40.448 120.721 41.5627 118.643 43.792C116.617 45.9707 115.603 48.8587 115.603 52.456C115.603 56.0533 116.617 58.9667 118.643 61.196C120.721 63.3747 123.431 64.464 126.775 64.464ZM152.954 71V33.912H158.73L160.326 40.448L158.426 40.676C159.794 39.0547 161.238 37.6867 162.758 36.572C164.278 35.4573 165.899 34.6213 167.622 34.064C169.345 33.456 171.143 33.152 173.018 33.152C175.653 33.152 177.73 33.608 179.25 34.52C180.77 35.3813 181.885 36.572 182.594 38.092C183.303 39.5613 183.759 41.2333 183.962 43.108C184.165 44.9827 184.266 46.908 184.266 48.884V71H176.666V47.288C176.666 44.6533 176.058 42.88 174.842 41.968C173.677 41.056 172.283 40.6 170.662 40.6C168.331 40.6 166.102 41.3347 163.974 42.804C161.846 44.2227 160.301 45.9453 159.338 47.972V43.716H160.554V71H152.954ZM200.378 71V47.288C200.378 44.6533 199.77 42.88 198.554 41.968C197.389 41.056 195.995 40.6 194.374 40.6C192.043 40.6 189.814 41.3347 187.686 42.804C185.558 44.2227 184.013 45.9453 183.05 47.972L182.138 40.676C183.506 39.0547 184.95 37.6867 186.47 36.572C187.99 35.4573 189.611 34.6213 191.334 34.064C193.057 33.456 194.855 33.152 196.73 33.152C199.365 33.152 201.442 33.608 202.962 34.52C204.482 35.3813 205.597 36.572 206.306 38.092C207.015 39.5613 207.471 41.2333 207.674 43.108C207.877 44.9827 207.978 46.908 207.978 48.884V71H200.378ZM225.128 71H217.528V33.912H225.128V71ZM216.388 22.512C216.388 21.0427 216.818 19.852 217.68 18.94C218.592 17.9773 219.808 17.496 221.328 17.496C222.797 17.496 223.988 17.9773 224.9 18.94C225.812 19.852 226.268 21.0427 226.268 22.512C226.268 23.9307 225.812 25.1213 224.9 26.084C223.988 26.996 222.797 27.452 221.328 27.452C219.808 27.452 218.592 26.996 217.68 26.084C216.818 25.1213 216.388 23.9307 216.388 22.512ZM256.244 33.152C258.878 33.152 260.956 33.608 262.476 34.52C263.996 35.3813 265.11 36.572 265.82 38.092C266.529 39.5613 266.985 41.2333 267.188 43.108C267.39 44.9827 267.492 46.908 267.492 48.884V71H259.892V47.288C259.892 44.6533 259.284 42.88 258.068 41.968C256.902 41.056 255.509 40.6 253.888 40.6C252.266 40.6 250.62 40.9293 248.948 41.588C247.276 42.2467 245.781 43.1333 244.464 44.248C243.146 45.3627 242.133 46.604 241.424 47.972V43.716H242.64V71H235.04V33.912H240.816L242.412 40.448L240.512 40.676C241.93 39.0547 243.476 37.6867 245.148 36.572C246.82 35.4573 248.593 34.6213 250.468 34.064C252.342 33.456 254.268 33.152 256.244 33.152ZM291.348 71.76C287.953 71.76 284.964 70.9493 282.38 69.328C279.796 67.7067 277.795 65.452 276.376 62.564C274.957 59.6253 274.248 56.2307 274.248 52.38C274.248 48.5293 274.957 45.16 276.376 42.272C277.795 39.384 279.796 37.1547 282.38 35.584C284.964 33.9627 287.953 33.152 291.348 33.152C294.692 33.152 297.605 33.9627 300.088 35.584C302.571 37.1547 304.496 39.384 305.864 42.272C307.283 45.16 307.992 48.5293 307.992 52.38C307.992 56.2307 307.283 59.6253 305.864 62.564C304.496 65.452 302.571 67.7067 300.088 69.328C297.605 70.9493 294.692 71.76 291.348 71.76ZM292.488 65.224C295.629 65.224 298.137 64.0587 300.012 61.728C301.937 59.3973 302.9 56.3067 302.9 52.456C302.9 48.6053 301.937 45.5147 300.012 43.184C298.137 40.8533 295.629 39.688 292.488 39.688C289.347 39.688 286.813 40.8533 284.888 43.184C282.963 45.5147 282 48.58 282 52.38C282 56.2307 282.963 59.3467 284.888 61.728C286.813 64.0587 289.347 65.224 292.488 65.224ZM304.344 71L302.52 63.552H303.28V42.424H302.52V19.244H310.12V71H304.344Z" fill="black"/>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 6.0 KiB |
34
config.json
Normal file
34
config.json
Normal file
@@ -0,0 +1,34 @@
|
||||
{
|
||||
"architectures": [
|
||||
"Qwen2ForCausalLM"
|
||||
],
|
||||
"attention_dropout": 0.0,
|
||||
"embd_pdrop": 0.0,
|
||||
"eos_token_id": 151645,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 3584,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 18944,
|
||||
"max_position_embeddings": 131072,
|
||||
"max_window_layers": 28,
|
||||
"model_type": "qwen2",
|
||||
"num_attention_heads": 28,
|
||||
"num_hidden_layers": 28,
|
||||
"num_key_value_heads": 4,
|
||||
"pad_token_id": 151643,
|
||||
"resid_pdrop": 0.0,
|
||||
"rms_norm_eps": 1e-06,
|
||||
"rope_scaling": {
|
||||
"factor": 7.0,
|
||||
"rope_type": "linear"
|
||||
},
|
||||
"rope_theta": 10000,
|
||||
"sliding_window": null,
|
||||
"tie_word_embeddings": false,
|
||||
"torch_dtype": "bfloat16",
|
||||
"transformers_version": "4.52.4",
|
||||
"use_cache": false,
|
||||
"use_mrope": false,
|
||||
"use_sliding_window": false,
|
||||
"vocab_size": 152064
|
||||
}
|
||||
1
configuration.json
Normal file
1
configuration.json
Normal file
@@ -0,0 +1 @@
|
||||
{"framework":"Pytorch","task":"text-generation"}
|
||||
7
generation_config.json
Normal file
7
generation_config.json
Normal file
@@ -0,0 +1,7 @@
|
||||
{
|
||||
"_from_model_config": true,
|
||||
"eos_token_id": 151645,
|
||||
"pad_token_id": 151643,
|
||||
"transformers_version": "4.52.4",
|
||||
"use_cache": false
|
||||
}
|
||||
4
hash.txt
Normal file
4
hash.txt
Normal file
@@ -0,0 +1,4 @@
|
||||
269698c23b85a97ef5f79c8c32f55161b27961738a0efe62a1726a42b5439a8a model-00001-of-00004.safetensors
|
||||
2f31e3df72990d31332c189e51504859d6d79f18dd9ff7c81b2eccceed104ee2 model-00002-of-00004.safetensors
|
||||
03bfd8b164f06a610a362590fb544949e5e38e14511a29a57d60ec716078d9a6 model-00003-of-00004.safetensors
|
||||
a78b3615b7c39486e96c075ad2b8a81f0e7554054d4751b86feb781193cf885d model-00004-of-00004.safetensors
|
||||
151388
merges.txt
Normal file
151388
merges.txt
Normal file
File diff suppressed because it is too large
Load Diff
3
model-00001-of-00004.safetensors
Normal file
3
model-00001-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:269698c23b85a97ef5f79c8c32f55161b27961738a0efe62a1726a42b5439a8a
|
||||
size 4958454624
|
||||
3
model-00002-of-00004.safetensors
Normal file
3
model-00002-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:2f31e3df72990d31332c189e51504859d6d79f18dd9ff7c81b2eccceed104ee2
|
||||
size 4811655328
|
||||
3
model-00003-of-00004.safetensors
Normal file
3
model-00003-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:03bfd8b164f06a610a362590fb544949e5e38e14511a29a57d60ec716078d9a6
|
||||
size 4995067944
|
||||
3
model-00004-of-00004.safetensors
Normal file
3
model-00004-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:a78b3615b7c39486e96c075ad2b8a81f0e7554054d4751b86feb781193cf885d
|
||||
size 466094000
|
||||
346
model.safetensors.index.json
Normal file
346
model.safetensors.index.json
Normal file
@@ -0,0 +1,346 @@
|
||||
{
|
||||
"metadata": {
|
||||
"total_size": 15231233024
|
||||
},
|
||||
"weight_map": {
|
||||
"lm_head.weight": "model-00001-of-00004.safetensors",
|
||||
"model.embed_tokens.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.0.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
|
||||
"model.layers.0.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.1.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.1.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.1.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
|
||||
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
|
||||
"model.layers.10.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
|
||||
"model.layers.11.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.12.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
|
||||
"model.layers.12.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.13.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.13.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.13.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
|
||||
"model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.14.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.14.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
|
||||
"model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.15.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
|
||||
"model.layers.15.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.16.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
|
||||
"model.layers.16.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.17.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
|
||||
"model.layers.17.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.18.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.18.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.18.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.18.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
|
||||
"model.layers.18.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.19.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.19.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.19.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
|
||||
"model.layers.19.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.2.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.2.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.2.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.2.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
|
||||
"model.layers.2.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.20.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.20.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.20.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
|
||||
"model.layers.20.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.21.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.21.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
|
||||
"model.layers.21.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.22.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
|
||||
"model.layers.22.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.23.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.23.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.23.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.23.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
|
||||
"model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
|
||||
"model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.25.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.25.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
|
||||
"model.layers.25.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.26.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.26.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.26.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
|
||||
"model.layers.26.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.27.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.27.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.27.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.27.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
|
||||
"model.layers.27.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.3.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.3.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.3.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.3.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.3.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
|
||||
"model.layers.3.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.4.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.4.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.4.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
|
||||
"model.layers.4.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.5.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.5.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.5.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
|
||||
"model.layers.5.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.6.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.6.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.6.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
|
||||
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.7.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.7.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.7.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
|
||||
"model.layers.7.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.8.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.8.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.k_proj.bias": "model-00004-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
|
||||
"model.layers.8.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.9.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.9.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||
"model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
|
||||
"model.layers.9.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||
"model.norm.weight": "model-00001-of-00004.safetensors"
|
||||
}
|
||||
}
|
||||
31
special_tokens_map.json
Normal file
31
special_tokens_map.json
Normal file
@@ -0,0 +1,31 @@
|
||||
{
|
||||
"additional_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>",
|
||||
"<|object_ref_start|>",
|
||||
"<|object_ref_end|>",
|
||||
"<|box_start|>",
|
||||
"<|box_end|>",
|
||||
"<|quad_start|>",
|
||||
"<|quad_end|>",
|
||||
"<|vision_start|>",
|
||||
"<|vision_end|>",
|
||||
"<|vision_pad|>",
|
||||
"<|image_pad|>",
|
||||
"<|video_pad|>"
|
||||
],
|
||||
"eos_token": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
757444
tokenizer.json
Normal file
757444
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
209
tokenizer_config.json
Normal file
209
tokenizer_config.json
Normal file
@@ -0,0 +1,209 @@
|
||||
{
|
||||
"add_bos_token": false,
|
||||
"add_prefix_space": false,
|
||||
"added_tokens_decoder": {
|
||||
"151643": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151644": {
|
||||
"content": "<|im_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151645": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151646": {
|
||||
"content": "<|object_ref_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151647": {
|
||||
"content": "<|object_ref_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151648": {
|
||||
"content": "<|box_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151649": {
|
||||
"content": "<|box_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151650": {
|
||||
"content": "<|quad_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151651": {
|
||||
"content": "<|quad_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151652": {
|
||||
"content": "<|vision_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151653": {
|
||||
"content": "<|vision_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151654": {
|
||||
"content": "<|vision_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151655": {
|
||||
"content": "<|image_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151656": {
|
||||
"content": "<|video_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151657": {
|
||||
"content": "<tool_call>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151658": {
|
||||
"content": "</tool_call>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151659": {
|
||||
"content": "<|fim_prefix|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151660": {
|
||||
"content": "<|fim_middle|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151661": {
|
||||
"content": "<|fim_suffix|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151662": {
|
||||
"content": "<|fim_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151663": {
|
||||
"content": "<|repo_name|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151664": {
|
||||
"content": "<|file_sep|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
}
|
||||
},
|
||||
"additional_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>",
|
||||
"<|object_ref_start|>",
|
||||
"<|object_ref_end|>",
|
||||
"<|box_start|>",
|
||||
"<|box_end|>",
|
||||
"<|quad_start|>",
|
||||
"<|quad_end|>",
|
||||
"<|vision_start|>",
|
||||
"<|vision_end|>",
|
||||
"<|vision_pad|>",
|
||||
"<|image_pad|>",
|
||||
"<|video_pad|>"
|
||||
],
|
||||
"bos_token": null,
|
||||
"chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0]['role'] == 'system' %}\n {{- messages[0]['content'] }}\n {%- else %}\n {{- 'Please reason step by step, and put your final answer within \\\\boxed{}.' }}\n {%- endif %}\n {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0]['role'] == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n {%- else %}\n {{- '<|im_start|>system\\nPlease reason step by step, and put your final answer within \\\\boxed{}.<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {{- '<|im_start|>' + message.role }}\n {%- if message.content %}\n {{- '\\n' + message.content }}\n {%- endif %}\n {%- for tool_call in message.tool_calls %}\n {%- if tool_call.function is defined %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '\\n<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {{- tool_call.arguments | tojson }}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
|
||||
"clean_up_tokenization_spaces": false,
|
||||
"eos_token": "<|im_end|>",
|
||||
"errors": "replace",
|
||||
"extra_special_tokens": {},
|
||||
"model_max_length": 26000,
|
||||
"pad_token": "<|endoftext|>",
|
||||
"padding_side": "right",
|
||||
"split_special_tokens": false,
|
||||
"tokenizer_class": "Qwen2Tokenizer",
|
||||
"unk_token": null
|
||||
}
|
||||
1
vocab.json
Normal file
1
vocab.json
Normal file
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user