初始化项目,由ModelHub XC社区提供模型
Model: FreedomIntelligence/AceGPT-v1.5-13B-Chat Source: Original Platform
This commit is contained in:
47
.gitattributes
vendored
Normal file
47
.gitattributes
vendored
Normal file
@@ -0,0 +1,47 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
Second_Language_(Arabic)_Acquisition_of_LLMs_via_Progressive_Vocabulary_Expansion.pdf filter=lfs diff=lfs merge=lfs -text
|
||||
model-00001-of-00011.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
model-00002-of-00011.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
model-00003-of-00011.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
model-00004-of-00011.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
model-00005-of-00011.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
model-00006-of-00011.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
model-00007-of-00011.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
model-00008-of-00011.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
model-00009-of-00011.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
model-00010-of-00011.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
model-00011-of-00011.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
72
README.md
Normal file
72
README.md
Normal file
@@ -0,0 +1,72 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
language:
|
||||
- ar
|
||||
- zh
|
||||
- en
|
||||
---
|
||||
|
||||
# <b>AceGPT</b>
|
||||
|
||||
AceGPT is a fully fine-tuned generative text model collection based on LlaMA2, particularly in the
|
||||
Arabic language domain. This is the repository for the version 1.5 of 13B-chat pre-trained model.
|
||||
|
||||
---
|
||||
## Model Details
|
||||
We have released the AceGPT family of large language models, which is a collection of fully fine-tuned generative text models based on LlaMA2, ranging from 7B to 13B parameters. Our models include two main categories: AceGPT and AceGPT-chat. AceGPT-chat is an optimized version specifically designed for dialogue applications. It is worth mentioning that our models have demonstrated superior performance compared to all currently available open-source Arabic dialogue models in multiple benchmark tests. Furthermore, in our human evaluations, our models have shown comparable satisfaction levels to some closed-source models, such as ChatGPT, in the Arabic language.
|
||||
## Model Developers
|
||||
We are from the King Abdullah University of Science and Technology (KAUST), the Chinese University of Hong Kong, Shenzhen (CUHKSZ), the Shenzhen Research Institute of Big Data (SRIBD), and King AbdulAziz University (KAU).
|
||||
## Variations
|
||||
AceGPT families come in a range of parameter sizes —— 7B and 13B, each size of model has a base category and a -chat category.
|
||||
## Paper
|
||||
The paper can be accessed at [link](https://huggingface.co/FreedomIntelligence/AceGPT-v1.5-13B-Chat/blob/main/Second_Language_(Arabic)_Acquisition_of_LLMs_via_Progressive_Vocabulary_Expansion.pdf).
|
||||
## Input
|
||||
Models input text only.
|
||||
## Output
|
||||
Models output text only.
|
||||
## Model Evaluation Results
|
||||
|
||||
Benchmark evaluations are conducted using accuracy or F1 scores as metrics, following the evaluation framework available at https://github.com/FreedomIntelligence/AceGPT/tree/main.
|
||||
([**ArabicMMLU**](https://github.com/mbzuai-nlp/ArabicMMLU) is assessed based on its source settings.)
|
||||
| | [**MMLU** (Huang et al. (2023))](https://github.com/FreedomIntelligence/AceGPT) | [ArabicMMLU](https://github.com/mbzuai-nlp/ArabicMMLU) | EXAMS | ACVA (clean) | ACVA (all) | BoolQ (trans) | ARC-C (trans) | Average |
|
||||
|------------------|------|------|------|------|------|------|------|------|
|
||||
| LLaMA2-7B-chat | 13.78 | 33.40 | 13.05 | 20.99 | 21.80 | 34.92 | 23.72 | 21.09 |
|
||||
| Phoenix-7b | 29.72 | 44.74 | 31.93 | 43.80 | 41.86 | 66.70 | 33.53 | 41.75 |
|
||||
| AceGPT-7B-chat | 30.69 | 36.31 | 33.73 | 53.87 | 53.07 | 60.70 | 38.05 | 43.77 |
|
||||
| Mistral-7B-Instruct-v0.2 | 27.93 | 41.44 | 21.56 | 64.56 | 63.47 | 60.18 | 35.67 | 44.97 |
|
||||
| **AceGPT-v1.5-7B-chat** | 45.77 | 56.62 | 43.69 | 69.46 | 70.86 | 72.45 | <u>60.49</u> | 59.90 |
|
||||
| Jais-13B-chat | 19.52 | 54.83 | 19.71 | 66.75 | 61.41 | 41.25 | 11.95 | 39.34 |
|
||||
| Llama2-13B-chat | 8.92 | 36.12 | 16.11 | 35.12 | 35.71 | 54.13 | 27.47 | 30.51 |
|
||||
| AceGPT-13B-chat | 35.59 | 52.61 | 38.72 | 70.82 | 70.21 | 66.85 | 44.20 | 54.14 |
|
||||
| **AceGPT-v1.5-13B-chat** | **47.33** | <u>61.70</u> | **48.37** | **76.90** | <u>76.37</u> | 69.33 | **63.99** | **63.42** |
|
||||
| Jais-30B-chat-v1 | 38.12 | 59.33 | 40.45 | <u>74.46</u> | 72.41 | 73.76 | 50.94 | 58.49 |
|
||||
| Jais-30B-chat-v3 | 35.68 | **62.36** | 32.24 | 73.63 | 73.66 | **76.30** | 51.02 | 57.84 |
|
||||
| ChatGPT 3.5 Turbo | <u>46.07</u> | 57.72 | <u>45.63</u> | 74.45 | **76.88** | <u>76.12</u> | 60.24 | <u>62.44</u> |
|
||||
|
||||
|
||||
|
||||
## Samples
|
||||
#### Sample1(abstract_algebra)
|
||||
* <b>input:</b>
|
||||
"<User>: فيما يلي أسئلة الاختيار من متعدد حول جبر تجريدي\n\nسؤال: ما هو الدرجة للامتداد الميداني الناتج من Q(sqrt(2), sqrt(3), sqrt(18)) على Q؟\nA. 0\nB. 4\nC. 2\nD. 6\nمن فضلك اختر إجابة واحدة من بين 'A، B، C، D' دون شرح. <Assistant>: "
|
||||
|
||||
* <b>output:</b>
|
||||
"B\n\nالشرح:\n\nالامت"
|
||||
|
||||
#### Sample2(business_ethics)
|
||||
* <b>input:</b>
|
||||
"<User>: فيما يلي أسئلة الاختيار من متعدد حول أخلاقيات الأعمال\n\nسؤال: تُصبح _______ مثل البيتكوين أكثر انتشارًا وتحمل مجموعة كبيرة من الآثار الأخلاقية المرتبطة بها، على سبيل المثال، إنها _______ وأكثر _______. ومع ذلك، تم استخدامها أيضًا للمشاركة في _______.\nA. العملات الرقمية، مكلفة، آمنة، جرائم مالية\nB. العملات التقليدية، رخيصة، غير آمنة، العطاء الخيري\nC. العملات الرقمية، رخيصة، آمنة، جرائم مالية\nD. العملات التقليدية، مكلفة، غير آمنة، العطاء الخيري\nمن فضلك اختر إجابة واحدة من بين 'A، B، C، D' دون شرح. <Assistant>: "
|
||||
|
||||
* <b>output:</b>
|
||||
"C\n\nالشرح:\n\nالإ"
|
||||
|
||||
|
||||
# Reference
|
||||
```
|
||||
@article{zhu2025second,
|
||||
title={Second Language (Arabic) Acquisition of LLMs via Progressive Vocabulary Expansion},
|
||||
author={Zhu, Jianqing and Huang, Huang and Lin, Zhihang and Liang, Juhao and Tang, Zhengyang and Almubarak, Khalid and Alharthi, Mosen and An, Bang and He, Juncai and Wu, Xiangbo and Yu, Fei and Chen, Junying and Ma, Zhuoheng and Du, Yuhao and Hu, Yan and Zhang, He and Alghamdi, Emad A. and Zhang, Lian and Sun, Ruoyu and Li, Haizhou and Wang, Benyou and Xu, Jinchao},
|
||||
journal={ACL 2025},
|
||||
year={2025}
|
||||
}
|
||||
```
|
||||
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:e41fbb8db18cb92486dc6e13edc6873fb73cd1dbb24382d756e7397b6530ae35
|
||||
size 3652174
|
||||
30
config.json
Normal file
30
config.json
Normal file
@@ -0,0 +1,30 @@
|
||||
{
|
||||
"_name_or_path": "AceGPT-v1.5-13B-Chat_1",
|
||||
"architectures": [
|
||||
"LlamaForCausalLM"
|
||||
],
|
||||
"attention_bias": false,
|
||||
"attention_dropout": 0.0,
|
||||
"bos_token_id": 1,
|
||||
"eos_token_id": 2,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 5120,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 13824,
|
||||
"max_length": 4096,
|
||||
"max_position_embeddings": 4096,
|
||||
"model_type": "llama",
|
||||
"num_attention_heads": 40,
|
||||
"num_hidden_layers": 40,
|
||||
"num_key_value_heads": 40,
|
||||
"pad_token_id": 0,
|
||||
"pretraining_tp": 1,
|
||||
"rms_norm_eps": 1e-05,
|
||||
"rope_scaling": null,
|
||||
"rope_theta": 10000.0,
|
||||
"tie_word_embeddings": false,
|
||||
"torch_dtype": "float32",
|
||||
"transformers_version": "4.38.1",
|
||||
"use_cache": true,
|
||||
"vocab_size": 44800
|
||||
}
|
||||
1
configuration.json
Normal file
1
configuration.json
Normal file
@@ -0,0 +1 @@
|
||||
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}
|
||||
8
generation_config.json
Normal file
8
generation_config.json
Normal file
@@ -0,0 +1,8 @@
|
||||
{
|
||||
"_from_model_config": true,
|
||||
"bos_token_id": 1,
|
||||
"eos_token_id": 2,
|
||||
"max_length": 4096,
|
||||
"pad_token_id": 0,
|
||||
"transformers_version": "4.38.1"
|
||||
}
|
||||
3
model-00001-of-00011.safetensors
Normal file
3
model-00001-of-00011.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:51be9cdadceb83943841887d9b935e20c363b6b4cc9d5082ab7c1e37e62eb85b
|
||||
size 4933676424
|
||||
3
model-00002-of-00011.safetensors
Normal file
3
model-00002-of-00011.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:9b224430706b1d7f704096c86f21c2446eb0f0e1a0739a593521e8f564a7dbc6
|
||||
size 4970418112
|
||||
3
model-00003-of-00011.safetensors
Normal file
3
model-00003-of-00011.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:128fca1031c83ba815fa3d4f976d92f134f23bf284522685ead2b264cf305fc2
|
||||
size 4970418120
|
||||
3
model-00004-of-00011.safetensors
Normal file
3
model-00004-of-00011.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:a5abc64e128d88cb993af1942b52b700453c0155098da00a431ec231a6efcbfe
|
||||
size 4792119040
|
||||
3
model-00005-of-00011.safetensors
Normal file
3
model-00005-of-00011.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:9fed522b2bb7cbea30034b375206a64a2e007d73309d17feff29bd126c818c4f
|
||||
size 4792160232
|
||||
3
model-00006-of-00011.safetensors
Normal file
3
model-00006-of-00011.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:8345fe71ccbc9cb64205dcd1e6a316034c87a6cb5014bef010876f7ca390a2bf
|
||||
size 4792160224
|
||||
3
model-00007-of-00011.safetensors
Normal file
3
model-00007-of-00011.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:b9c20982235cb1379033c7afaf1485a68d1f4a82e9bfe2e1accf939d60f97395
|
||||
size 4970418144
|
||||
3
model-00008-of-00011.safetensors
Normal file
3
model-00008-of-00011.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:01ba4a41720b8f86de300dddf5e48d9094e144095a370b7ff13374858c4f69d4
|
||||
size 4970418144
|
||||
3
model-00009-of-00011.safetensors
Normal file
3
model-00009-of-00011.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:a385aa89ad90eb3d09a0bc3c8583c3b0e36a2c91b465533e6bfcca926341657f
|
||||
size 4970418144
|
||||
3
model-00010-of-00011.safetensors
Normal file
3
model-00010-of-00011.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:a61ae63f551761b2fdec57bb12c17b11ef99fef94304d4b1726bb8ee1ad6b0f8
|
||||
size 4970418144
|
||||
3
model-00011-of-00011.safetensors
Normal file
3
model-00011-of-00011.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:ba2339fd638d838bbeecd85f334864597db079223c9cce4355b74c37a40bb07a
|
||||
size 3455162616
|
||||
370
model.safetensors.index.json
Normal file
370
model.safetensors.index.json
Normal file
@@ -0,0 +1,370 @@
|
||||
{
|
||||
"metadata": {
|
||||
"total_size": 52587745280
|
||||
},
|
||||
"weight_map": {
|
||||
"lm_head.weight": "model-00011-of-00011.safetensors",
|
||||
"model.embed_tokens.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.0.input_layernorm.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.1.input_layernorm.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.10.input_layernorm.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.10.mlp.down_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.10.mlp.gate_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.10.mlp.up_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.10.post_attention_layernorm.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.10.self_attn.k_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.10.self_attn.o_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.10.self_attn.q_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.10.self_attn.v_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.11.input_layernorm.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.11.mlp.down_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.11.mlp.gate_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.11.mlp.up_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.11.post_attention_layernorm.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.11.self_attn.k_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.11.self_attn.o_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.11.self_attn.q_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.11.self_attn.v_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.12.input_layernorm.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.12.mlp.down_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.12.mlp.gate_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.12.mlp.up_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.12.post_attention_layernorm.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.12.self_attn.k_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.12.self_attn.o_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.12.self_attn.q_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.12.self_attn.v_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.13.input_layernorm.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.13.mlp.down_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.13.mlp.gate_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.13.mlp.up_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.13.post_attention_layernorm.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.13.self_attn.k_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.13.self_attn.o_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.13.self_attn.q_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.13.self_attn.v_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.14.input_layernorm.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.14.mlp.down_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.14.mlp.gate_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.14.mlp.up_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.14.post_attention_layernorm.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.14.self_attn.k_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.14.self_attn.o_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.14.self_attn.q_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.14.self_attn.v_proj.weight": "model-00004-of-00011.safetensors",
|
||||
"model.layers.15.input_layernorm.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.15.mlp.down_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.15.mlp.gate_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.15.mlp.up_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.15.post_attention_layernorm.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.15.self_attn.k_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.15.self_attn.o_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.15.self_attn.q_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.15.self_attn.v_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.16.input_layernorm.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.16.mlp.down_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.16.mlp.gate_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.16.mlp.up_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.16.post_attention_layernorm.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.16.self_attn.k_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.16.self_attn.o_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.16.self_attn.q_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.16.self_attn.v_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.17.input_layernorm.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.17.mlp.down_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.17.mlp.gate_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.17.mlp.up_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.17.post_attention_layernorm.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.17.self_attn.k_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.17.self_attn.o_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.17.self_attn.q_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.17.self_attn.v_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.18.input_layernorm.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.18.mlp.down_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.18.mlp.gate_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.18.mlp.up_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.18.post_attention_layernorm.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.18.self_attn.k_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.18.self_attn.o_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.18.self_attn.q_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.18.self_attn.v_proj.weight": "model-00005-of-00011.safetensors",
|
||||
"model.layers.19.input_layernorm.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.19.mlp.down_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.19.mlp.gate_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.19.mlp.up_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.19.post_attention_layernorm.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.19.self_attn.k_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.19.self_attn.o_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.19.self_attn.q_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.19.self_attn.v_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.2.input_layernorm.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.20.input_layernorm.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.20.mlp.down_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.20.mlp.gate_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.20.mlp.up_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.20.post_attention_layernorm.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.20.self_attn.k_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.20.self_attn.o_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.20.self_attn.q_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.20.self_attn.v_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.21.input_layernorm.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.21.mlp.down_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.21.mlp.gate_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.21.mlp.up_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.21.post_attention_layernorm.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.21.self_attn.k_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.21.self_attn.o_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.21.self_attn.q_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.21.self_attn.v_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.22.input_layernorm.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.22.mlp.down_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.22.mlp.gate_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.22.mlp.up_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.22.post_attention_layernorm.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.22.self_attn.k_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.22.self_attn.o_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.22.self_attn.q_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.22.self_attn.v_proj.weight": "model-00006-of-00011.safetensors",
|
||||
"model.layers.23.input_layernorm.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.23.mlp.down_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.23.mlp.gate_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.23.mlp.up_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.23.post_attention_layernorm.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.23.self_attn.k_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.23.self_attn.o_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.23.self_attn.q_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.23.self_attn.v_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.24.input_layernorm.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.24.mlp.down_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.24.mlp.gate_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.24.mlp.up_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.24.post_attention_layernorm.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.24.self_attn.k_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.24.self_attn.o_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.24.self_attn.q_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.24.self_attn.v_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.25.input_layernorm.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.25.mlp.down_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.25.mlp.gate_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.25.mlp.up_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.25.post_attention_layernorm.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.25.self_attn.k_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.25.self_attn.o_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.25.self_attn.q_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.25.self_attn.v_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.26.input_layernorm.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.26.mlp.down_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.26.mlp.gate_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.26.mlp.up_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.26.post_attention_layernorm.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.26.self_attn.k_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.26.self_attn.o_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.26.self_attn.q_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.26.self_attn.v_proj.weight": "model-00007-of-00011.safetensors",
|
||||
"model.layers.27.input_layernorm.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.27.mlp.down_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.27.mlp.gate_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.27.mlp.up_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.27.post_attention_layernorm.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.27.self_attn.k_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.27.self_attn.o_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.27.self_attn.q_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.27.self_attn.v_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.28.input_layernorm.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.28.mlp.down_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.28.mlp.gate_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.28.mlp.up_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.28.post_attention_layernorm.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.28.self_attn.k_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.28.self_attn.o_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.28.self_attn.q_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.28.self_attn.v_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.29.input_layernorm.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.29.mlp.down_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.29.mlp.gate_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.29.mlp.up_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.29.post_attention_layernorm.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.29.self_attn.k_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.29.self_attn.o_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.29.self_attn.q_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.29.self_attn.v_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.3.input_layernorm.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.3.mlp.down_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.3.mlp.gate_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.3.mlp.up_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.3.post_attention_layernorm.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.3.self_attn.o_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00011.safetensors",
|
||||
"model.layers.3.self_attn.v_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.30.input_layernorm.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.30.mlp.down_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.30.mlp.gate_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.30.mlp.up_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.30.post_attention_layernorm.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.30.self_attn.k_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.30.self_attn.o_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.30.self_attn.q_proj.weight": "model-00008-of-00011.safetensors",
|
||||
"model.layers.30.self_attn.v_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.31.input_layernorm.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.31.mlp.down_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.31.mlp.gate_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.31.mlp.up_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.31.post_attention_layernorm.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.31.self_attn.k_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.31.self_attn.o_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.31.self_attn.q_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.31.self_attn.v_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.32.input_layernorm.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.32.mlp.down_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.32.mlp.gate_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.32.mlp.up_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.32.post_attention_layernorm.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.32.self_attn.k_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.32.self_attn.o_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.32.self_attn.q_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.32.self_attn.v_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.33.input_layernorm.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.33.mlp.down_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.33.mlp.gate_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.33.mlp.up_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.33.post_attention_layernorm.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.33.self_attn.k_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.33.self_attn.o_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.33.self_attn.q_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.33.self_attn.v_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.34.input_layernorm.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.34.mlp.down_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.34.mlp.gate_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.34.mlp.up_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.34.post_attention_layernorm.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.34.self_attn.k_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.34.self_attn.o_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.34.self_attn.q_proj.weight": "model-00009-of-00011.safetensors",
|
||||
"model.layers.34.self_attn.v_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.35.input_layernorm.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.35.mlp.down_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.35.mlp.gate_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.35.mlp.up_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.35.post_attention_layernorm.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.35.self_attn.k_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.35.self_attn.o_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.35.self_attn.q_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.35.self_attn.v_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.36.input_layernorm.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.36.mlp.down_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.36.mlp.gate_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.36.mlp.up_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.36.post_attention_layernorm.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.36.self_attn.k_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.36.self_attn.o_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.36.self_attn.q_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.36.self_attn.v_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.37.input_layernorm.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.37.mlp.down_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.37.mlp.gate_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.37.mlp.up_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.37.post_attention_layernorm.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.37.self_attn.k_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.37.self_attn.o_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.37.self_attn.q_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.37.self_attn.v_proj.weight": "model-00010-of-00011.safetensors",
|
||||
"model.layers.38.input_layernorm.weight": "model-00011-of-00011.safetensors",
|
||||
"model.layers.38.mlp.down_proj.weight": "model-00011-of-00011.safetensors",
|
||||
"model.layers.38.mlp.gate_proj.weight": "model-00011-of-00011.safetensors",
|
||||
"model.layers.38.mlp.up_proj.weight": "model-00011-of-00011.safetensors",
|
||||
"model.layers.38.post_attention_layernorm.weight": "model-00011-of-00011.safetensors",
|
||||
"model.layers.38.self_attn.k_proj.weight": "model-00011-of-00011.safetensors",
|
||||
"model.layers.38.self_attn.o_proj.weight": "model-00011-of-00011.safetensors",
|
||||
"model.layers.38.self_attn.q_proj.weight": "model-00011-of-00011.safetensors",
|
||||
"model.layers.38.self_attn.v_proj.weight": "model-00011-of-00011.safetensors",
|
||||
"model.layers.39.input_layernorm.weight": "model-00011-of-00011.safetensors",
|
||||
"model.layers.39.mlp.down_proj.weight": "model-00011-of-00011.safetensors",
|
||||
"model.layers.39.mlp.gate_proj.weight": "model-00011-of-00011.safetensors",
|
||||
"model.layers.39.mlp.up_proj.weight": "model-00011-of-00011.safetensors",
|
||||
"model.layers.39.post_attention_layernorm.weight": "model-00011-of-00011.safetensors",
|
||||
"model.layers.39.self_attn.k_proj.weight": "model-00011-of-00011.safetensors",
|
||||
"model.layers.39.self_attn.o_proj.weight": "model-00011-of-00011.safetensors",
|
||||
"model.layers.39.self_attn.q_proj.weight": "model-00011-of-00011.safetensors",
|
||||
"model.layers.39.self_attn.v_proj.weight": "model-00011-of-00011.safetensors",
|
||||
"model.layers.4.input_layernorm.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.4.mlp.down_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.4.mlp.gate_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.4.mlp.up_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.4.post_attention_layernorm.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.4.self_attn.k_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.4.self_attn.o_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.4.self_attn.q_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.4.self_attn.v_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.5.input_layernorm.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.5.mlp.down_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.5.mlp.gate_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.5.mlp.up_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.5.post_attention_layernorm.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.5.self_attn.k_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.5.self_attn.o_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.5.self_attn.q_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.5.self_attn.v_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.6.input_layernorm.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.6.mlp.down_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.6.mlp.gate_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.6.mlp.up_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.6.post_attention_layernorm.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.6.self_attn.k_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.6.self_attn.o_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.6.self_attn.q_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.6.self_attn.v_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.7.input_layernorm.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.7.mlp.down_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.7.mlp.gate_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.7.mlp.up_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.7.post_attention_layernorm.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.7.self_attn.k_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.7.self_attn.o_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.7.self_attn.q_proj.weight": "model-00002-of-00011.safetensors",
|
||||
"model.layers.7.self_attn.v_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.8.input_layernorm.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.8.mlp.down_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.8.mlp.gate_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.8.mlp.up_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.8.post_attention_layernorm.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.8.self_attn.k_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.8.self_attn.o_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.8.self_attn.q_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.8.self_attn.v_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.9.input_layernorm.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.9.mlp.down_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.9.mlp.gate_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.9.mlp.up_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.9.post_attention_layernorm.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.9.self_attn.k_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.9.self_attn.o_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.9.self_attn.q_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.layers.9.self_attn.v_proj.weight": "model-00003-of-00011.safetensors",
|
||||
"model.norm.weight": "model-00011-of-00011.safetensors"
|
||||
}
|
||||
}
|
||||
23
special_tokens_map.json
Normal file
23
special_tokens_map.json
Normal file
@@ -0,0 +1,23 @@
|
||||
{
|
||||
"bos_token": {
|
||||
"content": "<s>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"eos_token": {
|
||||
"content": "</s>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"unk_token": {
|
||||
"content": "<unk>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
130489
tokenizer.json
Normal file
130489
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
3
tokenizer.model
Normal file
3
tokenizer.model
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:f0a4a2876dcbbbeaa71fd99d9cf32fe5f353b4e8976acbd44f0d0f674ba2275b
|
||||
size 760271
|
||||
42
tokenizer_config.json
Normal file
42
tokenizer_config.json
Normal file
@@ -0,0 +1,42 @@
|
||||
{
|
||||
"add_bos_token": true,
|
||||
"add_eos_token": false,
|
||||
"add_prefix_space": true,
|
||||
"added_tokens_decoder": {
|
||||
"0": {
|
||||
"content": "<unk>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"1": {
|
||||
"content": "<s>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"2": {
|
||||
"content": "</s>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
}
|
||||
},
|
||||
"bos_token": "<s>",
|
||||
"clean_up_tokenization_spaces": false,
|
||||
"eos_token": "</s>",
|
||||
"legacy": true,
|
||||
"model_max_length": 1000000000000000019884624838656,
|
||||
"pad_token": null,
|
||||
"sp_model_kwargs": {},
|
||||
"spaces_between_special_tokens": false,
|
||||
"tokenizer_class": "LlamaTokenizer",
|
||||
"unk_token": "<unk>",
|
||||
"use_default_system_prompt": false
|
||||
}
|
||||
Reference in New Issue
Block a user