初始化项目,由ModelHub XC社区提供模型
Model: YanLabs/Seed-OSS-36B-Instruct-MPOA Source: Original Platform
This commit is contained in:
36
.gitattributes
vendored
Normal file
36
.gitattributes
vendored
Normal file
@@ -0,0 +1,36 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||
66
README.md
Normal file
66
README.md
Normal file
@@ -0,0 +1,66 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
base_model:
|
||||
- ByteDance-Seed/Seed-OSS-36B-Instruct
|
||||
pipeline_tag: text-generation
|
||||
---
|
||||
|
||||
|
||||
# YanLabs/Seed-OSS-36B-Instruct-MPOA
|
||||
|
||||
|
||||
This is an abliterated version of [ByteDance-Seed/Seed-OSS-36B-Instruct](https://huggingface.co/ByteDance-Seed/Seed-OSS-36B-Instruct) using the norm-preserving biprojected abliteration technique.
|
||||
|
||||
**⚠️ Warning**: Safety guardrails and refusal mechanisms have been removed through abliteration. This model may generate harmful content and is intended for mechanistic interpretability research only.
|
||||
|
||||
## Model Details
|
||||
|
||||
### Model Description
|
||||
|
||||
This model applies **norm-preserving biprojected abliteration** to remove refusal behaviors while preserving the model's original capabilities. The technique surgically removes "refusal directions" from the model's activation space without traditional fine-tuning.
|
||||
|
||||
- **Developed by**: YanLabs
|
||||
- **Model type**: Causal Language Model (Transformer)
|
||||
- **License**: apache-2.0
|
||||
- **Base model**: [ByteDance-Seed/Seed-OSS-36B-Instruct](https://huggingface.co/ByteDance-Seed/Seed-OSS-36B-Instruct)
|
||||
|
||||
### Model Sources
|
||||
|
||||
- **Base Model**: [ByteDance-Seed/Seed-OSS-36B-Instruct](https://huggingface.co/ByteDance-Seed/Seed-OSS-36B-Instruct)
|
||||
- **Abliteration Tool**: [jim-plus/llm-abliteration](https://github.com/jim-plus/llm-abliteration)
|
||||
- **Paper**: [Norm-Preserving Biprojected Abliteration](https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration)
|
||||
|
||||
## Uses
|
||||
|
||||
### Intended Use
|
||||
|
||||
- **Research**: Mechanistic interpretability studies
|
||||
- **Analysis**: Understanding LLM safety mechanisms
|
||||
- **Development**: Testing abliteration techniques
|
||||
|
||||
### Out-of-Scope Use
|
||||
|
||||
- ❌ Production deployments
|
||||
- ❌ User-facing applications
|
||||
- ❌ Generating harmful content for malicious purposes
|
||||
|
||||
## Limitations
|
||||
|
||||
- Abliteration does not guarantee complete removal of all refusals
|
||||
- May generate unsafe or harmful content
|
||||
- Model behavior may be unpredictable in edge cases
|
||||
- No explicit harm prevention mechanisms remain
|
||||
|
||||
## Citation
|
||||
|
||||
If you use this model in your research, please cite:
|
||||
|
||||
```bibtex
|
||||
@misc{Seed-OSS-36B-Instruct-MPOA,
|
||||
author = {YanLabs},
|
||||
title = {Seed-OSS-36B-Instruct-MPOA},
|
||||
year = {2025},
|
||||
publisher = {HuggingFace},
|
||||
howpublished = {\url{https://huggingface.co/YanLabs/Seed-OSS-36B-Instruct-MPOA}},
|
||||
note = {Abliterated using norm-preserving biprojected technique}
|
||||
}
|
||||
88
Seed-OSS-36B-Instruct-MPOA.yml
Normal file
88
Seed-OSS-36B-Instruct-MPOA.yml
Normal file
@@ -0,0 +1,88 @@
|
||||
model: /workspace/.cache/huggingface/hub/models--ByteDance-Seed--Seed-OSS-36B-Instruct/snapshots/497f1dca95ebdec98e41d517b9f060ee753c902f
|
||||
measurements: /workspace/seed.measure
|
||||
output: /workspace/seed-oss-36b-ab0
|
||||
ablate:
|
||||
- layer: 1
|
||||
measurement: 4
|
||||
scale: 1.0
|
||||
sparsity: 0.00
|
||||
- layer: 2
|
||||
measurement: 4
|
||||
scale: 1.0
|
||||
sparsity: 0.00
|
||||
- layer: 3
|
||||
measurement: 4
|
||||
scale: 1.0
|
||||
sparsity: 0.00
|
||||
- layer: 4
|
||||
measurement: 4
|
||||
scale: 1.0
|
||||
sparsity: 0.00
|
||||
- layer: 19
|
||||
measurement: 24
|
||||
scale: 1.0
|
||||
sparsity: 0.00
|
||||
- layer: 20
|
||||
measurement: 24
|
||||
scale: 1.0
|
||||
sparsity: 0.00
|
||||
- layer: 21
|
||||
measurement: 30
|
||||
scale: 1.0
|
||||
sparsity: 0.00
|
||||
- layer: 22
|
||||
measurement: 24
|
||||
scale: 1.0
|
||||
sparsity: 0.00
|
||||
- layer: 23
|
||||
measurement: 24
|
||||
scale: 1.0
|
||||
sparsity: 0.00
|
||||
- layer: 24
|
||||
measurement: 24
|
||||
scale: 1.0
|
||||
sparsity: 0.00
|
||||
- layer: 25
|
||||
measurement: 30
|
||||
scale: 1.0
|
||||
sparsity: 0.00
|
||||
- layer: 26
|
||||
measurement: 30
|
||||
scale: 1.0
|
||||
sparsity: 0.00
|
||||
- layer: 27
|
||||
measurement: 30
|
||||
scale: 1.0
|
||||
sparsity: 0.00
|
||||
- layer: 28
|
||||
measurement: 30
|
||||
scale: 1.0
|
||||
sparsity: 0.00
|
||||
- layer: 29
|
||||
measurement: 30
|
||||
scale: 1.0
|
||||
sparsity: 0.00
|
||||
- layer: 30
|
||||
measurement: 30
|
||||
scale: 1.0
|
||||
sparsity: 0.00
|
||||
- layer: 31
|
||||
measurement: 35
|
||||
scale: 1.0
|
||||
sparsity: 0.00
|
||||
- layer: 32
|
||||
measurement: 35
|
||||
scale: 1.0
|
||||
sparsity: 0.00
|
||||
- layer: 33
|
||||
measurement: 35
|
||||
scale: 1.0
|
||||
sparsity: 0.00
|
||||
- layer: 34
|
||||
measurement: 35
|
||||
scale: 1.0
|
||||
sparsity: 0.00
|
||||
- layer: 35
|
||||
measurement: 35
|
||||
scale: 1.0
|
||||
sparsity: 0.000
|
||||
33
config.json
Normal file
33
config.json
Normal file
@@ -0,0 +1,33 @@
|
||||
{
|
||||
"architectures": [
|
||||
"SeedOssForCausalLM"
|
||||
],
|
||||
"attention_bias": true,
|
||||
"attention_dropout": 0.1,
|
||||
"attention_out_bias": false,
|
||||
"bos_token_id": 0,
|
||||
"pad_token_id": 1,
|
||||
"eos_token_id": 2,
|
||||
"head_dim": 128,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 5120,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 27648,
|
||||
"max_position_embeddings": 524288,
|
||||
"mlp_bias": false,
|
||||
"model_type": "seed_oss",
|
||||
"num_attention_heads": 80,
|
||||
"num_hidden_layers": 64,
|
||||
"num_key_value_heads": 8,
|
||||
"residual_dropout": 0.1,
|
||||
"rms_norm_eps": 1e-06,
|
||||
"rope_scaling": {
|
||||
"rope_type": "default"
|
||||
},
|
||||
"rope_theta": 10000000.0,
|
||||
"tie_word_embeddings": false,
|
||||
"torch_dtype": "bfloat16",
|
||||
"transformers_version": "4.55.0",
|
||||
"use_cache": true,
|
||||
"vocab_size": 155136
|
||||
}
|
||||
10
generation_config.json
Normal file
10
generation_config.json
Normal file
@@ -0,0 +1,10 @@
|
||||
{
|
||||
"_from_model_config": true,
|
||||
"bos_token_id": 0,
|
||||
"pad_token_id": 1,
|
||||
"eos_token_id": 2,
|
||||
"transformers_version": "4.55.0",
|
||||
"temperature": 1.1,
|
||||
"top_p": 0.95
|
||||
}
|
||||
|
||||
3
model-00001-of-00015.safetensors
Normal file
3
model-00001-of-00015.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:a4bddc750af8e391cbb278955e0582c5cbdee67575fa0c90c7b8895cf49d43e0
|
||||
size 4954686264
|
||||
3
model-00002-of-00015.safetensors
Normal file
3
model-00002-of-00015.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:33367b069bf2108205fb013d752b9feaaaabd6dc8ae7316400e6d34557bd5f37
|
||||
size 4991407808
|
||||
3
model-00003-of-00015.safetensors
Normal file
3
model-00003-of-00015.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:724dfc23348ea55f35135c582ed4b59b57a021d1448cedb479884a3b3fe89ed5
|
||||
size 4834167328
|
||||
3
model-00004-of-00015.safetensors
Normal file
3
model-00004-of-00015.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:19752faebe38e21395701abacdaf0f9b06432be692efb60020b28752ef444d15
|
||||
size 4886550176
|
||||
3
model-00005-of-00015.safetensors
Normal file
3
model-00005-of-00015.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:65c2f70bbf644922cb0709d607a897b63cc6fbd4d9695ecc77dc8b9f0ffcdd3c
|
||||
size 4834167328
|
||||
3
model-00006-of-00015.safetensors
Normal file
3
model-00006-of-00015.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:97de97d35add4ee94b0f5918c01626979ffb69987261393c33a27ae4c098cf8f
|
||||
size 4886550144
|
||||
3
model-00007-of-00015.safetensors
Normal file
3
model-00007-of-00015.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:de9a89e29205fbe1f05471e725e9ff5914b3c37e655fc693460d29c4f25fd4af
|
||||
size 4834167328
|
||||
3
model-00008-of-00015.safetensors
Normal file
3
model-00008-of-00015.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:f5938ba5b272b992bacb89607a018f74ec988f359f0662530f19b40e08e1e147
|
||||
size 4886550144
|
||||
3
model-00009-of-00015.safetensors
Normal file
3
model-00009-of-00015.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:338e6d4a3aa32ec739b9f6b2cf87115fd0a9d12950015e603c3c376a6e4dfe4f
|
||||
size 4834167328
|
||||
3
model-00010-of-00015.safetensors
Normal file
3
model-00010-of-00015.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:2aab66b2b8ea7246faae1170444920f487ad45c7e1081a5dc97ec0b80ec2897c
|
||||
size 4886550176
|
||||
3
model-00011-of-00015.safetensors
Normal file
3
model-00011-of-00015.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:74340de3038f2469c7d7bcfa127f32baaa0f708a00698b6583d013f012feab07
|
||||
size 4834167360
|
||||
3
model-00012-of-00015.safetensors
Normal file
3
model-00012-of-00015.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:7ec20d327302f1a82685a660ee792f6319f861f8099882d3c53b59b5f18e487d
|
||||
size 4886550176
|
||||
3
model-00013-of-00015.safetensors
Normal file
3
model-00013-of-00015.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:b4efea0bf1a26fc0da3eb1fc4c963956fec43dc276270dfa8dd556477f758ce6
|
||||
size 4834167360
|
||||
3
model-00014-of-00015.safetensors
Normal file
3
model-00014-of-00015.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:a2b5aeb2cc06fb4b5f06fc7bd88144ceae69f1d754944575e2a0fe177cd9ae45
|
||||
size 4886550176
|
||||
3
model-00015-of-00015.safetensors
Normal file
3
model-00015-of-00015.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:e6b61535f8efaf570381bcb2d389cf660c1bce9f6b8976c3a92a251eac8d0285
|
||||
size 4031898896
|
||||
779
model.safetensors.index.json
Normal file
779
model.safetensors.index.json
Normal file
@@ -0,0 +1,779 @@
|
||||
{
|
||||
"metadata": {
|
||||
"total_parameters": 36151104512,
|
||||
"total_size": 72302209024
|
||||
},
|
||||
"weight_map": {
|
||||
"lm_head.weight": "model-00015-of-00015.safetensors",
|
||||
"model.embed_tokens.weight": "model-00001-of-00015.safetensors",
|
||||
"model.layers.0.input_layernorm.weight": "model-00001-of-00015.safetensors",
|
||||
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00015.safetensors",
|
||||
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00015.safetensors",
|
||||
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00015.safetensors",
|
||||
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00015.safetensors",
|
||||
"model.layers.0.self_attn.k_proj.bias": "model-00001-of-00015.safetensors",
|
||||
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00015.safetensors",
|
||||
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00015.safetensors",
|
||||
"model.layers.0.self_attn.q_proj.bias": "model-00001-of-00015.safetensors",
|
||||
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00015.safetensors",
|
||||
"model.layers.0.self_attn.v_proj.bias": "model-00001-of-00015.safetensors",
|
||||
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00015.safetensors",
|
||||
"model.layers.1.input_layernorm.weight": "model-00001-of-00015.safetensors",
|
||||
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00015.safetensors",
|
||||
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00015.safetensors",
|
||||
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00015.safetensors",
|
||||
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00015.safetensors",
|
||||
"model.layers.1.self_attn.k_proj.bias": "model-00001-of-00015.safetensors",
|
||||
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00015.safetensors",
|
||||
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00015.safetensors",
|
||||
"model.layers.1.self_attn.q_proj.bias": "model-00001-of-00015.safetensors",
|
||||
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00015.safetensors",
|
||||
"model.layers.1.self_attn.v_proj.bias": "model-00001-of-00015.safetensors",
|
||||
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00015.safetensors",
|
||||
"model.layers.10.input_layernorm.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.10.mlp.down_proj.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.10.mlp.gate_proj.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.10.mlp.up_proj.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.10.post_attention_layernorm.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.10.self_attn.k_proj.bias": "model-00003-of-00015.safetensors",
|
||||
"model.layers.10.self_attn.k_proj.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.10.self_attn.o_proj.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.10.self_attn.q_proj.bias": "model-00003-of-00015.safetensors",
|
||||
"model.layers.10.self_attn.q_proj.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.10.self_attn.v_proj.bias": "model-00003-of-00015.safetensors",
|
||||
"model.layers.10.self_attn.v_proj.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.11.input_layernorm.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.11.mlp.down_proj.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.11.mlp.gate_proj.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.11.mlp.up_proj.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.11.post_attention_layernorm.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.11.self_attn.k_proj.bias": "model-00003-of-00015.safetensors",
|
||||
"model.layers.11.self_attn.k_proj.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.11.self_attn.o_proj.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.11.self_attn.q_proj.bias": "model-00003-of-00015.safetensors",
|
||||
"model.layers.11.self_attn.q_proj.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.11.self_attn.v_proj.bias": "model-00003-of-00015.safetensors",
|
||||
"model.layers.11.self_attn.v_proj.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.12.input_layernorm.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.12.mlp.down_proj.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.12.mlp.gate_proj.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.12.mlp.up_proj.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.12.post_attention_layernorm.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.12.self_attn.k_proj.bias": "model-00003-of-00015.safetensors",
|
||||
"model.layers.12.self_attn.k_proj.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.12.self_attn.o_proj.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.12.self_attn.q_proj.bias": "model-00003-of-00015.safetensors",
|
||||
"model.layers.12.self_attn.q_proj.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.12.self_attn.v_proj.bias": "model-00003-of-00015.safetensors",
|
||||
"model.layers.12.self_attn.v_proj.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.13.input_layernorm.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.13.mlp.down_proj.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.13.mlp.gate_proj.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.13.mlp.up_proj.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.13.post_attention_layernorm.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.13.self_attn.k_proj.bias": "model-00004-of-00015.safetensors",
|
||||
"model.layers.13.self_attn.k_proj.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.13.self_attn.o_proj.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.13.self_attn.q_proj.bias": "model-00004-of-00015.safetensors",
|
||||
"model.layers.13.self_attn.q_proj.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.13.self_attn.v_proj.bias": "model-00004-of-00015.safetensors",
|
||||
"model.layers.13.self_attn.v_proj.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.14.input_layernorm.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.14.mlp.down_proj.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.14.mlp.gate_proj.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.14.mlp.up_proj.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.14.post_attention_layernorm.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.14.self_attn.k_proj.bias": "model-00004-of-00015.safetensors",
|
||||
"model.layers.14.self_attn.k_proj.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.14.self_attn.o_proj.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.14.self_attn.q_proj.bias": "model-00004-of-00015.safetensors",
|
||||
"model.layers.14.self_attn.q_proj.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.14.self_attn.v_proj.bias": "model-00004-of-00015.safetensors",
|
||||
"model.layers.14.self_attn.v_proj.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.15.input_layernorm.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.15.mlp.down_proj.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.15.mlp.gate_proj.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.15.mlp.up_proj.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.15.post_attention_layernorm.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.15.self_attn.k_proj.bias": "model-00004-of-00015.safetensors",
|
||||
"model.layers.15.self_attn.k_proj.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.15.self_attn.o_proj.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.15.self_attn.q_proj.bias": "model-00004-of-00015.safetensors",
|
||||
"model.layers.15.self_attn.q_proj.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.15.self_attn.v_proj.bias": "model-00004-of-00015.safetensors",
|
||||
"model.layers.15.self_attn.v_proj.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.16.input_layernorm.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.16.mlp.down_proj.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.16.mlp.gate_proj.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.16.mlp.up_proj.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.16.post_attention_layernorm.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.16.self_attn.k_proj.bias": "model-00004-of-00015.safetensors",
|
||||
"model.layers.16.self_attn.k_proj.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.16.self_attn.o_proj.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.16.self_attn.q_proj.bias": "model-00004-of-00015.safetensors",
|
||||
"model.layers.16.self_attn.q_proj.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.16.self_attn.v_proj.bias": "model-00004-of-00015.safetensors",
|
||||
"model.layers.16.self_attn.v_proj.weight": "model-00004-of-00015.safetensors",
|
||||
"model.layers.17.input_layernorm.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.17.mlp.down_proj.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.17.mlp.gate_proj.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.17.mlp.up_proj.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.17.post_attention_layernorm.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.17.self_attn.k_proj.bias": "model-00005-of-00015.safetensors",
|
||||
"model.layers.17.self_attn.k_proj.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.17.self_attn.o_proj.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.17.self_attn.q_proj.bias": "model-00005-of-00015.safetensors",
|
||||
"model.layers.17.self_attn.q_proj.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.17.self_attn.v_proj.bias": "model-00005-of-00015.safetensors",
|
||||
"model.layers.17.self_attn.v_proj.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.18.input_layernorm.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.18.mlp.down_proj.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.18.mlp.gate_proj.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.18.mlp.up_proj.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.18.post_attention_layernorm.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.18.self_attn.k_proj.bias": "model-00005-of-00015.safetensors",
|
||||
"model.layers.18.self_attn.k_proj.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.18.self_attn.o_proj.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.18.self_attn.q_proj.bias": "model-00005-of-00015.safetensors",
|
||||
"model.layers.18.self_attn.q_proj.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.18.self_attn.v_proj.bias": "model-00005-of-00015.safetensors",
|
||||
"model.layers.18.self_attn.v_proj.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.19.input_layernorm.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.19.mlp.down_proj.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.19.mlp.gate_proj.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.19.mlp.up_proj.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.19.post_attention_layernorm.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.19.self_attn.k_proj.bias": "model-00005-of-00015.safetensors",
|
||||
"model.layers.19.self_attn.k_proj.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.19.self_attn.o_proj.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.19.self_attn.q_proj.bias": "model-00005-of-00015.safetensors",
|
||||
"model.layers.19.self_attn.q_proj.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.19.self_attn.v_proj.bias": "model-00005-of-00015.safetensors",
|
||||
"model.layers.19.self_attn.v_proj.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.2.input_layernorm.weight": "model-00001-of-00015.safetensors",
|
||||
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00015.safetensors",
|
||||
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00015.safetensors",
|
||||
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00015.safetensors",
|
||||
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00015.safetensors",
|
||||
"model.layers.2.self_attn.k_proj.bias": "model-00001-of-00015.safetensors",
|
||||
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00015.safetensors",
|
||||
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00015.safetensors",
|
||||
"model.layers.2.self_attn.q_proj.bias": "model-00001-of-00015.safetensors",
|
||||
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00015.safetensors",
|
||||
"model.layers.2.self_attn.v_proj.bias": "model-00001-of-00015.safetensors",
|
||||
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00015.safetensors",
|
||||
"model.layers.20.input_layernorm.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.20.mlp.down_proj.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.20.mlp.gate_proj.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.20.mlp.up_proj.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.20.post_attention_layernorm.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.20.self_attn.k_proj.bias": "model-00005-of-00015.safetensors",
|
||||
"model.layers.20.self_attn.k_proj.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.20.self_attn.o_proj.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.20.self_attn.q_proj.bias": "model-00005-of-00015.safetensors",
|
||||
"model.layers.20.self_attn.q_proj.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.20.self_attn.v_proj.bias": "model-00005-of-00015.safetensors",
|
||||
"model.layers.20.self_attn.v_proj.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.21.input_layernorm.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.21.mlp.down_proj.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.21.mlp.gate_proj.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.21.mlp.up_proj.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.21.post_attention_layernorm.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.21.self_attn.k_proj.bias": "model-00005-of-00015.safetensors",
|
||||
"model.layers.21.self_attn.k_proj.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.21.self_attn.o_proj.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.21.self_attn.q_proj.bias": "model-00005-of-00015.safetensors",
|
||||
"model.layers.21.self_attn.q_proj.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.21.self_attn.v_proj.bias": "model-00005-of-00015.safetensors",
|
||||
"model.layers.21.self_attn.v_proj.weight": "model-00005-of-00015.safetensors",
|
||||
"model.layers.22.input_layernorm.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.22.mlp.down_proj.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.22.mlp.gate_proj.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.22.mlp.up_proj.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.22.post_attention_layernorm.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.22.self_attn.k_proj.bias": "model-00006-of-00015.safetensors",
|
||||
"model.layers.22.self_attn.k_proj.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.22.self_attn.o_proj.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.22.self_attn.q_proj.bias": "model-00006-of-00015.safetensors",
|
||||
"model.layers.22.self_attn.q_proj.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.22.self_attn.v_proj.bias": "model-00006-of-00015.safetensors",
|
||||
"model.layers.22.self_attn.v_proj.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.23.input_layernorm.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.23.mlp.down_proj.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.23.mlp.gate_proj.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.23.mlp.up_proj.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.23.post_attention_layernorm.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.23.self_attn.k_proj.bias": "model-00006-of-00015.safetensors",
|
||||
"model.layers.23.self_attn.k_proj.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.23.self_attn.o_proj.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.23.self_attn.q_proj.bias": "model-00006-of-00015.safetensors",
|
||||
"model.layers.23.self_attn.q_proj.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.23.self_attn.v_proj.bias": "model-00006-of-00015.safetensors",
|
||||
"model.layers.23.self_attn.v_proj.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.24.input_layernorm.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.24.mlp.down_proj.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.24.mlp.gate_proj.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.24.mlp.up_proj.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.24.post_attention_layernorm.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.24.self_attn.k_proj.bias": "model-00006-of-00015.safetensors",
|
||||
"model.layers.24.self_attn.k_proj.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.24.self_attn.o_proj.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.24.self_attn.q_proj.bias": "model-00006-of-00015.safetensors",
|
||||
"model.layers.24.self_attn.q_proj.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.24.self_attn.v_proj.bias": "model-00006-of-00015.safetensors",
|
||||
"model.layers.24.self_attn.v_proj.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.25.input_layernorm.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.25.mlp.down_proj.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.25.mlp.gate_proj.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.25.mlp.up_proj.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.25.post_attention_layernorm.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.25.self_attn.k_proj.bias": "model-00006-of-00015.safetensors",
|
||||
"model.layers.25.self_attn.k_proj.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.25.self_attn.o_proj.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.25.self_attn.q_proj.bias": "model-00006-of-00015.safetensors",
|
||||
"model.layers.25.self_attn.q_proj.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.25.self_attn.v_proj.bias": "model-00006-of-00015.safetensors",
|
||||
"model.layers.25.self_attn.v_proj.weight": "model-00006-of-00015.safetensors",
|
||||
"model.layers.26.input_layernorm.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.26.mlp.down_proj.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.26.mlp.gate_proj.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.26.mlp.up_proj.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.26.post_attention_layernorm.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.26.self_attn.k_proj.bias": "model-00007-of-00015.safetensors",
|
||||
"model.layers.26.self_attn.k_proj.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.26.self_attn.o_proj.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.26.self_attn.q_proj.bias": "model-00007-of-00015.safetensors",
|
||||
"model.layers.26.self_attn.q_proj.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.26.self_attn.v_proj.bias": "model-00007-of-00015.safetensors",
|
||||
"model.layers.26.self_attn.v_proj.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.27.input_layernorm.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.27.mlp.down_proj.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.27.mlp.gate_proj.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.27.mlp.up_proj.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.27.post_attention_layernorm.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.27.self_attn.k_proj.bias": "model-00007-of-00015.safetensors",
|
||||
"model.layers.27.self_attn.k_proj.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.27.self_attn.o_proj.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.27.self_attn.q_proj.bias": "model-00007-of-00015.safetensors",
|
||||
"model.layers.27.self_attn.q_proj.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.27.self_attn.v_proj.bias": "model-00007-of-00015.safetensors",
|
||||
"model.layers.27.self_attn.v_proj.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.28.input_layernorm.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.28.mlp.down_proj.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.28.mlp.gate_proj.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.28.mlp.up_proj.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.28.post_attention_layernorm.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.28.self_attn.k_proj.bias": "model-00007-of-00015.safetensors",
|
||||
"model.layers.28.self_attn.k_proj.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.28.self_attn.o_proj.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.28.self_attn.q_proj.bias": "model-00007-of-00015.safetensors",
|
||||
"model.layers.28.self_attn.q_proj.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.28.self_attn.v_proj.bias": "model-00007-of-00015.safetensors",
|
||||
"model.layers.28.self_attn.v_proj.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.29.input_layernorm.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.29.mlp.down_proj.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.29.mlp.gate_proj.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.29.mlp.up_proj.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.29.post_attention_layernorm.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.29.self_attn.k_proj.bias": "model-00007-of-00015.safetensors",
|
||||
"model.layers.29.self_attn.k_proj.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.29.self_attn.o_proj.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.29.self_attn.q_proj.bias": "model-00007-of-00015.safetensors",
|
||||
"model.layers.29.self_attn.q_proj.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.29.self_attn.v_proj.bias": "model-00007-of-00015.safetensors",
|
||||
"model.layers.29.self_attn.v_proj.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.3.input_layernorm.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.3.mlp.down_proj.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.3.mlp.gate_proj.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.3.mlp.up_proj.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.3.post_attention_layernorm.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.3.self_attn.k_proj.bias": "model-00001-of-00015.safetensors",
|
||||
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00015.safetensors",
|
||||
"model.layers.3.self_attn.o_proj.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.3.self_attn.q_proj.bias": "model-00001-of-00015.safetensors",
|
||||
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00015.safetensors",
|
||||
"model.layers.3.self_attn.v_proj.bias": "model-00001-of-00015.safetensors",
|
||||
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00015.safetensors",
|
||||
"model.layers.30.input_layernorm.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.30.mlp.down_proj.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.30.mlp.gate_proj.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.30.mlp.up_proj.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.30.post_attention_layernorm.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.30.self_attn.k_proj.bias": "model-00007-of-00015.safetensors",
|
||||
"model.layers.30.self_attn.k_proj.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.30.self_attn.o_proj.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.30.self_attn.q_proj.bias": "model-00007-of-00015.safetensors",
|
||||
"model.layers.30.self_attn.q_proj.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.30.self_attn.v_proj.bias": "model-00007-of-00015.safetensors",
|
||||
"model.layers.30.self_attn.v_proj.weight": "model-00007-of-00015.safetensors",
|
||||
"model.layers.31.input_layernorm.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.31.mlp.down_proj.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.31.mlp.gate_proj.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.31.mlp.up_proj.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.31.post_attention_layernorm.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.31.self_attn.k_proj.bias": "model-00008-of-00015.safetensors",
|
||||
"model.layers.31.self_attn.k_proj.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.31.self_attn.o_proj.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.31.self_attn.q_proj.bias": "model-00008-of-00015.safetensors",
|
||||
"model.layers.31.self_attn.q_proj.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.31.self_attn.v_proj.bias": "model-00008-of-00015.safetensors",
|
||||
"model.layers.31.self_attn.v_proj.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.32.input_layernorm.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.32.mlp.down_proj.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.32.mlp.gate_proj.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.32.mlp.up_proj.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.32.post_attention_layernorm.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.32.self_attn.k_proj.bias": "model-00008-of-00015.safetensors",
|
||||
"model.layers.32.self_attn.k_proj.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.32.self_attn.o_proj.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.32.self_attn.q_proj.bias": "model-00008-of-00015.safetensors",
|
||||
"model.layers.32.self_attn.q_proj.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.32.self_attn.v_proj.bias": "model-00008-of-00015.safetensors",
|
||||
"model.layers.32.self_attn.v_proj.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.33.input_layernorm.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.33.mlp.down_proj.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.33.mlp.gate_proj.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.33.mlp.up_proj.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.33.post_attention_layernorm.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.33.self_attn.k_proj.bias": "model-00008-of-00015.safetensors",
|
||||
"model.layers.33.self_attn.k_proj.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.33.self_attn.o_proj.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.33.self_attn.q_proj.bias": "model-00008-of-00015.safetensors",
|
||||
"model.layers.33.self_attn.q_proj.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.33.self_attn.v_proj.bias": "model-00008-of-00015.safetensors",
|
||||
"model.layers.33.self_attn.v_proj.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.34.input_layernorm.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.34.mlp.down_proj.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.34.mlp.gate_proj.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.34.mlp.up_proj.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.34.post_attention_layernorm.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.34.self_attn.k_proj.bias": "model-00008-of-00015.safetensors",
|
||||
"model.layers.34.self_attn.k_proj.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.34.self_attn.o_proj.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.34.self_attn.q_proj.bias": "model-00008-of-00015.safetensors",
|
||||
"model.layers.34.self_attn.q_proj.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.34.self_attn.v_proj.bias": "model-00008-of-00015.safetensors",
|
||||
"model.layers.34.self_attn.v_proj.weight": "model-00008-of-00015.safetensors",
|
||||
"model.layers.35.input_layernorm.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.35.mlp.down_proj.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.35.mlp.gate_proj.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.35.mlp.up_proj.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.35.post_attention_layernorm.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.35.self_attn.k_proj.bias": "model-00009-of-00015.safetensors",
|
||||
"model.layers.35.self_attn.k_proj.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.35.self_attn.o_proj.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.35.self_attn.q_proj.bias": "model-00009-of-00015.safetensors",
|
||||
"model.layers.35.self_attn.q_proj.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.35.self_attn.v_proj.bias": "model-00009-of-00015.safetensors",
|
||||
"model.layers.35.self_attn.v_proj.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.36.input_layernorm.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.36.mlp.down_proj.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.36.mlp.gate_proj.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.36.mlp.up_proj.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.36.post_attention_layernorm.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.36.self_attn.k_proj.bias": "model-00009-of-00015.safetensors",
|
||||
"model.layers.36.self_attn.k_proj.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.36.self_attn.o_proj.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.36.self_attn.q_proj.bias": "model-00009-of-00015.safetensors",
|
||||
"model.layers.36.self_attn.q_proj.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.36.self_attn.v_proj.bias": "model-00009-of-00015.safetensors",
|
||||
"model.layers.36.self_attn.v_proj.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.37.input_layernorm.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.37.mlp.down_proj.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.37.mlp.gate_proj.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.37.mlp.up_proj.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.37.post_attention_layernorm.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.37.self_attn.k_proj.bias": "model-00009-of-00015.safetensors",
|
||||
"model.layers.37.self_attn.k_proj.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.37.self_attn.o_proj.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.37.self_attn.q_proj.bias": "model-00009-of-00015.safetensors",
|
||||
"model.layers.37.self_attn.q_proj.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.37.self_attn.v_proj.bias": "model-00009-of-00015.safetensors",
|
||||
"model.layers.37.self_attn.v_proj.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.38.input_layernorm.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.38.mlp.down_proj.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.38.mlp.gate_proj.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.38.mlp.up_proj.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.38.post_attention_layernorm.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.38.self_attn.k_proj.bias": "model-00009-of-00015.safetensors",
|
||||
"model.layers.38.self_attn.k_proj.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.38.self_attn.o_proj.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.38.self_attn.q_proj.bias": "model-00009-of-00015.safetensors",
|
||||
"model.layers.38.self_attn.q_proj.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.38.self_attn.v_proj.bias": "model-00009-of-00015.safetensors",
|
||||
"model.layers.38.self_attn.v_proj.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.39.input_layernorm.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.39.mlp.down_proj.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.39.mlp.gate_proj.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.39.mlp.up_proj.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.39.post_attention_layernorm.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.39.self_attn.k_proj.bias": "model-00009-of-00015.safetensors",
|
||||
"model.layers.39.self_attn.k_proj.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.39.self_attn.o_proj.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.39.self_attn.q_proj.bias": "model-00009-of-00015.safetensors",
|
||||
"model.layers.39.self_attn.q_proj.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.39.self_attn.v_proj.bias": "model-00009-of-00015.safetensors",
|
||||
"model.layers.39.self_attn.v_proj.weight": "model-00009-of-00015.safetensors",
|
||||
"model.layers.4.input_layernorm.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.4.mlp.down_proj.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.4.mlp.gate_proj.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.4.mlp.up_proj.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.4.post_attention_layernorm.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.4.self_attn.k_proj.bias": "model-00002-of-00015.safetensors",
|
||||
"model.layers.4.self_attn.k_proj.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.4.self_attn.o_proj.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.4.self_attn.q_proj.bias": "model-00002-of-00015.safetensors",
|
||||
"model.layers.4.self_attn.q_proj.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.4.self_attn.v_proj.bias": "model-00002-of-00015.safetensors",
|
||||
"model.layers.4.self_attn.v_proj.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.40.input_layernorm.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.40.mlp.down_proj.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.40.mlp.gate_proj.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.40.mlp.up_proj.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.40.post_attention_layernorm.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.40.self_attn.k_proj.bias": "model-00010-of-00015.safetensors",
|
||||
"model.layers.40.self_attn.k_proj.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.40.self_attn.o_proj.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.40.self_attn.q_proj.bias": "model-00010-of-00015.safetensors",
|
||||
"model.layers.40.self_attn.q_proj.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.40.self_attn.v_proj.bias": "model-00010-of-00015.safetensors",
|
||||
"model.layers.40.self_attn.v_proj.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.41.input_layernorm.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.41.mlp.down_proj.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.41.mlp.gate_proj.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.41.mlp.up_proj.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.41.post_attention_layernorm.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.41.self_attn.k_proj.bias": "model-00010-of-00015.safetensors",
|
||||
"model.layers.41.self_attn.k_proj.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.41.self_attn.o_proj.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.41.self_attn.q_proj.bias": "model-00010-of-00015.safetensors",
|
||||
"model.layers.41.self_attn.q_proj.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.41.self_attn.v_proj.bias": "model-00010-of-00015.safetensors",
|
||||
"model.layers.41.self_attn.v_proj.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.42.input_layernorm.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.42.mlp.down_proj.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.42.mlp.gate_proj.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.42.mlp.up_proj.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.42.post_attention_layernorm.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.42.self_attn.k_proj.bias": "model-00010-of-00015.safetensors",
|
||||
"model.layers.42.self_attn.k_proj.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.42.self_attn.o_proj.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.42.self_attn.q_proj.bias": "model-00010-of-00015.safetensors",
|
||||
"model.layers.42.self_attn.q_proj.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.42.self_attn.v_proj.bias": "model-00010-of-00015.safetensors",
|
||||
"model.layers.42.self_attn.v_proj.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.43.input_layernorm.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.43.mlp.down_proj.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.43.mlp.gate_proj.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.43.mlp.up_proj.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.43.post_attention_layernorm.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.43.self_attn.k_proj.bias": "model-00010-of-00015.safetensors",
|
||||
"model.layers.43.self_attn.k_proj.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.43.self_attn.o_proj.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.43.self_attn.q_proj.bias": "model-00010-of-00015.safetensors",
|
||||
"model.layers.43.self_attn.q_proj.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.43.self_attn.v_proj.bias": "model-00010-of-00015.safetensors",
|
||||
"model.layers.43.self_attn.v_proj.weight": "model-00010-of-00015.safetensors",
|
||||
"model.layers.44.input_layernorm.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.44.mlp.down_proj.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.44.mlp.gate_proj.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.44.mlp.up_proj.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.44.post_attention_layernorm.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.44.self_attn.k_proj.bias": "model-00011-of-00015.safetensors",
|
||||
"model.layers.44.self_attn.k_proj.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.44.self_attn.o_proj.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.44.self_attn.q_proj.bias": "model-00011-of-00015.safetensors",
|
||||
"model.layers.44.self_attn.q_proj.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.44.self_attn.v_proj.bias": "model-00011-of-00015.safetensors",
|
||||
"model.layers.44.self_attn.v_proj.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.45.input_layernorm.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.45.mlp.down_proj.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.45.mlp.gate_proj.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.45.mlp.up_proj.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.45.post_attention_layernorm.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.45.self_attn.k_proj.bias": "model-00011-of-00015.safetensors",
|
||||
"model.layers.45.self_attn.k_proj.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.45.self_attn.o_proj.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.45.self_attn.q_proj.bias": "model-00011-of-00015.safetensors",
|
||||
"model.layers.45.self_attn.q_proj.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.45.self_attn.v_proj.bias": "model-00011-of-00015.safetensors",
|
||||
"model.layers.45.self_attn.v_proj.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.46.input_layernorm.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.46.mlp.down_proj.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.46.mlp.gate_proj.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.46.mlp.up_proj.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.46.post_attention_layernorm.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.46.self_attn.k_proj.bias": "model-00011-of-00015.safetensors",
|
||||
"model.layers.46.self_attn.k_proj.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.46.self_attn.o_proj.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.46.self_attn.q_proj.bias": "model-00011-of-00015.safetensors",
|
||||
"model.layers.46.self_attn.q_proj.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.46.self_attn.v_proj.bias": "model-00011-of-00015.safetensors",
|
||||
"model.layers.46.self_attn.v_proj.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.47.input_layernorm.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.47.mlp.down_proj.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.47.mlp.gate_proj.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.47.mlp.up_proj.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.47.post_attention_layernorm.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.47.self_attn.k_proj.bias": "model-00011-of-00015.safetensors",
|
||||
"model.layers.47.self_attn.k_proj.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.47.self_attn.o_proj.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.47.self_attn.q_proj.bias": "model-00011-of-00015.safetensors",
|
||||
"model.layers.47.self_attn.q_proj.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.47.self_attn.v_proj.bias": "model-00011-of-00015.safetensors",
|
||||
"model.layers.47.self_attn.v_proj.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.48.input_layernorm.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.48.mlp.down_proj.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.48.mlp.gate_proj.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.48.mlp.up_proj.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.48.post_attention_layernorm.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.48.self_attn.k_proj.bias": "model-00011-of-00015.safetensors",
|
||||
"model.layers.48.self_attn.k_proj.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.48.self_attn.o_proj.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.48.self_attn.q_proj.bias": "model-00011-of-00015.safetensors",
|
||||
"model.layers.48.self_attn.q_proj.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.48.self_attn.v_proj.bias": "model-00011-of-00015.safetensors",
|
||||
"model.layers.48.self_attn.v_proj.weight": "model-00011-of-00015.safetensors",
|
||||
"model.layers.49.input_layernorm.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.49.mlp.down_proj.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.49.mlp.gate_proj.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.49.mlp.up_proj.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.49.post_attention_layernorm.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.49.self_attn.k_proj.bias": "model-00012-of-00015.safetensors",
|
||||
"model.layers.49.self_attn.k_proj.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.49.self_attn.o_proj.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.49.self_attn.q_proj.bias": "model-00012-of-00015.safetensors",
|
||||
"model.layers.49.self_attn.q_proj.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.49.self_attn.v_proj.bias": "model-00012-of-00015.safetensors",
|
||||
"model.layers.49.self_attn.v_proj.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.5.input_layernorm.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.5.mlp.down_proj.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.5.mlp.gate_proj.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.5.mlp.up_proj.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.5.post_attention_layernorm.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.5.self_attn.k_proj.bias": "model-00002-of-00015.safetensors",
|
||||
"model.layers.5.self_attn.k_proj.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.5.self_attn.o_proj.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.5.self_attn.q_proj.bias": "model-00002-of-00015.safetensors",
|
||||
"model.layers.5.self_attn.q_proj.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.5.self_attn.v_proj.bias": "model-00002-of-00015.safetensors",
|
||||
"model.layers.5.self_attn.v_proj.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.50.input_layernorm.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.50.mlp.down_proj.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.50.mlp.gate_proj.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.50.mlp.up_proj.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.50.post_attention_layernorm.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.50.self_attn.k_proj.bias": "model-00012-of-00015.safetensors",
|
||||
"model.layers.50.self_attn.k_proj.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.50.self_attn.o_proj.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.50.self_attn.q_proj.bias": "model-00012-of-00015.safetensors",
|
||||
"model.layers.50.self_attn.q_proj.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.50.self_attn.v_proj.bias": "model-00012-of-00015.safetensors",
|
||||
"model.layers.50.self_attn.v_proj.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.51.input_layernorm.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.51.mlp.down_proj.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.51.mlp.gate_proj.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.51.mlp.up_proj.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.51.post_attention_layernorm.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.51.self_attn.k_proj.bias": "model-00012-of-00015.safetensors",
|
||||
"model.layers.51.self_attn.k_proj.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.51.self_attn.o_proj.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.51.self_attn.q_proj.bias": "model-00012-of-00015.safetensors",
|
||||
"model.layers.51.self_attn.q_proj.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.51.self_attn.v_proj.bias": "model-00012-of-00015.safetensors",
|
||||
"model.layers.51.self_attn.v_proj.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.52.input_layernorm.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.52.mlp.down_proj.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.52.mlp.gate_proj.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.52.mlp.up_proj.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.52.post_attention_layernorm.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.52.self_attn.k_proj.bias": "model-00012-of-00015.safetensors",
|
||||
"model.layers.52.self_attn.k_proj.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.52.self_attn.o_proj.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.52.self_attn.q_proj.bias": "model-00012-of-00015.safetensors",
|
||||
"model.layers.52.self_attn.q_proj.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.52.self_attn.v_proj.bias": "model-00012-of-00015.safetensors",
|
||||
"model.layers.52.self_attn.v_proj.weight": "model-00012-of-00015.safetensors",
|
||||
"model.layers.53.input_layernorm.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.53.mlp.down_proj.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.53.mlp.gate_proj.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.53.mlp.up_proj.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.53.post_attention_layernorm.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.53.self_attn.k_proj.bias": "model-00013-of-00015.safetensors",
|
||||
"model.layers.53.self_attn.k_proj.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.53.self_attn.o_proj.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.53.self_attn.q_proj.bias": "model-00013-of-00015.safetensors",
|
||||
"model.layers.53.self_attn.q_proj.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.53.self_attn.v_proj.bias": "model-00013-of-00015.safetensors",
|
||||
"model.layers.53.self_attn.v_proj.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.54.input_layernorm.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.54.mlp.down_proj.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.54.mlp.gate_proj.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.54.mlp.up_proj.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.54.post_attention_layernorm.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.54.self_attn.k_proj.bias": "model-00013-of-00015.safetensors",
|
||||
"model.layers.54.self_attn.k_proj.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.54.self_attn.o_proj.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.54.self_attn.q_proj.bias": "model-00013-of-00015.safetensors",
|
||||
"model.layers.54.self_attn.q_proj.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.54.self_attn.v_proj.bias": "model-00013-of-00015.safetensors",
|
||||
"model.layers.54.self_attn.v_proj.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.55.input_layernorm.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.55.mlp.down_proj.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.55.mlp.gate_proj.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.55.mlp.up_proj.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.55.post_attention_layernorm.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.55.self_attn.k_proj.bias": "model-00013-of-00015.safetensors",
|
||||
"model.layers.55.self_attn.k_proj.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.55.self_attn.o_proj.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.55.self_attn.q_proj.bias": "model-00013-of-00015.safetensors",
|
||||
"model.layers.55.self_attn.q_proj.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.55.self_attn.v_proj.bias": "model-00013-of-00015.safetensors",
|
||||
"model.layers.55.self_attn.v_proj.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.56.input_layernorm.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.56.mlp.down_proj.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.56.mlp.gate_proj.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.56.mlp.up_proj.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.56.post_attention_layernorm.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.56.self_attn.k_proj.bias": "model-00013-of-00015.safetensors",
|
||||
"model.layers.56.self_attn.k_proj.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.56.self_attn.o_proj.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.56.self_attn.q_proj.bias": "model-00013-of-00015.safetensors",
|
||||
"model.layers.56.self_attn.q_proj.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.56.self_attn.v_proj.bias": "model-00013-of-00015.safetensors",
|
||||
"model.layers.56.self_attn.v_proj.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.57.input_layernorm.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.57.mlp.down_proj.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.57.mlp.gate_proj.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.57.mlp.up_proj.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.57.post_attention_layernorm.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.57.self_attn.k_proj.bias": "model-00013-of-00015.safetensors",
|
||||
"model.layers.57.self_attn.k_proj.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.57.self_attn.o_proj.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.57.self_attn.q_proj.bias": "model-00013-of-00015.safetensors",
|
||||
"model.layers.57.self_attn.q_proj.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.57.self_attn.v_proj.bias": "model-00013-of-00015.safetensors",
|
||||
"model.layers.57.self_attn.v_proj.weight": "model-00013-of-00015.safetensors",
|
||||
"model.layers.58.input_layernorm.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.58.mlp.down_proj.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.58.mlp.gate_proj.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.58.mlp.up_proj.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.58.post_attention_layernorm.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.58.self_attn.k_proj.bias": "model-00014-of-00015.safetensors",
|
||||
"model.layers.58.self_attn.k_proj.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.58.self_attn.o_proj.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.58.self_attn.q_proj.bias": "model-00014-of-00015.safetensors",
|
||||
"model.layers.58.self_attn.q_proj.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.58.self_attn.v_proj.bias": "model-00014-of-00015.safetensors",
|
||||
"model.layers.58.self_attn.v_proj.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.59.input_layernorm.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.59.mlp.down_proj.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.59.mlp.gate_proj.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.59.mlp.up_proj.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.59.post_attention_layernorm.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.59.self_attn.k_proj.bias": "model-00014-of-00015.safetensors",
|
||||
"model.layers.59.self_attn.k_proj.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.59.self_attn.o_proj.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.59.self_attn.q_proj.bias": "model-00014-of-00015.safetensors",
|
||||
"model.layers.59.self_attn.q_proj.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.59.self_attn.v_proj.bias": "model-00014-of-00015.safetensors",
|
||||
"model.layers.59.self_attn.v_proj.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.6.input_layernorm.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.6.mlp.down_proj.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.6.mlp.gate_proj.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.6.mlp.up_proj.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.6.post_attention_layernorm.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.6.self_attn.k_proj.bias": "model-00002-of-00015.safetensors",
|
||||
"model.layers.6.self_attn.k_proj.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.6.self_attn.o_proj.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.6.self_attn.q_proj.bias": "model-00002-of-00015.safetensors",
|
||||
"model.layers.6.self_attn.q_proj.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.6.self_attn.v_proj.bias": "model-00002-of-00015.safetensors",
|
||||
"model.layers.6.self_attn.v_proj.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.60.input_layernorm.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.60.mlp.down_proj.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.60.mlp.gate_proj.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.60.mlp.up_proj.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.60.post_attention_layernorm.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.60.self_attn.k_proj.bias": "model-00014-of-00015.safetensors",
|
||||
"model.layers.60.self_attn.k_proj.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.60.self_attn.o_proj.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.60.self_attn.q_proj.bias": "model-00014-of-00015.safetensors",
|
||||
"model.layers.60.self_attn.q_proj.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.60.self_attn.v_proj.bias": "model-00014-of-00015.safetensors",
|
||||
"model.layers.60.self_attn.v_proj.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.61.input_layernorm.weight": "model-00015-of-00015.safetensors",
|
||||
"model.layers.61.mlp.down_proj.weight": "model-00015-of-00015.safetensors",
|
||||
"model.layers.61.mlp.gate_proj.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.61.mlp.up_proj.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.61.post_attention_layernorm.weight": "model-00015-of-00015.safetensors",
|
||||
"model.layers.61.self_attn.k_proj.bias": "model-00014-of-00015.safetensors",
|
||||
"model.layers.61.self_attn.k_proj.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.61.self_attn.o_proj.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.61.self_attn.q_proj.bias": "model-00014-of-00015.safetensors",
|
||||
"model.layers.61.self_attn.q_proj.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.61.self_attn.v_proj.bias": "model-00014-of-00015.safetensors",
|
||||
"model.layers.61.self_attn.v_proj.weight": "model-00014-of-00015.safetensors",
|
||||
"model.layers.62.input_layernorm.weight": "model-00015-of-00015.safetensors",
|
||||
"model.layers.62.mlp.down_proj.weight": "model-00015-of-00015.safetensors",
|
||||
"model.layers.62.mlp.gate_proj.weight": "model-00015-of-00015.safetensors",
|
||||
"model.layers.62.mlp.up_proj.weight": "model-00015-of-00015.safetensors",
|
||||
"model.layers.62.post_attention_layernorm.weight": "model-00015-of-00015.safetensors",
|
||||
"model.layers.62.self_attn.k_proj.bias": "model-00015-of-00015.safetensors",
|
||||
"model.layers.62.self_attn.k_proj.weight": "model-00015-of-00015.safetensors",
|
||||
"model.layers.62.self_attn.o_proj.weight": "model-00015-of-00015.safetensors",
|
||||
"model.layers.62.self_attn.q_proj.bias": "model-00015-of-00015.safetensors",
|
||||
"model.layers.62.self_attn.q_proj.weight": "model-00015-of-00015.safetensors",
|
||||
"model.layers.62.self_attn.v_proj.bias": "model-00015-of-00015.safetensors",
|
||||
"model.layers.62.self_attn.v_proj.weight": "model-00015-of-00015.safetensors",
|
||||
"model.layers.63.input_layernorm.weight": "model-00015-of-00015.safetensors",
|
||||
"model.layers.63.mlp.down_proj.weight": "model-00015-of-00015.safetensors",
|
||||
"model.layers.63.mlp.gate_proj.weight": "model-00015-of-00015.safetensors",
|
||||
"model.layers.63.mlp.up_proj.weight": "model-00015-of-00015.safetensors",
|
||||
"model.layers.63.post_attention_layernorm.weight": "model-00015-of-00015.safetensors",
|
||||
"model.layers.63.self_attn.k_proj.bias": "model-00015-of-00015.safetensors",
|
||||
"model.layers.63.self_attn.k_proj.weight": "model-00015-of-00015.safetensors",
|
||||
"model.layers.63.self_attn.o_proj.weight": "model-00015-of-00015.safetensors",
|
||||
"model.layers.63.self_attn.q_proj.bias": "model-00015-of-00015.safetensors",
|
||||
"model.layers.63.self_attn.q_proj.weight": "model-00015-of-00015.safetensors",
|
||||
"model.layers.63.self_attn.v_proj.bias": "model-00015-of-00015.safetensors",
|
||||
"model.layers.63.self_attn.v_proj.weight": "model-00015-of-00015.safetensors",
|
||||
"model.layers.7.input_layernorm.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.7.mlp.down_proj.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.7.mlp.gate_proj.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.7.mlp.up_proj.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.7.post_attention_layernorm.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.7.self_attn.k_proj.bias": "model-00002-of-00015.safetensors",
|
||||
"model.layers.7.self_attn.k_proj.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.7.self_attn.o_proj.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.7.self_attn.q_proj.bias": "model-00002-of-00015.safetensors",
|
||||
"model.layers.7.self_attn.q_proj.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.7.self_attn.v_proj.bias": "model-00002-of-00015.safetensors",
|
||||
"model.layers.7.self_attn.v_proj.weight": "model-00002-of-00015.safetensors",
|
||||
"model.layers.8.input_layernorm.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.8.mlp.down_proj.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.8.mlp.gate_proj.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.8.mlp.up_proj.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.8.post_attention_layernorm.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.8.self_attn.k_proj.bias": "model-00003-of-00015.safetensors",
|
||||
"model.layers.8.self_attn.k_proj.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.8.self_attn.o_proj.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.8.self_attn.q_proj.bias": "model-00003-of-00015.safetensors",
|
||||
"model.layers.8.self_attn.q_proj.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.8.self_attn.v_proj.bias": "model-00003-of-00015.safetensors",
|
||||
"model.layers.8.self_attn.v_proj.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.9.input_layernorm.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.9.mlp.down_proj.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.9.mlp.gate_proj.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.9.mlp.up_proj.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.9.post_attention_layernorm.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.9.self_attn.k_proj.bias": "model-00003-of-00015.safetensors",
|
||||
"model.layers.9.self_attn.k_proj.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.9.self_attn.o_proj.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.9.self_attn.q_proj.bias": "model-00003-of-00015.safetensors",
|
||||
"model.layers.9.self_attn.q_proj.weight": "model-00003-of-00015.safetensors",
|
||||
"model.layers.9.self_attn.v_proj.bias": "model-00003-of-00015.safetensors",
|
||||
"model.layers.9.self_attn.v_proj.weight": "model-00003-of-00015.safetensors",
|
||||
"model.norm.weight": "model-00015-of-00015.safetensors"
|
||||
}
|
||||
}
|
||||
23
special_tokens_map.json
Normal file
23
special_tokens_map.json
Normal file
@@ -0,0 +1,23 @@
|
||||
{
|
||||
"bos_token": {
|
||||
"content": "<seed:bos>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"eos_token": {
|
||||
"content": "<seed:eos>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": {
|
||||
"content": "<seed:pad>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
3
tokenizer.json
Normal file
3
tokenizer.json
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:f6bd848f52451824a3033a9f1e67eea5b399a13c90f845a332d3a29537e05827
|
||||
size 11883696
|
||||
1035
tokenizer_config.json
Normal file
1035
tokenizer_config.json
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user