初始化项目,由ModelHub XC社区提供模型

Model: Magpie-Align/Llama-3-8B-Magpie-Align-SFT-v0.3
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-22 00:32:16 +08:00
commit 3fdb1eb962
19 changed files with 413485 additions and 0 deletions

43
.gitattributes vendored Normal file
View File

@@ -0,0 +1,43 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
model-00001-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
model-00002-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
model-00003-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
model-00004-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
pytorch_model-00001-of-00004.bin filter=lfs diff=lfs merge=lfs -text
pytorch_model-00002-of-00004.bin filter=lfs diff=lfs merge=lfs -text
pytorch_model-00003-of-00004.bin filter=lfs diff=lfs merge=lfs -text
pytorch_model-00004-of-00004.bin filter=lfs diff=lfs merge=lfs -text

190
README.md Normal file
View File

@@ -0,0 +1,190 @@
---
license: llama3
base_model: meta-llama/Meta-Llama-3-8B
tags:
- axolotl
- generated_from_trainer
model-index:
- name: Llama-3-8B-Magpie-Align-SFT-v0.3
results: []
datasets:
- Magpie-Align/Magpie-Reasoning-150K
- Magpie-Align/Magpie-Pro-MT-300K-v0.1
- Magpie-Align/Magpie-Qwen2-Pro-200K-Chinese
language:
- en
- zh
---
![Magpie](https://cdn-uploads.huggingface.co/production/uploads/653df1323479e9ebbe3eb6cc/FWWILXrAGNwWr52aghV0S.png)
# 🐦 Llama-3-8B-Magpie-Align-SFT-v0.3
Project Web: [https://magpie-align.github.io/](https://magpie-align.github.io/)
Arxiv Technical Report: [https://arxiv.org/abs/2406.08464](https://arxiv.org/abs/2406.08464)
Codes: [https://github.com/magpie-align/magpie](https://github.com/magpie-align/magpie)
## 🧐 About This Model
This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on
- [Magpie-Align/Magpie-Pro-MT-300K-v0.1](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-MT-300K-v0.1),
- [Magpie-Align/Magpie-Reasoning-150K](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-150K), and
- [Magpie-Align/Magpie-Qwen2-Pro-200K-Chinese](https://huggingface.co/datasets/Magpie-Align/Magpie-Qwen2-Pro-200K-Chinese)
Compared to [v0.2](https://huggingface.co/Magpie-Align/Llama-3-8B-Magpie-Align-SFT-v0.2), we enhance its multi-lingual ability by incorporating a new dataset with 200K Chinese instructions. It achieves performance comparable with the official Llama-3-8B-Instruct Model **with SFT only**! The detailed benchmark performance is as follows:
- **MT-Bench: 8.050 (1st Turn), 7.350 (Second Turn), 7.700 (Average)**
- **Alpaca Eval 2 (GPT-4-Turbo-1106): 26.37 (LC), 26.42 (WR)**
- **Alpaca Eval 2 (Llama-3-8B-Instruct): 54.53 (LC), 55.26 (WR)**
- **Arena Hard: 20.6**
## 👀 Other Information
**License**: Please follow [Meta Llama 3 Community License](https://llama.meta.com/llama3/license).
**Conversation Template**: Please use Llama 3 **official chat template** for the best performance.
**How to use it?** Please check the official [Llama 3 repository](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct#how-to-use) for detailed instructions. Simply replace the original `model_id` with `Magpie-Align/Llama-3-8B-Magpie-Align-SFT-v0.3`.
## 📚 Citation
If you find the model, data, or code useful, please cite our paper:
```
@article{xu2024magpie,
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
**Questions?** Please contact [Zhangchen](https://zhangchenxu.com/) by email.
## Paper Abstract
<details><summary>Click Here</summary>
High-quality instruction data is critical for aligning large language models (LLMs). Although some models, such as Llama-3-Instruct, have open weights, their alignment data remain private, which hinders the democratization of AI. High human labor costs and a limited, predefined scope for prompting prevent existing open-source data creation methods from scaling effectively, potentially limiting the diversity and quality of public alignment datasets. Is it possible to synthesize high-quality instruction data at scale by extracting it directly from an aligned LLM? We present a self-synthesis method for generating large-scale alignment data named Magpie. Our key observation is that aligned LLMs like Llama-3-Instruct can generate a user query when we input only the left-side templates up to the position reserved for user messages, thanks to their auto-regressive nature. We use this method to prompt Llama-3-Instruct and generate 4 million instructions along with their corresponding responses. We perform a comprehensive analysis of the extracted data and select 300K high-quality instances. To compare Magpie data with other public instruction datasets, we fine-tune Llama-3-8B-Base with each dataset and evaluate the performance of the fine-tuned models. Our results indicate that in some tasks, models fine-tuned with Magpie perform comparably to the official Llama-3-8B-Instruct, despite the latter being enhanced with 10 million data points through supervised fine-tuning (SFT) and subsequent feedback learning. We also show that using Magpie solely for SFT can surpass the performance of previous public datasets utilized for both SFT and preference optimization, such as direct preference optimization with UltraFeedback. This advantage is evident on alignment benchmarks such as AlpacaEval, ArenaHard, and WildBench.
</details><be>
## 🏃‍♂️‍➡️ Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 32
- total_train_batch_size: 128
- total_eval_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 98
- num_epochs: 2
### Training results
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| 0.8616 | 0.0019 | 1 | 0.8870 |
| 0.5554 | 0.2013 | 106 | 0.5568 |
| 0.5067 | 0.4027 | 212 | 0.5065 |
| 0.4728 | 0.6040 | 318 | 0.4865 |
| 0.4681 | 0.8054 | 424 | 0.4740 |
| 0.4563 | 1.0067 | 530 | 0.4662 |
| 0.4115 | 1.1944 | 636 | 0.4642 |
| 0.3993 | 1.3957 | 742 | 0.4620 |
| 0.4048 | 1.5971 | 848 | 0.4613 |
| 0.4167 | 1.7984 | 954 | 0.4611 |
### Framework versions
- Transformers 4.42.3
- Pytorch 2.3.1+cu121
- Datasets 2.19.1
- Tokenizers 0.19.1
*Internal name for identification: Llama-3-8B-Magpie-Mix-RC*. Please change the model name in the below Axolotl config.
[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
<details><summary>See axolotl config</summary>
axolotl version: `0.4.1`
```yaml
base_model: meta-llama/Meta-Llama-3-8B
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer
load_in_8bit: false
load_in_4bit: false
strict: false
datasets:
- path: Magpie-Align/Magpie-Reasoning-150K
type: sharegpt
conversation: llama3
- path: Magpie-Align/Magpie-Qwen2-Pro-200K-Chinese
type: sharegpt
conversation: llama3
- path: Magpie-Align/Magpie-Pro-MT-300K-v0.1
type: sharegpt
conversation: llama3
dataset_prepared_path: last_run_prepared
val_set_size: 0.001
output_dir: axolotl_out/Llama-3-8B-Magpie-Mix-RC
sequence_len: 8192
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true
wandb_project: SynDa
wandb_entity:
wandb_watch:
wandb_name: Llama-3-8B-Magpie-Mix-RC
wandb_log_model:
hub_model_id: Magpie-Align/Llama-3-8B-Magpie-Mix-RC
gradient_accumulation_steps: 32
micro_batch_size: 1
num_epochs: 2
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 2e-5
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
early_stopping_patience:
resume_from_checkpoint:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_ratio: 0.1
evals_per_epoch: 5
eval_table_size:
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
pad_token: <|end_of_text|>
```
</details><br>

29
config.json Normal file
View File

@@ -0,0 +1,29 @@
{
"_name_or_path": "meta-llama/Meta-Llama-3-8B",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128001,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.42.3",
"use_cache": false,
"vocab_size": 128256
}

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}

9
generation_config.json Normal file
View File

@@ -0,0 +1,9 @@
{
"bos_token_id": 128000,
"do_sample": true,
"eos_token_id": 128001,
"max_length": 4096,
"temperature": 0.6,
"top_p": 0.9,
"transformers_version": "4.42.3"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5f2a39301a321d2a4b4d2452145484b6153e56b9401524d4d7a473eba2bc123b
size 4976698672

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:07377cc920443939a57440bbf72a66f6fe299ce937b7e17115ac531161b7465b
size 4999802720

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:04b22840255be085264d23284cba5669ecc77952a87e85c54ae70187a50ef233
size 4915916176

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:56a27ac1098d947f10f6ddf86a8c44de995e75082b61c30f0fcc5b4cb682784a
size 1168138808

View File

@@ -0,0 +1,298 @@
{
"metadata": {
"total_size": 16060522496
},
"weight_map": {
"lm_head.weight": "model-00004-of-00004.safetensors",
"model.embed_tokens.weight": "model-00001-of-00004.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.20.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.21.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.30.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.input_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.norm.weight": "model-00004-of-00004.safetensors"
}
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:342eb884bf9cec068425b404b2cff0ffbaac52c25fa1a5dd2705848b062099a9
size 4976718466

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e33cd7b2a6fe22ccc42d1586bdcfda3feec178f94aaa9c6e8756f486de794d6f
size 4999827718

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ba1e149b39229c702b84c51b72c399060c23315f1b3ed4dad6c38b10ac6291e2
size 4915940170

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:07608f6504dc4ce3d6f6b23832080c4d907d739c3fcfc8e47ee40b3acd03749d
size 1168140873

View File

@@ -0,0 +1,298 @@
{
"metadata": {
"total_size": 16060522496
},
"weight_map": {
"lm_head.weight": "pytorch_model-00004-of-00004.bin",
"model.embed_tokens.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.10.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.10.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.10.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.10.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.10.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.10.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.10.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.10.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.10.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.11.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.11.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.11.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.11.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.11.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.11.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.11.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.11.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.11.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.12.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.12.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.12.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.12.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.12.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.12.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.12.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.12.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.12.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.13.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.13.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.13.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.13.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.13.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.13.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.13.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.13.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.13.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.14.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.14.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.14.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.14.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.14.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.14.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.14.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.14.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.14.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.15.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.15.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.15.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.15.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.15.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.15.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.15.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.15.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.15.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.16.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.16.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.16.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.16.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.16.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.16.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.16.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.16.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.16.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.17.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.17.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.17.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.17.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.17.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.17.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.17.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.17.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.17.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.18.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.18.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.18.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.18.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.18.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.18.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.18.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.18.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.18.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.19.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.19.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.19.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.19.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.19.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.19.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.19.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.19.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.19.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.2.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.2.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.2.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.2.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.20.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.20.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.20.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.20.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.20.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.20.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.20.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.20.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.20.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.21.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.21.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.21.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.21.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.21.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.21.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.21.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.21.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.21.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.22.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.22.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.22.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.22.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.22.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.22.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.22.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.22.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.22.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.23.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.23.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.23.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.23.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.23.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.23.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.23.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.23.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.23.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.24.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.24.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.24.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.24.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.24.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.24.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.24.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.24.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.24.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.25.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.25.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.25.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.25.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.25.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.25.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.25.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.25.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.25.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.26.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.26.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.26.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.26.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.26.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.26.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.26.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.26.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.26.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.27.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.27.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.27.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.27.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.27.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.27.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.27.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.27.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.27.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.28.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.28.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.28.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.28.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.28.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.28.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.28.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.28.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.28.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.29.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.29.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.29.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.29.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.29.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.29.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.29.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.29.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.29.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.3.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.3.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.3.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.3.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.3.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.3.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.3.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.3.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.3.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.30.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.30.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.30.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.30.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.30.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.30.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.30.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.30.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.30.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.31.input_layernorm.weight": "pytorch_model-00004-of-00004.bin",
"model.layers.31.mlp.down_proj.weight": "pytorch_model-00004-of-00004.bin",
"model.layers.31.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.31.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.31.post_attention_layernorm.weight": "pytorch_model-00004-of-00004.bin",
"model.layers.31.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.31.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.31.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.31.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
"model.layers.4.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.4.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.4.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.4.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.4.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.4.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.4.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.4.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.4.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.5.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.5.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.5.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.5.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.5.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.5.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.5.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.5.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.5.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.6.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.6.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.6.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.6.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.6.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.6.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.6.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.6.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.6.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.7.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.7.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.7.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.7.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.7.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.7.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.7.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.7.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.7.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.8.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.8.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.8.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.8.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.8.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.8.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.8.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.8.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.8.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
"model.layers.9.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.9.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.9.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.9.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.9.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.9.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.9.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.9.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.layers.9.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
"model.norm.weight": "pytorch_model-00004-of-00004.bin"
}
}

23
special_tokens_map.json Normal file
View File

@@ -0,0 +1,23 @@
{
"bos_token": {
"content": "<|begin_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<|end_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<|end_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

410504
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

2063
tokenizer_config.json Normal file

File diff suppressed because it is too large Load Diff

3
training_args.bin Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2a015d48c47573ef07587622d7a89db9bc8496e338de1cd51e3cb7c410d5ec0c
size 6200