初始化项目,由ModelHub XC社区提供模型
Model: Magpie-Align/MagpieLM-4B-Chat-v0.1 Source: Original Platform
This commit is contained in:
37
.gitattributes
vendored
Normal file
37
.gitattributes
vendored
Normal file
@@ -0,0 +1,37 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
model-00001-of-00002.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
model-00002-of-00002.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
295
README.md
Normal file
295
README.md
Normal file
@@ -0,0 +1,295 @@
|
||||
---
|
||||
library_name: transformers
|
||||
license: other
|
||||
base_model: Magpie-Align/MagpieLM-4B-SFT-v0.1
|
||||
tags:
|
||||
- alignment-handbook
|
||||
- trl
|
||||
- dpo
|
||||
- generated_from_trainer
|
||||
datasets:
|
||||
- Magpie-Align/MagpieLM-SFT-Data-v0.1
|
||||
- Magpie-Align/MagpieLM-DPO-Data-v0.1
|
||||
model-index:
|
||||
- name: MagpieLM-4B-Chat-v0.1
|
||||
results: []
|
||||
---
|
||||
|
||||

|
||||
|
||||
# 🐦 MagpieLM-4B-Chat-v0.1
|
||||
|
||||
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://api.wandb.ai/links/uw-nsl/ilv83ciw)
|
||||
|
||||
## 🧐 About This Model
|
||||
|
||||
*Model full name: Llama3.1-MagpieLM-4B-Chat-v0.1*
|
||||
|
||||
This model is an aligned version of [Llama-3.1-Minitron-4B-Width](https://huggingface.co/nvidia/Llama-3.1-Minitron-4B-Width-Base), which achieves state-of-the-art performance among open-aligned SLMs. It even outperforms larger open-weight models including Llama-3-8B-Instruct, Llama-3.1-8B-Instruct and Qwen-2-7B-Instruct.
|
||||
|
||||
We apply the following standard alignment pipeline with two carefully crafted synthetic datasets. Feel free to use these datasets and reproduce our model, or make your own friendly chatbots :)
|
||||
|
||||
We first perform SFT using [Magpie-Align/MagpieLM-SFT-Data-v0.1](https://huggingface.co/datasets/Magpie-Align/MagpieLM-SFT-Data-v0.1).
|
||||
* **SFT Model Checkpoint:** [Magpie-Align/MagpieLM-4B-SFT-v0.1](https://huggingface.co/Magpie-Align/MagpieLM-4B-SFT-v0.1)
|
||||
|
||||
We then perform DPO on the [Magpie-Align/MagpieLM-DPO-Data-v0.1](https://huggingface.co/datasets/Magpie-Align/MagpieLM-DPO-Data-v0.1) dataset.
|
||||
|
||||
[*See more powerful 8B version here!*](https://huggingface.co/Magpie-Align/MagpieLM-8B-Chat-v0.1)
|
||||
|
||||
## 🔥 Benchmark Performance
|
||||
|
||||
Greedy Decoding
|
||||
|
||||
- **Alpaca Eval 2: 40.99 (LC), 45.19 (WR)**
|
||||
- **Arena Hard: 24.6**
|
||||
- **WildBench WB Score (v2.0625): 32.37**
|
||||
|
||||
**Benchmark Performance Compare to Other SOTA SLMs**
|
||||
|
||||

|
||||
|
||||
## 👀 Other Information
|
||||
|
||||
**License**: Please follow [NVIDIA Open Model License Agreement](https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf).
|
||||
|
||||
**Conversation Template**: Please use the **Llama 3 chat template** for the best performance.
|
||||
|
||||
**Limitations**: This model primarily understands and generates content in English. Its outputs may contain factual errors, logical inconsistencies, or reflect biases present in the training data. While the model aims to improve instruction-following and helpfulness, it isn't specifically designed for complex reasoning tasks, potentially leading to suboptimal performance in these areas. Additionally, the model may produce unsafe or inappropriate content, as no specific safety training were implemented during the alignment process.
|
||||
|
||||
## 🧐 How to use it?
|
||||
|
||||
[](https://huggingface.co/spaces/flydust/MagpieLM-4B)
|
||||
|
||||
Please update transformers to the latest version by `pip install git+https://github.com/huggingface/transformers`.
|
||||
|
||||
You can then run conversational inference using the Transformers `pipeline` abstraction or by leveraging the Auto classes with the `generate()` function.
|
||||
|
||||
```python
|
||||
import transformers
|
||||
import torch
|
||||
|
||||
model_id = "MagpieLM-4B-Chat-v0.1"
|
||||
|
||||
pipeline = transformers.pipeline(
|
||||
"text-generation",
|
||||
model=model_id,
|
||||
model_kwargs={"torch_dtype": torch.bfloat16},
|
||||
device_map="auto",
|
||||
)
|
||||
|
||||
messages = [
|
||||
{"role": "system", "content": "You are Magpie, a friendly AI assistant."},
|
||||
{"role": "user", "content": "Who are you?"},
|
||||
]
|
||||
|
||||
outputs = pipeline(
|
||||
messages,
|
||||
max_new_tokens=256,
|
||||
)
|
||||
print(outputs[0]["generated_text"][-1])
|
||||
```
|
||||
|
||||
---
|
||||
# Alignment Pipeline
|
||||
|
||||
The detailed alignment pipeline is as follows.
|
||||
|
||||
## Stage 1: Supervised Fine-tuning
|
||||
|
||||
We use [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) for SFT. Please refer to the model card of [SFT checkpoint](https://huggingface.co/Magpie-Align/MagpieLM-4B-SFT-v0.1) and below for detailed configurations.
|
||||
|
||||
[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
|
||||
<details><summary>See axolotl config</summary>
|
||||
|
||||
axolotl version: `0.4.1`
|
||||
```yaml
|
||||
base_model: nvidia/Llama-3.1-Minitron-4B-Width-Base
|
||||
model_type: AutoModelForCausalLM
|
||||
tokenizer_type: AutoTokenizer
|
||||
chat_template: llama3
|
||||
|
||||
load_in_8bit: false
|
||||
load_in_4bit: false
|
||||
strict: false
|
||||
|
||||
datasets:
|
||||
- path: Magpie-Align/MagpieLM-SFT-Data-v0.1
|
||||
type: sharegpt
|
||||
conversation: llama3
|
||||
dataset_prepared_path: last_run_prepared
|
||||
val_set_size: 0.001
|
||||
output_dir: axolotl_out/MagpieLM-4B-SFT-v0.1
|
||||
|
||||
sequence_len: 8192
|
||||
sample_packing: true
|
||||
eval_sample_packing: false
|
||||
pad_to_sequence_len: true
|
||||
|
||||
wandb_project: SynDa
|
||||
wandb_entity:
|
||||
wandb_watch:
|
||||
wandb_name: Llama3.1-MagpieLM-4B-SFT-v0.1
|
||||
wandb_log_model:
|
||||
hub_model_id: Magpie-Align/MagpieLM-4B-SFT-v0.1
|
||||
|
||||
gradient_accumulation_steps: 32
|
||||
micro_batch_size: 1
|
||||
num_epochs: 2
|
||||
optimizer: paged_adamw_8bit
|
||||
lr_scheduler: cosine
|
||||
learning_rate: 2e-5
|
||||
|
||||
train_on_inputs: false
|
||||
group_by_length: false
|
||||
bf16: true
|
||||
fp16:
|
||||
tf32: false
|
||||
|
||||
gradient_checkpointing: true
|
||||
gradient_checkpointing_kwargs:
|
||||
use_reentrant: false
|
||||
early_stopping_patience:
|
||||
resume_from_checkpoint:
|
||||
logging_steps: 1
|
||||
xformers_attention:
|
||||
flash_attention: true
|
||||
|
||||
warmup_ratio: 0.1
|
||||
evals_per_epoch: 5
|
||||
eval_table_size:
|
||||
saves_per_epoch: 1
|
||||
debug:
|
||||
deepspeed:
|
||||
weight_decay: 0.0
|
||||
fsdp:
|
||||
fsdp_config:
|
||||
special_tokens:
|
||||
pad_token: <|end_of_text|>
|
||||
|
||||
```
|
||||
</details><br>
|
||||
|
||||
## Stage 2: Direct Preference Optimization
|
||||
|
||||
We use [alignment handbook](https://github.com/huggingface/alignment-handbook) for DPO.
|
||||
|
||||
### Training hyperparameters
|
||||
|
||||
The following hyperparameters were used during training:
|
||||
- learning_rate: 1.5e-07
|
||||
- train_batch_size: 2
|
||||
- eval_batch_size: 4
|
||||
- seed: 42
|
||||
- distributed_type: multi-GPU
|
||||
- num_devices: 4
|
||||
- gradient_accumulation_steps: 16
|
||||
- total_train_batch_size: 128
|
||||
- total_eval_batch_size: 16
|
||||
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
||||
- lr_scheduler_type: cosine
|
||||
- lr_scheduler_warmup_ratio: 0.1
|
||||
- num_epochs: 1
|
||||
|
||||
### Training results
|
||||
|
||||
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|
||||
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
|
||||
| 0.6911 | 0.0653 | 100 | 0.6912 | -0.0026 | -0.0066 | 0.5640 | 0.0041 | -502.9037 | -510.6042 | -1.7834 | -1.7781 |
|
||||
| 0.6703 | 0.1306 | 200 | 0.6713 | -0.1429 | -0.1981 | 0.6380 | 0.0552 | -522.0521 | -524.6394 | -1.7686 | -1.7593 |
|
||||
| 0.6306 | 0.1959 | 300 | 0.6347 | -0.6439 | -0.8210 | 0.6840 | 0.1770 | -584.3356 | -574.7375 | -1.7536 | -1.7436 |
|
||||
| 0.5831 | 0.2612 | 400 | 0.5932 | -1.5155 | -1.8774 | 0.7070 | 0.3619 | -689.9788 | -661.8920 | -1.6963 | -1.6877 |
|
||||
| 0.5447 | 0.3266 | 500 | 0.5645 | -2.1858 | -2.7052 | 0.7110 | 0.5195 | -772.7636 | -728.9221 | -1.6249 | -1.6207 |
|
||||
| 0.5896 | 0.3919 | 600 | 0.5453 | -2.3771 | -2.9747 | 0.7180 | 0.5976 | -799.7122 | -748.0584 | -1.5836 | -1.5847 |
|
||||
| 0.5342 | 0.4572 | 700 | 0.5305 | -2.6231 | -3.3063 | 0.7350 | 0.6832 | -832.8744 | -772.6592 | -1.5454 | -1.5524 |
|
||||
| 0.511 | 0.5225 | 800 | 0.5177 | -3.0517 | -3.8393 | 0.7400 | 0.7876 | -886.1714 | -815.5145 | -1.5160 | -1.5273 |
|
||||
| 0.5007 | 0.5878 | 900 | 0.5088 | -3.0925 | -3.9197 | 0.7540 | 0.8273 | -894.2120 | -819.5908 | -1.5007 | -1.5144 |
|
||||
| 0.485 | 0.6531 | 1000 | 0.5033 | -3.1305 | -3.9863 | 0.7630 | 0.8558 | -900.8680 | -823.3940 | -1.4834 | -1.4997 |
|
||||
| 0.4307 | 0.7184 | 1100 | 0.4989 | -3.1387 | -4.0097 | 0.7610 | 0.8710 | -903.2113 | -824.2159 | -1.4728 | -1.4911 |
|
||||
| 0.5403 | 0.7837 | 1200 | 0.4964 | -3.3418 | -4.2574 | 0.7620 | 0.9156 | -927.9747 | -844.5242 | -1.4641 | -1.4822 |
|
||||
| 0.5182 | 0.8490 | 1300 | 0.4952 | -3.3255 | -4.2430 | 0.7600 | 0.9175 | -926.5396 | -842.8945 | -1.4601 | -1.4788 |
|
||||
| 0.5165 | 0.9144 | 1400 | 0.4943 | -3.3308 | -4.2525 | 0.7600 | 0.9217 | -927.4913 | -843.4282 | -1.4610 | -1.4799 |
|
||||
| 0.5192 | 0.9797 | 1500 | 0.4942 | -3.3377 | -4.2603 | 0.7620 | 0.9226 | -928.2655 | -844.1144 | -1.4591 | -1.4783 |
|
||||
|
||||
|
||||
### Framework versions
|
||||
|
||||
- Transformers 4.45.0.dev0
|
||||
- Pytorch 2.3.1+cu121
|
||||
- Datasets 2.20.0
|
||||
- Tokenizers 0.19.1
|
||||
|
||||
<details><summary>See alignment handbook configs</summary>
|
||||
|
||||
```yaml
|
||||
# Customized Configs
|
||||
model_name_or_path: Magpie-Align/MagpieLM-4B-SFT-v0.1
|
||||
hub_model_id: Magpie-Align/MagpieLM-4B-Chat-v0.1
|
||||
output_dir: alignment_handbook_out/MagpieLM-4B-Chat-v0.1
|
||||
run_name: MagpieLM-4B-Chat-v0.1
|
||||
|
||||
dataset_mixer:
|
||||
Magpie-Align/MagpieLM-DPO-Data-v0.1: 1.0
|
||||
dataset_splits:
|
||||
- train
|
||||
- test
|
||||
preprocessing_num_workers: 24
|
||||
|
||||
# DPOTrainer arguments
|
||||
bf16: true
|
||||
beta: 0.01
|
||||
learning_rate: 1.5e-7
|
||||
gradient_accumulation_steps: 16
|
||||
per_device_train_batch_size: 2
|
||||
per_device_eval_batch_size: 4
|
||||
num_train_epochs: 1
|
||||
max_length: 2048
|
||||
max_prompt_length: 1800
|
||||
warmup_ratio: 0.1
|
||||
logging_steps: 1
|
||||
lr_scheduler_type: cosine
|
||||
optim: adamw_torch
|
||||
|
||||
torch_dtype: null
|
||||
# use_flash_attention_2: true
|
||||
do_eval: true
|
||||
evaluation_strategy: steps
|
||||
eval_steps: 100
|
||||
gradient_checkpointing: true
|
||||
gradient_checkpointing_kwargs:
|
||||
use_reentrant: False
|
||||
log_level: info
|
||||
push_to_hub: true
|
||||
save_total_limit: 0
|
||||
seed: 42
|
||||
report_to:
|
||||
- wandb
|
||||
```
|
||||
</details><be>
|
||||
|
||||
## 📚 Citation
|
||||
|
||||
If you find the model, data, or code useful, please cite:
|
||||
```
|
||||
@article{xu2024magpie,
|
||||
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
|
||||
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
|
||||
year={2024},
|
||||
eprint={2406.08464},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.CL}
|
||||
}
|
||||
|
||||
@article{xu2024stronger,
|
||||
title={Stronger Models are NOT Stronger Teachers for Instruction Tuning},
|
||||
author={Xu, Zhangchen and Jiang, Fengqing and Niu, Luyao and Lin, Bill Yuchen and Poovendran, Radha},
|
||||
journal={arXiv preprint arXiv:2411.07133},
|
||||
year={2024}
|
||||
}
|
||||
```
|
||||
|
||||
**Contact**
|
||||
|
||||
Questions? Contact:
|
||||
- [Zhangchen Xu](https://zhangchenxu.com/) [zxu9 at uw dot edu], and
|
||||
- [Bill Yuchen Lin](https://yuchenlin.xyz/) [yuchenlin1995 at gmail dot com]
|
||||
|
||||
22
all_results.json
Normal file
22
all_results.json
Normal file
@@ -0,0 +1,22 @@
|
||||
{
|
||||
"epoch": 0.9999183606825047,
|
||||
"eval_logits/chosen": -1.4790972471237183,
|
||||
"eval_logits/rejected": -1.4601374864578247,
|
||||
"eval_logps/chosen": -844.0220947265625,
|
||||
"eval_logps/rejected": -928.1929321289062,
|
||||
"eval_loss": 0.4941699206829071,
|
||||
"eval_rewards/accuracies": 0.7620000243186951,
|
||||
"eval_rewards/chosen": -3.3367679119110107,
|
||||
"eval_rewards/margins": 0.9227678179740906,
|
||||
"eval_rewards/rejected": -4.259535789489746,
|
||||
"eval_runtime": 300.0346,
|
||||
"eval_samples": 4000,
|
||||
"eval_samples_per_second": 13.332,
|
||||
"eval_steps_per_second": 0.833,
|
||||
"total_flos": 0.0,
|
||||
"train_loss": 0.5524635882372952,
|
||||
"train_runtime": 40329.8241,
|
||||
"train_samples": 195977,
|
||||
"train_samples_per_second": 4.859,
|
||||
"train_steps_per_second": 0.038
|
||||
}
|
||||
36
config.json
Normal file
36
config.json
Normal file
@@ -0,0 +1,36 @@
|
||||
{
|
||||
"_name_or_path": "flydust/Llama-3.1-Minitron-4B-Magpie-Gemma2-9B-550K",
|
||||
"architectures": [
|
||||
"LlamaForCausalLM"
|
||||
],
|
||||
"attention_bias": false,
|
||||
"attention_dropout": 0.0,
|
||||
"bos_token_id": 128000,
|
||||
"eos_token_id": 128001,
|
||||
"head_dim": 128,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 3072,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 9216,
|
||||
"max_position_embeddings": 131072,
|
||||
"mlp_bias": false,
|
||||
"model_type": "llama",
|
||||
"num_attention_heads": 32,
|
||||
"num_hidden_layers": 32,
|
||||
"num_key_value_heads": 8,
|
||||
"pretraining_tp": 1,
|
||||
"rms_norm_eps": 1e-05,
|
||||
"rope_scaling": {
|
||||
"factor": 8.0,
|
||||
"high_freq_factor": 4.0,
|
||||
"low_freq_factor": 1.0,
|
||||
"original_max_position_embeddings": 8192,
|
||||
"rope_type": "llama3"
|
||||
},
|
||||
"rope_theta": 500000.0,
|
||||
"tie_word_embeddings": false,
|
||||
"torch_dtype": "bfloat16",
|
||||
"transformers_version": "4.45.0.dev0",
|
||||
"use_cache": true,
|
||||
"vocab_size": 128256
|
||||
}
|
||||
1
configuration.json
Normal file
1
configuration.json
Normal file
@@ -0,0 +1 @@
|
||||
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}
|
||||
16
eval_results.json
Normal file
16
eval_results.json
Normal file
@@ -0,0 +1,16 @@
|
||||
{
|
||||
"epoch": 0.9999183606825047,
|
||||
"eval_logits/chosen": -1.4790972471237183,
|
||||
"eval_logits/rejected": -1.4601374864578247,
|
||||
"eval_logps/chosen": -844.0220947265625,
|
||||
"eval_logps/rejected": -928.1929321289062,
|
||||
"eval_loss": 0.4941699206829071,
|
||||
"eval_rewards/accuracies": 0.7620000243186951,
|
||||
"eval_rewards/chosen": -3.3367679119110107,
|
||||
"eval_rewards/margins": 0.9227678179740906,
|
||||
"eval_rewards/rejected": -4.259535789489746,
|
||||
"eval_runtime": 300.0346,
|
||||
"eval_samples": 4000,
|
||||
"eval_samples_per_second": 13.332,
|
||||
"eval_steps_per_second": 0.833
|
||||
}
|
||||
7
generation_config.json
Normal file
7
generation_config.json
Normal file
@@ -0,0 +1,7 @@
|
||||
{
|
||||
"_from_model_config": true,
|
||||
"bos_token_id": 128000,
|
||||
"do_sample": true,
|
||||
"eos_token_id": 128001,
|
||||
"transformers_version": "4.45.0.dev0"
|
||||
}
|
||||
3
model-00001-of-00002.safetensors
Normal file
3
model-00001-of-00002.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:a9036ae1e92c8923de7547625784d89928f1655ce96b82f0a45acac4a4232c0c
|
||||
size 4978354640
|
||||
3
model-00002-of-00002.safetensors
Normal file
3
model-00002-of-00002.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:526abcd98b9a370e32fb6e503215266a166ba0fe74f6087dbc613996e78d7d95
|
||||
size 4047172128
|
||||
298
model.safetensors.index.json
Normal file
298
model.safetensors.index.json
Normal file
@@ -0,0 +1,298 @@
|
||||
{
|
||||
"metadata": {
|
||||
"total_size": 9025492992
|
||||
},
|
||||
"weight_map": {
|
||||
"lm_head.weight": "model-00002-of-00002.safetensors",
|
||||
"model.embed_tokens.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.10.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.10.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.10.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.10.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.10.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.10.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.11.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.11.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.11.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.11.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.11.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.11.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.11.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.12.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.12.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.12.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.12.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.13.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.13.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.13.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.13.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.13.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.13.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.14.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.14.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.14.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.14.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.14.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.14.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.14.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.15.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.15.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.15.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.15.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.15.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.15.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.15.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.16.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.16.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.16.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.16.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.16.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.16.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.16.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.17.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.17.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.17.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.17.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.17.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.17.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.17.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.17.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.18.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.18.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.18.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.18.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.18.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.18.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.18.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.18.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.18.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.19.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.19.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.19.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.19.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.19.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.19.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.19.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.19.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.19.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.20.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.20.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.20.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.20.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.20.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.20.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.20.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.20.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.20.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.21.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.21.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.21.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.21.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.21.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.21.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.21.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.21.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.21.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.22.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.22.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.22.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.22.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.22.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.22.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.22.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.22.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.22.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.23.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.23.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.23.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.23.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.23.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.23.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.23.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.23.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.23.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.24.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.24.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.24.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.24.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.24.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.24.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.24.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.24.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.24.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.25.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.25.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.25.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.25.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.25.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.25.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.25.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.25.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.25.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.26.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.26.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.26.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.26.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.26.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.26.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.26.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.26.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.26.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.27.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.27.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.27.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.27.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.27.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.27.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.27.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.27.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.27.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.28.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.28.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.28.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.28.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.28.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.28.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.28.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.28.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.28.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.29.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.29.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.29.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.29.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.29.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.29.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.29.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.29.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.29.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.30.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.30.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.30.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.30.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.30.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.30.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.30.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.30.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.30.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.31.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.31.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.31.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.31.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.31.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.31.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.31.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.31.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.31.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.8.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.8.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.9.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.9.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.9.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.9.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.9.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.9.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.9.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.9.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.norm.weight": "model-00002-of-00002.safetensors"
|
||||
}
|
||||
}
|
||||
23
special_tokens_map.json
Normal file
23
special_tokens_map.json
Normal file
@@ -0,0 +1,23 @@
|
||||
{
|
||||
"bos_token": {
|
||||
"content": "<|begin_of_text|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"eos_token": {
|
||||
"content": "<|end_of_text|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": {
|
||||
"content": "<|end_of_text|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
410563
tokenizer.json
Normal file
410563
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
2063
tokenizer_config.json
Normal file
2063
tokenizer_config.json
Normal file
File diff suppressed because it is too large
Load Diff
9
train_results.json
Normal file
9
train_results.json
Normal file
@@ -0,0 +1,9 @@
|
||||
{
|
||||
"epoch": 0.9999183606825047,
|
||||
"total_flos": 0.0,
|
||||
"train_loss": 0.5524635882372952,
|
||||
"train_runtime": 40329.8241,
|
||||
"train_samples": 195977,
|
||||
"train_samples_per_second": 4.859,
|
||||
"train_steps_per_second": 0.038
|
||||
}
|
||||
23247
trainer_state.json
Normal file
23247
trainer_state.json
Normal file
File diff suppressed because it is too large
Load Diff
3
training_args.bin
Normal file
3
training_args.bin
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:9bf1a49601ef38c1a50cf398fbcdec4ed2e87b7a063f54c2c9ef3ab4cba62345
|
||||
size 6712
|
||||
Reference in New Issue
Block a user