初始化项目,由ModelHub XC社区提供模型

Model: CHIH-HUNG/llama-2-13b-FINETUNE2_3w-gate_up_down_proj
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-11 03:07:44 +08:00
commit dfaf0184c3
12 changed files with 94010 additions and 0 deletions

35
.gitattributes vendored Normal file
View File

@@ -0,0 +1,35 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text

73
README.md Normal file
View File

@@ -0,0 +1,73 @@
---
license: llama2
datasets:
- huangyt/FINETUNE2
---
# Model Card for Model ID
<!-- Provide a quick summary of what the model is/does. -->
在llama-2-13b上使用huangyt/FINETUNE2資料集進行訓練總資料筆數約3w
# Fine-Tuning Information
- **GPU:** RTX4090 (single core / 24564MiB)
- **model:** meta-llama/Llama-2-13b-hf
- **dataset:** huangyt/FINETUNE2 (共約3w筆訓練集)
- **peft_type:** LoRA
- **lora_rank:** 8
- **lora_target:** gate_proj, up_proj, down_proj
- **per_device_train_batch_size:** 8
- **gradient_accumulation_steps:** 8
- **learning_rate :** 5e-5
- **epoch:** 1
- **precision:** bf16
- **quantization:** load_in_4bit
# Fine-Tuning Detail
- **train_loss:** 0.614
- **train_runtime:** 3:42:14 (use deepspeed)
# Evaluation
- 評估結果來自**HuggingFaceH4/open_llm_leaderboard**
- 與Llama-2-13b比較4種Benchmark包含**ARC**、**HellaSwag**、**MMLU**、**TruthfulQA**
| Model |Average| ARC |HellaSwag| MMLU | TruthfulQA |
|-----------------------------------------------------|-------|-------|---------|-------|------------|
|meta-llama/Llama-2-13b-hf | 56.9 | 58.11 | 80.97 | 54.34 | 34.17 |
|meta-llama/Llama-2-13b-chat-hf | 59.93 | 59.04 | 81.94 | 54.64 | 44.12 |
|CHIH-HUNG/llama-2-13b-FINETUNE2_3w | 58.34 | 58.62 | 82.32 | 54.25 | 38.17 |
|CHIH-HUNG/llama-2-13b-FINETUNE2_3w-q_k_v_o_proj | 58.21 | 58.53 | 82.47 | 53.9 | 37.92 |
|CHIH-HUNG/llama-2-13b-FINETUNE2_3w-gate_up_down_proj | 58.65 | 57.42 | 82.42 | 55.57 | 39.19 |
# How to convert dataset to json
- 在**load_dataset**中輸入資料集名稱,並且在**take**中輸入要取前幾筆資料
- 觀察該資料集的欄位名稱,填入**example**欄位中(例如system_prompt、question、response)
- 最後指定json檔儲存位置 (**json_filename**)
```py
import json
from datasets import load_dataset
# 讀取數據集take可以取得該數據集前n筆資料
dataset = load_dataset("huangyt/FINETUNE2", split="train", streaming=True)
# 提取所需欄位並建立新的字典列表
extracted_data = []
for example in dataset:
extracted_example = {
"instruction": example["instruction"],
"input": example["input"],
"output": example["output"]
}
extracted_data.append(extracted_example)
# 指定 JSON 文件名稱
json_filename = "huangyt_FINETUNE2.json"
# 寫入 JSON 文件
with open(json_filename, "w") as json_file:
json.dump(extracted_data, json_file, indent=4)
print(f"數據已提取並保存為 {json_filename}")
```

26
config.json Normal file
View File

@@ -0,0 +1,26 @@
{
"_name_or_path": "meta-llama/Llama-2-13b-hf",
"architectures": [
"LlamaForCausalLM"
],
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 5120,
"initializer_range": 0.02,
"intermediate_size": 13824,
"max_position_embeddings": 4096,
"model_type": "llama",
"num_attention_heads": 40,
"num_hidden_layers": 40,
"num_key_value_heads": 40,
"pad_token_id": 0,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.31.0",
"use_cache": true,
"vocab_size": 32000
}

7
generation_config.json Normal file
View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 1,
"eos_token_id": 2,
"pad_token_id": 0,
"transformers_version": "4.31.0"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c6ab8049949ddf3d97fa45acc84e73e0c470ff2ba93288ba0c8c7cceedc0aa85
size 9948728430

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4ccab684ced3d96874b6408f2d53281601c9fff21f6c6fee92280b964458f338
size 9904165024

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f24169a1d119eb25500541c5a3181d1debc87b7b0d3ddc06efa3125c2f42e6b5
size 6178983625

View File

@@ -0,0 +1,410 @@
{
"metadata": {
"total_size": 26031738880
},
"weight_map": {
"lm_head.weight": "pytorch_model-00003-of-00003.bin",
"model.embed_tokens.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.0.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
"model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.1.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
"model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.10.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.10.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.10.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.10.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.10.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.10.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.10.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.10.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.10.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
"model.layers.10.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.11.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.11.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.11.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.11.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.11.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.11.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.11.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.11.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.11.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
"model.layers.11.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.12.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.12.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.12.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.12.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.12.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.12.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.12.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.12.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.12.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
"model.layers.12.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.13.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.13.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.13.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.13.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.13.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.13.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.13.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.13.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.13.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
"model.layers.13.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.14.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.14.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.14.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.14.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.14.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.14.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.14.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.14.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.14.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
"model.layers.14.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.15.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.15.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.15.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.15.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.15.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.15.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.15.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.15.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.15.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
"model.layers.15.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.16.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.16.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.16.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.16.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.16.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.16.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.16.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.16.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.16.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
"model.layers.16.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.17.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.17.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.17.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.17.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.17.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.17.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.17.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.17.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.17.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
"model.layers.17.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.18.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.18.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.18.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.18.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.18.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.18.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.18.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.18.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.18.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
"model.layers.18.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.19.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.19.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.19.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.19.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.19.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.19.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.19.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.19.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.19.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
"model.layers.19.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.2.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.2.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.2.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.2.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.2.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
"model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.20.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.20.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.20.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.20.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.20.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.20.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.20.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.20.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.20.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
"model.layers.20.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.21.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.21.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.21.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.21.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.21.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.21.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.21.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.21.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.21.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
"model.layers.21.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.22.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.22.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.22.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.22.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.22.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.22.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.22.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.22.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.22.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
"model.layers.22.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.23.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.23.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.23.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.23.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.23.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.23.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.23.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.23.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.23.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
"model.layers.23.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.24.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.24.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.24.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.24.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.24.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.24.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.24.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.24.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.24.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
"model.layers.24.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.25.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.25.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.25.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.25.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.25.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.25.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.25.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.25.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.25.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
"model.layers.25.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.26.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.26.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.26.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.26.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.26.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.26.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.26.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.26.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.26.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
"model.layers.26.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.27.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.27.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.27.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.27.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.27.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.27.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.27.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.27.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.27.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
"model.layers.27.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.28.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.28.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.28.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.28.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.28.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.28.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.28.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.28.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.28.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
"model.layers.28.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.29.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.29.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.29.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.29.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.29.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.29.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.29.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.29.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.29.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
"model.layers.29.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.3.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.3.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.3.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.3.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.3.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.3.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.3.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.3.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.3.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
"model.layers.3.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.30.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.30.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.30.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.30.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.30.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.30.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.30.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.30.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.30.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
"model.layers.30.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.31.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.31.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.31.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.31.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.31.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.31.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.31.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.31.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.31.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
"model.layers.31.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.32.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.32.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.32.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.32.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.32.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.32.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.32.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.32.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.32.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
"model.layers.32.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.33.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.33.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.33.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.33.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.33.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.33.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.33.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.33.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.33.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
"model.layers.33.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.34.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.34.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.34.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.34.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.34.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.34.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.34.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.34.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.34.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
"model.layers.34.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.35.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.35.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.35.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.35.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.35.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.35.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.35.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.35.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.35.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
"model.layers.35.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.36.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.36.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.36.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.36.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.36.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.36.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.36.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.36.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.36.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
"model.layers.36.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.37.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.37.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.37.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.37.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.37.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.37.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.37.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.37.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.37.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
"model.layers.37.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.38.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.38.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.38.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.38.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.38.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.38.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.38.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.38.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.38.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
"model.layers.38.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.39.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.39.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.39.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.39.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.39.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.39.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.39.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.39.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.39.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
"model.layers.39.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.4.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.4.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.4.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.4.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.4.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.4.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.4.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.4.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.4.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
"model.layers.4.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.5.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.5.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.5.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.5.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.5.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.5.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.5.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.5.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.5.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
"model.layers.5.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.6.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.6.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.6.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.6.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.6.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.6.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.6.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.6.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.6.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
"model.layers.6.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.7.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.7.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.7.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.7.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.7.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.7.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.7.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.7.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.7.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
"model.layers.7.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.8.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.8.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.8.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.8.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.8.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.8.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.8.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.8.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.8.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
"model.layers.8.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.9.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.9.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.9.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.9.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.9.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.9.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.9.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.9.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.9.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
"model.layers.9.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.norm.weight": "pytorch_model-00003-of-00003.bin"
}
}

23
special_tokens_map.json Normal file
View File

@@ -0,0 +1,23 @@
{
"bos_token": {
"content": "<s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "</s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"unk_token": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

93391
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

3
tokenizer.model Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
size 499723

33
tokenizer_config.json Normal file
View File

@@ -0,0 +1,33 @@
{
"bos_token": {
"__type": "AddedToken",
"content": "<s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"clean_up_tokenization_spaces": false,
"eos_token": {
"__type": "AddedToken",
"content": "</s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"legacy": false,
"model_max_length": 1000000000000000019884624838656,
"pad_token": null,
"padding_side": "right",
"sp_model_kwargs": {},
"tokenizer_class": "LlamaTokenizer",
"unk_token": {
"__type": "AddedToken",
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}