初始化项目,由ModelHub XC社区提供模型

Model: ibivibiv/athene-noctua-13b
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-11 18:02:38 +08:00
commit e5ffb40d96
21 changed files with 94210 additions and 0 deletions

35
.gitattributes vendored Normal file
View File

@@ -0,0 +1,35 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text

278
README.md Normal file
View File

@@ -0,0 +1,278 @@
---
language:
- en
license: llama2
tags:
- logic
- reasoning
model-index:
- name: athene-noctua-13b
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: AI2 Reasoning Challenge (25-Shot)
type: ai2_arc
config: ARC-Challenge
split: test
args:
num_few_shot: 25
metrics:
- type: acc_norm
value: 57.17
name: normalized accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ibivibiv/athene-noctua-13b
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: HellaSwag (10-Shot)
type: hellaswag
split: validation
args:
num_few_shot: 10
metrics:
- type: acc_norm
value: 81.52
name: normalized accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ibivibiv/athene-noctua-13b
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU (5-Shot)
type: cais/mmlu
config: all
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 55.91
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ibivibiv/athene-noctua-13b
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: TruthfulQA (0-shot)
type: truthful_qa
config: multiple_choice
split: validation
args:
num_few_shot: 0
metrics:
- type: mc2
value: 47.49
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ibivibiv/athene-noctua-13b
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: Winogrande (5-shot)
type: winogrande
config: winogrande_xl
split: validation
args:
num_few_shot: 5
metrics:
- type: acc
value: 73.4
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ibivibiv/athene-noctua-13b
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GSM8k (5-shot)
type: gsm8k
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 15.31
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ibivibiv/athene-noctua-13b
name: Open LLM Leaderboard
---
# Athene Noctua 13B
![img](./athene_noctua.png)
# Model Details
* **Trained by**: [ibivibiv](https://huggingface.co/ibivibiv)
* **Library**: [HuggingFace Transformers](https://github.com/huggingface/transformers)
* **Model type:** **athene-noctua-13b** is an auto-regressive language model fine tuned on the Llama 2 transformer architecture.
* **Language(s)**: English
* **Purpose**: Has specific training for logic enforcement, will do well in ARC or other logic testing as well as critical thinking tasks. This model is targeted towards planning exercises.
* **Comments**: This little guy does pretty well in my logic puzzle testing for a 13B model. I've been using it for test runs to prime for larger models, but it is worth uploading now as it is doing very well on the tests. Again, this a 13B model so tricky logic does still trip it up but for its size it is doing well.
# Prompting
## Prompt Template for alpaca style
```
### Instruction:
<prompt> (without the <>)
### Response:
```
## Sample Code
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
torch.set_default_device("cuda")
model = AutoModelForCausalLM.from_pretrained("ibivibiv/athene-noctua-13b", torch_dtype="auto", device_config='auto')
tokenizer = AutoTokenizer.from_pretrained("ibivibiv/athene-noctua-13b")
inputs = tokenizer("### Instruction: Create a plan for developing the game of snake in python using pygame.\n### Response:\n", return_tensors="pt", return_attention_mask=False)
outputs = model.generate(**inputs, max_length=200)
text = tokenizer.batch_decode(outputs)[0]
print(text)
```
## Citations
```
@misc{open-llm-leaderboard,
author = {Edward Beeching and Clémentine Fourrier and Nathan Habib and Sheon Han and Nathan Lambert and Nazneen Rajani and Omar Sanseviero and Lewis Tunstall and Thomas Wolf},
title = {Open LLM Leaderboard},
year = {2023},
publisher = {Hugging Face},
howpublished = "\url{https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard}"
}
```
```
@software{eval-harness,
author = {Gao, Leo and
Tow, Jonathan and
Biderman, Stella and
Black, Sid and
DiPofi, Anthony and
Foster, Charles and
Golding, Laurence and
Hsu, Jeffrey and
McDonell, Kyle and
Muennighoff, Niklas and
Phang, Jason and
Reynolds, Laria and
Tang, Eric and
Thite, Anish and
Wang, Ben and
Wang, Kevin and
Zou, Andy},
title = {A framework for few-shot language model evaluation},
month = sep,
year = 2021,
publisher = {Zenodo},
version = {v0.0.1},
doi = {10.5281/zenodo.5371628},
url = {https://doi.org/10.5281/zenodo.5371628}
}
```
```
@misc{clark2018think,
title={Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge},
author={Peter Clark and Isaac Cowhey and Oren Etzioni and Tushar Khot and Ashish Sabharwal and Carissa Schoenick and Oyvind Tafjord},
year={2018},
eprint={1803.05457},
archivePrefix={arXiv},
primaryClass={cs.AI}
}
```
```
@misc{zellers2019hellaswag,
title={HellaSwag: Can a Machine Really Finish Your Sentence?},
author={Rowan Zellers and Ari Holtzman and Yonatan Bisk and Ali Farhadi and Yejin Choi},
year={2019},
eprint={1905.07830},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
```
@misc{hendrycks2021measuring,
title={Measuring Massive Multitask Language Understanding},
author={Dan Hendrycks and Collin Burns and Steven Basart and Andy Zou and Mantas Mazeika and Dawn Song and Jacob Steinhardt},
year={2021},
eprint={2009.03300},
archivePrefix={arXiv},
primaryClass={cs.CY}
}
```
```
@misc{lin2022truthfulqa,
title={TruthfulQA: Measuring How Models Mimic Human Falsehoods},
author={Stephanie Lin and Jacob Hilton and Owain Evans},
year={2022},
eprint={2109.07958},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
```
@misc{DBLP:journals/corr/abs-1907-10641,
title={{WINOGRANDE:} An Adversarial Winograd Schema Challenge at Scale},
author={Keisuke Sakaguchi and Ronan Le Bras and Chandra Bhagavatula and Yejin Choi},
year={2019},
eprint={1907.10641},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
```
@misc{DBLP:journals/corr/abs-2110-14168,
title={Training Verifiers to Solve Math Word Problems},
author={Karl Cobbe and
Vineet Kosaraju and
Mohammad Bavarian and
Mark Chen and
Heewoo Jun and
Lukasz Kaiser and
Matthias Plappert and
Jerry Tworek and
Jacob Hilton and
Reiichiro Nakano and
Christopher Hesse and
John Schulman},
year={2021},
eprint={2110.14168},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_ibivibiv__athene-noctua-13b)
| Metric |Value|
|---------------------------------|----:|
|Avg. |55.13|
|AI2 Reasoning Challenge (25-Shot)|57.17|
|HellaSwag (10-Shot) |81.52|
|MMLU (5-Shot) |55.91|
|TruthfulQA (0-shot) |47.49|
|Winogrande (5-shot) |73.40|
|GSM8k (5-shot) |15.31|

BIN
athene_noctua.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 875 KiB

29
config.json Normal file
View File

@@ -0,0 +1,29 @@
{
"_name_or_path": "./little_owl",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 5120,
"initializer_range": 0.02,
"intermediate_size": 13824,
"max_position_embeddings": 4096,
"model_type": "llama",
"num_attention_heads": 40,
"num_hidden_layers": 40,
"num_key_value_heads": 40,
"pad_token_id": 0,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.36.2",
"use_cache": true,
"vocab_size": 32000
}

7
generation_config.json Normal file
View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 1,
"eos_token_id": 2,
"pad_token_id": 0,
"transformers_version": "4.36.2"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:20d0a6132cc9e08d0218dc14ea41cc895784bc9beaf417e6327dee6f64b3074d
size 4881247856

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9c10a7d95fd3b8d28e48842a37d97ac3b08926923c9c2710d6bdc859be9e829d
size 4970418112

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0cef54a3b593bea095fe1447d899a354f94f1b86c15cc81963db1a825edd7388
size 4970418120

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8e9d9a6d12ed8f850af6b52cf0329bcf45eabdcaf9c1ec5e203fb45557ce2818
size 4970418144

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:df86076629cc0a7c0d698020abfc1499d96c77aa9fabd79046d99cce6d7a9d6b
size 4970418144

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:825507117aa1ee47fd0016098526993d3da27d7f11ee1610aede90aac1c3ebf1
size 4792119040

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:aea3603316a9839fb2c3609d7db978d0af85e742d12d21316951d35a15b1324e
size 4792160232

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:95e08c93091f9087bed2ee3e219e30d59c3f7d90975589f7719ea99fcaa2b33d
size 4792160224

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1a986c001d7c638ae4be48c2381206b4daab3d5b8401b9d488b4fd95c2809da2
size 4970418144

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c5f86825b7ba19a29a77bfc6009dafbbbfda6ebabb3fc82e77a8b7db2aac66d2
size 4970418144

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d8d9b4394dc46add2daed2396b32b646ceb3d239f99fa4dd5d0b69cf3b2a1cde
size 2983303184

View File

@@ -0,0 +1,370 @@
{
"metadata": {
"total_size": 52063457280
},
"weight_map": {
"lm_head.weight": "model-00011-of-00011.safetensors",
"model.embed_tokens.weight": "model-00001-of-00011.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00011.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00011.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00011.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00011.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00011.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00011.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00011.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00011.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00011.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00011.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00011.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00011.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00011.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00011.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00011.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00011.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00011.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00011.safetensors",
"model.layers.10.input_layernorm.weight": "model-00003-of-00011.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00003-of-00011.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00003-of-00011.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00003-of-00011.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00003-of-00011.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00003-of-00011.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00003-of-00011.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00003-of-00011.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00003-of-00011.safetensors",
"model.layers.11.input_layernorm.weight": "model-00004-of-00011.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00004-of-00011.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00004-of-00011.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00004-of-00011.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00004-of-00011.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00003-of-00011.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00004-of-00011.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00003-of-00011.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00004-of-00011.safetensors",
"model.layers.12.input_layernorm.weight": "model-00004-of-00011.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00004-of-00011.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00004-of-00011.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00004-of-00011.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00004-of-00011.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00004-of-00011.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00004-of-00011.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00004-of-00011.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00004-of-00011.safetensors",
"model.layers.13.input_layernorm.weight": "model-00004-of-00011.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00004-of-00011.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00004-of-00011.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00004-of-00011.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00004-of-00011.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00004-of-00011.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00004-of-00011.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00004-of-00011.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00004-of-00011.safetensors",
"model.layers.14.input_layernorm.weight": "model-00004-of-00011.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00004-of-00011.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00004-of-00011.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00004-of-00011.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00004-of-00011.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00004-of-00011.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00004-of-00011.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00004-of-00011.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00004-of-00011.safetensors",
"model.layers.15.input_layernorm.weight": "model-00005-of-00011.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00005-of-00011.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00005-of-00011.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00005-of-00011.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00005-of-00011.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00005-of-00011.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00005-of-00011.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00004-of-00011.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00005-of-00011.safetensors",
"model.layers.16.input_layernorm.weight": "model-00005-of-00011.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00005-of-00011.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00005-of-00011.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00005-of-00011.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00005-of-00011.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00005-of-00011.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00005-of-00011.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00005-of-00011.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00005-of-00011.safetensors",
"model.layers.17.input_layernorm.weight": "model-00005-of-00011.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00005-of-00011.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00005-of-00011.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00005-of-00011.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00005-of-00011.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00005-of-00011.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00005-of-00011.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00005-of-00011.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00005-of-00011.safetensors",
"model.layers.18.input_layernorm.weight": "model-00005-of-00011.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00005-of-00011.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00005-of-00011.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00005-of-00011.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00005-of-00011.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00005-of-00011.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00005-of-00011.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00005-of-00011.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00005-of-00011.safetensors",
"model.layers.19.input_layernorm.weight": "model-00006-of-00011.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00006-of-00011.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00006-of-00011.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00006-of-00011.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00006-of-00011.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00006-of-00011.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00006-of-00011.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00006-of-00011.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00006-of-00011.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00011.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00011.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00011.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00011.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00011.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00011.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00011.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00011.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00011.safetensors",
"model.layers.20.input_layernorm.weight": "model-00006-of-00011.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00006-of-00011.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00006-of-00011.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00006-of-00011.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00006-of-00011.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00006-of-00011.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00006-of-00011.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00006-of-00011.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00006-of-00011.safetensors",
"model.layers.21.input_layernorm.weight": "model-00006-of-00011.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00006-of-00011.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00006-of-00011.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00006-of-00011.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00006-of-00011.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00006-of-00011.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00006-of-00011.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00006-of-00011.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00006-of-00011.safetensors",
"model.layers.22.input_layernorm.weight": "model-00007-of-00011.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00007-of-00011.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00006-of-00011.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00006-of-00011.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00007-of-00011.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00006-of-00011.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00006-of-00011.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00006-of-00011.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00006-of-00011.safetensors",
"model.layers.23.input_layernorm.weight": "model-00007-of-00011.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00007-of-00011.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00007-of-00011.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00007-of-00011.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00007-of-00011.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00007-of-00011.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00007-of-00011.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00007-of-00011.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00007-of-00011.safetensors",
"model.layers.24.input_layernorm.weight": "model-00007-of-00011.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00007-of-00011.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00007-of-00011.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00007-of-00011.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00007-of-00011.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00007-of-00011.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00007-of-00011.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00007-of-00011.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00007-of-00011.safetensors",
"model.layers.25.input_layernorm.weight": "model-00007-of-00011.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00007-of-00011.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00007-of-00011.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00007-of-00011.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00007-of-00011.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00007-of-00011.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00007-of-00011.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00007-of-00011.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00007-of-00011.safetensors",
"model.layers.26.input_layernorm.weight": "model-00008-of-00011.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00008-of-00011.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00007-of-00011.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00008-of-00011.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00008-of-00011.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00007-of-00011.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00007-of-00011.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00007-of-00011.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00007-of-00011.safetensors",
"model.layers.27.input_layernorm.weight": "model-00008-of-00011.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00008-of-00011.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00008-of-00011.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00008-of-00011.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00008-of-00011.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00008-of-00011.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00008-of-00011.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00008-of-00011.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00008-of-00011.safetensors",
"model.layers.28.input_layernorm.weight": "model-00008-of-00011.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00008-of-00011.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00008-of-00011.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00008-of-00011.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00008-of-00011.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00008-of-00011.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00008-of-00011.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00008-of-00011.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00008-of-00011.safetensors",
"model.layers.29.input_layernorm.weight": "model-00008-of-00011.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00008-of-00011.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00008-of-00011.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00008-of-00011.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00008-of-00011.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00008-of-00011.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00008-of-00011.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00008-of-00011.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00008-of-00011.safetensors",
"model.layers.3.input_layernorm.weight": "model-00002-of-00011.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00002-of-00011.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00002-of-00011.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00002-of-00011.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00002-of-00011.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00011.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00011.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00011.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00011.safetensors",
"model.layers.30.input_layernorm.weight": "model-00009-of-00011.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00009-of-00011.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00009-of-00011.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00009-of-00011.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00009-of-00011.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00008-of-00011.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00008-of-00011.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00008-of-00011.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00008-of-00011.safetensors",
"model.layers.31.input_layernorm.weight": "model-00009-of-00011.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00009-of-00011.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00009-of-00011.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00009-of-00011.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00009-of-00011.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00009-of-00011.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00009-of-00011.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00009-of-00011.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00009-of-00011.safetensors",
"model.layers.32.input_layernorm.weight": "model-00009-of-00011.safetensors",
"model.layers.32.mlp.down_proj.weight": "model-00009-of-00011.safetensors",
"model.layers.32.mlp.gate_proj.weight": "model-00009-of-00011.safetensors",
"model.layers.32.mlp.up_proj.weight": "model-00009-of-00011.safetensors",
"model.layers.32.post_attention_layernorm.weight": "model-00009-of-00011.safetensors",
"model.layers.32.self_attn.k_proj.weight": "model-00009-of-00011.safetensors",
"model.layers.32.self_attn.o_proj.weight": "model-00009-of-00011.safetensors",
"model.layers.32.self_attn.q_proj.weight": "model-00009-of-00011.safetensors",
"model.layers.32.self_attn.v_proj.weight": "model-00009-of-00011.safetensors",
"model.layers.33.input_layernorm.weight": "model-00009-of-00011.safetensors",
"model.layers.33.mlp.down_proj.weight": "model-00009-of-00011.safetensors",
"model.layers.33.mlp.gate_proj.weight": "model-00009-of-00011.safetensors",
"model.layers.33.mlp.up_proj.weight": "model-00009-of-00011.safetensors",
"model.layers.33.post_attention_layernorm.weight": "model-00009-of-00011.safetensors",
"model.layers.33.self_attn.k_proj.weight": "model-00009-of-00011.safetensors",
"model.layers.33.self_attn.o_proj.weight": "model-00009-of-00011.safetensors",
"model.layers.33.self_attn.q_proj.weight": "model-00009-of-00011.safetensors",
"model.layers.33.self_attn.v_proj.weight": "model-00009-of-00011.safetensors",
"model.layers.34.input_layernorm.weight": "model-00010-of-00011.safetensors",
"model.layers.34.mlp.down_proj.weight": "model-00010-of-00011.safetensors",
"model.layers.34.mlp.gate_proj.weight": "model-00010-of-00011.safetensors",
"model.layers.34.mlp.up_proj.weight": "model-00010-of-00011.safetensors",
"model.layers.34.post_attention_layernorm.weight": "model-00010-of-00011.safetensors",
"model.layers.34.self_attn.k_proj.weight": "model-00009-of-00011.safetensors",
"model.layers.34.self_attn.o_proj.weight": "model-00010-of-00011.safetensors",
"model.layers.34.self_attn.q_proj.weight": "model-00009-of-00011.safetensors",
"model.layers.34.self_attn.v_proj.weight": "model-00009-of-00011.safetensors",
"model.layers.35.input_layernorm.weight": "model-00010-of-00011.safetensors",
"model.layers.35.mlp.down_proj.weight": "model-00010-of-00011.safetensors",
"model.layers.35.mlp.gate_proj.weight": "model-00010-of-00011.safetensors",
"model.layers.35.mlp.up_proj.weight": "model-00010-of-00011.safetensors",
"model.layers.35.post_attention_layernorm.weight": "model-00010-of-00011.safetensors",
"model.layers.35.self_attn.k_proj.weight": "model-00010-of-00011.safetensors",
"model.layers.35.self_attn.o_proj.weight": "model-00010-of-00011.safetensors",
"model.layers.35.self_attn.q_proj.weight": "model-00010-of-00011.safetensors",
"model.layers.35.self_attn.v_proj.weight": "model-00010-of-00011.safetensors",
"model.layers.36.input_layernorm.weight": "model-00010-of-00011.safetensors",
"model.layers.36.mlp.down_proj.weight": "model-00010-of-00011.safetensors",
"model.layers.36.mlp.gate_proj.weight": "model-00010-of-00011.safetensors",
"model.layers.36.mlp.up_proj.weight": "model-00010-of-00011.safetensors",
"model.layers.36.post_attention_layernorm.weight": "model-00010-of-00011.safetensors",
"model.layers.36.self_attn.k_proj.weight": "model-00010-of-00011.safetensors",
"model.layers.36.self_attn.o_proj.weight": "model-00010-of-00011.safetensors",
"model.layers.36.self_attn.q_proj.weight": "model-00010-of-00011.safetensors",
"model.layers.36.self_attn.v_proj.weight": "model-00010-of-00011.safetensors",
"model.layers.37.input_layernorm.weight": "model-00010-of-00011.safetensors",
"model.layers.37.mlp.down_proj.weight": "model-00010-of-00011.safetensors",
"model.layers.37.mlp.gate_proj.weight": "model-00010-of-00011.safetensors",
"model.layers.37.mlp.up_proj.weight": "model-00010-of-00011.safetensors",
"model.layers.37.post_attention_layernorm.weight": "model-00010-of-00011.safetensors",
"model.layers.37.self_attn.k_proj.weight": "model-00010-of-00011.safetensors",
"model.layers.37.self_attn.o_proj.weight": "model-00010-of-00011.safetensors",
"model.layers.37.self_attn.q_proj.weight": "model-00010-of-00011.safetensors",
"model.layers.37.self_attn.v_proj.weight": "model-00010-of-00011.safetensors",
"model.layers.38.input_layernorm.weight": "model-00011-of-00011.safetensors",
"model.layers.38.mlp.down_proj.weight": "model-00011-of-00011.safetensors",
"model.layers.38.mlp.gate_proj.weight": "model-00011-of-00011.safetensors",
"model.layers.38.mlp.up_proj.weight": "model-00011-of-00011.safetensors",
"model.layers.38.post_attention_layernorm.weight": "model-00011-of-00011.safetensors",
"model.layers.38.self_attn.k_proj.weight": "model-00010-of-00011.safetensors",
"model.layers.38.self_attn.o_proj.weight": "model-00011-of-00011.safetensors",
"model.layers.38.self_attn.q_proj.weight": "model-00010-of-00011.safetensors",
"model.layers.38.self_attn.v_proj.weight": "model-00011-of-00011.safetensors",
"model.layers.39.input_layernorm.weight": "model-00011-of-00011.safetensors",
"model.layers.39.mlp.down_proj.weight": "model-00011-of-00011.safetensors",
"model.layers.39.mlp.gate_proj.weight": "model-00011-of-00011.safetensors",
"model.layers.39.mlp.up_proj.weight": "model-00011-of-00011.safetensors",
"model.layers.39.post_attention_layernorm.weight": "model-00011-of-00011.safetensors",
"model.layers.39.self_attn.k_proj.weight": "model-00011-of-00011.safetensors",
"model.layers.39.self_attn.o_proj.weight": "model-00011-of-00011.safetensors",
"model.layers.39.self_attn.q_proj.weight": "model-00011-of-00011.safetensors",
"model.layers.39.self_attn.v_proj.weight": "model-00011-of-00011.safetensors",
"model.layers.4.input_layernorm.weight": "model-00002-of-00011.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00002-of-00011.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00002-of-00011.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00002-of-00011.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00002-of-00011.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00002-of-00011.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00002-of-00011.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00002-of-00011.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00002-of-00011.safetensors",
"model.layers.5.input_layernorm.weight": "model-00002-of-00011.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00002-of-00011.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00002-of-00011.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00002-of-00011.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00002-of-00011.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00002-of-00011.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00002-of-00011.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00002-of-00011.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00002-of-00011.safetensors",
"model.layers.6.input_layernorm.weight": "model-00002-of-00011.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00002-of-00011.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00002-of-00011.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00002-of-00011.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00002-of-00011.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00002-of-00011.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00002-of-00011.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00002-of-00011.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00002-of-00011.safetensors",
"model.layers.7.input_layernorm.weight": "model-00003-of-00011.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00003-of-00011.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00003-of-00011.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00003-of-00011.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00003-of-00011.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00002-of-00011.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00003-of-00011.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00002-of-00011.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00002-of-00011.safetensors",
"model.layers.8.input_layernorm.weight": "model-00003-of-00011.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00003-of-00011.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00003-of-00011.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00003-of-00011.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00003-of-00011.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00003-of-00011.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00003-of-00011.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00003-of-00011.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00003-of-00011.safetensors",
"model.layers.9.input_layernorm.weight": "model-00003-of-00011.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00003-of-00011.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00003-of-00011.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00003-of-00011.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00003-of-00011.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00003-of-00011.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00003-of-00011.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00003-of-00011.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00003-of-00011.safetensors",
"model.norm.weight": "model-00011-of-00011.safetensors"
}
}

23
special_tokens_map.json Normal file
View File

@@ -0,0 +1,23 @@
{
"bos_token": {
"content": "<s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "</s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"unk_token": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

93391
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

BIN
tokenizer.model (Stored with Git LFS) Normal file

Binary file not shown.

41
tokenizer_config.json Normal file
View File

@@ -0,0 +1,41 @@
{
"add_bos_token": true,
"add_eos_token": false,
"added_tokens_decoder": {
"0": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"1": {
"content": "<s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"2": {
"content": "</s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
}
},
"bos_token": "<s>",
"clean_up_tokenization_spaces": false,
"eos_token": "</s>",
"legacy": false,
"model_max_length": 1000000000000000019884624838656,
"pad_token": null,
"padding_side": "right",
"sp_model_kwargs": {},
"tokenizer_class": "LlamaTokenizer",
"unk_token": "<unk>",
"use_default_system_prompt": true
}