初始化项目,由ModelHub XC社区提供模型

Model: sambanovasystems/SambaLingo-Bulgarian-Base
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-06-10 22:47:39 +08:00
commit 85c4c94558
13 changed files with 158843 additions and 0 deletions

36
.gitattributes vendored Normal file
View File

@@ -0,0 +1,36 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
SambaLingo_Logo.png filter=lfs diff=lfs merge=lfs -text

110
README.md Normal file
View File

@@ -0,0 +1,110 @@
---
license: llama2
datasets:
- uonlp/CulturaX
language:
- bg
- en
metrics:
- chrf
- accuracy
- bleu
---
# SambaLingo-Bulgarian-Base
<img src="SambaLingo_Logo.png" width="340" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
<!-- Provide a quick summary of what the model is/does. -->
SambaLingo-Bulgarian-Base is a pretrained Bi-lingual Bulgarian and English model that adapts [Llama-2-7b](https://huggingface.co/meta-llama/Llama-2-7b-hf) to Bulgarian by training on 38 billion tokens from the Bulgarian split of the [Cultura-X](https://huggingface.co/datasets/uonlp/CulturaX) dataset. This model reports state of the art evaluation results in perplexity and FLORES-200 translation. For the chat version of this model, please see [sambanovasystems/SambaLingo-Bulgarian-Chat](https://huggingface.co/sambanovasystems/SambaLingo-Bulgarian-Chat), or try it out at [SambaLingo-chat-space](https://huggingface.co/spaces/sambanovasystems/SambaLingo-chat-space).
## Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** [SambaNova Systems](https://sambanova.ai/)
- **Model type:** Language Model
- **Language(s):** Bulgarian, English
- **Finetuned from model:** [Llama 2](https://huggingface.co/meta-llama/Llama-2-7b-hf)
- **Try the chat version of this model**: [SambaLingo-chat-space](https://huggingface.co/spaces/sambanovasystems/SambaLingo-chat-space).
- **Paper:** [SambaLingo: Teaching Large Language Models New Languages](https://arxiv.org/abs/2404.05829)
- **Blog Post**: [sambalingo-open-source-language-experts](https://sambanova.ai/blog/sambalingo-open-source-language-experts)
## Getting Started
### Loading Model With Hugging Face
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("sambanovasystems/SambaLingo-Bulgarian-Base")
model = AutoModelForCausalLM.from_pretrained("sambanovasystems/SambaLingo-Bulgarian-Base", device_map="auto", torch_dtype="auto")
```
### Suggested Inference Parameters
We suggest setting do_sample=False as this is a pretrained checkpoint.
### Prompting Guidelines
This model is a pretrained checkpoint, so to use it effectively please use few shot prompting with exemplars. The only other prompt templating required is the standard \<s\> (BOS) token from the Llama tokenizer. If you want to interact with this model with direct questions or queries, please use the chat version of the model that has been aligned with human preferences [sambanovasystems/SambaLingo-Bulgarian-Chat](https://huggingface.co/sambanovasystems/SambaLingo-Bulgarian-Chat).
## Training Details
All pre-training is done on the [Cultura-X](https://huggingface.co/datasets/uonlp/CulturaX) dataset. We mix the data to be 75% data from the language we are adapting to, and 25% English as suggested by [Csaki et al.](https://arxiv.org/abs/2311.05741) We pack the data into sequences of length 4096, and ensure that when learning a token we only attend to previous tokens in the context of the corresponding text document. We train with a global batch size of 1024, sequence length of 4096, maximum learning rate of 1e-4 with cosine decay, warmup ratio of 0.01 and a weight decay of 0.1.
## Tokenizer Details
We extended the vocabulary of the base llama model from 32,000 tokens to 57,000 tokens by adding up to 25,000 non-overlapping tokens from the new language.
## Evaluation
For evaluation results see our paper: [SambaLingo: Teaching Large Language Models New Languages](https://arxiv.org/abs/2404.05829)
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
### Direct Use
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
Use of this model is governed by the Metas [Llama 2 Community License Agreement](https://ai.meta.com/llama/license/). Please review and accept the license before downloading the model weights.
### Out-of-Scope Use
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
SambaLingo should NOT be used for:
- Mission-critical applications
- Applications that involve the safety of others
- Making highly important decisions
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
Like all LLMs, SambaLingo has certain limitations:
- Hallucination: Model may sometimes generate responses that contain plausible-sounding but factually incorrect or irrelevant information.
- Code Switching: The model might unintentionally switch between languages or dialects within a single response, affecting the coherence and understandability of the output.
- Repetition: The Model may produce repetitive phrases or sentences, leading to less engaging and informative responses.
- Coding and Math: The model's performance in generating accurate code or solving complex mathematical problems may be limited.
- Toxicity: The model could inadvertently generate responses containing inappropriate or harmful content.
## Acknowledgments
We extend our heartfelt gratitude to the open-source AI community; this endeavor would not have been possible without open source. SambaNova embraces the open-source community and aspires to actively contribute to this initiative.
We would like to give a special thanks to the following groups:
- Meta for open sourcing LLama 2 and open sourcing FLORES-200 dataset
- Nguyen et al for open sourcing CulturaX dataset
- CohereAI for releasing AYA-101 and open sourcing a multilingual instruction tuning dataset
- EleutherAI for their open source evaluation framework
- Hugging Face-H4 team for open source the zephyr training recipe and alignment handbook repo
## Cite SambaLingo
```
@misc{csaki2024sambalingo,
title={SambaLingo: Teaching Large Language Models New Languages},
author={Zoltan Csaki and Bo Li and Jonathan Li and Qiantong Xu and Pian Pawakapan and Leon Zhang and Yun Du and Hengyu Zhao and Changran Hu and Urmish Thakker},
year={2024},
eprint={2404.05829},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```

3
SambaLingo_Logo.png Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:12134a10d8250af8f27c6e541744cf2c2b563a286abb462def466bf4c1691a7f
size 1456206

30
config.json Normal file
View File

@@ -0,0 +1,30 @@
{
"_name_or_path": "/import/ml-sc-nlpcheckpoints-scratch3/zoltanc/international_llamas/bg_take_3/out/step_9000",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 11008,
"max_position_embeddings": 4096,
"model_name": "",
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 32,
"pad_token_id": 0,
"pretraining_tp": 1,
"return_dict": false,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.29.0",
"use_cache": true,
"vocab_size": 57344
}

7
generation_config.json Normal file
View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 1,
"eos_token_id": 2,
"pad_token_id": 0,
"transformers_version": "4.29.0"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f3361303046dd125bddd106c6c25f4fff7ebd1410d0e33b3b1f4a9527c8d723d
size 9978651705

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b8a94808fd538bf2a6d9481d7476967cddd78a866618ec3ddb53ee9440465b16
size 9848665103

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:aa7deb6530a98d5761d83e8f7617b2a3cd8aee48981ddb36994f025ea21adabc
size 7956938160

View File

@@ -0,0 +1,330 @@
{
"metadata": {
"total_size": 27784142848
},
"weight_map": {
"lm_head.weight": "pytorch_model-00003-of-00003.bin",
"model.embed_tokens.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.0.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
"model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.1.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
"model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.10.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.10.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.10.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.10.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.10.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.10.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.10.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.10.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.10.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
"model.layers.10.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.11.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.11.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.11.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.11.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.11.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.11.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.11.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.11.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.11.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
"model.layers.11.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.12.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.12.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.12.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.12.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.12.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.12.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.12.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.12.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.12.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
"model.layers.12.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.13.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.13.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.13.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.13.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.13.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.13.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.13.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.13.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.13.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
"model.layers.13.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.14.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.14.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.14.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.14.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.14.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.14.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.14.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.14.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.14.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
"model.layers.14.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.15.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.15.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.15.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.15.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.15.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.15.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.15.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.15.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.15.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
"model.layers.15.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.16.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.16.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.16.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.16.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.16.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.16.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.16.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.16.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.16.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
"model.layers.16.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.17.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.17.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.17.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.17.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.17.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.17.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.17.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.17.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.17.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
"model.layers.17.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.18.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.18.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.18.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.18.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.18.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.18.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.18.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.18.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.18.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
"model.layers.18.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.19.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.19.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.19.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.19.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.19.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.19.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.19.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.19.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.19.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
"model.layers.19.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.2.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.2.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.2.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.2.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.2.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
"model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.20.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.20.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.20.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.20.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.20.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.20.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.20.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.20.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.20.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
"model.layers.20.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.21.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.21.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.21.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.21.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.21.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.21.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.21.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.21.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.21.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
"model.layers.21.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.22.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.22.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.22.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.22.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.22.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.22.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.22.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.22.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.22.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
"model.layers.22.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.23.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.23.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.23.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.23.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.23.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.23.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.23.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.23.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.23.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00003.bin",
"model.layers.23.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.24.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.24.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.24.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.24.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.24.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.24.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.24.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.24.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.24.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
"model.layers.24.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.25.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.25.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.25.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.25.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.25.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.25.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.25.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.25.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.25.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
"model.layers.25.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.26.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.26.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.26.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.26.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.26.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.26.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.26.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.26.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.26.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
"model.layers.26.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.27.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.27.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.27.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.27.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.27.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.27.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.27.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.27.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.27.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
"model.layers.27.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.28.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.28.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.28.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.28.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.28.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.28.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.28.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.28.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.28.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
"model.layers.28.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.29.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.29.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.29.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.29.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.29.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.29.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.29.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.29.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.29.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
"model.layers.29.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.3.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.3.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.3.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.3.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.3.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.3.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.3.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.3.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.3.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
"model.layers.3.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.30.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.30.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.30.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.30.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.30.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.30.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.30.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.30.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.30.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
"model.layers.30.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.31.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.31.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.31.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.31.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.31.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.31.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.31.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.31.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.31.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00003.bin",
"model.layers.31.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.4.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.4.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.4.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.4.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.4.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.4.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.4.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.4.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.4.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
"model.layers.4.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.5.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.5.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.5.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.5.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.5.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.5.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.5.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.5.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.5.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
"model.layers.5.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.6.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.6.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.6.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.6.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.6.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.6.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.6.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.6.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.6.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
"model.layers.6.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.7.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.7.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.7.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.7.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.7.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.7.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.7.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.7.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.7.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
"model.layers.7.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.8.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.8.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.8.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.8.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.8.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.8.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.8.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.8.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.8.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
"model.layers.8.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.9.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.9.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.9.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.9.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.9.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.9.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.9.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.9.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.9.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00003.bin",
"model.layers.9.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.norm.weight": "pytorch_model-00003-of-00003.bin"
}
}

5
special_tokens_map.json Normal file
View File

@@ -0,0 +1,5 @@
{
"bos_token": "<s>",
"eos_token": "</s>",
"unk_token": "<unk>"
}

158268
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

3
tokenizer.model Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:cb8c053915820a76258c993fdf089f367c4e6ff46286fd0e26b1ae896bbcab73
size 1008953

42
tokenizer_config.json Normal file
View File

@@ -0,0 +1,42 @@
{
"add_bos_token": true,
"add_eos_token": false,
"added_tokens_decoder": {
"0": {
"content": "<unk>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": true
},
"1": {
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": true
},
"2": {
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": true
}
},
"additional_special_tokens": [],
"bos_token": "<s>",
"clean_up_tokenization_spaces": false,
"eos_token": "</s>",
"legacy": true,
"model_max_length": 1000000000000000019884624838656,
"pad_token": null,
"sp_model_kwargs": {},
"spaces_between_special_tokens": false,
"tokenizer_class": "LlamaTokenizer",
"unk_token": "<unk>",
"use_default_system_prompt": true
}