初始化项目，由ModelHub XC社区提供模型

Model: websystemspl/Bielik-11B-v3.0-Instruct-128k Source: Original Platform
2026-04-11 11:54:02 +08:00
commit d0a5441c3d
16 changed files with 271453 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,35 @@
+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,210 @@
+---
+license: apache-2.0
+base_model:
+- speakleash/Bielik-11B-v3-Base-20250730
+language:
+- multilingual
+- pl
+- en
+- sq
+- bel
+- bs
+- bg
+- hr
+- cs
+- da
+- et
+- fi
+- fr
+- el
+- es
+- is
+- lt
+- nl
+- de
+- no
+- pt
+- ru
+- ro
+- sr
+- hbs
+- sv
+- sk
+- sl
+- tr
+- uk
+- hu
+- it
+- lv
+library_name: transformers
+inference:
+  parameters:
+    temperature: 0.2
+widget:
+- messages:
+  - role: user
+    content: Co przedstawia polskie godło?
+extra_gated_fields:
+  I agree to be contacted for feedback about Bielik models: checkbox
+---
+
+<p align="center">
+  <img src="https://huggingface.co/speakleash/Bielik-11B-v2/raw/main/speakleash_cyfronet.png">
+</p>
+
+# Bielik-11B-v3.0-Instruct
+
+Bielik-11B-v3.0-Instruct is a generative text model featuring 11 billion parameters. 
+It is an instruct fine-tuned version of the [Bielik-11B-v3-Base-20250730](https://huggingface.co/speakleash/Bielik-11B-v3-Base-20250730). 
+Forementioned model stands as a testament to the unique collaboration between the open-science/open-source project SpeakLeash and the High Performance Computing (HPC) center: ACK Cyfronet AGH. 
+Developed and trained on multilingual text corpora across 32 European languages, with emphasis on Polish, which has been cherry-picked and processed by the SpeakLeash team, this endeavor leverages Polish large-scale computing infrastructure, 
+specifically within the PLGrid environment, and more precisely, the HPC centers: ACK Cyfronet AGH. 
+The creation and training of the Bielik-11B-v3.0-Instruct was propelled by the support of computational grant number PLG/2024/016951, conducted on the Athena and Helios supercomputer, 
+enabling the use of cutting-edge technology and computational resources essential for large-scale machine learning processes. 
+As a result, the model exhibits an exceptional ability to understand and process the Polish language and other European languages, providing accurate responses and performing a variety of linguistic tasks with high precision.
+
+📚 Technical report: [Bielik_11B_v3.pdf](https://github.com/speakleash/bielik-papers/blob/main/v3/Bielik_11B_v3.pdf)
+
+🗣️ Chat: https://chat.bielik.ai/
+
+## Model
+
+The model is a successor to the [Bielik v2](https://arxiv.org/abs/2505.02410) series, and its development also leveraged the knowledge and experience gained while working on the [Bielik v3 Small](https://arxiv.org/abs/2505.02550) models.
+
+The [SpeakLeash](https://speakleash.org/) team is working on their own set of instructions in Polish, which is continuously being expanded and refined by annotators. A portion of these instructions, which had been manually verified and corrected, has been utilized for training purposes. Moreover, due to the limited availability of high-quality instructions in Polish, synthetic instructions were generated and used in training. The dataset used for training comprised over 20 million instructions, consisting of more than 17 billion tokens.
+
+To align the model with user preferences we employed the [DPO-Positive](https://arxiv.org/abs/2402.13228) method, utilizing both generated and manually corrected examples, which were scored by a metamodel. A dataset comprising over 114,000 examples of varying lengths to address different aspects of response style. It was filtered and evaluated by the reward model to select instructions with the right level of difference between chosen and rejected. The novelty introduced in DPO-P was multi-turn conversations introduction.
+
+In the final stage of the alignment pipeline, Reinforcement Learning (RL) was used to further enhance the model's analytical capabilities. Training employed [Group Relative Policy Optimization (GRPO)](https://arxiv.org/abs/2402.03300) and its variant, [Dr. GRPO](https://arxiv.org/abs/2503.20783), which was chosen to improve token efficiency by reducing the tendency of models to artificially increase response length to maximize rewards. The RL training was conducted using the [Volcano Engine Reinforcement Learning (VERL)](https://arxiv.org/abs/2409.19256) framework, providing a scalable and modular training environment. The training corpus comprised 143k curated problems spanning logic, STEM, mathematics, and tool-use domains, with all samples selected based on the availability of Reinforcement Learning from Verifiable Rewards (RLVR), ensuring that each problem had a definitive, verifiable solution.
+
+Bielik instruct models have been trained with the use of an original open source framework called [ALLaMo](https://github.com/chrisociepa/allamo) implemented by [Krzysztof Ociepa](https://www.linkedin.com/in/krzysztof-ociepa-44886550/). This framework allows users to train language models with architecture similar to LLaMA and Mistral in fast and efficient way.
+
+### Model description:
+
+* **Developed by:** [SpeakLeash](https://speakleash.org/) & [ACK Cyfronet AGH](https://www.cyfronet.pl/)
+* **Language:** Multilingual (32 European languages, optimized for Polish)
+* **Model type:** causal decoder-only
+* **Finetuned from:** [Bielik-11B-v3-Base-20250730](https://huggingface.co/speakleash/Bielik-11B-v3-Base-20250730)
+* **License:** Apache 2.0
+
+
+### Chat template
+
+Bielik-11B-v3.0-Instruct uses [ChatML](https://github.com/cognitivecomputations/OpenChatML) as the prompt format.
+
+E.g.
+```
+prompt = "<s><|im_start|> user\nJakie mamy pory roku?<|im_end|> \n<|im_start|> assistant\n"
+completion = "W Polsce mamy 4 pory roku: wiosna, lato, jesień i zima.<|im_end|> \n"
+```
+
+This format is available as a [chat template](https://huggingface.co/docs/transformers/main/chat_templating) via the `apply_chat_template()` method:
+
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+device = "cuda" # the device to load the model onto
+
+model_name = "speakleash/Bielik-11B-v3.0-Instruct"
+
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
+
+messages = [
+    {"role": "system", "content": "Odpowiadaj krótko, precyzyjnie i wyłącznie w języku polskim."},
+    {"role": "user", "content": "Jakie mamy pory roku w Polsce?"},
+    {"role": "assistant", "content": "W Polsce mamy 4 pory roku: wiosna, lato, jesień i zima."},
+    {"role": "user", "content": "Która jest najcieplejsza?"}
+]
+
+input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt")
+
+model_inputs = input_ids.to(device)
+model.to(device)
+
+generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
+decoded = tokenizer.batch_decode(generated_ids)
+print(decoded[0])
+```
+
+Fully formated input conversation by apply_chat_template from previous example:
+
+```
+<s><|im_start|> system
+Odpowiadaj krótko, precyzyjnie i wyłącznie w języku polskim.<|im_end|> 
+<|im_start|> user
+Jakie mamy pory roku w Polsce?<|im_end|> 
+<|im_start|> assistant
+W Polsce mamy 4 pory roku: wiosna, lato, jesień i zima.<|im_end|> 
+<|im_start|> user
+Która jest najcieplejsza?<|im_end|>
+```
+
+## Limitations and Biases
+
+Bielik-11B-v3.0-Instruct is a quick demonstration that the base model can be easily fine-tuned to achieve compelling and promising performance. It does not have any moderation mechanisms. We're looking forward to engaging with the community in ways to make the model respect guardrails, allowing for deployment in environments requiring moderated outputs.
+
+Bielik-11B-v3.0-Instruct can produce factually incorrect output, and should not be relied on to produce factually accurate data. Bielik-11B-v3.0-Instruct was trained on various public datasets. While great efforts have been taken to clear the training data, it is possible that this model can generate lewd, false, biased or otherwise offensive outputs.
+
+## Responsible for training the model
+
+* [Krzysztof Ociepa](https://www.linkedin.com/in/krzysztof-ociepa-44886550/)<sup>SpeakLeash</sup> - team leadership, conceptualizing, data preparation, process optimization and oversight of training
+* [Łukasz Flis](https://www.linkedin.com/in/lukasz-flis-0a39631/)<sup>Cyfronet AGH</sup> - coordinating and supervising the training
+* [Remigiusz Kinas](https://www.linkedin.com/in/remigiusz-kinas/)<sup>SpeakLeash</sup> - conceptualizing, coordinating RL trainings, data preparation, benchmarking and quantizations
+* [Adrian Gwoździej](https://www.linkedin.com/in/adrgwo/)<sup>SpeakLeash</sup> - data preparation and ensuring data quality
+* [Krzysztof Wróbel](https://www.linkedin.com/in/wrobelkrzysztof/)<sup>SpeakLeash</sup> - benchmarks
+
+
+The model could not have been created without the commitment and work of the entire SpeakLeash team, whose contribution is invaluable. Thanks to the hard work of many individuals, it was possible to gather a large amount of content in Polish and establish collaboration between the open-science SpeakLeash project and the HPC center: ACK Cyfronet AGH. Individuals who contributed to the creation of the model:
+[Sebastian Kondracki](https://www.linkedin.com/in/sebastian-kondracki/),
+[Marek Magryś](https://www.linkedin.com/in/magrys/),
+[Igor Ciuciura](https://www.linkedin.com/in/igor-ciuciura-1763b52a6/),
+[Szymon Baczyński](https://www.linkedin.com/in/szymon-baczynski/),
+[Dominika Basaj](https://www.linkedin.com/in/dominika-basaj/),
+[Kuba Sołtys](https://www.linkedin.com/in/qooba/),
+[Karol Jezierski](https://www.linkedin.com/in/karol-jezierski/),
+[Jan Sowa](https://www.linkedin.com/in/janpiotrsowa/),
+[Anna Przybył](https://www.linkedin.com/in/annaprzybyl/),
+[Agnieszka Ratajska](https://www.linkedin.com/in/agnieszka-ratajska/),
+[Witold Wydmański](https://www.linkedin.com/in/witold-wydmanski/),
+[Katarzyna Starosławska](https://www.linkedin.com/in/kstaroslawska/),
+[Izabela Babis](https://www.linkedin.com/in/izabela-babis-2274b8105/),
+[Nina Babis](https://www.linkedin.com/in/nina-babis-00055a140/).
+
+We gratefully acknowledge Polish high-performance computing infrastructure PLGrid (HPC Center: ACK Cyfronet AGH) for providing computer facilities and support within computational grant no. PLG/2024/016951.
+
+## Legal Aspects
+
+EU AI Act Transparency Documentation: [Bielik 11B v3 EU Public Summary.pdf](https://bit.ly/4qDcE81)
+
+## Data Protection and Copyright Requests
+
+For removal requests of personally identifiable information (PII) or of copyrighted content, please contact the respective dataset owners or us directly: [biuro@speakleash.org.pl](mailto:biuro@speakleash.org.pl).
+
+## Citation
+Please cite this model using the following format:
+
+```
+@misc{ociepa2025bielik11bv3multilingual,
+      title={Bielik 11B v3: Multilingual Large Language Model for European Languages}, 
+      author={Krzysztof Ociepa and Łukasz Flis and Remigiusz Kinas and Krzysztof Wróbel and Adrian Gwoździej},
+      year={2025},
+      eprint={2601.11579},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2601.11579}, 
+}
+@misc{Bielik11Bv3i,
+    title     = {Bielik-11B-v3.0-Instruct model card},
+    author    = {Ociepa, Krzysztof and Flis, Łukasz and Kinas, Remigiusz and Gwoździej, Adrian and Wróbel, Krzysztof and {SpeakLeash Team} and {Cyfronet Team}},
+    year      = {2025},
+    url       = {https://huggingface.co/speakleash/Bielik-11B-v3.0-Instruct},
+    note      = {Accessed: 2025-12-31}, % change this date
+    urldate   = {2025-12-31} % change this date
+}
+```
+
+## Contact Us
+
+If you have any questions or suggestions, please use the discussion tab. If you want to contact us directly, join our [Discord SpeakLeash](https://discord.gg/pv4brQMDTy).
--- a/added_tokens.json
+++ b/added_tokens.json
@@ -0,0 +1,130 @@
+{
+  "<|im_start|>": 32000,
+  "<|im_end|>": 32001,
+  "<|function_list|>": 32002,
+  "<|function_output|>": 32003,
+  "<|function_call|>": 32004,
+  "<tool_call>": 32005,
+  "</tool_call>": 32006,
+  "<think>": 32007,
+  "</think>": 32008,
+  "<|control_100|>": 32099,
+  "<|control_101|>": 32100,
+  "<|control_102|>": 32101,
+  "<|control_103|>": 32102,
+  "<|control_104|>": 32103,
+  "<|control_105|>": 32104,
+  "<|control_106|>": 32105,
+  "<|control_107|>": 32106,
+  "<|control_108|>": 32107,
+  "<|control_109|>": 32108,
+  "<|control_10|>": 32009,
+  "<|control_110|>": 32109,
+  "<|control_111|>": 32110,
+  "<|control_112|>": 32111,
+  "<|control_113|>": 32112,
+  "<|control_114|>": 32113,
+  "<|control_115|>": 32114,
+  "<|control_116|>": 32115,
+  "<|control_117|>": 32116,
+  "<|control_118|>": 32117,
+  "<|control_119|>": 32118,
+  "<|control_11|>": 32010,
+  "<|control_120|>": 32119,
+  "<|control_121|>": 32120,
+  "<|control_122|>": 32121,
+  "<|control_123|>": 32122,
+  "<|control_124|>": 32123,
+  "<|control_125|>": 32124,
+  "<|control_126|>": 32125,
+  "<|control_127|>": 32126,
+  "<|control_128|>": 32127,
+  "<|control_12|>": 32011,
+  "<|control_13|>": 32012,
+  "<|control_14|>": 32013,
+  "<|control_15|>": 32014,
+  "<|control_16|>": 32015,
+  "<|control_17|>": 32016,
+  "<|control_18|>": 32017,
+  "<|control_19|>": 32018,
+  "<|control_20|>": 32019,
+  "<|control_21|>": 32020,
+  "<|control_22|>": 32021,
+  "<|control_23|>": 32022,
+  "<|control_24|>": 32023,
+  "<|control_25|>": 32024,
+  "<|control_26|>": 32025,
+  "<|control_27|>": 32026,
+  "<|control_28|>": 32027,
+  "<|control_29|>": 32028,
+  "<|control_30|>": 32029,
+  "<|control_31|>": 32030,
+  "<|control_32|>": 32031,
+  "<|control_33|>": 32032,
+  "<|control_34|>": 32033,
+  "<|control_35|>": 32034,
+  "<|control_36|>": 32035,
+  "<|control_37|>": 32036,
+  "<|control_38|>": 32037,
+  "<|control_39|>": 32038,
+  "<|control_40|>": 32039,
+  "<|control_41|>": 32040,
+  "<|control_42|>": 32041,
+  "<|control_43|>": 32042,
+  "<|control_44|>": 32043,
+  "<|control_45|>": 32044,
+  "<|control_46|>": 32045,
+  "<|control_47|>": 32046,
+  "<|control_48|>": 32047,
+  "<|control_49|>": 32048,
+  "<|control_50|>": 32049,
+  "<|control_51|>": 32050,
+  "<|control_52|>": 32051,
+  "<|control_53|>": 32052,
+  "<|control_54|>": 32053,
+  "<|control_55|>": 32054,
+  "<|control_56|>": 32055,
+  "<|control_57|>": 32056,
+  "<|control_58|>": 32057,
+  "<|control_59|>": 32058,
+  "<|control_60|>": 32059,
+  "<|control_61|>": 32060,
+  "<|control_62|>": 32061,
+  "<|control_63|>": 32062,
+  "<|control_64|>": 32063,
+  "<|control_65|>": 32064,
+  "<|control_66|>": 32065,
+  "<|control_67|>": 32066,
+  "<|control_68|>": 32067,
+  "<|control_69|>": 32068,
+  "<|control_70|>": 32069,
+  "<|control_71|>": 32070,
+  "<|control_72|>": 32071,
+  "<|control_73|>": 32072,
+  "<|control_74|>": 32073,
+  "<|control_75|>": 32074,
+  "<|control_76|>": 32075,
+  "<|control_77|>": 32076,
+  "<|control_78|>": 32077,
+  "<|control_79|>": 32078,
+  "<|control_80|>": 32079,
+  "<|control_81|>": 32080,
+  "<|control_82|>": 32081,
+  "<|control_83|>": 32082,
+  "<|control_84|>": 32083,
+  "<|control_85|>": 32084,
+  "<|control_86|>": 32085,
+  "<|control_87|>": 32086,
+  "<|control_88|>": 32087,
+  "<|control_89|>": 32088,
+  "<|control_90|>": 32089,
+  "<|control_91|>": 32090,
+  "<|control_92|>": 32091,
+  "<|control_93|>": 32092,
+  "<|control_94|>": 32093,
+  "<|control_95|>": 32094,
+  "<|control_96|>": 32095,
+  "<|control_97|>": 32096,
+  "<|control_98|>": 32097,
+  "<|control_99|>": 32098
+}
--- a/config.json
+++ b/config.json
@@ -0,0 +1,37 @@
+{
+  "architectures": [
+    "LlamaForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 1,
+  "eos_token_id": [
+    32001,
+    2
+  ],
+  "pad_token_id": 2,
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 4096,
+  "initializer_range": 0.02,
+  "intermediate_size": 14336,
+  "max_position_embeddings": 32768,
+  "mlp_bias": false,
+  "model_type": "llama",
+  "num_attention_heads": 32,
+  "num_hidden_layers": 50,
+  "num_key_value_heads": 8,
+  "pretraining_tp": 1,
+  "rms_norm_eps": 1e-05,
+  "rope_scaling": {
+    "rope_type": "yarn",
+    "factor": 4.0,
+    "original_max_position_embeddings": 32768
+  },
+  "rope_theta": 1000000,
+  "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.51.3",
+  "use_cache": true,
+  "vocab_size": 32128
+}
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,7 @@
+{
+  "_from_model_config": true,
+  "bos_token_id": 1,
+  "eos_token_id": [ 32001, 2 ],
+  "pad_token_id": 2,
+  "transformers_version": "4.51.3"
+}
--- a/model-00001-of-00005.safetensors
+++ b/model-00001-of-00005.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:49a05548e43d8850f4c3178a5e557000cebfbcdac94da66f734aaa0c35eca031
+size 4986129216
--- a/model-00002-of-00005.safetensors
+++ b/model-00002-of-00005.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:60b25efb361c16c2a6ee3d5c1de6e249256e70968b95db43c706aa52ef76eb5d
+size 4890723960
--- a/model-00003-of-00005.safetensors
+++ b/model-00003-of-00005.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:dce0c22cfe54e0d7fb1a6ff241fa2878c28c644f81d7c2ebc4e3581ee65bf361
+size 4927466720
--- a/model-00004-of-00005.safetensors
+++ b/model-00004-of-00005.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:9a122b2dd0a3da81ec26c968444e6d20a79d2e03a0e41b5ab693bf5f587daef8
+size 4924339176
--- a/model-00005-of-00005.safetensors
+++ b/model-00005-of-00005.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:e5b04e4be4099c3938677be46c32e7abf75006646422c4fed6869ad17ea1325e
+size 2608986960
--- a/model.safetensors.index.json
+++ b/model.safetensors.index.json
@@ -0,0 +1,460 @@
+{
+  "metadata": {
+    "total_size": 22337593344
+  },
+  "weight_map": {
+    "lm_head.weight": "model-00003-of-00005.safetensors",
+    "model.embed_tokens.weight": "model-00001-of-00005.safetensors",
+    "model.layers.0.input_layernorm.weight": "model-00005-of-00005.safetensors",
+    "model.layers.0.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.0.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.0.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
+    "model.layers.0.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.0.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.0.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.1.input_layernorm.weight": "model-00004-of-00005.safetensors",
+    "model.layers.1.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.1.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.1.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
+    "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.1.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.1.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.1.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.10.input_layernorm.weight": "model-00004-of-00005.safetensors",
+    "model.layers.10.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.10.mlp.gate_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.10.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.10.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
+    "model.layers.10.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.10.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.10.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.10.self_attn.v_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.11.input_layernorm.weight": "model-00002-of-00005.safetensors",
+    "model.layers.11.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.11.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.11.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.11.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
+    "model.layers.11.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.11.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.11.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.11.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.12.input_layernorm.weight": "model-00002-of-00005.safetensors",
+    "model.layers.12.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.12.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.12.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.12.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
+    "model.layers.12.self_attn.k_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.12.self_attn.o_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.12.self_attn.q_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.12.self_attn.v_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.13.input_layernorm.weight": "model-00001-of-00005.safetensors",
+    "model.layers.13.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.13.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.13.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.13.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
+    "model.layers.13.self_attn.k_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.13.self_attn.o_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.13.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.13.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.14.input_layernorm.weight": "model-00003-of-00005.safetensors",
+    "model.layers.14.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.14.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.14.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.14.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
+    "model.layers.14.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.14.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.14.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.14.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.15.input_layernorm.weight": "model-00003-of-00005.safetensors",
+    "model.layers.15.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.15.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.15.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.15.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
+    "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.15.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.15.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.15.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.16.input_layernorm.weight": "model-00003-of-00005.safetensors",
+    "model.layers.16.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.16.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.16.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.16.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
+    "model.layers.16.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.16.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.16.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.16.self_attn.v_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.17.input_layernorm.weight": "model-00004-of-00005.safetensors",
+    "model.layers.17.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.17.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.17.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.17.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
+    "model.layers.17.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.17.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.17.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.17.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.18.input_layernorm.weight": "model-00002-of-00005.safetensors",
+    "model.layers.18.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.18.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.18.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.18.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
+    "model.layers.18.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.18.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.18.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.18.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.19.input_layernorm.weight": "model-00002-of-00005.safetensors",
+    "model.layers.19.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.19.mlp.gate_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.19.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.19.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
+    "model.layers.19.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.19.self_attn.o_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.19.self_attn.q_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.19.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.2.input_layernorm.weight": "model-00003-of-00005.safetensors",
+    "model.layers.2.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.2.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.2.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
+    "model.layers.2.self_attn.k_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.2.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.2.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.20.input_layernorm.weight": "model-00004-of-00005.safetensors",
+    "model.layers.20.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.20.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.20.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.20.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
+    "model.layers.20.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.20.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.20.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.20.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.21.input_layernorm.weight": "model-00001-of-00005.safetensors",
+    "model.layers.21.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.21.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.21.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.21.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
+    "model.layers.21.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.21.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.21.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.21.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.22.input_layernorm.weight": "model-00003-of-00005.safetensors",
+    "model.layers.22.mlp.down_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.22.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.22.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.22.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
+    "model.layers.22.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.22.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.22.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.22.self_attn.v_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.23.input_layernorm.weight": "model-00003-of-00005.safetensors",
+    "model.layers.23.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.23.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.23.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.23.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
+    "model.layers.23.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.23.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.23.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.23.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.24.input_layernorm.weight": "model-00001-of-00005.safetensors",
+    "model.layers.24.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.24.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.24.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.24.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
+    "model.layers.24.self_attn.k_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.24.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.24.self_attn.q_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.24.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.25.input_layernorm.weight": "model-00001-of-00005.safetensors",
+    "model.layers.25.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.25.mlp.gate_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.25.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.25.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
+    "model.layers.25.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.25.self_attn.o_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.25.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.25.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.26.input_layernorm.weight": "model-00003-of-00005.safetensors",
+    "model.layers.26.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.26.mlp.gate_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.26.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.26.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
+    "model.layers.26.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.26.self_attn.o_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.26.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.26.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.27.input_layernorm.weight": "model-00004-of-00005.safetensors",
+    "model.layers.27.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.27.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.27.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.27.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
+    "model.layers.27.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.27.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.27.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.27.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.28.input_layernorm.weight": "model-00003-of-00005.safetensors",
+    "model.layers.28.mlp.down_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.28.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.28.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.28.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
+    "model.layers.28.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.28.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.28.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.28.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.29.input_layernorm.weight": "model-00005-of-00005.safetensors",
+    "model.layers.29.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.29.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.29.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.29.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
+    "model.layers.29.self_attn.k_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.29.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.29.self_attn.q_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.29.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.3.input_layernorm.weight": "model-00004-of-00005.safetensors",
+    "model.layers.3.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.3.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.3.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.3.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
+    "model.layers.3.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.3.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.3.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.3.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.30.input_layernorm.weight": "model-00001-of-00005.safetensors",
+    "model.layers.30.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.30.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.30.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.30.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
+    "model.layers.30.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.30.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.30.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.30.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.31.input_layernorm.weight": "model-00004-of-00005.safetensors",
+    "model.layers.31.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.31.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.31.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.31.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
+    "model.layers.31.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.31.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.31.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.31.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.32.input_layernorm.weight": "model-00001-of-00005.safetensors",
+    "model.layers.32.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.32.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.32.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.32.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
+    "model.layers.32.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.32.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.32.self_attn.q_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.32.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.33.input_layernorm.weight": "model-00003-of-00005.safetensors",
+    "model.layers.33.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.33.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.33.mlp.up_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.33.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
+    "model.layers.33.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.33.self_attn.o_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.33.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.33.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.34.input_layernorm.weight": "model-00002-of-00005.safetensors",
+    "model.layers.34.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.34.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.34.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.34.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
+    "model.layers.34.self_attn.k_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.34.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.34.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.34.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.35.input_layernorm.weight": "model-00003-of-00005.safetensors",
+    "model.layers.35.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.35.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.35.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.35.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
+    "model.layers.35.self_attn.k_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.35.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.35.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.35.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.36.input_layernorm.weight": "model-00004-of-00005.safetensors",
+    "model.layers.36.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.36.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.36.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.36.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
+    "model.layers.36.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.36.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.36.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.36.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.37.input_layernorm.weight": "model-00005-of-00005.safetensors",
+    "model.layers.37.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.37.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.37.mlp.up_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.37.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
+    "model.layers.37.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.37.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.37.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.37.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.38.input_layernorm.weight": "model-00004-of-00005.safetensors",
+    "model.layers.38.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.38.mlp.gate_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.38.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.38.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
+    "model.layers.38.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.38.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.38.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.38.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.39.input_layernorm.weight": "model-00001-of-00005.safetensors",
+    "model.layers.39.mlp.down_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.39.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.39.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.39.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
+    "model.layers.39.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.39.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.39.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.39.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.4.input_layernorm.weight": "model-00002-of-00005.safetensors",
+    "model.layers.4.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.4.mlp.gate_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.4.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.4.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
+    "model.layers.4.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.4.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.4.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.4.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.40.input_layernorm.weight": "model-00003-of-00005.safetensors",
+    "model.layers.40.mlp.down_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.40.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.40.mlp.up_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.40.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
+    "model.layers.40.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.40.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.40.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.40.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.41.input_layernorm.weight": "model-00002-of-00005.safetensors",
+    "model.layers.41.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.41.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.41.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.41.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
+    "model.layers.41.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.41.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.41.self_attn.q_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.41.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.42.input_layernorm.weight": "model-00003-of-00005.safetensors",
+    "model.layers.42.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.42.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.42.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.42.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
+    "model.layers.42.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.42.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.42.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.42.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.43.input_layernorm.weight": "model-00004-of-00005.safetensors",
+    "model.layers.43.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.43.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.43.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.43.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
+    "model.layers.43.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.43.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.43.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.43.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.44.input_layernorm.weight": "model-00002-of-00005.safetensors",
+    "model.layers.44.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.44.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.44.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.44.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
+    "model.layers.44.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.44.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.44.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.44.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.45.input_layernorm.weight": "model-00004-of-00005.safetensors",
+    "model.layers.45.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.45.mlp.gate_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.45.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.45.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
+    "model.layers.45.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.45.self_attn.o_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.45.self_attn.q_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.45.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.46.input_layernorm.weight": "model-00001-of-00005.safetensors",
+    "model.layers.46.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.46.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.46.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.46.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
+    "model.layers.46.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.46.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.46.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.46.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.47.input_layernorm.weight": "model-00004-of-00005.safetensors",
+    "model.layers.47.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.47.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.47.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.47.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
+    "model.layers.47.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.47.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.47.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.47.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.48.input_layernorm.weight": "model-00002-of-00005.safetensors",
+    "model.layers.48.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.48.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.48.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.48.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
+    "model.layers.48.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.48.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.48.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.48.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.49.input_layernorm.weight": "model-00004-of-00005.safetensors",
+    "model.layers.49.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.49.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.49.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.49.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
+    "model.layers.49.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.49.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.49.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.49.self_attn.v_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.5.input_layernorm.weight": "model-00005-of-00005.safetensors",
+    "model.layers.5.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.5.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.5.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
+    "model.layers.5.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.5.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.5.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.6.input_layernorm.weight": "model-00004-of-00005.safetensors",
+    "model.layers.6.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.6.mlp.gate_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.6.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
+    "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.6.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.6.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.6.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.7.input_layernorm.weight": "model-00001-of-00005.safetensors",
+    "model.layers.7.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.7.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.7.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
+    "model.layers.7.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.7.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.7.self_attn.q_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.7.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.8.input_layernorm.weight": "model-00005-of-00005.safetensors",
+    "model.layers.8.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
+    "model.layers.8.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.8.mlp.up_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.8.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
+    "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.8.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.8.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.8.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
+    "model.layers.9.input_layernorm.weight": "model-00005-of-00005.safetensors",
+    "model.layers.9.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.9.mlp.gate_proj.weight": "model-00005-of-00005.safetensors",
+    "model.layers.9.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.9.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
+    "model.layers.9.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
+    "model.layers.9.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.9.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
+    "model.layers.9.self_attn.v_proj.weight": "model-00005-of-00005.safetensors",
+    "model.norm.weight": "model-00003-of-00005.safetensors"
+  }
+}
--- a/params.json
+++ b/params.json
@@ -0,0 +1,11 @@
+{
+    "dim": 4096,
+    "n_layers": 50,
+    "head_dim": 128,
+    "hidden_dim": 14336,
+    "n_heads": 32,
+    "n_kv_heads": 8,
+    "norm_eps": 1e-05,
+    "vocab_size": 32128,
+    "rope_theta": 1000000.0
+}
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,160 @@
+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|function_list|>",
+    "<|function_output|>",
+    "<|function_call|>",
+    "<tool_call>",
+    "</tool_call>",
+    "<think>",
+    "</think>",
+    "<|control_10|>",
+    "<|control_11|>",
+    "<|control_12|>",
+    "<|control_13|>",
+    "<|control_14|>",
+    "<|control_15|>",
+    "<|control_16|>",
+    "<|control_17|>",
+    "<|control_18|>",
+    "<|control_19|>",
+    "<|control_20|>",
+    "<|control_21|>",
+    "<|control_22|>",
+    "<|control_23|>",
+    "<|control_24|>",
+    "<|control_25|>",
+    "<|control_26|>",
+    "<|control_27|>",
+    "<|control_28|>",
+    "<|control_29|>",
+    "<|control_30|>",
+    "<|control_31|>",
+    "<|control_32|>",
+    "<|control_33|>",
+    "<|control_34|>",
+    "<|control_35|>",
+    "<|control_36|>",
+    "<|control_37|>",
+    "<|control_38|>",
+    "<|control_39|>",
+    "<|control_40|>",
+    "<|control_41|>",
+    "<|control_42|>",
+    "<|control_43|>",
+    "<|control_44|>",
+    "<|control_45|>",
+    "<|control_46|>",
+    "<|control_47|>",
+    "<|control_48|>",
+    "<|control_49|>",
+    "<|control_50|>",
+    "<|control_51|>",
+    "<|control_52|>",
+    "<|control_53|>",
+    "<|control_54|>",
+    "<|control_55|>",
+    "<|control_56|>",
+    "<|control_57|>",
+    "<|control_58|>",
+    "<|control_59|>",
+    "<|control_60|>",
+    "<|control_61|>",
+    "<|control_62|>",
+    "<|control_63|>",
+    "<|control_64|>",
+    "<|control_65|>",
+    "<|control_66|>",
+    "<|control_67|>",
+    "<|control_68|>",
+    "<|control_69|>",
+    "<|control_70|>",
+    "<|control_71|>",
+    "<|control_72|>",
+    "<|control_73|>",
+    "<|control_74|>",
+    "<|control_75|>",
+    "<|control_76|>",
+    "<|control_77|>",
+    "<|control_78|>",
+    "<|control_79|>",
+    "<|control_80|>",
+    "<|control_81|>",
+    "<|control_82|>",
+    "<|control_83|>",
+    "<|control_84|>",
+    "<|control_85|>",
+    "<|control_86|>",
+    "<|control_87|>",
+    "<|control_88|>",
+    "<|control_89|>",
+    "<|control_90|>",
+    "<|control_91|>",
+    "<|control_92|>",
+    "<|control_93|>",
+    "<|control_94|>",
+    "<|control_95|>",
+    "<|control_96|>",
+    "<|control_97|>",
+    "<|control_98|>",
+    "<|control_99|>",
+    "<|control_100|>",
+    "<|control_101|>",
+    "<|control_102|>",
+    "<|control_103|>",
+    "<|control_104|>",
+    "<|control_105|>",
+    "<|control_106|>",
+    "<|control_107|>",
+    "<|control_108|>",
+    "<|control_109|>",
+    "<|control_110|>",
+    "<|control_111|>",
+    "<|control_112|>",
+    "<|control_113|>",
+    "<|control_114|>",
+    "<|control_115|>",
+    "<|control_116|>",
+    "<|control_117|>",
+    "<|control_118|>",
+    "<|control_119|>",
+    "<|control_120|>",
+    "<|control_121|>",
+    "<|control_122|>",
+    "<|control_123|>",
+    "<|control_124|>",
+    "<|control_125|>",
+    "<|control_126|>",
+    "<|control_127|>",
+    "<|control_128|>"
+  ],
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}
--- a/tokenizer.json
+++ b/tokenizer.json
--- a/tokenizer.model
+++ b/tokenizer.model
--- a/tokenizer_config.json
+++ b/tokenizer_config.json