初始化项目，由ModelHub XC社区提供模型

Model: rombodawg/LosslessMegaCoder-llama2-7b-mini Source: Original Platform
2026-05-12 08:19:28 +08:00
commit 794a851a5a
17 changed files with 94030 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,35 @@
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.bz2 filter=lfs diff=lfs merge=lfs -text
 *.ckpt filter=lfs diff=lfs merge=lfs -text
 *.ftz filter=lfs diff=lfs merge=lfs -text
 *.gz filter=lfs diff=lfs merge=lfs -text
 *.h5 filter=lfs diff=lfs merge=lfs -text
 *.joblib filter=lfs diff=lfs merge=lfs -text
 *.lfs.* filter=lfs diff=lfs merge=lfs -text
 *.mlmodel filter=lfs diff=lfs merge=lfs -text
 *.model filter=lfs diff=lfs merge=lfs -text
 *.msgpack filter=lfs diff=lfs merge=lfs -text
 *.npy filter=lfs diff=lfs merge=lfs -text
 *.npz filter=lfs diff=lfs merge=lfs -text
 *.onnx filter=lfs diff=lfs merge=lfs -text
 *.ot filter=lfs diff=lfs merge=lfs -text
 *.parquet filter=lfs diff=lfs merge=lfs -text
 *.pb filter=lfs diff=lfs merge=lfs -text
 *.pickle filter=lfs diff=lfs merge=lfs -text
 *.pkl filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
 *.rar filter=lfs diff=lfs merge=lfs -text
 *.safetensors filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
 *.tar filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text
 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,107 @@
 ---
 license: llama2
 datasets:
 - rombodawg/LosslessMegaCodeTrainingV2_1m_Evol_Uncensored
 ---
 ___________________________
 - Please note this model was not trained on the rombodawg/LosslessMegaCodeTrainingV3_MINI dataset, despite the name similarity. You can find the training data at the bottom of the model card labeled (megacode2-min100)
 ___________________________
 This is one of the first models trained on the LosslessMegaCodeTrainingV2_1m_Evol_Uncensored dataset. The version of the dataset used for this model was filtered by removed any data with less than 100 tokens but plans for much more refined filtering are in the works
 - This model was made as a colaboration between me and andreaskoepf who is an affiliate of Open Assistant. 
 This model is extremely good at coding, and might be one of the best coding models for its size and much better than any 7b parameter model. Plans for bigger models are coming in the future.
 ### Prompt template
 [chatml](https://github.com/openai/openai-python/blob/main/chatml.md) format is used:
 "<|im_start|>system\n{system message}<|im_end|>\n<|im_start|>user\n{user prompt}<|im_end|>\n<|im_start|>assistant\n{Assistant answer}<|im_end|>\n" 
 multi-line:
 ```
 <|im_start|>system
 {system message}<|im_end|>
 <|im_start|>user
 {user prompt}<|im_end|>
 <|im_start|>assistant
 {Assistant answer}<|im_end|>
 ```
 Gpt4all template:
 - System prompt
 ```
 <|im_start|>system
 "Below is an instruction that describes a task. Write a response that appropriately completes the request."
 ```
 - Prompt template
 ```
 <|im_end|>
 <|im_start|>user
 "%1"<|im_end|>
 <|im_start|>assistant
 ```
 Oobagooba Text-Generation-Webui Template
 - user:
 ```
  <|im_start|>user
  {User string}<|im_end|>
 ```
 - bot:
 ```
  <|im_start|>assistant
  {Bot string}<|im_end|>
 ```
 - turn_template:
 ```
 <|user|>\n<|user-message|>\n\n<|bot|>\n<|bot-message|>\n\n
 ```
 - context:
 ```
  <|im_start|>system
  Below is an instruction that describes a task. Write a response that appropriately completes the request.<|im_end|>
 ```
 Current quatizations available:
 - https://huggingface.co/TheBloke/LosslessMegaCoder-Llama2-7B-Mini-GPTQ
 Benchmarks for the model can be found at the link bellow the model here is called (andreaskoepf/llama2-7b-megacode2_min100)
 - https://tju01.github.io/FastEval-OpenAssistant/
 Sampling report:
 https://open-assistant.github.io/oasst-model-eval/?f=https%3A%2F%2Fraw.githubusercontent.com%2FOpen-Assistant%2Foasst-model-eval%2Fmain%2Fsampling_reports%2Foasst-pretrained%2F2023-08-12_andreaskoepf_llama2-7b-megacode2_min100_sampling_noprefix2.json
 Training information:
 - https://wandb.ai/open-assistant/public-sft/runs/run17_megacode_min100
 The link for the full dataset is bellow:
 - https://huggingface.co/datasets/rombodawg/LosslessMegaCodeTrainingV2_1m_Evol_Uncensored
 Link for the filtered dataset used to make this model are bellow:
 - https://huggingface.co/datasets/andreaskoepf/megacode2-min100
 The original posting for this model was uploaded at the link bellow. 
 - https://huggingface.co/andreaskoepf/llama2-7b-megacode2_min100
 # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
 Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_rombodawg__LosslessMegaCoder-llama2-7b-mini)
 | Metric                | Value                     |
 |-----------------------|---------------------------|
 | Avg.                  | 45.33   |
 | ARC (25-shot)         | 53.5          |
 | HellaSwag (10-shot)   | 77.38    |
 | MMLU (5-shot)         | 49.72         |
 | TruthfulQA (0-shot)   | 45.77   |
 | Winogrande (5-shot)   | 74.03   |
 | GSM8K (5-shot)        | 9.55        |
 | DROP (3-shot)         | 7.34         |
--- a/config.json
+++ b/config.json
@@ -0,0 +1,25 @@
 {
  "architectures": [
    "LlamaForCausalLM"
  ],
  "bos_token_id": 1,
  "eos_token_id": 32006,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 11008,
  "max_position_embeddings": 4096,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 32,
  "pad_token_id": 0,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.31.0",
  "use_cache": true,
  "vocab_size": 32007
 }
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,6 @@
 {
  "bos_token_id": 1,
  "eos_token_id": 32006,
  "pad_token_id": 0,
  "transformers_version": "4.31.0"
 }
--- a/huggingface-metadata.txt
+++ b/huggingface-metadata.txt
@@ -0,0 +1,13 @@
 url: https://huggingface.co/andreaskoepf/llama2-7b-megacode2_min100
 branch: main
 download date: 2023-08-12 22:23:38
 sha256sum:
    fe688706d5cdbd6379124c171c7c6f3c84c3763e505182d4d5cb1efe3ed7a1b8 pytorch_model-00001-of-00008.bin
    4d2011ed5d037437d553c4e1560a0fc684d4414286a52eebbf5e4807e43016ca pytorch_model-00002-of-00008.bin
    f5799d91309b7392105e1f7afbdc84e079448fb4ec0dc2c07f60e6fed6ca5e54 pytorch_model-00003-of-00008.bin
    223d9ddf59b6974a22bbcf08a948a9b8a639604a141c220a1ace47eabb25df28 pytorch_model-00004-of-00008.bin
    5dddfdc359df7a3bdf6f5c8cbd2bd5e81b203ae62da2837a3283645a9c39f230 pytorch_model-00005-of-00008.bin
    57bff4043363eb7af7f5d4bd2b07d0951c4bedf1ade47290e23d35c27d15cd52 pytorch_model-00006-of-00008.bin
    a25d72a4d6068ff81c8d3bda8f8dbf5362aa76da38d803bfee7b7d85fb692aeb pytorch_model-00007-of-00008.bin
    721a078f52a1b5680372ef91f75c95490cad5596c3365607de69b2b91dd8f95b pytorch_model-00008-of-00008.bin
    9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347 tokenizer.model
--- a/pytorch_model-00001-of-00008.bin
+++ b/pytorch_model-00001-of-00008.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:fe688706d5cdbd6379124c171c7c6f3c84c3763e505182d4d5cb1efe3ed7a1b8
 size 1914838174
--- a/pytorch_model-00002-of-00008.bin
+++ b/pytorch_model-00002-of-00008.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:4d2011ed5d037437d553c4e1560a0fc684d4414286a52eebbf5e4807e43016ca
 size 1900102374
--- a/pytorch_model-00003-of-00008.bin
+++ b/pytorch_model-00003-of-00008.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:f5799d91309b7392105e1f7afbdc84e079448fb4ec0dc2c07f60e6fed6ca5e54
 size 1843496394
--- a/pytorch_model-00004-of-00008.bin
+++ b/pytorch_model-00004-of-00008.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:223d9ddf59b6974a22bbcf08a948a9b8a639604a141c220a1ace47eabb25df28
 size 1923187238
--- a/pytorch_model-00005-of-00008.bin
+++ b/pytorch_model-00005-of-00008.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:5dddfdc359df7a3bdf6f5c8cbd2bd5e81b203ae62da2837a3283645a9c39f230
 size 1900102438
--- a/pytorch_model-00006-of-00008.bin
+++ b/pytorch_model-00006-of-00008.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:57bff4043363eb7af7f5d4bd2b07d0951c4bedf1ade47290e23d35c27d15cd52
 size 1843496394
--- a/pytorch_model-00007-of-00008.bin
+++ b/pytorch_model-00007-of-00008.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:a25d72a4d6068ff81c8d3bda8f8dbf5362aa76da38d803bfee7b7d85fb692aeb
 size 1889640934
--- a/pytorch_model-00008-of-00008.bin
+++ b/pytorch_model-00008-of-00008.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:721a078f52a1b5680372ef91f75c95490cad5596c3365607de69b2b91dd8f95b
 size 262202757
--- a/pytorch_model.bin.index.json
+++ b/pytorch_model.bin.index.json
@@ -0,0 +1,330 @@
 {
  "metadata": {
    "total_size": 13476954112
  },
  "weight_map": {
    "lm_head.weight": "pytorch_model-00008-of-00008.bin",
    "model.embed_tokens.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.0.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00008.bin",
    "model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.1.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00008.bin",
    "model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.10.input_layernorm.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.10.mlp.down_proj.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.10.mlp.gate_proj.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.10.mlp.up_proj.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.10.post_attention_layernorm.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.10.self_attn.k_proj.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.10.self_attn.o_proj.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.10.self_attn.q_proj.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.10.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00008.bin",
    "model.layers.10.self_attn.v_proj.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.11.input_layernorm.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.11.mlp.down_proj.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.11.mlp.gate_proj.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.11.mlp.up_proj.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.11.post_attention_layernorm.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.11.self_attn.k_proj.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.11.self_attn.o_proj.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.11.self_attn.q_proj.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.11.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00008.bin",
    "model.layers.11.self_attn.v_proj.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.12.input_layernorm.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.12.mlp.down_proj.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.12.mlp.gate_proj.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.12.mlp.up_proj.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.12.post_attention_layernorm.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.12.self_attn.k_proj.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.12.self_attn.o_proj.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.12.self_attn.q_proj.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.12.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00008.bin",
    "model.layers.12.self_attn.v_proj.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.13.input_layernorm.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.13.mlp.down_proj.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.13.mlp.gate_proj.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.13.mlp.up_proj.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.13.post_attention_layernorm.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.13.self_attn.k_proj.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.13.self_attn.o_proj.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.13.self_attn.q_proj.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.13.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00008.bin",
    "model.layers.13.self_attn.v_proj.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.14.input_layernorm.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.14.mlp.down_proj.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.14.mlp.gate_proj.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.14.mlp.up_proj.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.14.post_attention_layernorm.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.14.self_attn.k_proj.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.14.self_attn.o_proj.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.14.self_attn.q_proj.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.14.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00008.bin",
    "model.layers.14.self_attn.v_proj.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.15.input_layernorm.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.15.mlp.down_proj.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.15.mlp.gate_proj.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.15.mlp.up_proj.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.15.post_attention_layernorm.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.15.self_attn.k_proj.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.15.self_attn.o_proj.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.15.self_attn.q_proj.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.15.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00008.bin",
    "model.layers.15.self_attn.v_proj.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.16.input_layernorm.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.16.mlp.down_proj.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.16.mlp.gate_proj.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.16.mlp.up_proj.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.16.post_attention_layernorm.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.16.self_attn.k_proj.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.16.self_attn.o_proj.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.16.self_attn.q_proj.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.16.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00008.bin",
    "model.layers.16.self_attn.v_proj.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.17.input_layernorm.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.17.mlp.down_proj.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.17.mlp.gate_proj.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.17.mlp.up_proj.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.17.post_attention_layernorm.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.17.self_attn.k_proj.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.17.self_attn.o_proj.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.17.self_attn.q_proj.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.17.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00008.bin",
    "model.layers.17.self_attn.v_proj.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.18.input_layernorm.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.18.mlp.down_proj.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.18.mlp.gate_proj.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.18.mlp.up_proj.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.18.post_attention_layernorm.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.18.self_attn.k_proj.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.18.self_attn.o_proj.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.18.self_attn.q_proj.weight": "pytorch_model-00004-of-00008.bin",
    "model.layers.18.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00008.bin",
    "model.layers.18.self_attn.v_proj.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.19.input_layernorm.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.19.mlp.down_proj.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.19.mlp.gate_proj.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.19.mlp.up_proj.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.19.post_attention_layernorm.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.19.self_attn.k_proj.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.19.self_attn.o_proj.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.19.self_attn.q_proj.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.19.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00008.bin",
    "model.layers.19.self_attn.v_proj.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.2.mlp.down_proj.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.2.mlp.gate_proj.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.2.mlp.up_proj.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.2.self_attn.o_proj.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.2.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00008.bin",
    "model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.20.input_layernorm.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.20.mlp.down_proj.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.20.mlp.gate_proj.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.20.mlp.up_proj.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.20.post_attention_layernorm.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.20.self_attn.k_proj.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.20.self_attn.o_proj.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.20.self_attn.q_proj.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.20.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00008.bin",
    "model.layers.20.self_attn.v_proj.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.21.input_layernorm.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.21.mlp.down_proj.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.21.mlp.gate_proj.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.21.mlp.up_proj.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.21.post_attention_layernorm.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.21.self_attn.k_proj.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.21.self_attn.o_proj.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.21.self_attn.q_proj.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.21.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00008.bin",
    "model.layers.21.self_attn.v_proj.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.22.input_layernorm.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.22.mlp.down_proj.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.22.mlp.gate_proj.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.22.mlp.up_proj.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.22.post_attention_layernorm.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.22.self_attn.k_proj.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.22.self_attn.o_proj.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.22.self_attn.q_proj.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.22.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00008.bin",
    "model.layers.22.self_attn.v_proj.weight": "pytorch_model-00005-of-00008.bin",
    "model.layers.23.input_layernorm.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.23.mlp.down_proj.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.23.mlp.gate_proj.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.23.mlp.up_proj.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.23.post_attention_layernorm.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.23.self_attn.k_proj.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.23.self_attn.o_proj.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.23.self_attn.q_proj.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.23.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00008.bin",
    "model.layers.23.self_attn.v_proj.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.24.input_layernorm.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.24.mlp.down_proj.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.24.mlp.gate_proj.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.24.mlp.up_proj.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.24.post_attention_layernorm.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.24.self_attn.k_proj.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.24.self_attn.o_proj.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.24.self_attn.q_proj.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.24.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00008.bin",
    "model.layers.24.self_attn.v_proj.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.25.input_layernorm.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.25.mlp.down_proj.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.25.mlp.gate_proj.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.25.mlp.up_proj.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.25.post_attention_layernorm.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.25.self_attn.k_proj.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.25.self_attn.o_proj.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.25.self_attn.q_proj.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.25.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00008.bin",
    "model.layers.25.self_attn.v_proj.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.26.input_layernorm.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.26.mlp.down_proj.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.26.mlp.gate_proj.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.26.mlp.up_proj.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.26.post_attention_layernorm.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.26.self_attn.k_proj.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.26.self_attn.o_proj.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.26.self_attn.q_proj.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.26.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00008.bin",
    "model.layers.26.self_attn.v_proj.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.27.input_layernorm.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.27.mlp.down_proj.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.27.mlp.gate_proj.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.27.mlp.up_proj.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.27.post_attention_layernorm.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.27.self_attn.k_proj.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.27.self_attn.o_proj.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.27.self_attn.q_proj.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.27.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00008.bin",
    "model.layers.27.self_attn.v_proj.weight": "pytorch_model-00006-of-00008.bin",
    "model.layers.28.input_layernorm.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.28.mlp.down_proj.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.28.mlp.gate_proj.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.28.mlp.up_proj.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.28.post_attention_layernorm.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.28.self_attn.k_proj.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.28.self_attn.o_proj.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.28.self_attn.q_proj.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.28.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00008.bin",
    "model.layers.28.self_attn.v_proj.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.29.input_layernorm.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.29.mlp.down_proj.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.29.mlp.gate_proj.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.29.mlp.up_proj.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.29.post_attention_layernorm.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.29.self_attn.k_proj.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.29.self_attn.o_proj.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.29.self_attn.q_proj.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.29.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00008.bin",
    "model.layers.29.self_attn.v_proj.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.3.input_layernorm.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.3.mlp.down_proj.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.3.mlp.gate_proj.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.3.mlp.up_proj.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.3.post_attention_layernorm.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.3.self_attn.k_proj.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.3.self_attn.o_proj.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.3.self_attn.q_proj.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.3.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00008.bin",
    "model.layers.3.self_attn.v_proj.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.30.input_layernorm.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.30.mlp.down_proj.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.30.mlp.gate_proj.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.30.mlp.up_proj.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.30.post_attention_layernorm.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.30.self_attn.k_proj.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.30.self_attn.o_proj.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.30.self_attn.q_proj.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.30.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00008.bin",
    "model.layers.30.self_attn.v_proj.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.31.input_layernorm.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.31.mlp.down_proj.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.31.mlp.gate_proj.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.31.mlp.up_proj.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.31.post_attention_layernorm.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.31.self_attn.k_proj.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.31.self_attn.o_proj.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.31.self_attn.q_proj.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.31.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00008.bin",
    "model.layers.31.self_attn.v_proj.weight": "pytorch_model-00007-of-00008.bin",
    "model.layers.4.input_layernorm.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.4.mlp.down_proj.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.4.mlp.gate_proj.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.4.mlp.up_proj.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.4.post_attention_layernorm.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.4.self_attn.k_proj.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.4.self_attn.o_proj.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.4.self_attn.q_proj.weight": "pytorch_model-00001-of-00008.bin",
    "model.layers.4.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00008.bin",
    "model.layers.4.self_attn.v_proj.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.5.input_layernorm.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.5.mlp.down_proj.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.5.mlp.gate_proj.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.5.mlp.up_proj.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.5.post_attention_layernorm.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.5.self_attn.k_proj.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.5.self_attn.o_proj.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.5.self_attn.q_proj.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.5.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00008.bin",
    "model.layers.5.self_attn.v_proj.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.6.input_layernorm.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.6.mlp.down_proj.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.6.mlp.gate_proj.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.6.mlp.up_proj.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.6.post_attention_layernorm.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.6.self_attn.k_proj.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.6.self_attn.o_proj.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.6.self_attn.q_proj.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.6.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00008.bin",
    "model.layers.6.self_attn.v_proj.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.7.input_layernorm.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.7.mlp.down_proj.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.7.mlp.gate_proj.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.7.mlp.up_proj.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.7.post_attention_layernorm.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.7.self_attn.k_proj.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.7.self_attn.o_proj.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.7.self_attn.q_proj.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.7.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00008.bin",
    "model.layers.7.self_attn.v_proj.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.8.input_layernorm.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.8.mlp.down_proj.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.8.mlp.gate_proj.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.8.mlp.up_proj.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.8.post_attention_layernorm.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.8.self_attn.k_proj.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.8.self_attn.o_proj.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.8.self_attn.q_proj.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.8.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00008.bin",
    "model.layers.8.self_attn.v_proj.weight": "pytorch_model-00002-of-00008.bin",
    "model.layers.9.input_layernorm.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.9.mlp.down_proj.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.9.mlp.gate_proj.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.9.mlp.up_proj.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.9.post_attention_layernorm.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.9.self_attn.k_proj.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.9.self_attn.o_proj.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.9.self_attn.q_proj.weight": "pytorch_model-00003-of-00008.bin",
    "model.layers.9.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00008.bin",
    "model.layers.9.self_attn.v_proj.weight": "pytorch_model-00003-of-00008.bin",
    "model.norm.weight": "pytorch_model-00007-of-00008.bin"
  }
 }
--- a/tokenizer.json
+++ b/tokenizer.json
--- a/tokenizer.model
+++ b/tokenizer.model
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,33 @@
 {
  "bos_token": {
    "__type": "AddedToken",
    "content": "<s>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "clean_up_tokenization_spaces": false,
  "eos_token": {
    "__type": "AddedToken",
    "content": "</s>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "legacy": false,
  "model_max_length": 1000000000000000019884624838656,
  "pad_token": null,
  "padding_side": "right",
  "sp_model_kwargs": {},
  "tokenizer_class": "LlamaTokenizer",
  "unk_token": {
    "__type": "AddedToken",
    "content": "<unk>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  }
 }