初始化项目，由ModelHub XC社区提供模型

Model: airesearch/LLaMa3.1-8B-Legal-ThaiCCL-Combine Source: Original Platform
2026-06-21 08:09:19 +08:00
commit 2aeb457d5e
16 changed files with 413432 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,35 @@
+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,169 @@
+---
+library_name: transformers
+base_model: meta-llama/Llama-3.1-8B
+tags:
+- alignment-handbook
+- generated_from_trainer
+datasets:
+- airesearch/WangchanX-Legal-ThaiCCL-RAG
+model-index:
+- name: llama3.1-8b-legal-combine-ccl-16
+  results: []
+---
+
+# Llama-3.1-Legal-ThaiCCL-8B
+
+Llama-3.1-Legal-ThaiCCL-8B is a large language model built upon Llama-3.1-8B, designed to answer Thai legal questions. It is full finetuned on the [WangchanX Thai Legal dataset](https://huggingface.co/datasets/airesearch/WangchanX-Legal-ThaiCCL-RAG) using the [WangchanX Finetuning pipeline](https://github.com/vistec-AI/WangchanX). The model is intended to be used with a supporting Retrieval-Augmented Generation (RAG) system which queries relevant supporting legal documents for the model to reference when responding to the questions.
+
+## Model description
+- Base model: [Meta Llama 3.1 8B](https://huggingface.co/meta-llama/Llama-3.1-8B)
+- Training Repository: [WangchanX Finetuning Pipeline](https://github.com/vistec-AI/WangchanX)
+- Training Dataset: [WangchanX Thai Legal dataset](https://huggingface.co/datasets/airesearch/WangchanX-Legal-ThaiCCL-RAG) 
+- License: [Meta's Llama 3.1 Community License Agreement](https://www.llama.com/llama3_1/license/)
+
+## Model Usage
+```python
+from transformers import pipeline
+import torch
+
+EN_QA_TEMPLATE = "Given the user's query in the context of Thai legal matters, the RAG system retrieves the top_n related documents. From these documents, it's crucial to identify and utilize only the most relevant ones to craft an accurate and informative response.Context information is below.\n\n---------------------\nContext: Thai legal domain\nQuery: {query_str}\nRetrieved Documents: {context_str}\n---------------------\n\n Using the provided context information and the list of retrieved documents, you will focus on selecting the documents that are most relevant to the user's query. This selection process involves evaluating the content of each document for its pertinency to the query, ensuring that the response is based on accurate and contextually appropriate information.Based on the selected documents, you will synthesize a response that addresses the user's query, drawing directly from the content of these documents to provide a precise, legally informed answer.You must answer in Thai.\nAnswer:"
+
+EN_SYSTEM_PROMPT_STR = """You are a legal assistant named Sommai (สมหมาย in Thai). You provide legal advice in a friendly, clear, and approachable manner. When answering questions, you reference the relevant law sections, including the name of the act or code they are from. You explain what these sections entail, including any associated punishments, fees, or obligations. Your tone is polite yet informal, making users feel comfortable, like consulting a trusted friend. If a question falls outside your knowledge, you must respond with the exact phrase: 'สมหมายไม่สามารถตอบคำถามนี้ได้ครับ'. You avoid making up information and guide users based on accurate legal references relevant to their situation. Where applicable, you provide practical advice, such as preparing documents, seeking medical attention, or contacting authorities. If asked about past Supreme Court judgments, you must state that you do not have information on those judgments at this time."""
+
+query = "การร้องขอให้ศาลสั่งให้บุคคลเป็นคนไร้ความสามารถมีหลักเกณฑ์การพิจารณาอย่างไร"
+
+context = """ประมวลกฎหมายแพ่งและพาณิชย์ มาตรา 33 ในคดีที่มีการร้องขอให้ศาลสั่งให้บุคคลใดเป็นคนไร้ความสามารถเพราะวิกลจริต ถ้าทางพิจารณาได้ความว่าบุคคลนั้นไม่วิกลจริต แต่มีจิตฟั่นเฟือนไม่สมประกอบ เมื่อศาลเห็นสมควรหรือเมื่อมีคำขอของคู่ความหรือของบุคคลตามที่ระบุไว้ในมาตรา 28 ศาลอาจสั่งให้บุคคลนั้นเป็นคนเสมือนไร้ความสามารถก็ได้ หรือในคดีที่มีการร้องขอให้ศาลสั่งให้บุคคลใดเป็นคนเสมือนไร้ความสามารถเพราะมีจิตฟั่นเฟือนไม่สมประกอบ ถ้าทางพิจารณาได้ความว่าบุคคลนั้นวิกลจริต เมื่อมีคำขอของคู่ความหรือของบุคคลตามที่ระบุไว้ในมาตรา 28 ศาลอาจสั่งให้บุคคลนั้นเป็นคนไร้ความสามารถก็ได้"""
+
+
+model_id = "airesearch/LLaMa3.1-8B-Legal-ThaiCCL-Combine"
+
+pipeline = transformers.pipeline(
+    "text-generation",
+    model=model_id,
+    model_kwargs={"torch_dtype": torch.bfloat16},
+    device_map="auto",
+)
+
+sample = [
+    {"role": "system", "content": SYSTEM_PROMPT_STR},
+    {"role": "user", "content": QA_template.format(context_str=context, query_str=query)},
+]
+
+prompt = pipeline.tokenizer.apply_chat_template(sample, 
+                                                tokenize=False, 
+                                                add_generation_prompt=True)
+
+outputs = pipeline(
+    prompt,
+    max_new_tokens = 512,
+    eos_token_id = terminators,
+    do_sample = True,
+    temperature = 0.6,
+    top_p = 0.9
+)
+
+print(outputs[0]["generated_text"][-1])
+```
+
+## Training Data
+The model is trained on the [WangchanX Legal ThaiCCL RAG dataset](https://huggingface.co/datasets/airesearch/WangchanX-Legal-ThaiCCL-RAG), which is a Thai legal question-answering dataset created using a RAG system to query relevant supporting legal datasets based on a question for the LLM to reference in its answer. For more information on how the datasets was created please refer to this [blog](https://medium.com/airesearch-in-th/ชุดข้อมูลกฎหมายไทยสำหรับการพัฒนา-rag-0eb2eab283a1).
+
+To emulate a real world use case, during training we incorporated both the positive and negative context (if available) into the prompt. We found that this resulted in a model that is more robust towards cases that the RAG system also passes in irrelevant contexts mixed with the correct context to reference (refer to the [evaluation](#evaluation) section for results). 
+
+### Prompt Format
+We recommend using the same chat template (system prompt and question template of context, query, and retreived documents) when using the provided weights, since the model was trained with the specific system prompt and question template. Example input prompt:
+```
+<|begin_of_text|><|start_header_id|>system<|end_header_id|>
+You are a legal assistant named Sommai (สมหมาย in Thai), you provide legal advice to users in a friendly and understandable manner. When answering questions, you specifically reference the law sections relevant to the query, including the name of the act or code they originated from, an explanation of what those sections entail, and any associated punishments or fees. Your tone is approachable and informal yet polite, making users feel as if they are seeking advice from a friend. If a question arises that does not match the information you possess, you must acknowledge your current limitations by stating this exactly sentence: 'สมหมายไม่สามารถตอบคำถามนี้ได้ครับ'. You will not fabricate information but rather guide users based on actual law sections relevant to their situation. Additionally, you offer practical advice on next steps, such as gathering required documents, seeking medical attention, or visiting a police station, as applicable. If inquired about past Supreme Court judgments, you must reply that you do not have information on those judgments yet.<|eot_id|>
+<|start_header_id|>user<|end_header_id|>
+
+Given the user's query in the context of Thai legal matters, the RAG system retrieves the top_n related documents. From these documents, it's crucial to identify and utilize only the most relevant ones to craft an accurate and informative response.
+
+Context information is below.
+---------------------
+Context: Thai legal domain
+Query: {question}
+Retreived Documents: {retreived legal documents}
+---------------------
+
+Using the provided context information and the list of retrieved documents, you will focus on selecting the documents that are most relevant to the user's query. This selection process involves evaluating the content of each document for its pertinency to the query, ensuring that the response is based on accurate and contextually appropriate information.
+Based on the selected documents, you will synthesize a response that addresses the user's query, drawing directly from the content of these documents to provide a precise, legally informed answer.
+You must answer in Thai.
+Answer:
+<|eot_id|>
+<|start_header_id|>assistant<|end_header_id|>
+```
+Here is a Python code snippet on how to apply the chat template with the provided system prompt and question template on the WangchanX Legal Thai CCL dataset:
+```python
+EN_QA_TEMPLATE = "Given the user's query in the context of Thai legal matters, the RAG system retrieves the top_n related documents. From these documents, it's crucial to identify and utilize only the most relevant ones to craft an accurate and informative response.Context information is below.\n\n---------------------\nContext: Thai legal domain\nQuery: {query_str}\nRetrieved Documents: {context_str}\n---------------------\n\n Using the provided context information and the list of retrieved documents, you will focus on selecting the documents that are most relevant to the user's query. This selection process involves evaluating the content of each document for its pertinency to the query, ensuring that the response is based on accurate and contextually appropriate information.Based on the selected documents, you will synthesize a response that addresses the user's query, drawing directly from the content of these documents to provide a precise, legally informed answer.You must answer in Thai.\nAnswer:"
+
+EN_SYSTEM_PROMPT_STR = """You are a legal assistant named Sommai (สมหมาย in Thai). You provide legal advice in a friendly, clear, and approachable manner. When answering questions, you reference the relevant law sections, including the name of the act or code they are from. You explain what these sections entail, including any associated punishments, fees, or obligations. Your tone is polite yet informal, making users feel comfortable, like consulting a trusted friend. If a question falls outside your knowledge, you must respond with the exact phrase: 'สมหมายไม่สามารถตอบคำถามนี้ได้ครับ'. You avoid making up information and guide users based on accurate legal references relevant to their situation. Where applicable, you provide practical advice, such as preparing documents, seeking medical attention, or contacting authorities. If asked about past Supreme Court judgments, you must state that you do not have information on those judgments at this time."""
+
+def format(example):
+    if "คำตอบ: " in example["positive_answer"]:
+        example["positive_answer"] = example["positive_answer"].replace("คำตอบ: ", "")
+    if example['positive_contexts']:
+        context = ''.join([v['text'] for v in example['positive_contexts'][:5]])
+        message = [
+            {"content": EN_SYSTEM_PROMPT_STR, "role": "system"}, 
+            {"content": EN_QA_TEMPLATE.format(query_str=example['question'], context_str=context), "role": "user"}, 
+        ]
+    else:
+        message = [
+            {"content": EN_SYSTEM_PROMPT_STR, "role": "system"}, 
+            {"content": EN_QA_TEMPLATE.format(query_str=example['question'], context_str=" "), "role": "user"}, 
+        ]
+    return dict(messages=message)
+dataset = dataset.map(format, batched=False)
+```
+
+### Training hyperparameters
+We full fine-tuned Llama-3.1-8B using the following hyperparameters:
+- learning_rate: 0.0002
+- train_batch_size: 4
+- eval_batch_size: 4
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 4
+- gradient_accumulation_steps: 16
+- total_train_batch_size: 256
+- total_eval_batch_size: 16
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.1
+- num_epochs: 4
+
+Total training time: 2:15:14.66
+
+## Evaluation
+We tested our model based on the test set of the WangchanX Legal Thai CCL dataset using both traditional (MRC) metrics and a LLM as judge technique based on the paper [CHIE: Generative MRC Evaluation for in-context QA with Correctness, Helpfulness, Irrelevancy, and Extraneousness Aspects](https://aclanthology.org/2024.genbench-1.10.pdf)
+
+Note: `LLaMa3.1-8B-Legal-ThaiCCL` is trained on only positive contexts while `LLaMa3.1-8B-Legal-ThaiCCL-Combine` is trained on both positive and negative contexts
+
+### Table 1: MRC Results
+| Model                                | Context Type      | Answer Type   | ROUGE-L | Character Error Rate (CER) | Word Error Rate (WER) | BERT Score | F1-score XQuAD | Exact Match XQuAD |
+|--------------------------------------|-------------------|---------------|----------|----------------------------|-----------------------|------------|----------------|-------------------|
+| Zero-shot LLaMa3.1-8B-Instruct       | Golden Passage    | Only Positive | 0.553    | 1.181                      | 1.301                 | 0.769      | 48.788         | 0.0               |
+| LLaMa3.1-8B-Legal-ThaiCCL           | Golden Passage    | Only Positive | 0.603    | 0.667                      | 0.736                 | 0.821      | 60.039         | 0.053             |
+| LLaMa3.1-8B-Legal-ThaiCCL-Combine   | Golden Passage    | Only Positive | 0.715    | 0.695                      | 0.758                 | 0.833      | 64.578         | 0.614             |
+| Zero-shot LLaMa3.1-70B-Instruct      | Golden Passage    | Only Positive | 0.830    | 0.768                      | 0.848                 | 0.830      | 61.497         | 0.0               |
+| Zero-shot LLaMa3.1-8B-Instruct       | Retrieval Passage | Only Positive | 0.422    | 1.631                      | 1.773                 | 0.757      | 39.639         | 0.0               |
+| LLaMa3.1-8B-Legal-ThaiCCL           | Retrieval Passage | Only Positive | 0.366    | 1.078                      | 1.220                 | 0.779      | 44.238         | 0.03              |
+| LLaMa3.1-8B-Legal-ThaiCCL-Combine   | Retrieval Passage | Only Positive | 0.516    | 0.884                      | 0.884                 | 0.816      | 54.948         | 0.668             |
+| Zero-shot LLaMa3.1-70B-Instruct      | Retrieval Passage | Only Positive | 0.616    | 0.934                      | 1.020                 | 0.816      | 54.930         | 0.0               |
+
+### Table 2: CHIE Results
+| Model                                | Context Type      | Answer Type   | Q1: Correctness [H] | Q2: Helpfulness [H] | Q3: Irrelevancy [L] | Q4: Out-of-Context [L] |
+|--------------------------------------|-------------------|---------------|----------------------|----------------------|---------------------|------------------------|
+| Zero-shot LLaMa3.1-8B-Instruct       | Golden Passage    | Only Positive | 0.740                | 0.808                | 0.480               | 0.410                  |
+| LLaMa3.1-8B-Legal-ThaiCCL           | Golden Passage    | Only Positive | 0.705                | 0.486                | 0.294               | 0.208                  |
+| LLaMa3.1-8B-Legal-ThaiCCL-Combine   | Golden Passage    | Only Positive | 0.565                | 0.468                | 0.405               | 0.325                  |
+| Zero-shot LLaMa3.1-70B-Instruct      | Golden Passage    | Only Positive | 0.870                | 0.658                | 0.316               | 0.247                  |
+| Zero-shot LLaMa3.1-8B-Instruct       | Retrieval Passage | Only Positive | 0.480                | 0.822                | 0.557               | 0.248                  |
+| LLaMa3.1-8B-Legal-ThaiCCL           | Retrieval Passage | Only Positive | 0.274                | 0.470                | 0.720               | 0.191                  |
+| LLaMa3.1-8B-Legal-ThaiCCL-Combine   | Retrieval Passage | Only Positive | 0.532                | 0.445                | 0.508               | 0.203                  |
+| Zero-shot LLaMa3.1-70B-Instruct      | Retrieval Passage | Only Positive | 0.748                | 0.594                | 0.364               | 0.202                  |
+
+
+## License and use
+The model is released under [Meta's Llama 3.1 Community License Agreement](https://www.llama.com/llama3_1/license/). Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. 
--- a/all_results.json
+++ b/all_results.json
@@ -0,0 +1,9 @@
+{
+    "epoch": 3.975951903807615,
+    "total_flos": 135489736671232.0,
+    "train_loss": 1.0211431388893435,
+    "train_runtime": 8462.237,
+    "train_samples": 7983,
+    "train_samples_per_second": 3.773,
+    "train_steps_per_second": 0.015
+}
--- a/config.json
+++ b/config.json
@@ -0,0 +1,35 @@
+{
+  "_name_or_path": "meta-llama/Llama-3.1-8B",
+  "architectures": [
+    "LlamaForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 128000,
+  "eos_token_id": 128001,
+  "hidden_act": "silu",
+  "hidden_size": 4096,
+  "initializer_range": 0.02,
+  "intermediate_size": 14336,
+  "max_position_embeddings": 131072,
+  "mlp_bias": false,
+  "model_type": "llama",
+  "num_attention_heads": 32,
+  "num_hidden_layers": 32,
+  "num_key_value_heads": 8,
+  "pretraining_tp": 1,
+  "rms_norm_eps": 1e-05,
+  "rope_scaling": {
+    "factor": 8.0,
+    "high_freq_factor": 4.0,
+    "low_freq_factor": 1.0,
+    "original_max_position_embeddings": 8192,
+    "rope_type": "llama3"
+  },
+  "rope_theta": 500000.0,
+  "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.44.2",
+  "use_cache": true,
+  "vocab_size": 128256
+}
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,9 @@
+{
+  "_from_model_config": true,
+  "bos_token_id": 128000,
+  "do_sample": true,
+  "eos_token_id": 128001,
+  "temperature": 0.6,
+  "top_p": 0.9,
+  "transformers_version": "4.44.2"
+}
--- a/model-00001-of-00004.safetensors
+++ b/model-00001-of-00004.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:ae7ac328692e737a34d7f02be1a29926096e08b8c5b3d6a0840d334359ff88bb
+size 4976698672
--- a/model-00002-of-00004.safetensors
+++ b/model-00002-of-00004.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:9e3d1172ca92ae7bd3888aaf97ec603b5e1d381181b47adf4a00c11b426705fb
+size 4999802720
--- a/model-00003-of-00004.safetensors
+++ b/model-00003-of-00004.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:3e152aeebd0bd1e5e3e02869ffaca78203ace7e50238494cd68397932721bcfd
+size 4915916176
--- a/model-00004-of-00004.safetensors
+++ b/model-00004-of-00004.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:b9e7181f1a1638ba91f17f071bb1f666bea828946e50a80b9ff385ca55d099cf
+size 1168138808
--- a/model.safetensors.index.json
+++ b/model.safetensors.index.json
@@ -0,0 +1,298 @@
+{
+  "metadata": {
+    "total_size": 16060522496
+  },
+  "weight_map": {
+    "lm_head.weight": "model-00004-of-00004.safetensors",
+    "model.embed_tokens.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.20.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.20.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.20.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.20.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.20.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.21.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.30.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.31.input_layernorm.weight": "model-00004-of-00004.safetensors",
+    "model.layers.31.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.31.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.31.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.31.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
+    "model.layers.31.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.31.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.31.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.31.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.norm.weight": "model-00004-of-00004.safetensors"
+  }
+}
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,17 @@
+{
+  "bos_token": {
+    "content": "<|begin_of_text|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|end_of_text|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "<|end_of_text|>"
+}
--- a/tokenizer.json
+++ b/tokenizer.json
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
--- a/train_results.json
+++ b/train_results.json
@@ -0,0 +1,9 @@
+{
+    "epoch": 3.975951903807615,
+    "total_flos": 135489736671232.0,
+    "train_loss": 1.0211431388893435,
+    "train_runtime": 8462.237,
+    "train_samples": 7983,
+    "train_samples_per_second": 3.773,
+    "train_steps_per_second": 0.015
+}
--- a/trainer_state.json
+++ b/trainer_state.json
@@ -0,0 +1,210 @@
+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 3.975951903807615,
+  "eval_steps": 500,
+  "global_step": 124,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.16032064128256512,
+      "grad_norm": 96.1263392680429,
+      "learning_rate": 7.692307692307693e-05,
+      "loss": 1.184,
+      "step": 5
+    },
+    {
+      "epoch": 0.32064128256513025,
+      "grad_norm": 109.03932941785152,
+      "learning_rate": 0.00015384615384615385,
+      "loss": 5.0239,
+      "step": 10
+    },
+    {
+      "epoch": 0.48096192384769537,
+      "grad_norm": 57.030962020859526,
+      "learning_rate": 0.00019983983492623833,
+      "loss": 4.0648,
+      "step": 15
+    },
+    {
+      "epoch": 0.6412825651302605,
+      "grad_norm": 22.869725355526718,
+      "learning_rate": 0.0001980438647961327,
+      "loss": 2.8094,
+      "step": 20
+    },
+    {
+      "epoch": 0.8016032064128257,
+      "grad_norm": 6.985773379251052,
+      "learning_rate": 0.00019428774454610843,
+      "loss": 2.2094,
+      "step": 25
+    },
+    {
+      "epoch": 0.9619238476953907,
+      "grad_norm": 5.841783328387945,
+      "learning_rate": 0.00018864656872260985,
+      "loss": 1.3499,
+      "step": 30
+    },
+    {
+      "epoch": 1.122244488977956,
+      "grad_norm": 4.328203171108087,
+      "learning_rate": 0.0001812331190023886,
+      "loss": 1.2527,
+      "step": 35
+    },
+    {
+      "epoch": 1.282565130260521,
+      "grad_norm": 1.8271946123422904,
+      "learning_rate": 0.00017219560939545246,
+      "loss": 0.9903,
+      "step": 40
+    },
+    {
+      "epoch": 1.4428857715430863,
+      "grad_norm": 1.3376797882778457,
+      "learning_rate": 0.00016171472306414554,
+      "loss": 0.8082,
+      "step": 45
+    },
+    {
+      "epoch": 1.6032064128256514,
+      "grad_norm": 0.8625176200253305,
+      "learning_rate": 0.00015000000000000001,
+      "loss": 0.7515,
+      "step": 50
+    },
+    {
+      "epoch": 1.7635270541082164,
+      "grad_norm": 0.9297452179429324,
+      "learning_rate": 0.00013728564777803088,
+      "loss": 0.6321,
+      "step": 55
+    },
+    {
+      "epoch": 1.9238476953907817,
+      "grad_norm": 1.1609492766676697,
+      "learning_rate": 0.0001238258591423165,
+      "loss": 0.5802,
+      "step": 60
+    },
+    {
+      "epoch": 2.0841683366733466,
+      "grad_norm": 0.6088051647121085,
+      "learning_rate": 0.00010988973003642499,
+      "loss": 0.4983,
+      "step": 65
+    },
+    {
+      "epoch": 2.244488977955912,
+      "grad_norm": 0.45623167145983035,
+      "learning_rate": 9.57558796803852e-05,
+      "loss": 0.4119,
+      "step": 70
+    },
+    {
+      "epoch": 2.404809619238477,
+      "grad_norm": 0.43383089288281307,
+      "learning_rate": 8.170688025276134e-05,
+      "loss": 0.3915,
+      "step": 75
+    },
+    {
+      "epoch": 2.565130260521042,
+      "grad_norm": 0.44731640463982364,
+      "learning_rate": 6.802360754287547e-05,
+      "loss": 0.3917,
+      "step": 80
+    },
+    {
+      "epoch": 2.7254509018036073,
+      "grad_norm": 0.4205909891049952,
+      "learning_rate": 5.497962551823266e-05,
+      "loss": 0.3886,
+      "step": 85
+    },
+    {
+      "epoch": 2.8857715430861726,
+      "grad_norm": 0.4168469096572953,
+      "learning_rate": 4.283571707415214e-05,
+      "loss": 0.3689,
+      "step": 90
+    },
+    {
+      "epoch": 3.0460921843687374,
+      "grad_norm": 0.6798047385077177,
+      "learning_rate": 3.1834670310046734e-05,
+      "loss": 0.3212,
+      "step": 95
+    },
+    {
+      "epoch": 3.2064128256513027,
+      "grad_norm": 0.3963204162500924,
+      "learning_rate": 2.2196424568156073e-05,
+      "loss": 0.1699,
+      "step": 100
+    },
+    {
+      "epoch": 3.3667334669338675,
+      "grad_norm": 0.38329008287670924,
+      "learning_rate": 1.4113673277957395e-05,
+      "loss": 0.1585,
+      "step": 105
+    },
+    {
+      "epoch": 3.527054108216433,
+      "grad_norm": 0.34785356511529864,
+      "learning_rate": 7.74801151675314e-06,
+      "loss": 0.1499,
+      "step": 110
+    },
+    {
+      "epoch": 3.687374749498998,
+      "grad_norm": 0.3409235398527789,
+      "learning_rate": 3.226705306650113e-06,
+      "loss": 0.1438,
+      "step": 115
+    },
+    {
+      "epoch": 3.847695390781563,
+      "grad_norm": 0.365385237116056,
+      "learning_rate": 6.401472380297091e-07,
+      "loss": 0.1525,
+      "step": 120
+    },
+    {
+      "epoch": 3.975951903807615,
+      "step": 124,
+      "total_flos": 135489736671232.0,
+      "train_loss": 1.0211431388893435,
+      "train_runtime": 8462.237,
+      "train_samples_per_second": 3.773,
+      "train_steps_per_second": 0.015
+    }
+  ],
+  "logging_steps": 5,
+  "max_steps": 124,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 4,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 135489736671232.0,
+  "train_batch_size": 4,
+  "trial_name": null,
+  "trial_params": null
+}
--- a/training_args.bin
+++ b/training_args.bin
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:0c3122b2870ec0154c62cdad3737d82bbee4e0e7b59ab0800c350b883903aa27
+size 6904