初始化项目，由ModelHub XC社区提供模型

Model: distil-labs/Distil-PII-Llama-3.2-3B-Instruct Source: Original Platform
2026-05-25 06:13:16 +08:00
commit b685f33c3d
17 changed files with 3055 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,36 @@
+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text
--- a/31
+++ b/31
@@ -0,0 +1,31 @@
+GENERAL TERMS AND CONDITIONS
+
+Note that if you want to use the Commercial licence, please contact us at contact@distillabs.ai
+
+- Model License Terms -
+
+R&D License
+
+1. SERVICES, PRICES AND PAYMENT
+
+    1.1 The Customer pays a one-time license fee, as indicated in the check-out process, for running of one (1) training process of the selected Base Model using Customer Data (“License Fee”).
+
+    1.2 The License Fee shall be due for payment in advance. The Customer shall only be permitted to set off against payment claims of Distil Labs if the Customer’s claims are undisputed or have become res judicata.
+
+2. MODEL LICENSE: R&D LICENSE
+
+    2.1 Subject to Customer’s payment of the license fee, Distil Labs grants to Customer the Model License (as defined below). For clarification, Distil Labs retains any other rights in its software or know- how, in particular in the codebase needed for the fine-tuning of the Trained Model.
+
+    2.2 Subject to the requirements of the Base Model License (cf. Section 2.5 below), Distil Labs transfers to the Customer the perpetual, non-exclusive usage right to the Trained Model for non-commercial purposes of prototyping and research & development. The Parties agree, that commercial purposes include deployment in production externally (to be used by Customer’s customers paid or free of charge) or internally (as a tool for Customer’s employees). The territorial scope of the license is limited to the use within the United States of America and the European Economic Area including all member states of the European Union (“Model License”).
+
+    2.3 The Model License for non-commercial purposes of prototyping and research & development shall include (i) the non-exclusive right to permanent or temporary reproduction, in whole or in part, by any means and in any form (e.g. permanent and/or volatile storage on electrical, electromagnetic, optical storage media, such as any type of SDD, HDD, DVD, memory cards, USB sticks), (ii) the non-exclusive right to distribution in any form, media and by any means regardless of whether the distribution is in tangible or intangible form, in particular to transmit the Trained Model via wired and wireless networks (e.g. for download from internet or intranet by wire or wireless means including broadband, cable, fiberglass, WIFI, LTE, 5G, satellite internet, other data networks), and (iii) the non-exclusive right of making available to the public in such a way that members of the public can access it from places and at times of their choice (e.g. by web or mobile app, virtual or augmented reality, cloud storage, cloud hosting, decentralized hosting, non-fungible token, application service providing, software as a service, or cloud computing). The license shall also contain, to the extent necessary for prototyping and research & development, the right to adapt and modify the Trained Model subject to the limitation in Section 2.4 and 2.5 below, to further develop the Trained Model including changes to functions or appearance, adapt to other software versions, to exchange parts of the Trained Model or combine the Trained Model with other results of work and to use the results in the same way as the original Trained Model. Any derived models from the Trained Model shall retain this model license.
+
+    2.4 The Customer shall not, without the prior written consent of Distil Labs:
+
+        2.4.1 train, fine-tune, re-train, or otherwise modify the Trained Model, unless for purpose of research & development;
+
+        2.4.2 use the Trained Model or any part thereof to create derivative models or services that compete with those of Distil Labs;
+
+        2.4.3 circumvent any technical restrictions embedded in the Trained Model or Base Model that are designed to enforce usage limitations.
+
+    2.5 The Parties acknowledge and agree that the Trained Model is developed from Base Models which are supplied by a third party. Therefore, the Model License is subject to the restrictions resulting from the open-source or any other applicable license of the Base Model (“Base Model License”) and the Customer must use the Trained Model in compliance with the Base Model License. In particular, the Customer must oblige their clients to compliance with the Base Model License in any case of transferring or sublicensing the rights to or making available in any way the Trained Model. The applicable Base Model License is defined in the Training Configuration and will be provided for download. The Customer agrees to indemnify Distil Labs for any and all claims brought by the Base Model provider for violations of the Base Model License.
--- a/1
+++ b/1
@@ -0,0 +1 @@
+FROM .
--- a/README.md
+++ b/README.md
@@ -0,0 +1,188 @@
+---
+
+license: llama3.2
+language: en
+base_model: meta-llama/Llama-3.2-3B-Instruct
+pipeline_tag: text-generation
+tags: [pii-redaction, privacy, slm, distil-labs]
+---
+
+<div align="center">
+  <img src="https://github.com/distil-labs/badges/blob/main/distillabs-logo.svg?raw=true" width="40%" alt="distil labs" />
+</div>
+
+---
+
+<div align="center">
+  <table>
+    <tr>
+      <td align="center">
+        <a href="https://www.distillabs.ai/?utm_source=hugging-face&utm_medium=referral&utm_campaign=distil-PII">
+          <img src="https://github.com/distil-labs/badges/blob/main/badge-distillabs-home.svg?raw=true" alt="Homepage"/>
+        </a>
+      </td>
+      <td align="center">
+        <a href="https://github.com/distil-labs">
+          <img src="https://github.com/distil-labs/badges/blob/main/badge-github.svg?raw=true" alt="GitHub"/>
+        </a>
+      </td>
+      <td align="center">
+        <a href="https://huggingface.co/distil-labs">
+          <img src="https://github.com/distil-labs/badges/blob/main/badge-huggingface.svg?raw=true" alt="Hugging Face"/>
+        </a>
+      </td>
+    </tr>
+    <tr>
+      <td align="center">
+        <a href="https://www.linkedin.com/company/distil-labs/">
+          <img src="https://github.com/distil-labs/badges/blob/main/badge-linkedin.svg?raw=true" alt="LinkedIn"/>
+        </a>
+      </td>
+      <td align="center">
+        <a href="https://distil-labs-community.slack.com/join/shared_invite/zt-36zqj87le-i3quWUn2bjErRq22xoE58g">
+          <img src="https://github.com/distil-labs/badges/blob/main/badge-slack.svg?raw=true" alt="Slack"/>
+        </a>
+      </td>
+      <td align="center">
+        <a href="https://x.com/distil_labs">
+          <img src="https://github.com/distil-labs/badges/blob/main/badge-twitter.svg?raw=true" alt="Twitter"/>
+        </a>
+      </td>
+    </tr>
+  </table>
+</div>
+
+---
+
+# Distil-PII-Llama-3.2-3B-Instruct
+
+A **small language model** (SLM) fine-tuned by Distil Labs for **policy-aware PII redaction** that outputs a single JSON object with `redacted_text` and `entities`. Optimized to run locally with strong accuracy and strict schema adherence.
+
+## Model Details
+
+* **Developed by:** Distil Labs GmbH
+* **License:** Llama 3.2 Community License Agreement
+* **Finetuned from:** `meta-llama/Llama-3.2-3B-Instruct`
+
+## Intended Use & Limitations
+
+* **Use cases:** Redacting support chats, logs, tickets, transcripts—removing identity while preserving ops signals (IDs last-4, order numbers, etc.).
+* **Out of scope:** Legal or compliance advice; languages beyond English (generalization not guaranteed); domain-specific IDs unseen in training.
+
+## Input & Output
+
+**Input:** A plain-text prompt with task instruction + context.
+**Output (JSON only):**
+
+```json
+{
+  "redacted_text": "Text with in-place tokens",
+  "entities": [
+    {"value": "<original>", "replacement_token": "[TOKEN]", "reason": "<why>"}
+  ]
+}
+```
+
+**Tokens:** `[PERSON] [EMAIL] [PHONE] [ADDRESS] [SSN] [ID] [UUID] [CARD_LAST4:####] [IBAN_LAST4:####] [GENDER] [AGE] [RACE] [MARITAL_STATUS]`
+
+## Training
+
+Instruction-tuned on a compact policy spec + ~20 curated examples emphasizing **exact JSON schema**, **minimal in-place edits**, and **entity correctness**.
+
+## Evaluation
+
+Judged by a frontier LLM using a deterministic rubric: JSON-only, schema validity, **redacted_text exact match**, and **set-equality** of `(value, replacement_token)` pairs (reason/order ignored). Score: **0.82 ± 0.03**.
+
+## How to Use
+Details of deployment can be found in [docs](https://docs.distillabs.ai/how-to/model-deployment). Deploy the model using vllm or ollama (-gguf version available in this collection) and use the following snippet to get results
+```python
+SYSTEM_PROMPT = """
+You are a problem solving model working on task_description XML block:
+<task_description>
+Produce a redacted version of texts, removing sensitive personal data while preserving operational signals. The model must return a single json blob with:
+
+* **redacted_text** is the input with minimal, in-place replacements of redacted entities.
+* **entities** as an array of objects with exactly three fields {value: original_value, replacement_token: replacement, reason: reasoning}.
+
+## What to redact (→ replacement token)
+
+* **PERSON** — customer/patient/person names (first/last/full; identifying initials) → `[PERSON]`
+* **EMAIL** — any email, including obfuscated `name(at)domain(dot)com` → `[EMAIL]`
+* **PHONE** — any international/national format (separators/emoji bullets allowed) → `[PHONE]`
+* **ADDRESS** — street + number; full postal lines; apartment/unit numbers → `[ADDRESS]`
+* **SSN** — US Social Security numbers → `[SSN]`
+* **ID** — national IDs (PESEL, NIN, Aadhaar, DNI, etc.) when personal → `[ID]`
+* **UUID** — person-scoped system identifiers (e.g., MRN/NHS/patient IDs/customer UUIDs) → `[UUID]`
+* **CREDIT_CARD** — 13–19 digits (spaces/hyphens allowed) → `[CARD_LAST4:####]` (keep last-4 only)
+* **IBAN** — IBAN/bank account numbers → `[IBAN_LAST4:####]` (keep last-4 only)
+* **GENDER** — self-identification (male/female/non-binary/etc.) → `[GENDER]`
+* **AGE** — stated ages (“I’m 29”, “age: 47”, “29 y/o”) → `[AGE_YEARS:##]`
+* **RACE** — race/ethnicity self-identification → `[RACE]`
+* **MARITAL_STATUS** — married/single/divorced/widowed/partnered → `[MARITAL_STATUS]`
+
+
+## Keep (do not redact)
+
+* Card **last-4** when only last-4 is present (e.g., “ending 9021”, “•••• 9021”).
+* Operational IDs: order/ticket/invoice numbers, shipment tracking, device serials, case IDs.
+* Non-personal org info: company names, product names, team names.
+* Cities/countries alone (redact full street+number, not plain city/country mentions).
+
+## Output schema (exactly these fields)
+* **redacted_text** The original text with all the sensitive information replaced with redacted tokens
+* **entities** Array with all the replaced elements, each element represented by following fields
+  * **replacement_token**: one of `[PERSON] | [EMAIL] | [PHONE] | [ADDRESS] | [SSN] | [ID] | [UUID] | [CREDIT_CARD] | [IBAN] | [GENDER] | [AGE] | [RACE] | [MARITAL_STATUS]`
+  * **value**: original text that was redacted
+  * **reason**: brief string explaining the rule/rationale
+
+for example
+{
+  "redacted_text": "Hi, I'm [PERSON] and my email is [EMAIL].",
+  "entities": [
+    { "type": "PERSON", "value": "John Smith", "reason": "person name"},
+    { "type": "EMAIL", "value": "john.smith@example.com", "reason": "email"},
+  ]
+}
+</task_description>
+You will be given a single task with context in the context XML block and the task in the question XML block
+Solve the task in question block based on the context in context block.
+Generate only the answer, do not generate anything else
+"""
+
+PROMPT_TEMPLATE = """
+
+Now for the real task, solve the task in question block based on the context in context block.
+Generate only the solution, do not generate anything else
+<context>
+{context}
+</context>
+<question>Redact provided text according to the task description and return redacted elements.</question>
+"""
+
+from openai import OpenAI
+
+PORT = "PORT GOES HERE"  # 8000 for vllm, 11434 for ollama
+MODEL_NAME = "NAME USED FOR SETTING UP THE CLIENT"
+TEXT_TO_REDACT = "NI number AB123456C confirmed."
+
+client = OpenAI(base_url=f"http://127.0.0.1:{PORT}/v1", api_key="EMPTY")
+chat_response = client.chat.completions.create(
+    model=MODEL_NAME,
+    messages=[
+        {"role": "system", "content": SYSTEM_PROMPT},
+        {"role": "user", "content": PROMPT_TEMPLATE.format(context=TEXT_TO_REDACT)},
+    ],
+    temperature=0,
+)
+```
+
+
+## Risks & Mitigations
+
+* **False negatives/positives:** May miss novel formats or over-redact generic terms. Mitigate via guardrails + post-validation.
+* **Policy drift:** Keep task preamble fixed; monitor with unit tests.
+
+## Model Sources
+
+* **Homepage:** [https://distillabs.ai](https://distillabs.ai)
+* **Contact:** [contact@distillabs.ai](mailto:contact@distillabs.ai)
--- a/111
+++ b/111
@@ -0,0 +1,111 @@
+LLAMA 3.2 COMMUNITY LICENSE AGREEMENT
+Llama 3.2 Version Release Date: September 25, 2024
+
+“Agreement” means the terms and conditions for use, reproduction, distribution
+and modification of the Llama Materials set forth herein.
+
+“Documentation” means the specifications, manuals and documentation accompanying Llama 3.2
+distributed by Meta at https://llama.meta.com/doc/overview.
+
+“Licensee” or “you” means you, or your employer or any other person or entity (if you are
+entering into this Agreement on such person or entity’s behalf), of the age required under
+applicable laws, rules or regulations to provide legal consent and that has legal authority
+to bind your employer or such other person or entity if you are entering in this Agreement
+on their behalf.
+
+“Llama 3.2” means the foundational large language models and software and algorithms, including
+machine-learning model code, trained model weights, inference-enabling code, training-enabling code,
+fine-tuning enabling code and other elements of the foregoing distributed by Meta at
+https://www.llama.com/llama-downloads.
+
+“Llama Materials” means, collectively, Meta’s proprietary Llama 3.2 and Documentation (and
+any portion thereof) made available under this Agreement.
+
+“Meta” or “we” means Meta Platforms Ireland Limited (if you are located in or,
+if you are an entity, your principal place of business is in the EEA or Switzerland)
+and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland).
+
+
+By clicking “I Accept” below or by using or distributing any portion or element of the Llama Materials,
+you agree to be bound by this Agreement.
+
+
+1. License Rights and Redistribution.
+
+    a. Grant of Rights. You are granted a non-exclusive, worldwide,
+non-transferable and royalty-free limited license under Meta’s intellectual property or other rights
+owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works
+of, and make modifications to the Llama Materials.
+
+    b. Redistribution and Use.
+
+        i. If you distribute or make available the Llama Materials (or any derivative works thereof),
+or a product or service (including another AI model) that contains any of them, you shall (A) provide
+a copy of this Agreement with any such Llama Materials; and (B) prominently display “Built with Llama”
+on a related website, user interface, blogpost, about page, or product documentation. If you use the
+Llama Materials or any outputs or results of the Llama Materials to create, train, fine tune, or
+otherwise improve an AI model, which is distributed or made available, you shall also include “Llama”
+at the beginning of any such AI model name.
+
+        ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part
+of an integrated end user product, then Section 2 of this Agreement will not apply to you.
+
+        iii. You must retain in all copies of the Llama Materials that you distribute the
+following attribution notice within a “Notice” text file distributed as a part of such copies:
+“Llama 3.2 is licensed under the Llama 3.2 Community License, Copyright © Meta Platforms,
+Inc. All Rights Reserved.”
+
+        iv. Your use of the Llama Materials must comply with applicable laws and regulations
+(including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for
+the Llama Materials (available at https://www.llama.com/llama3_2/use-policy), which is hereby
+incorporated by reference into this Agreement.
+
+2. Additional Commercial Terms. If, on the Llama 3.2 version release date, the monthly active users
+of the products or services made available by or for Licensee, or Licensee’s affiliates,
+is greater than 700 million monthly active users in the preceding calendar month, you must request
+a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to
+exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
+
+3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND
+RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, AND META DISCLAIMS
+ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES
+OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE
+FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED
+WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS.
+
+4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY,
+WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT,
+FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN
+IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
+
+5. Intellectual Property.
+
+    a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials,
+neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates,
+except as required for reasonable and customary use in describing and redistributing the Llama Materials or as
+set forth in this Section 5(a). Meta hereby grants you a license to use “Llama” (the “Mark”) solely as required
+to comply with the last sentence of Section 1.b.i. You will comply with Meta’s brand guidelines (currently accessible
+at https://about.meta.com/brand/resources/meta/company-brand/). All goodwill arising out of your use of the Mark
+will inure to the benefit of Meta.
+
+    b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with respect to any
+    derivative works and modifications of the Llama Materials that are made by you, as between you and Meta,
+    you are and will be the owner of such derivative works and modifications.
+
+    c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or
+    counterclaim in a lawsuit) alleging that the Llama Materials or Llama 3.2 outputs or results, or any portion
+    of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable
+    by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or
+    claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third
+    party arising out of or related to your use or distribution of the Llama Materials.
+
+6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access
+to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms
+and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this
+Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3,
+4 and 7 shall survive the termination of this Agreement.
+
+7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of
+California without regard to choice of law principles, and the UN Convention on Contracts for the International
+Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of
+any dispute arising out of this Agreement.
--- a/9
+++ b/9
@@ -0,0 +1,9 @@
+MIT License
+
+Copyright (c) 2023 DeepSeek
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
--- a/chat_template.jinja
+++ b/chat_template.jinja
@@ -0,0 +1,93 @@
+{{- bos_token }}
+{%- if custom_tools is defined %}
+    {%- set tools = custom_tools %}
+{%- endif %}
+{%- if not tools_in_user_message is defined %}
+    {%- set tools_in_user_message = true %}
+{%- endif %}
+{%- if not date_string is defined %}
+    {%- if strftime_now is defined %}
+        {%- set date_string = strftime_now("%d %b %Y") %}
+    {%- else %}
+        {%- set date_string = "26 Jul 2024" %}
+    {%- endif %}
+{%- endif %}
+{%- if not tools is defined %}
+    {%- set tools = none %}
+{%- endif %}
+
+{#- This block extracts the system message, so we can slot it into the right place. #}
+{%- if messages[0]['role'] == 'system' %}
+    {%- set system_message = messages[0]['content']|trim %}
+    {%- set messages = messages[1:] %}
+{%- else %}
+    {%- set system_message = "" %}
+{%- endif %}
+
+{#- System message #}
+{{- "<|start_header_id|>system<|end_header_id|>\n\n" }}
+{%- if tools is not none %}
+    {{- "Environment: ipython\n" }}
+{%- endif %}
+{{- "Cutting Knowledge Date: December 2023\n" }}
+{{- "Today Date: " + date_string + "\n\n" }}
+{%- if tools is not none and not tools_in_user_message %}
+    {{- "You have access to the following functions. To call a function, please respond with JSON for a function call." }}
+    {{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
+    {{- "Do not use variables.\n\n" }}
+    {%- for t in tools %}
+        {{- t | tojson(indent=4) }}
+        {{- "\n\n" }}
+    {%- endfor %}
+{%- endif %}
+{{- system_message }}
+{{- "<|eot_id|>" }}
+
+{#- Custom tools are passed in a user message with some extra guidance #}
+{%- if tools_in_user_message and not tools is none %}
+    {#- Extract the first user message so we can plug it in here #}
+    {%- if messages | length != 0 %}
+        {%- set first_user_message = messages[0]['content']|trim %}
+        {%- set messages = messages[1:] %}
+    {%- else %}
+        {{- raise_exception("Cannot put tools in the first user message when there's no first user message!") }}
+{%- endif %}
+    {{- '<|start_header_id|>user<|end_header_id|>\n\n' -}}
+    {{- "Given the following functions, please respond with a JSON for a function call " }}
+    {{- "with its proper arguments that best answers the given prompt.\n\n" }}
+    {{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
+    {{- "Do not use variables.\n\n" }}
+    {%- for t in tools %}
+        {{- t | tojson(indent=4) }}
+        {{- "\n\n" }}
+    {%- endfor %}
+    {{- first_user_message + "<|eot_id|>"}}
+{%- endif %}
+
+{%- for message in messages %}
+    {%- if not (message.role == 'ipython' or message.role == 'tool' or 'tool_calls' in message) %}
+        {{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' }}
+    {%- elif 'tool_calls' in message %}
+        {%- if not message.tool_calls|length == 1 %}
+            {{- raise_exception("This model only supports single tool-calls at once!") }}
+        {%- endif %}
+        {%- set tool_call = message.tool_calls[0].function %}
+        {{- '<|start_header_id|>assistant<|end_header_id|>\n\n' -}}
+        {{- '{"name": "' + tool_call.name + '", ' }}
+        {{- '"parameters": ' }}
+        {{- tool_call.arguments | tojson }}
+        {{- "}" }}
+        {{- "<|eot_id|>" }}
+    {%- elif message.role == "tool" or message.role == "ipython" %}
+        {{- "<|start_header_id|>ipython<|end_header_id|>\n\n" }}
+        {%- if message.content is mapping or message.content is iterable %}
+            {{- message.content | tojson }}
+        {%- else %}
+            {{- message.content }}
+        {%- endif %}
+        {{- "<|eot_id|>" }}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|start_header_id|>assistant<|end_header_id|>\n\n' }}
+{%- endif %}
--- a/config.json
+++ b/config.json
@@ -0,0 +1,41 @@
+{
+  "architectures": [
+    "LlamaForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 128000,
+  "eos_token_id": [
+    128001,
+    128008,
+    128009
+  ],
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 3072,
+  "initializer_range": 0.02,
+  "intermediate_size": 8192,
+  "max_position_embeddings": 131072,
+  "mlp_bias": false,
+  "model_type": "llama",
+  "num_attention_heads": 24,
+  "num_hidden_layers": 28,
+  "num_key_value_heads": 8,
+  "pad_token": "<|reserved_special_token_247|>",
+  "pad_token_id": 128255,
+  "pretraining_tp": 1,
+  "rms_norm_eps": 1e-05,
+  "rope_scaling": {
+    "factor": 32.0,
+    "high_freq_factor": 4.0,
+    "low_freq_factor": 1.0,
+    "original_max_position_embeddings": 8192,
+    "rope_type": "llama3"
+  },
+  "rope_theta": 500000.0,
+  "tie_word_embeddings": true,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.53.0",
+  "use_cache": true,
+  "vocab_size": 128256
+}
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,12 @@
+{
+  "bos_token_id": 128000,
+  "do_sample": true,
+  "eos_token_id": [
+    128001,
+    128008,
+    128009
+  ],
+  "temperature": 0.6,
+  "top_p": 0.9,
+  "transformers_version": "4.53.0"
+}
--- a/model-00001-of-00002.safetensors
+++ b/model-00001-of-00002.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:b743aa324dc01646c41a7d277b960cda5d921452cf0b741b1d7f1f6ab6c91691
+size 4965799096
--- a/model-00002-of-00002.safetensors
+++ b/model-00002-of-00002.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:847eff22c7a463cd298c4034cbbd62f44060df09287b8a9f0859a56126bff94f
+size 1459729952
--- a/model.safetensors.index.json
+++ b/model.safetensors.index.json
@@ -0,0 +1,262 @@
+{
+  "metadata": {
+    "total_parameters": 3212749824,
+    "total_size": 6425499648
+  },
+  "weight_map": {
+    "model.embed_tokens.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.20.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.20.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.20.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.21.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.21.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.21.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.21.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.21.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.21.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.21.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.21.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.21.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.22.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.22.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.22.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.22.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.22.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.22.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.22.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.22.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.22.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.23.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.23.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.23.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.23.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.23.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.23.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.23.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.23.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.23.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.24.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.24.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.24.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.24.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.24.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.24.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.24.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.24.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.24.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.25.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.25.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.25.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.25.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.25.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.25.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.25.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.25.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.25.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.26.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.26.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.26.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.26.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.26.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.26.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.26.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.26.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.26.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.27.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.27.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.27.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.27.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.27.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.27.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.27.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.27.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.27.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.norm.weight": "model-00002-of-00002.safetensors"
+  }
+}
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,23 @@
+{
+  "bos_token": {
+    "content": "<|begin_of_text|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|eot_id|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|reserved_special_token_247|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}
--- a/tokenizer.json
+++ b/tokenizer.json
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:c82407ee10fa3777e08252308fce60fcca3e2ff2fb980acdca6c8c5bdd8470c0
+size 17210206
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
--- a/training-logs.csv
+++ b/training-logs.csv
@@ -0,0 +1,88 @@
+,eval_loss,eval_binary,eval_rouge,eval_llm_as_a_judge,eval_runtime,eval_samples_per_second,eval_steps_per_second,epoch,step,loss,grad_norm,learning_rate,train_runtime,train_samples_per_second,train_steps_per_second,total_flos,train_loss
+0,0.5167275071144104,0.0,0.7697632054127466,0.0,32.4672,0.739,0.37,0.0,0,,,,,,,,
+1,,,,,,,,0.04929022082018927,250,0.2184,0.917641818523407,1.2266009852216749e-05,,,,,
+2,,,,,,,,0.09858044164037855,500,0.0936,1.5168706178665161,2.458128078817734e-05,,,,,
+3,,,,,,,,0.14787066246056782,750,0.0713,0.9926924109458923,3.6896551724137934e-05,,,,,
+4,,,,,,,,0.1971608832807571,1000,0.0686,0.7068266868591309,4.9211822660098524e-05,,,,,
+5,,,,,,,,0.24645110410094637,1250,0.0639,0.9005385637283325,4.939293311887096e-05,,,,,
+6,,,,,,,,0.29574132492113564,1500,0.0583,0.3320270776748657,4.874435739116899e-05,,,,,
+7,,,,,,,,0.34503154574132494,1750,0.0523,0.534458577632904,4.8095781663467026e-05,,,,,
+8,,,,,,,,0.3943217665615142,2000,0.0528,0.30121704936027527,4.744720593576506e-05,,,,,
+9,,,,,,,,0.4436119873817035,2250,0.0503,0.2801056206226349,4.679863020806309e-05,,,,,
+10,,,,,,,,0.49290220820189273,2500,0.0506,0.31803637742996216,4.6150054480361124e-05,,,,,
+11,,,,,,,,0.542192429022082,2750,0.0504,0.39373141527175903,4.550147875265916e-05,,,,,
+12,,,,,,,,0.5914826498422713,3000,0.0468,0.23342998325824738,4.4852903024957196e-05,,,,,
+13,,,,,,,,0.6407728706624606,3250,0.0455,0.18071851134300232,4.420432729725523e-05,,,,,
+14,,,,,,,,0.6900630914826499,3500,0.0485,0.41326144337654114,4.355575156955327e-05,,,,,
+15,,,,,,,,0.7393533123028391,3750,0.0496,0.17202889919281006,4.29071758418513e-05,,,,,
+16,,,,,,,,0.7886435331230284,4000,0.0467,0.3105004131793976,4.2258600114149334e-05,,,,,
+17,,,,,,,,0.8379337539432177,4250,0.0461,0.2848104238510132,4.1610024386447366e-05,,,,,
+18,,,,,,,,0.887223974763407,4500,0.046,0.5095773339271545,4.09614486587454e-05,,,,,
+19,,,,,,,,0.9365141955835962,4750,0.0455,0.2530672252178192,4.031287293104343e-05,,,,,
+20,,,,,,,,0.9858044164037855,5000,0.045,0.1897924840450287,3.9664297203341464e-05,,,,,
+21,0.1263345330953598,0.125,0.9474400463451992,0.20833333333333334,40.2686,0.596,0.298,1.0,5072,,,,,,,,
+22,,,,,,,,1.0350946372239747,5250,0.0437,0.2292327880859375,3.90157214756395e-05,,,,,
+23,,,,,,,,1.084384858044164,5500,0.0405,0.12882058322429657,3.836714574793753e-05,,,,,
+24,,,,,,,,1.1336750788643533,5750,0.0418,0.2277369201183319,3.771857002023557e-05,,,,,
+25,,,,,,,,1.1829652996845426,6000,0.0408,0.4681090712547302,3.70699942925336e-05,,,,,
+26,,,,,,,,1.2322555205047319,6250,0.0411,0.29907476902008057,3.6421418564831635e-05,,,,,
+27,,,,,,,,1.2815457413249212,6500,0.0385,0.27942612767219543,3.577284283712967e-05,,,,,
+28,,,,,,,,1.3308359621451105,6750,0.0428,0.2944606840610504,3.51242671094277e-05,,,,,
+29,,,,,,,,1.3801261829652998,7000,0.0392,0.19481465220451355,3.447569138172573e-05,,,,,
+30,,,,,,,,1.4294164037854888,7250,0.0411,0.31186002492904663,3.3827115654023766e-05,,,,,
+31,,,,,,,,1.4787066246056781,7500,0.0421,0.37516000866889954,3.31785399263218e-05,,,,,
+32,,,,,,,,1.5279968454258674,7750,0.0384,0.2926328778266907,3.252996419861983e-05,,,,,
+33,,,,,,,,1.5772870662460567,8000,0.0405,0.2896762192249298,3.1881388470917864e-05,,,,,
+34,,,,,,,,1.626577287066246,8250,0.0394,0.15331445634365082,3.12328127432159e-05,,,,,
+35,,,,,,,,1.6758675078864353,8500,0.0412,0.28620100021362305,3.0584237015513936e-05,,,,,
+36,,,,,,,,1.7251577287066246,8750,0.0364,0.30576035380363464,2.993566128781197e-05,,,,,
+37,,,,,,,,1.774447949526814,9000,0.0407,0.18869711458683014,2.928708556011e-05,,,,,
+38,,,,,,,,1.8237381703470033,9250,0.0378,0.22588257491588593,2.8638509832408034e-05,,,,,
+39,,,,,,,,1.8730283911671926,9500,0.0406,0.2644851803779602,2.7989934104706067e-05,,,,,
+40,,,,,,,,1.9223186119873819,9750,0.0368,0.12382346391677856,2.73413583770041e-05,,,,,
+41,,,,,,,,1.971608832807571,10000,0.0375,0.155380517244339,2.6692782649302132e-05,,,,,
+42,0.11640363931655884,0.16666666666666666,0.9475961059276585,0.2916666666666667,44.2421,0.542,0.271,2.0,10144,,,,,,,,
+43,,,,,,,,2.0208990536277605,10250,0.0371,0.18081019818782806,2.6044206921600168e-05,,,,,
+44,,,,,,,,2.0701892744479493,10500,0.0356,0.08594907075166702,2.53956311938982e-05,,,,,
+45,,,,,,,,2.1194794952681386,10750,0.0336,0.2650609612464905,2.4747055466196234e-05,,,,,
+46,,,,,,,,2.168769716088328,11000,0.0339,0.27648213505744934,2.4098479738494266e-05,,,,,
+47,,,,,,,,2.218059936908517,11250,0.0346,0.2537253201007843,2.34499040107923e-05,,,,,
+48,,,,,,,,2.2673501577287065,11500,0.0346,0.2574119567871094,2.2801328283090335e-05,,,,,
+49,,,,,,,,2.316640378548896,11750,0.0342,0.17437870800495148,2.2152752555388368e-05,,,,,
+50,,,,,,,,2.365930599369085,12000,0.0345,0.19289404153823853,2.15041768276864e-05,,,,,
+51,,,,,,,,2.4152208201892744,12250,0.0331,0.20487362146377563,2.0855601099984433e-05,,,,,
+52,,,,,,,,2.4645110410094637,12500,0.0325,0.1820557862520218,2.0207025372282466e-05,,,,,
+53,,,,,,,,2.513801261829653,12750,0.0319,0.2215508073568344,1.9558449644580502e-05,,,,,
+54,,,,,,,,2.5630914826498423,13000,0.0349,0.16218796372413635,1.8909873916878535e-05,,,,,
+55,,,,,,,,2.6123817034700316,13250,0.0339,0.2562701404094696,1.8261298189176567e-05,,,,,
+56,,,,,,,,2.661671924290221,13500,0.036,0.33530670404434204,1.7612722461474604e-05,,,,,
+57,,,,,,,,2.7109621451104102,13750,0.0323,0.12399043887853622,1.6964146733772636e-05,,,,,
+58,,,,,,,,2.7602523659305995,14000,0.0355,0.28111016750335693,1.631557100607067e-05,,,,,
+59,,,,,,,,2.809542586750789,14250,0.0338,0.38726863265037537,1.5666995278368705e-05,,,,,
+60,,,,,,,,2.8588328075709777,14500,0.0327,0.2847912609577179,1.5018419550666738e-05,,,,,
+61,,,,,,,,2.9081230283911674,14750,0.0342,0.315563827753067,1.436984382296477e-05,,,,,
+62,,,,,,,,2.9574132492113563,15000,0.0336,0.2762741148471832,1.3721268095262805e-05,,,,,
+63,0.12141290307044983,0.2916666666666667,0.9544938655488249,0.3333333333333333,38.498,0.623,0.312,3.0,15216,,,,,,,,
+64,,,,,,,,3.0067034700315456,15250,0.0318,0.3155117928981781,1.3072692367560838e-05,,,,,
+65,,,,,,,,3.055993690851735,15500,0.0302,0.11656571924686432,1.242411663985887e-05,,,,,
+66,,,,,,,,3.105283911671924,15750,0.0275,0.46191108226776123,1.1775540912156905e-05,,,,,
+67,,,,,,,,3.1545741324921135,16000,0.0292,0.3043929934501648,1.1126965184454937e-05,,,,,
+68,,,,,,,,3.203864353312303,16250,0.0294,0.1710922122001648,1.0478389456752972e-05,,,,,
+69,,,,,,,,3.253154574132492,16500,0.0287,0.1864010989665985,9.829813729051005e-06,,,,,
+70,,,,,,,,3.3024447949526814,16750,0.0319,0.2804664969444275,9.181238001349037e-06,,,,,
+71,,,,,,,,3.3517350157728707,17000,0.0286,0.20211897790431976,8.532662273647072e-06,,,,,
+72,,,,,,,,3.40102523659306,17250,0.0288,0.3104719817638397,7.884086545945104e-06,,,,,
+73,,,,,,,,3.4503154574132493,17500,0.0287,0.10014355927705765,7.235510818243138e-06,,,,,
+74,,,,,,,,3.4996056782334386,17750,0.0291,0.12555907666683197,6.5869350905411715e-06,,,,,
+75,,,,,,,,3.548895899053628,18000,0.0288,0.1633451133966446,5.938359362839206e-06,,,,,
+76,,,,,,,,3.5981861198738168,18250,0.0277,0.23577263951301575,5.289783635137239e-06,,,,,
+77,,,,,,,,3.6474763406940065,18500,0.0275,0.4976824223995209,4.641207907435272e-06,,,,,
+78,,,,,,,,3.6967665615141954,18750,0.0272,0.2571507692337036,3.992632179733306e-06,,,,,
+79,,,,,,,,3.746056782334385,19000,0.0287,0.15798607468605042,3.344056452031339e-06,,,,,
+80,,,,,,,,3.795347003154574,19250,0.0296,0.1648528277873993,2.695480724329373e-06,,,,,
+81,,,,,,,,3.8446372239747633,19500,0.0277,0.13284029066562653,2.0469049966274064e-06,,,,,
+82,,,,,,,,3.8939274447949526,19750,0.0289,0.28205162286758423,1.39832926892544e-06,,,,,
+83,,,,,,,,3.943217665615142,20000,0.0292,0.12843023240566254,7.497535412234734e-07,,,,,
+84,,,,,,,,3.992507886435331,20250,0.0283,0.3274129331111908,1.0117781352150678e-07,,,,,
+85,0.12934015691280365,0.375,0.9521755235624768,0.4166666666666667,35.7003,0.672,0.336,4.0,20288,,,,,,,,
+86,,,,,,,,4.0,20288,,,,7466.5185,5.434,2.717,8.261775423189135e+17,0.04122413263570999
--- a/training-logs.json
+++ b/training-logs.json
@@ -0,0 +1,87 @@
+{"eval_loss":0.5167275071,"eval_binary":0.0,"eval_rouge":0.7697632054,"eval_llm_as_a_judge":0.0,"eval_runtime":32.4672,"eval_samples_per_second":0.739,"eval_steps_per_second":0.37,"epoch":0.0,"step":0,"loss":null,"grad_norm":null,"learning_rate":null,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":0.0492902208,"step":250,"loss":0.2184,"grad_norm":0.9176418185,"learning_rate":0.000012266,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":0.0985804416,"step":500,"loss":0.0936,"grad_norm":1.5168706179,"learning_rate":0.0000245813,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":0.1478706625,"step":750,"loss":0.0713,"grad_norm":0.9926924109,"learning_rate":0.0000368966,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":0.1971608833,"step":1000,"loss":0.0686,"grad_norm":0.7068266869,"learning_rate":0.0000492118,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":0.2464511041,"step":1250,"loss":0.0639,"grad_norm":0.9005385637,"learning_rate":0.0000493929,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":0.2957413249,"step":1500,"loss":0.0583,"grad_norm":0.3320270777,"learning_rate":0.0000487444,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":0.3450315457,"step":1750,"loss":0.0523,"grad_norm":0.5344585776,"learning_rate":0.0000480958,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":0.3943217666,"step":2000,"loss":0.0528,"grad_norm":0.3012170494,"learning_rate":0.0000474472,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":0.4436119874,"step":2250,"loss":0.0503,"grad_norm":0.2801056206,"learning_rate":0.0000467986,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":0.4929022082,"step":2500,"loss":0.0506,"grad_norm":0.3180363774,"learning_rate":0.0000461501,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":0.542192429,"step":2750,"loss":0.0504,"grad_norm":0.3937314153,"learning_rate":0.0000455015,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":0.5914826498,"step":3000,"loss":0.0468,"grad_norm":0.2334299833,"learning_rate":0.0000448529,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":0.6407728707,"step":3250,"loss":0.0455,"grad_norm":0.1807185113,"learning_rate":0.0000442043,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":0.6900630915,"step":3500,"loss":0.0485,"grad_norm":0.4132614434,"learning_rate":0.0000435558,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":0.7393533123,"step":3750,"loss":0.0496,"grad_norm":0.1720288992,"learning_rate":0.0000429072,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":0.7886435331,"step":4000,"loss":0.0467,"grad_norm":0.3105004132,"learning_rate":0.0000422586,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":0.8379337539,"step":4250,"loss":0.0461,"grad_norm":0.2848104239,"learning_rate":0.00004161,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":0.8872239748,"step":4500,"loss":0.046,"grad_norm":0.5095773339,"learning_rate":0.0000409614,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":0.9365141956,"step":4750,"loss":0.0455,"grad_norm":0.2530672252,"learning_rate":0.0000403129,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":0.9858044164,"step":5000,"loss":0.045,"grad_norm":0.189792484,"learning_rate":0.0000396643,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":0.1263345331,"eval_binary":0.125,"eval_rouge":0.9474400463,"eval_llm_as_a_judge":0.2083333333,"eval_runtime":40.2686,"eval_samples_per_second":0.596,"eval_steps_per_second":0.298,"epoch":1.0,"step":5072,"loss":null,"grad_norm":null,"learning_rate":null,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":1.0350946372,"step":5250,"loss":0.0437,"grad_norm":0.2292327881,"learning_rate":0.0000390157,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":1.084384858,"step":5500,"loss":0.0405,"grad_norm":0.1288205832,"learning_rate":0.0000383671,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":1.1336750789,"step":5750,"loss":0.0418,"grad_norm":0.2277369201,"learning_rate":0.0000377186,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":1.1829652997,"step":6000,"loss":0.0408,"grad_norm":0.4681090713,"learning_rate":0.00003707,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":1.2322555205,"step":6250,"loss":0.0411,"grad_norm":0.299074769,"learning_rate":0.0000364214,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":1.2815457413,"step":6500,"loss":0.0385,"grad_norm":0.2794261277,"learning_rate":0.0000357728,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":1.3308359621,"step":6750,"loss":0.0428,"grad_norm":0.2944606841,"learning_rate":0.0000351243,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":1.380126183,"step":7000,"loss":0.0392,"grad_norm":0.1948146522,"learning_rate":0.0000344757,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":1.4294164038,"step":7250,"loss":0.0411,"grad_norm":0.3118600249,"learning_rate":0.0000338271,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":1.4787066246,"step":7500,"loss":0.0421,"grad_norm":0.3751600087,"learning_rate":0.0000331785,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":1.5279968454,"step":7750,"loss":0.0384,"grad_norm":0.2926328778,"learning_rate":0.00003253,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":1.5772870662,"step":8000,"loss":0.0405,"grad_norm":0.2896762192,"learning_rate":0.0000318814,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":1.6265772871,"step":8250,"loss":0.0394,"grad_norm":0.1533144563,"learning_rate":0.0000312328,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":1.6758675079,"step":8500,"loss":0.0412,"grad_norm":0.2862010002,"learning_rate":0.0000305842,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":1.7251577287,"step":8750,"loss":0.0364,"grad_norm":0.3057603538,"learning_rate":0.0000299357,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":1.7744479495,"step":9000,"loss":0.0407,"grad_norm":0.1886971146,"learning_rate":0.0000292871,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":1.8237381703,"step":9250,"loss":0.0378,"grad_norm":0.2258825749,"learning_rate":0.0000286385,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":1.8730283912,"step":9500,"loss":0.0406,"grad_norm":0.2644851804,"learning_rate":0.0000279899,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":1.922318612,"step":9750,"loss":0.0368,"grad_norm":0.1238234639,"learning_rate":0.0000273414,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":1.9716088328,"step":10000,"loss":0.0375,"grad_norm":0.1553805172,"learning_rate":0.0000266928,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":0.1164036393,"eval_binary":0.1666666667,"eval_rouge":0.9475961059,"eval_llm_as_a_judge":0.2916666667,"eval_runtime":44.2421,"eval_samples_per_second":0.542,"eval_steps_per_second":0.271,"epoch":2.0,"step":10144,"loss":null,"grad_norm":null,"learning_rate":null,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":2.0208990536,"step":10250,"loss":0.0371,"grad_norm":0.1808101982,"learning_rate":0.0000260442,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":2.0701892744,"step":10500,"loss":0.0356,"grad_norm":0.0859490708,"learning_rate":0.0000253956,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":2.1194794953,"step":10750,"loss":0.0336,"grad_norm":0.2650609612,"learning_rate":0.0000247471,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":2.1687697161,"step":11000,"loss":0.0339,"grad_norm":0.2764821351,"learning_rate":0.0000240985,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":2.2180599369,"step":11250,"loss":0.0346,"grad_norm":0.2537253201,"learning_rate":0.0000234499,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":2.2673501577,"step":11500,"loss":0.0346,"grad_norm":0.2574119568,"learning_rate":0.0000228013,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":2.3166403785,"step":11750,"loss":0.0342,"grad_norm":0.174378708,"learning_rate":0.0000221528,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":2.3659305994,"step":12000,"loss":0.0345,"grad_norm":0.1928940415,"learning_rate":0.0000215042,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":2.4152208202,"step":12250,"loss":0.0331,"grad_norm":0.2048736215,"learning_rate":0.0000208556,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":2.464511041,"step":12500,"loss":0.0325,"grad_norm":0.1820557863,"learning_rate":0.000020207,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":2.5138012618,"step":12750,"loss":0.0319,"grad_norm":0.2215508074,"learning_rate":0.0000195584,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":2.5630914826,"step":13000,"loss":0.0349,"grad_norm":0.1621879637,"learning_rate":0.0000189099,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":2.6123817035,"step":13250,"loss":0.0339,"grad_norm":0.2562701404,"learning_rate":0.0000182613,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":2.6616719243,"step":13500,"loss":0.036,"grad_norm":0.335306704,"learning_rate":0.0000176127,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":2.7109621451,"step":13750,"loss":0.0323,"grad_norm":0.1239904389,"learning_rate":0.0000169641,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":2.7602523659,"step":14000,"loss":0.0355,"grad_norm":0.2811101675,"learning_rate":0.0000163156,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":2.8095425868,"step":14250,"loss":0.0338,"grad_norm":0.3872686327,"learning_rate":0.000015667,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":2.8588328076,"step":14500,"loss":0.0327,"grad_norm":0.284791261,"learning_rate":0.0000150184,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":2.9081230284,"step":14750,"loss":0.0342,"grad_norm":0.3155638278,"learning_rate":0.0000143698,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":2.9574132492,"step":15000,"loss":0.0336,"grad_norm":0.2762741148,"learning_rate":0.0000137213,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":0.1214129031,"eval_binary":0.2916666667,"eval_rouge":0.9544938655,"eval_llm_as_a_judge":0.3333333333,"eval_runtime":38.498,"eval_samples_per_second":0.623,"eval_steps_per_second":0.312,"epoch":3.0,"step":15216,"loss":null,"grad_norm":null,"learning_rate":null,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":3.00670347,"step":15250,"loss":0.0318,"grad_norm":0.3155117929,"learning_rate":0.0000130727,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":3.0559936909,"step":15500,"loss":0.0302,"grad_norm":0.1165657192,"learning_rate":0.0000124241,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":3.1052839117,"step":15750,"loss":0.0275,"grad_norm":0.4619110823,"learning_rate":0.0000117755,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":3.1545741325,"step":16000,"loss":0.0292,"grad_norm":0.3043929935,"learning_rate":0.000011127,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":3.2038643533,"step":16250,"loss":0.0294,"grad_norm":0.1710922122,"learning_rate":0.0000104784,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":3.2531545741,"step":16500,"loss":0.0287,"grad_norm":0.186401099,"learning_rate":0.0000098298,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":3.302444795,"step":16750,"loss":0.0319,"grad_norm":0.2804664969,"learning_rate":0.0000091812,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":3.3517350158,"step":17000,"loss":0.0286,"grad_norm":0.2021189779,"learning_rate":0.0000085327,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":3.4010252366,"step":17250,"loss":0.0288,"grad_norm":0.3104719818,"learning_rate":0.0000078841,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":3.4503154574,"step":17500,"loss":0.0287,"grad_norm":0.1001435593,"learning_rate":0.0000072355,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":3.4996056782,"step":17750,"loss":0.0291,"grad_norm":0.1255590767,"learning_rate":0.0000065869,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":3.5488958991,"step":18000,"loss":0.0288,"grad_norm":0.1633451134,"learning_rate":0.0000059384,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":3.5981861199,"step":18250,"loss":0.0277,"grad_norm":0.2357726395,"learning_rate":0.0000052898,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":3.6474763407,"step":18500,"loss":0.0275,"grad_norm":0.4976824224,"learning_rate":0.0000046412,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":3.6967665615,"step":18750,"loss":0.0272,"grad_norm":0.2571507692,"learning_rate":0.0000039926,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":3.7460567823,"step":19000,"loss":0.0287,"grad_norm":0.1579860747,"learning_rate":0.0000033441,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":3.7953470032,"step":19250,"loss":0.0296,"grad_norm":0.1648528278,"learning_rate":0.0000026955,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":3.844637224,"step":19500,"loss":0.0277,"grad_norm":0.1328402907,"learning_rate":0.0000020469,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":3.8939274448,"step":19750,"loss":0.0289,"grad_norm":0.2820516229,"learning_rate":0.0000013983,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":3.9432176656,"step":20000,"loss":0.0292,"grad_norm":0.1284302324,"learning_rate":0.0000007498,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":3.9925078864,"step":20250,"loss":0.0283,"grad_norm":0.3274129331,"learning_rate":0.0000001012,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":0.1293401569,"eval_binary":0.375,"eval_rouge":0.9521755236,"eval_llm_as_a_judge":0.4166666667,"eval_runtime":35.7003,"eval_samples_per_second":0.672,"eval_steps_per_second":0.336,"epoch":4.0,"step":20288,"loss":null,"grad_norm":null,"learning_rate":null,"train_runtime":null,"train_samples_per_second":null,"train_steps_per_second":null,"total_flos":null,"train_loss":null}
+{"eval_loss":null,"eval_binary":null,"eval_rouge":null,"eval_llm_as_a_judge":null,"eval_runtime":null,"eval_samples_per_second":null,"eval_steps_per_second":null,"epoch":4.0,"step":20288,"loss":null,"grad_norm":null,"learning_rate":null,"train_runtime":7466.5185,"train_samples_per_second":5.434,"train_steps_per_second":2.717,"total_flos":8.261775423e+17,"train_loss":0.0412241326}