初始化项目，由ModelHub XC社区提供模型

Model: AI-ModelScope/EXAONE-3.0-7.8B-Instruct Source: Original Platform
2026-05-21 20:36:13 +08:00
commit 06c0e2ab2a
22 changed files with 315134 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,35 @@
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.bz2 filter=lfs diff=lfs merge=lfs -text
 *.ckpt filter=lfs diff=lfs merge=lfs -text
 *.ftz filter=lfs diff=lfs merge=lfs -text
 *.gz filter=lfs diff=lfs merge=lfs -text
 *.h5 filter=lfs diff=lfs merge=lfs -text
 *.joblib filter=lfs diff=lfs merge=lfs -text
 *.lfs.* filter=lfs diff=lfs merge=lfs -text
 *.mlmodel filter=lfs diff=lfs merge=lfs -text
 *.model filter=lfs diff=lfs merge=lfs -text
 *.msgpack filter=lfs diff=lfs merge=lfs -text
 *.npy filter=lfs diff=lfs merge=lfs -text
 *.npz filter=lfs diff=lfs merge=lfs -text
 *.onnx filter=lfs diff=lfs merge=lfs -text
 *.ot filter=lfs diff=lfs merge=lfs -text
 *.parquet filter=lfs diff=lfs merge=lfs -text
 *.pb filter=lfs diff=lfs merge=lfs -text
 *.pickle filter=lfs diff=lfs merge=lfs -text
 *.pkl filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
 *.rar filter=lfs diff=lfs merge=lfs -text
 *.safetensors filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
 *.tar filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text
 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
--- a/162
+++ b/162
@@ -0,0 +1,162 @@
 EXAONE AI Model License Agreement 1.1 - NC
 This License Agreement (“Agreement”) is entered into between you (“Licensee”) and LG Management Development 
 Institute Co., Ltd. (“Licensor”), governing the use of the EXAONE AI Model (“Model”). By downloading, 
 installing, copying, or using the Model, you agree to comply with and be bound by the terms of this Agreement.
 If you do not agree to all the terms, you must not download, install, copy, or use the Model. This Agreement 
 constitutes a binding legal agreement between the Licensee and Licensor.
 1. Definitions
    1.1 Model: The artificial intelligence model provided by Licensor, which includes any software, 
    algorithms, machine learning models, or related components supplied by Licensor. This definition extends 
    to encompass all updates, enhancements, improvements, bug fixes, patches, or other modifications that may 
    be provided by Licensor from time to time, whether automatically or manually implemented.
    1.2 Derivatives: Any modifications, alterations, enhancements, improvements, adaptations, or derivative 
    works of the Model created by Licensee or any third party. This includes changes made to the Model's 
    architecture, parameters, data processing methods, or any other aspect of the Model that results in a 
    modification of its functionality or output.
    1.3 Output: Any data, results, content, predictions, analyses, insights, or other materials generated by 
    the Model or Derivatives, regardless of whether they are in their original form or have been further 
    processed or modified by the Licensee. This includes, but is not limited to, textual or numerical produced 
    directly or indirectly through the use of the Model.
    1.4 Licensor: LG Management Development Institute Co., Ltd., the owner, developer, and provider of the 
    EXAONE AI Model. The Licensor holds all rights, title, and interest in the Model and is responsible for 
    granting licenses to use the Model under the terms specified in this Agreement.
    1.5 Licensee: The individual, organization, corporation, academic institution, government agency, or other 
    entity using or intending to use the Model under the terms and conditions of this Agreement. The Licensee 
    is responsible for ensuring compliance with the Agreement by all authorized users who access or utilize 
    the Model on behalf of the Licensee.
 2. License Grant
    2.1 Grant of License: Subject to the terms and conditions outlined in this Agreement, the Licensor hereby 
    grants the Licensee a limited, non-exclusive, non-transferable, worldwide, and revocable license to:
        a. Access, download, install, and use the Model solely for research purposes. This includes 
        evaluation, testing, academic research, experimentation, and participation in competitions, provided 
        that such participation is in a non-commercial context. Notwithstanding Section 3.1, the Licensee may 
        only provide the Model or Derivatives for a competition if no commercial license is granted to the 
        competition organizer or any third party.
        b. Publicly disclose research results and findings derived from the use of the Model or Derivatives, 
        including publishing papers or presentations.
        c. Modify the Model and create Derivatives based on the Model, provided that such modifications and 
        Derivatives are used exclusively for research purposes. The Licensee may conduct experiments, perform 
        analyses, and apply custom modifications to the Model to explore its capabilities and performance 
        under various scenarios. If the Model is modified, the modified Model must include “EXAONE” at the 
        beginning of its name.
        d. Distribute the Model and Derivatives in each case with a copy of this Agreement.
    2.2 Scope of License: The license granted herein does not authorize the Licensee to use the Model for any 
    purpose not explicitly permitted under this Agreement. Any use beyond the scope of this license, including 
    any commercial application or external distribution, is strictly prohibited unless explicitly agreed upon 
    in writing by the Licensor.
 3. Restrictions
    3.1 Commercial Use: The Licensee is expressly prohibited from using the Model, Derivatives, or Output for 
    any commercial purposes, including but not limited to, developing or deploying products, services, or 
    applications that generate revenue, whether directly or indirectly. Any commercial exploitation of the 
    Model or its derivatives requires a separate commercial license agreement with the Licensor. Furthermore, 
    the Licensee shall not use the Model, Derivatives or Output to develop or improve other models.
    3.2 Reverse Engineering: The Licensee shall not decompile, disassemble, reverse engineer, or attempt to 
    derive the source code, underlying ideas, algorithms, or structure of the Model, except to the extent that 
    such activities are expressly permitted by applicable law. Any attempt to bypass or circumvent 
    technological protection measures applied to the Model is strictly prohibited.
    3.3 Unlawful Use: The Licensee shall not use the Model and Derivatives for any illegal, fraudulent, or 
    unauthorized activities, nor for any purpose that violates applicable laws or regulations. This includes 
    but is not limited to the creation, distribution, or dissemination of malicious, deceptive, or unlawful 
    content.
    3.4 Ethical Use: The Licensee shall ensure that the Model or Derivatives is used in an ethical and 
    responsible manner, adhering to the following guidelines:
        a. The Model and Derivatives shall not be used to generate, propagate, or amplify false, misleading, 
        or harmful information, including fake news, misinformation, or disinformation.
        b. The Model and Derivatives shall not be employed to create, distribute, or promote content that is 
        discriminatory, harassing, defamatory, abusive, or otherwise offensive to individuals or groups based 
        on race, gender, sexual orientation, religion, nationality, or other protected characteristics.
        c. The Model and Derivatives shall not infringe on the rights of others, including intellectual 
        property rights, privacy rights, or any other rights recognized by law. The Licensee shall obtain all 
        necessary permissions and consents before using the Model and Derivatives in a manner that may impact 
        the rights of third parties.
        d. The Model and Derivatives shall not be used in a way that causes harm, whether physical, mental, 
        emotional, or financial, to individuals, organizations, or communities. The Licensee shall take all 
        reasonable measures to prevent misuse or abuse of the Model and Derivatives that could result in harm 
        or injury.
 4. Ownership
    4.1 Intellectual Property: All rights, title, and interest in and to the Model, including any 
    modifications, Derivatives, and associated documentation, are and shall remain the exclusive property of 
    the Licensor. The Licensee acknowledges that this Agreement does not transfer any ownership rights to the 
    Licensee. All trademarks, service marks, and logos associated with the Model are the property of the 
    Licensor.
    4.2 Output: All rights, title, and interest in and to the Output generated by the Model and Derivatives 
    whether in its original form or modified, are and shall remain the exclusive property of the Licensor.
    Licensee may use, modify, and distribute the Output and its derivatives for research purpose. The Licensee 
    shall not claim ownership of the Output except as expressly provided in this Agreement. The Licensee may 
    use the Output solely for the purposes permitted under this Agreement and shall not exploit the Output for 
    unauthorized or commercial purposes.
    4.3 Attribution: In any publication or presentation of results obtained using the Model, the Licensee 
    shall provide appropriate attribution to the Licensor, citing the Model's name and version, along with any 
    relevant documentation or references specified by the Licensor.
 5. No Warranty
    5.1 “As-Is” Basis: The Model, Derivatives, and Output are provided on an “as-is” and “as-available” basis, 
    without any warranties or representations of any kind, whether express, implied, or statutory. The 
    Licensor disclaims all warranties, including but not limited to, implied warranties of merchantability, 
    fitness for a particular purpose, accuracy, reliability, non-infringement, or any warranty arising from 
    the course of dealing or usage of trade.
    5.2 Performance and Reliability: The Licensor does not warrant or guarantee that the Model, Derivatives or 
    Output will meet the Licensee’s requirements, that the operation of the Model, Derivatives or Output will 
    be uninterrupted or error-free, or that defects in the Model will be corrected. The Licensee acknowledges 
    that the use of the Model, Derivatives or Output is at its own risk and that the Model, Derivatives or 
    Output may contain bugs, errors, or other limitations.
    5.3 No Endorsement: The Licensor does not endorse, approve, or certify any results, conclusions, or 
    recommendations derived from the use of the Model. The Licensee is solely responsible for evaluating the 
    accuracy, reliability, and suitability of the Model for its intended purposes.
 6. Limitation of Liability
    6.1 No Liability for Damages: To the fullest extent permitted by applicable law, in no event shall the 
    Licensor be liable for any special, incidental, indirect, consequential, exemplary, or punitive damages, 
    including but not limited to, damages for loss of business profits, business interruption, loss of 
    business information, loss of data, or any other pecuniary or non-pecuniary loss arising out of or in 
    connection with the use or inability to use the Model, Derivatives or any Output, even if the Licensor has 
    been advised of the possibility of such damages.
    6.2 Indemnification: The Licensee agrees to indemnify, defend, and hold harmless the Licensor, its 
    affiliates, officers, directors, employees, and agents from and against any claims, liabilities, damages, 
    losses, costs, or expenses (including reasonable attorneys' fees) arising out of or related to the 
    Licensee's use of the Model, any Derivatives, or any Output, including any violation of this Agreement or 
    applicable laws.
 7. Termination
    7.1 Termination by Licensor: The Licensor reserves the right to terminate this Agreement and revoke the 
    Licensee’s rights to use the Model at any time, with or without cause, and without prior notice if the 
    Licensee breaches any of the terms or conditions of this Agreement. Termination shall be effective 
    immediately upon notice.
    7.2 Effect of Termination: Upon termination of this Agreement, the Licensee must immediately cease all use 
    of the Model, Derivatives, and Output and destroy all copies of the Model, Derivatives, and Output in its 
    possession or control, including any backup or archival copies. The Licensee shall certify in writing to 
    the Licensor that such destruction has been completed.
    7.3 Survival: The provisions of this Agreement that by their nature should survive termination, including 
    but not limited to, Sections 4 (Ownership), 5 (No Warranty), 6 (Limitation of Liability), and this Section 
    7 (Termination), shall continue to apply after termination.
 8. Governing Law
    8.1 Governing Law: This Agreement shall be governed by and construed in accordance with the laws of the 
    Republic of Korea, without regard to its conflict of laws principles.
    8.2 Arbitration: Any disputes, controversies, or claims arising out of or relating to this Agreement, 
    including its existence, validity, interpretation, performance, breach, or termination, shall be referred 
    to and finally resolved by arbitration administered by the Korean Commercial Arbitration Board (KCAB) in 
    accordance with the International Arbitration Rules of the Korean Commercial Arbitration Board in force at 
    the time of the commencement of the arbitration. The seat of arbitration shall be Seoul, Republic of 
    Korea. The tribunal shall consist of one arbitrator. The language of the arbitration shall be English.
 9. Alterations
    9.1 Modifications: The Licensor reserves the right to modify or amend this Agreement at any time, in its 
    sole discretion. Any modifications will be effective upon posting the updated Agreement on the Licensor’s 
    website or through other means of communication. The Licensee is responsible for reviewing the Agreement 
    periodically for changes. Continued use of the Model after any modifications have been made constitutes 
    acceptance of the revised Agreement.
    9.2 Entire Agreement: This Agreement constitutes the entire agreement between the Licensee and Licensor 
    concerning the subject matter hereof and supersedes all prior or contemporaneous oral or written 
    agreements, representations, or understandings. Any terms or conditions of any purchase order or other 
    document submitted by the Licensee in connection with the Model that are in addition to, different from, 
    or inconsistent with the terms and conditions of this Agreement are not binding on the Licensor and are 
    void.
 By downloading, installing, or using the EXAONE AI Model, the Licensee acknowledges that it has read, 
 understood, and agrees to be bound by the terms and conditions of this Agreement.
--- a/README.md
+++ b/README.md
@@ -0,0 +1,119 @@
 ---
 license: other
 license_name: exaone
 license_link: LICENSE
 language:
 - en
 - ko
 tags:
 - lg-ai
 - exaone
 ---
 <p align="center">
 <img src="assets/EXAONE_Symbol+BI_3d.png", width="300", style="margin: 40 auto;">
 <br>
 # EXAONE-3.0-7.8B-Instruct
 **👋👋 We have revised our [license](./LICENSE) for revitalizing the research ecosystem.👋👋**
 ## Introduction
 We introduce EXAONE-3.0-7.8B-Instruct, a pre-trained and instruction-tuned bilingual (English and Korean) generative model with 7.8 billion parameters. 
 The model was pre-trained with 8T curated tokens and post-trained with supervised fine-tuning and direct preference optimization. 
 It demonstrates highly competitive benchmark performance against other state-of-the-art open models of similar size. 
 For more details, please refer to our [technical report](https://arxiv.org/abs/2408.03541), [blog](https://www.lgresearch.ai/blog/view?seq=460) and [GitHub](https://github.com/LG-AI-EXAONE).
 ## Quickstart
 We recommend to use transformers v4.41 or later.
 ```python
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model = AutoModelForCausalLM.from_pretrained(
    "LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto"
 )
 tokenizer = AutoTokenizer.from_pretrained("LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct")
 # Choose your prompt
 prompt = "Explain who you are"  # English example
 prompt = "너의 소원을 말해봐"   # Korean example
 messages = [
    {"role": "system", 
     "content": "You are EXAONE model from LG AI Research, a helpful assistant."},
    {"role": "user", "content": prompt}
 ]
 input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
 )
 output = model.generate(
    input_ids.to("cuda"),
    eos_token_id=tokenizer.eos_token_id,
    max_new_tokens=128
 )
 print(tokenizer.decode(output[0]))
 ```
 > ### Note
 > The EXAONE 3.0 instruction-tuned language model was trained to utilize the system prompt, 
 > so we highly recommend using the system prompts provided in the code snippet above.
 ## Evaluation
 We compared EXAONE-3.0-7.8B-Instruct with similar-sized instruction-tuned LLMs. To verify the performance of real-world use cases, we measured benchmarks that have a high correlation with [LMSYS Chatbot Arena](https://chat.lmsys.org/).
 Some experimental results are shown below. The full evaluation results can be found in the [technical report](https://arxiv.org/abs/2408.03541).
 | Language | Benchmark | EXAONE 3.0 <br>7.8B Inst. | Llama 3.1 <br>8B Inst. | Gemma 2 <br>9B Inst. | QWEN 2 <br>7B Inst. | Phi 3 <br>7B Inst. | Mistral 7B <br>Inst. |
 | :-----: | :----- | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: |
 | English | MT-Bench           | **9.01** | 7.95 | 8.52 | 8.41 | 8.52 | 7.72 |
 |         | Arena-Hard-v0.1    | **46.8** | 28.0 | 42.1 | 21.7 | 29.1 | 16.2 |
 |         | WildBench          | **48.2** | 34.5 | 41.5 | 34.9 | 32.8 | 29.0 |
 |         | AlpacaEval 2.0 LC  | 45.0 | 31.5 | **47.5** | 24.5 | 37.1 | 31.0 |
 | Korean  | KoMT-Bench<sup>[1] | **8.92** | 6.06 | 7.92 | 7.69 | 4.87 | 5.20 |
 |         | LogicKor           | **8.62** | 5.40 | 8.07 | 6.12 | 3.76 | 3.42 |
 - [1] KoMT-Bench is a dataset created by translating MT-Bench into Korean; see [README](https://github.com/LG-AI-EXAONE/KoMT-Bench) for more details.
 ## Limitation
 The EXAONE language model has certain limitations and may occasionally generate inappropriate responses. The language model generates responses based on the output probability of tokens, and it is determined during learning from training data. While we have made every effort to exclude personal, harmful, and biased information from the training data, some problematic content may still be included, potentially leading to undesirable responses. Please note that the text generated by EXAONE language model does not reflects the views of LG AI Research.
 - Inappropriate answers may be generated, which contain personal, harmful or other inappropriate information.
 - Biased responses may be generated, which are associated with age, gender, race, and so on.
 - The generated responses rely heavily on statistics from the training data, which can result in the generation of
 semantically or syntactically incorrect sentences.
 - Since the model does not reflect the latest information, the responses may be false or contradictory.
 LG AI Research strives to reduce potential risks that may arise from EXAONE language model. Users are not allowed
 to engage in any malicious activities (e.g., keying in illegal information) that may induce the creation of inappropriate
 outputs violating LG AI’s ethical principles when using EXAONE language model.
 ## License
 The model is licensed under [EXAONE AI Model License Agreement 1.1 - NC](./LICENSE)
 ## Citation
 ```
@article{exaone-3.0-7.8B-instruct,
  title={EXAONE 3.0 7.8B Instruction Tuned Language Model},
  author={LG AI Research},
  journal={arXiv preprint arXiv:2408.03541},
  year={2024}
 }
 ```
 ## Contact
 LG AI Research Technical Support: contact_us@lgresearch.ai
--- a/assets/EXAONE_Symbol+BI_3d.png
+++ b/assets/EXAONE_Symbol+BI_3d.png
--- a/config.json
+++ b/config.json
@@ -0,0 +1,32 @@
 {
  "activation_function": "silu",
  "architectures": [
    "ExaoneForCausalLM"
  ],
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "configuration_exaone.ExaoneConfig",
    "AutoModelForCausalLM": "modeling_exaone.ExaoneForCausalLM",
    "AutoModelForSequenceClassification": "modeling_exaone.ExaoneForSequenceClassification"
  },
  "bos_token_id": 1,
  "embed_dropout": 0.0,
  "eos_token_id": 361,
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "layer_norm_epsilon": 1e-05,
  "max_position_embeddings": 4096,
  "model_type": "exaone",
  "num_attention_heads": 32,
  "num_key_value_heads": 8,
  "num_layers": 32,
  "pad_token_id": 0,
  "rope_scaling": null,
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "float32",
  "transformers_version": "4.41.0",
  "use_cache": true,
  "vocab_size": 102400
 }
--- a/configuration.json
+++ b/configuration.json
@@ -0,0 +1 @@
 {"framework": "pytorch", "task": "text-generation", "allow_remote": true}
--- a/configuration_exaone.py
+++ b/configuration_exaone.py
@@ -0,0 +1,186 @@
 # coding=utf-8
 # Copyright 2021 The LG AI Research EXAONE Lab. All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 """ EXAONE model configuration """
 from transformers.configuration_utils import PretrainedConfig
 from transformers.utils import logging
 logger = logging.get_logger(__name__)
 EXAONE_PRETRAINED_CONFIG_ARCHIVE_MAP = {
 }
 class ExaoneConfig(PretrainedConfig):
    r"""
    This is the configuration class to store the configuration of a :class:`~transformers.ExaoneModel`. It is used to
    instantiate a EXAONE model according to the specified arguments, defining the model architecture. Instantiating a
    configuration with the defaults will yield a similar configuration to that of the Exaone
    Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used to control the model
    outputs. Read the documentation from :class:`~transformers.PretrainedConfig` for more information.
    Args:
        vocab_size (:obj:`int`, `optional`, defaults to 102400):
            Vocabulary size of the EXAONE model. Defines the number of different tokens that can be represented by the
            :obj:`inputs_ids` passed when calling :class:`~transformers.ExaoneModel`. Vocabulary size of the model.
            Defines the different tokens that can be represented by the `inputs_ids` passed to the forward method of
            :class:`~transformers.EXAONEModel`.
        max_position_embeddings (:obj:`int`, `optional`, defaults to 2048):
            The maximum sequence length that this model might ever be used with. Typically set this to something large
            just in case (e.g., 512 or 1024 or 2048).
        hidden_size (:obj:`int`, `optional`, defaults to 2048):
            Dimensionality of the encoder layers and the pooler layer.
        num_layers (:obj:`int`, `optional`, defaults to 32):
            Number of hidden layers in the Transformer encoder.
        num_attention_heads (:obj:`int`, `optional`, defaults to 32):
            Number of attention heads for each attention layer in the Transformer decoder.
        num_key_value_heads (:obj:`int`, `optional`):
            This is the number of key_value heads that should be used to implement Grouped Query Attention. If
            `num_key_value_heads=num_attention_heads`, the model will use Multi Head Attention (MHA), if
            `num_key_value_heads=1 the model will use Multi Query Attention (MQA) otherwise GQA is used. When
            converting a multi-head checkpoint to a GQA checkpoint, each group key and value head should be constructed
            by meanpooling all the original heads within that group. For more details checkout [this
            paper](https://arxiv.org/pdf/2305.13245.pdf). If it is not specified, will default to
            `num_attention_heads`.
        intermediate_size (:obj:`int`, `optional`, defaults to `hidden_size * 4`):
            Dimensionality of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder.
        activation_function (:obj:`str` or :obj:`function`, `optional`, defaults to :obj:`"silu"`):
            The non-linear activation function (function or string) in the decoder.
        rope_theta (:obj:`float`, `optional`, defaults to 10000.0):
            The base period of the RoPE embeddings.
        rope_scaling (:obj:`Dict`, `optional`):
            Dictionary containing the scaling configuration for the RoPE embeddings. NOTE: if you apply new rope type
            and you expect the model to work on longer `max_position_embeddings`, we recommend you to update this value
            accordingly.
            Expected contents:
                `rope_type` (:obj:`str`):
                    The sub-variant of RoPE to use. Can be one of ['default', 'linear', 'dynamic', 'yarn', 'longrope',
                    'llama3'], with 'default' being the original RoPE implementation.
                `factor` (:obj:`float`, `optional`):
                    Used with all rope types except 'default'. The scaling factor to apply to the RoPE embeddings. In
                    most scaling types, a `factor` of x will enable the model to handle sequences of length x *
                    original maximum pre-trained length.
                `original_max_position_embeddings` (:obj:`int`, `optional`):
                    Used with 'dynamic', 'longrope' and 'llama3'. The original max position embeddings used during
                    pretraining.
                `attention_factor` (:obj:`float`, `optional`):
                    Used with 'yarn' and 'longrope'. The scaling factor to be applied on the attention
                    computation. If unspecified, it defaults to value recommended by the implementation, using the
                    `factor` field to infer the suggested value.
                `beta_fast` (:obj:`float`, `optional`):
                    Only used with 'yarn'. Parameter to set the boundary for extrapolation (only) in the linear
                    ramp function. If unspecified, it defaults to 32.
                `beta_slow` (:obj:`float`, `optional`):
                    Only used with 'yarn'. Parameter to set the boundary for interpolation (only) in the linear
                    ramp function. If unspecified, it defaults to 1.
                `short_factor` (:obj:`List[float]`, `optional`):
                    Only used with 'longrope'. The scaling factor to be applied to short contexts (<
                    `original_max_position_embeddings`). Must be a list of numbers with the same length as the hidden
                    size divided by the number of attention heads divided by 2
                `long_factor` (:obj:`List[float]`, `optional`):
                    Only used with 'longrope'. The scaling factor to be applied to long contexts (<
                    `original_max_position_embeddings`). Must be a list of numbers with the same length as the hidden
                    size divided by the number of attention heads divided by 2
                `low_freq_factor` (:obj:`float`, `optional`):
                    Only used with 'llama3'. Scaling factor applied to low frequency components of the RoPE
                `high_freq_factor` (:obj:`float`, `optional`):
                    Only used with 'llama3'. Scaling factor applied to high frequency components of the RoPE
        embed_dropout (:obj:`float`, `optional`, defaults to 0.0):
            The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
        attention_dropout (:obj:`float`, `optional`, defaults to 0.0):
            The dropout ratio for the attention probabilities.
        layer_norm_epsilon (:obj:`float`, `optional`, defaults to 1e-5):
            The epsilon used by the layer normalization layers.
        initializer_range (:obj:`float`, `optional`, defaults to 0.02):
            The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
        use_cache (:obj:`bool`, `optional`, defaults to :obj:`True`):
            Whether or not the model should return the last key/values attentions (not used by all models). Only
            relevant if ``config.is_decoder=True``.
        bos_token_id (:obj:`int`, `optional`, defaults to 0):
            Beginning of stream token id.
        eos_token_id (:obj:`int`, `optional`, defaults to 2):
            End of stream token id.
        tie_word_embeddings (:obj:`bool`, `optional`, defaults to :obj:`True`):
            Whether to tie weight embeddings
        gradient_checkpointing (:obj:`bool`, `optional`, defaults to :obj:`False`):
            If True, use gradient checkpointing to save memory at the expense of slower backward pass.
        Example::
            >>> from transformers import EXAONEModel, ExaoneConfig
            >>> # Initializing a EXAONE configuration
            >>> configuration = ExaoneConfig()
            >>> # Initializing a model from configuration
            >>> model = EXAONEModel(configuration)
            >>> # Accessing the model configuration
            >>> configuration = model.config
    """
    model_type = "exaone"
    keys_to_ignore_at_inference = ["past_key_values"]
    attribute_map = {"num_hidden_layers": "num_layers"}
    def __init__(
        self,
        vocab_size=102400,
        max_position_embeddings=2048,
        hidden_size=2048,
        num_layers=32,
        num_attention_heads=32,
        num_key_value_heads=None,
        intermediate_size=None,
        activation_function="silu",
        rope_theta=10000.0,
        rope_scaling=None,
        embed_dropout=0.0,
        attention_dropout=0.0,
        layer_norm_epsilon=1e-5,
        initializer_range=0.02,
        use_cache=True,
        bos_token_id=0,
        eos_token_id=2,
        tie_word_embeddings=True,
        **kwargs
    ):
        self.vocab_size = vocab_size
        self.max_position_embeddings = max_position_embeddings
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.num_attention_heads = num_attention_heads
        self.num_hidden_layers = num_layers
        if num_key_value_heads is None:
            num_key_value_heads = num_attention_heads
        self.num_key_value_heads = num_key_value_heads
        if intermediate_size:
            self.intermediate_size = intermediate_size
        else:
            self.intermediate_size = hidden_size * 4
        self.activation_function = activation_function
        self.embed_dropout = embed_dropout
        self.attention_dropout = attention_dropout
        self.layer_norm_epsilon = layer_norm_epsilon
        self.initializer_range = initializer_range
        self.use_cache = use_cache
        self.rope_theta = rope_theta
        self.rope_scaling = rope_scaling
        self.bos_token_id = bos_token_id
        self.eos_token_id = eos_token_id
        super().__init__(bos_token_id=bos_token_id, eos_token_id=eos_token_id, tie_word_embeddings=tie_word_embeddings, **kwargs)
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,7 @@
 {
  "_from_model_config": true,
  "bos_token_id": 1,
  "eos_token_id": 361,
  "pad_token_id": 0,
  "transformers_version": "4.41"
 }
--- a/merges.txt
+++ b/merges.txt
--- a/model-00001-of-00007.safetensors
+++ b/model-00001-of-00007.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:c606ee52d0fa5bbec4fb9b1f0f00a26fae9ff7bc4494ebd1dfa69050f37d8eae
 size 4932636680
--- a/model-00002-of-00007.safetensors
+++ b/model-00002-of-00007.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:7caeef4eee66dc420df77feb3352080e0295f4efb1295610b9a56024ed29d0da
 size 4999813040
--- a/model-00003-of-00007.safetensors
+++ b/model-00003-of-00007.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:6dd2d4f8d6f580f331c24f7d6518acebe56957973b9cb0316c0b66fcaca5167a
 size 4999813080
--- a/model-00004-of-00007.safetensors
+++ b/model-00004-of-00007.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:4f8a78c8e9926eaa9531b76681548d08811e721c460b439e46927ff6631f1916
 size 4832007464
--- a/model-00005-of-00007.safetensors
+++ b/model-00005-of-00007.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:03fb440bb35aa162943a7fd4603a68ece18f529374e784263d0b8352beb1b0e5
 size 4999813088
--- a/model-00006-of-00007.safetensors
+++ b/model-00006-of-00007.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:3e17fb6e128f8b39c92742bf58459534cc2b27d4ad5dcc315f7be7aa572ade3a
 size 4832023944
--- a/model-00007-of-00007.safetensors
+++ b/model-00007-of-00007.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:a9c431b6dce9f9f6283e28ef88d230aa1bedcb50f76cfa8146435b879e12fac0
 size 1677721728
--- a/model.safetensors.index.json
+++ b/model.safetensors.index.json
@@ -0,0 +1,298 @@
 {
  "metadata": {
    "total_size": 31273795584
  },
  "weight_map": {
    "lm_head.weight": "model-00007-of-00007.safetensors",
    "transformer.h.0.attn.attention.k_proj.weight": "model-00001-of-00007.safetensors",
    "transformer.h.0.attn.attention.out_proj.weight": "model-00001-of-00007.safetensors",
    "transformer.h.0.attn.attention.q_proj.weight": "model-00001-of-00007.safetensors",
    "transformer.h.0.attn.attention.v_proj.weight": "model-00001-of-00007.safetensors",
    "transformer.h.0.ln_1.weight": "model-00001-of-00007.safetensors",
    "transformer.h.0.ln_2.weight": "model-00001-of-00007.safetensors",
    "transformer.h.0.mlp.c_fc_0.weight": "model-00001-of-00007.safetensors",
    "transformer.h.0.mlp.c_fc_1.weight": "model-00001-of-00007.safetensors",
    "transformer.h.0.mlp.c_proj.weight": "model-00001-of-00007.safetensors",
    "transformer.h.1.attn.attention.k_proj.weight": "model-00001-of-00007.safetensors",
    "transformer.h.1.attn.attention.out_proj.weight": "model-00001-of-00007.safetensors",
    "transformer.h.1.attn.attention.q_proj.weight": "model-00001-of-00007.safetensors",
    "transformer.h.1.attn.attention.v_proj.weight": "model-00001-of-00007.safetensors",
    "transformer.h.1.ln_1.weight": "model-00001-of-00007.safetensors",
    "transformer.h.1.ln_2.weight": "model-00001-of-00007.safetensors",
    "transformer.h.1.mlp.c_fc_0.weight": "model-00001-of-00007.safetensors",
    "transformer.h.1.mlp.c_fc_1.weight": "model-00001-of-00007.safetensors",
    "transformer.h.1.mlp.c_proj.weight": "model-00001-of-00007.safetensors",
    "transformer.h.10.attn.attention.k_proj.weight": "model-00003-of-00007.safetensors",
    "transformer.h.10.attn.attention.out_proj.weight": "model-00003-of-00007.safetensors",
    "transformer.h.10.attn.attention.q_proj.weight": "model-00003-of-00007.safetensors",
    "transformer.h.10.attn.attention.v_proj.weight": "model-00003-of-00007.safetensors",
    "transformer.h.10.ln_1.weight": "model-00003-of-00007.safetensors",
    "transformer.h.10.ln_2.weight": "model-00003-of-00007.safetensors",
    "transformer.h.10.mlp.c_fc_0.weight": "model-00003-of-00007.safetensors",
    "transformer.h.10.mlp.c_fc_1.weight": "model-00003-of-00007.safetensors",
    "transformer.h.10.mlp.c_proj.weight": "model-00003-of-00007.safetensors",
    "transformer.h.11.attn.attention.k_proj.weight": "model-00003-of-00007.safetensors",
    "transformer.h.11.attn.attention.out_proj.weight": "model-00003-of-00007.safetensors",
    "transformer.h.11.attn.attention.q_proj.weight": "model-00003-of-00007.safetensors",
    "transformer.h.11.attn.attention.v_proj.weight": "model-00003-of-00007.safetensors",
    "transformer.h.11.ln_1.weight": "model-00003-of-00007.safetensors",
    "transformer.h.11.ln_2.weight": "model-00003-of-00007.safetensors",
    "transformer.h.11.mlp.c_fc_0.weight": "model-00003-of-00007.safetensors",
    "transformer.h.11.mlp.c_fc_1.weight": "model-00003-of-00007.safetensors",
    "transformer.h.11.mlp.c_proj.weight": "model-00003-of-00007.safetensors",
    "transformer.h.12.attn.attention.k_proj.weight": "model-00003-of-00007.safetensors",
    "transformer.h.12.attn.attention.out_proj.weight": "model-00003-of-00007.safetensors",
    "transformer.h.12.attn.attention.q_proj.weight": "model-00003-of-00007.safetensors",
    "transformer.h.12.attn.attention.v_proj.weight": "model-00003-of-00007.safetensors",
    "transformer.h.12.ln_1.weight": "model-00003-of-00007.safetensors",
    "transformer.h.12.ln_2.weight": "model-00003-of-00007.safetensors",
    "transformer.h.12.mlp.c_fc_0.weight": "model-00003-of-00007.safetensors",
    "transformer.h.12.mlp.c_fc_1.weight": "model-00003-of-00007.safetensors",
    "transformer.h.12.mlp.c_proj.weight": "model-00003-of-00007.safetensors",
    "transformer.h.13.attn.attention.k_proj.weight": "model-00003-of-00007.safetensors",
    "transformer.h.13.attn.attention.out_proj.weight": "model-00003-of-00007.safetensors",
    "transformer.h.13.attn.attention.q_proj.weight": "model-00003-of-00007.safetensors",
    "transformer.h.13.attn.attention.v_proj.weight": "model-00003-of-00007.safetensors",
    "transformer.h.13.ln_1.weight": "model-00003-of-00007.safetensors",
    "transformer.h.13.ln_2.weight": "model-00003-of-00007.safetensors",
    "transformer.h.13.mlp.c_fc_0.weight": "model-00003-of-00007.safetensors",
    "transformer.h.13.mlp.c_fc_1.weight": "model-00003-of-00007.safetensors",
    "transformer.h.13.mlp.c_proj.weight": "model-00003-of-00007.safetensors",
    "transformer.h.14.attn.attention.k_proj.weight": "model-00003-of-00007.safetensors",
    "transformer.h.14.attn.attention.out_proj.weight": "model-00003-of-00007.safetensors",
    "transformer.h.14.attn.attention.q_proj.weight": "model-00003-of-00007.safetensors",
    "transformer.h.14.attn.attention.v_proj.weight": "model-00003-of-00007.safetensors",
    "transformer.h.14.ln_1.weight": "model-00003-of-00007.safetensors",
    "transformer.h.14.ln_2.weight": "model-00003-of-00007.safetensors",
    "transformer.h.14.mlp.c_fc_0.weight": "model-00003-of-00007.safetensors",
    "transformer.h.14.mlp.c_fc_1.weight": "model-00003-of-00007.safetensors",
    "transformer.h.14.mlp.c_proj.weight": "model-00003-of-00007.safetensors",
    "transformer.h.15.attn.attention.k_proj.weight": "model-00003-of-00007.safetensors",
    "transformer.h.15.attn.attention.out_proj.weight": "model-00003-of-00007.safetensors",
    "transformer.h.15.attn.attention.q_proj.weight": "model-00003-of-00007.safetensors",
    "transformer.h.15.attn.attention.v_proj.weight": "model-00003-of-00007.safetensors",
    "transformer.h.15.ln_1.weight": "model-00003-of-00007.safetensors",
    "transformer.h.15.ln_2.weight": "model-00003-of-00007.safetensors",
    "transformer.h.15.mlp.c_fc_0.weight": "model-00004-of-00007.safetensors",
    "transformer.h.15.mlp.c_fc_1.weight": "model-00004-of-00007.safetensors",
    "transformer.h.15.mlp.c_proj.weight": "model-00004-of-00007.safetensors",
    "transformer.h.16.attn.attention.k_proj.weight": "model-00004-of-00007.safetensors",
    "transformer.h.16.attn.attention.out_proj.weight": "model-00004-of-00007.safetensors",
    "transformer.h.16.attn.attention.q_proj.weight": "model-00004-of-00007.safetensors",
    "transformer.h.16.attn.attention.v_proj.weight": "model-00004-of-00007.safetensors",
    "transformer.h.16.ln_1.weight": "model-00004-of-00007.safetensors",
    "transformer.h.16.ln_2.weight": "model-00004-of-00007.safetensors",
    "transformer.h.16.mlp.c_fc_0.weight": "model-00004-of-00007.safetensors",
    "transformer.h.16.mlp.c_fc_1.weight": "model-00004-of-00007.safetensors",
    "transformer.h.16.mlp.c_proj.weight": "model-00004-of-00007.safetensors",
    "transformer.h.17.attn.attention.k_proj.weight": "model-00004-of-00007.safetensors",
    "transformer.h.17.attn.attention.out_proj.weight": "model-00004-of-00007.safetensors",
    "transformer.h.17.attn.attention.q_proj.weight": "model-00004-of-00007.safetensors",
    "transformer.h.17.attn.attention.v_proj.weight": "model-00004-of-00007.safetensors",
    "transformer.h.17.ln_1.weight": "model-00004-of-00007.safetensors",
    "transformer.h.17.ln_2.weight": "model-00004-of-00007.safetensors",
    "transformer.h.17.mlp.c_fc_0.weight": "model-00004-of-00007.safetensors",
    "transformer.h.17.mlp.c_fc_1.weight": "model-00004-of-00007.safetensors",
    "transformer.h.17.mlp.c_proj.weight": "model-00004-of-00007.safetensors",
    "transformer.h.18.attn.attention.k_proj.weight": "model-00004-of-00007.safetensors",
    "transformer.h.18.attn.attention.out_proj.weight": "model-00004-of-00007.safetensors",
    "transformer.h.18.attn.attention.q_proj.weight": "model-00004-of-00007.safetensors",
    "transformer.h.18.attn.attention.v_proj.weight": "model-00004-of-00007.safetensors",
    "transformer.h.18.ln_1.weight": "model-00004-of-00007.safetensors",
    "transformer.h.18.ln_2.weight": "model-00004-of-00007.safetensors",
    "transformer.h.18.mlp.c_fc_0.weight": "model-00004-of-00007.safetensors",
    "transformer.h.18.mlp.c_fc_1.weight": "model-00004-of-00007.safetensors",
    "transformer.h.18.mlp.c_proj.weight": "model-00004-of-00007.safetensors",
    "transformer.h.19.attn.attention.k_proj.weight": "model-00004-of-00007.safetensors",
    "transformer.h.19.attn.attention.out_proj.weight": "model-00004-of-00007.safetensors",
    "transformer.h.19.attn.attention.q_proj.weight": "model-00004-of-00007.safetensors",
    "transformer.h.19.attn.attention.v_proj.weight": "model-00004-of-00007.safetensors",
    "transformer.h.19.ln_1.weight": "model-00004-of-00007.safetensors",
    "transformer.h.19.ln_2.weight": "model-00004-of-00007.safetensors",
    "transformer.h.19.mlp.c_fc_0.weight": "model-00004-of-00007.safetensors",
    "transformer.h.19.mlp.c_fc_1.weight": "model-00004-of-00007.safetensors",
    "transformer.h.19.mlp.c_proj.weight": "model-00004-of-00007.safetensors",
    "transformer.h.2.attn.attention.k_proj.weight": "model-00001-of-00007.safetensors",
    "transformer.h.2.attn.attention.out_proj.weight": "model-00001-of-00007.safetensors",
    "transformer.h.2.attn.attention.q_proj.weight": "model-00001-of-00007.safetensors",
    "transformer.h.2.attn.attention.v_proj.weight": "model-00001-of-00007.safetensors",
    "transformer.h.2.ln_1.weight": "model-00001-of-00007.safetensors",
    "transformer.h.2.ln_2.weight": "model-00001-of-00007.safetensors",
    "transformer.h.2.mlp.c_fc_0.weight": "model-00001-of-00007.safetensors",
    "transformer.h.2.mlp.c_fc_1.weight": "model-00001-of-00007.safetensors",
    "transformer.h.2.mlp.c_proj.weight": "model-00001-of-00007.safetensors",
    "transformer.h.20.attn.attention.k_proj.weight": "model-00004-of-00007.safetensors",
    "transformer.h.20.attn.attention.out_proj.weight": "model-00004-of-00007.safetensors",
    "transformer.h.20.attn.attention.q_proj.weight": "model-00004-of-00007.safetensors",
    "transformer.h.20.attn.attention.v_proj.weight": "model-00004-of-00007.safetensors",
    "transformer.h.20.ln_1.weight": "model-00004-of-00007.safetensors",
    "transformer.h.20.ln_2.weight": "model-00004-of-00007.safetensors",
    "transformer.h.20.mlp.c_fc_0.weight": "model-00004-of-00007.safetensors",
    "transformer.h.20.mlp.c_fc_1.weight": "model-00004-of-00007.safetensors",
    "transformer.h.20.mlp.c_proj.weight": "model-00005-of-00007.safetensors",
    "transformer.h.21.attn.attention.k_proj.weight": "model-00005-of-00007.safetensors",
    "transformer.h.21.attn.attention.out_proj.weight": "model-00005-of-00007.safetensors",
    "transformer.h.21.attn.attention.q_proj.weight": "model-00005-of-00007.safetensors",
    "transformer.h.21.attn.attention.v_proj.weight": "model-00005-of-00007.safetensors",
    "transformer.h.21.ln_1.weight": "model-00005-of-00007.safetensors",
    "transformer.h.21.ln_2.weight": "model-00005-of-00007.safetensors",
    "transformer.h.21.mlp.c_fc_0.weight": "model-00005-of-00007.safetensors",
    "transformer.h.21.mlp.c_fc_1.weight": "model-00005-of-00007.safetensors",
    "transformer.h.21.mlp.c_proj.weight": "model-00005-of-00007.safetensors",
    "transformer.h.22.attn.attention.k_proj.weight": "model-00005-of-00007.safetensors",
    "transformer.h.22.attn.attention.out_proj.weight": "model-00005-of-00007.safetensors",
    "transformer.h.22.attn.attention.q_proj.weight": "model-00005-of-00007.safetensors",
    "transformer.h.22.attn.attention.v_proj.weight": "model-00005-of-00007.safetensors",
    "transformer.h.22.ln_1.weight": "model-00005-of-00007.safetensors",
    "transformer.h.22.ln_2.weight": "model-00005-of-00007.safetensors",
    "transformer.h.22.mlp.c_fc_0.weight": "model-00005-of-00007.safetensors",
    "transformer.h.22.mlp.c_fc_1.weight": "model-00005-of-00007.safetensors",
    "transformer.h.22.mlp.c_proj.weight": "model-00005-of-00007.safetensors",
    "transformer.h.23.attn.attention.k_proj.weight": "model-00005-of-00007.safetensors",
    "transformer.h.23.attn.attention.out_proj.weight": "model-00005-of-00007.safetensors",
    "transformer.h.23.attn.attention.q_proj.weight": "model-00005-of-00007.safetensors",
    "transformer.h.23.attn.attention.v_proj.weight": "model-00005-of-00007.safetensors",
    "transformer.h.23.ln_1.weight": "model-00005-of-00007.safetensors",
    "transformer.h.23.ln_2.weight": "model-00005-of-00007.safetensors",
    "transformer.h.23.mlp.c_fc_0.weight": "model-00005-of-00007.safetensors",
    "transformer.h.23.mlp.c_fc_1.weight": "model-00005-of-00007.safetensors",
    "transformer.h.23.mlp.c_proj.weight": "model-00005-of-00007.safetensors",
    "transformer.h.24.attn.attention.k_proj.weight": "model-00005-of-00007.safetensors",
    "transformer.h.24.attn.attention.out_proj.weight": "model-00005-of-00007.safetensors",
    "transformer.h.24.attn.attention.q_proj.weight": "model-00005-of-00007.safetensors",
    "transformer.h.24.attn.attention.v_proj.weight": "model-00005-of-00007.safetensors",
    "transformer.h.24.ln_1.weight": "model-00005-of-00007.safetensors",
    "transformer.h.24.ln_2.weight": "model-00005-of-00007.safetensors",
    "transformer.h.24.mlp.c_fc_0.weight": "model-00005-of-00007.safetensors",
    "transformer.h.24.mlp.c_fc_1.weight": "model-00005-of-00007.safetensors",
    "transformer.h.24.mlp.c_proj.weight": "model-00005-of-00007.safetensors",
    "transformer.h.25.attn.attention.k_proj.weight": "model-00005-of-00007.safetensors",
    "transformer.h.25.attn.attention.out_proj.weight": "model-00005-of-00007.safetensors",
    "transformer.h.25.attn.attention.q_proj.weight": "model-00005-of-00007.safetensors",
    "transformer.h.25.attn.attention.v_proj.weight": "model-00005-of-00007.safetensors",
    "transformer.h.25.ln_1.weight": "model-00005-of-00007.safetensors",
    "transformer.h.25.ln_2.weight": "model-00005-of-00007.safetensors",
    "transformer.h.25.mlp.c_fc_0.weight": "model-00005-of-00007.safetensors",
    "transformer.h.25.mlp.c_fc_1.weight": "model-00005-of-00007.safetensors",
    "transformer.h.25.mlp.c_proj.weight": "model-00005-of-00007.safetensors",
    "transformer.h.26.attn.attention.k_proj.weight": "model-00005-of-00007.safetensors",
    "transformer.h.26.attn.attention.out_proj.weight": "model-00005-of-00007.safetensors",
    "transformer.h.26.attn.attention.q_proj.weight": "model-00005-of-00007.safetensors",
    "transformer.h.26.attn.attention.v_proj.weight": "model-00005-of-00007.safetensors",
    "transformer.h.26.ln_1.weight": "model-00005-of-00007.safetensors",
    "transformer.h.26.ln_2.weight": "model-00005-of-00007.safetensors",
    "transformer.h.26.mlp.c_fc_0.weight": "model-00005-of-00007.safetensors",
    "transformer.h.26.mlp.c_fc_1.weight": "model-00006-of-00007.safetensors",
    "transformer.h.26.mlp.c_proj.weight": "model-00006-of-00007.safetensors",
    "transformer.h.27.attn.attention.k_proj.weight": "model-00006-of-00007.safetensors",
    "transformer.h.27.attn.attention.out_proj.weight": "model-00006-of-00007.safetensors",
    "transformer.h.27.attn.attention.q_proj.weight": "model-00006-of-00007.safetensors",
    "transformer.h.27.attn.attention.v_proj.weight": "model-00006-of-00007.safetensors",
    "transformer.h.27.ln_1.weight": "model-00006-of-00007.safetensors",
    "transformer.h.27.ln_2.weight": "model-00006-of-00007.safetensors",
    "transformer.h.27.mlp.c_fc_0.weight": "model-00006-of-00007.safetensors",
    "transformer.h.27.mlp.c_fc_1.weight": "model-00006-of-00007.safetensors",
    "transformer.h.27.mlp.c_proj.weight": "model-00006-of-00007.safetensors",
    "transformer.h.28.attn.attention.k_proj.weight": "model-00006-of-00007.safetensors",
    "transformer.h.28.attn.attention.out_proj.weight": "model-00006-of-00007.safetensors",
    "transformer.h.28.attn.attention.q_proj.weight": "model-00006-of-00007.safetensors",
    "transformer.h.28.attn.attention.v_proj.weight": "model-00006-of-00007.safetensors",
    "transformer.h.28.ln_1.weight": "model-00006-of-00007.safetensors",
    "transformer.h.28.ln_2.weight": "model-00006-of-00007.safetensors",
    "transformer.h.28.mlp.c_fc_0.weight": "model-00006-of-00007.safetensors",
    "transformer.h.28.mlp.c_fc_1.weight": "model-00006-of-00007.safetensors",
    "transformer.h.28.mlp.c_proj.weight": "model-00006-of-00007.safetensors",
    "transformer.h.29.attn.attention.k_proj.weight": "model-00006-of-00007.safetensors",
    "transformer.h.29.attn.attention.out_proj.weight": "model-00006-of-00007.safetensors",
    "transformer.h.29.attn.attention.q_proj.weight": "model-00006-of-00007.safetensors",
    "transformer.h.29.attn.attention.v_proj.weight": "model-00006-of-00007.safetensors",
    "transformer.h.29.ln_1.weight": "model-00006-of-00007.safetensors",
    "transformer.h.29.ln_2.weight": "model-00006-of-00007.safetensors",
    "transformer.h.29.mlp.c_fc_0.weight": "model-00006-of-00007.safetensors",
    "transformer.h.29.mlp.c_fc_1.weight": "model-00006-of-00007.safetensors",
    "transformer.h.29.mlp.c_proj.weight": "model-00006-of-00007.safetensors",
    "transformer.h.3.attn.attention.k_proj.weight": "model-00001-of-00007.safetensors",
    "transformer.h.3.attn.attention.out_proj.weight": "model-00001-of-00007.safetensors",
    "transformer.h.3.attn.attention.q_proj.weight": "model-00001-of-00007.safetensors",
    "transformer.h.3.attn.attention.v_proj.weight": "model-00001-of-00007.safetensors",
    "transformer.h.3.ln_1.weight": "model-00001-of-00007.safetensors",
    "transformer.h.3.ln_2.weight": "model-00001-of-00007.safetensors",
    "transformer.h.3.mlp.c_fc_0.weight": "model-00001-of-00007.safetensors",
    "transformer.h.3.mlp.c_fc_1.weight": "model-00001-of-00007.safetensors",
    "transformer.h.3.mlp.c_proj.weight": "model-00002-of-00007.safetensors",
    "transformer.h.30.attn.attention.k_proj.weight": "model-00006-of-00007.safetensors",
    "transformer.h.30.attn.attention.out_proj.weight": "model-00006-of-00007.safetensors",
    "transformer.h.30.attn.attention.q_proj.weight": "model-00006-of-00007.safetensors",
    "transformer.h.30.attn.attention.v_proj.weight": "model-00006-of-00007.safetensors",
    "transformer.h.30.ln_1.weight": "model-00006-of-00007.safetensors",
    "transformer.h.30.ln_2.weight": "model-00006-of-00007.safetensors",
    "transformer.h.30.mlp.c_fc_0.weight": "model-00006-of-00007.safetensors",
    "transformer.h.30.mlp.c_fc_1.weight": "model-00006-of-00007.safetensors",
    "transformer.h.30.mlp.c_proj.weight": "model-00006-of-00007.safetensors",
    "transformer.h.31.attn.attention.k_proj.weight": "model-00006-of-00007.safetensors",
    "transformer.h.31.attn.attention.out_proj.weight": "model-00006-of-00007.safetensors",
    "transformer.h.31.attn.attention.q_proj.weight": "model-00006-of-00007.safetensors",
    "transformer.h.31.attn.attention.v_proj.weight": "model-00006-of-00007.safetensors",
    "transformer.h.31.ln_1.weight": "model-00006-of-00007.safetensors",
    "transformer.h.31.ln_2.weight": "model-00006-of-00007.safetensors",
    "transformer.h.31.mlp.c_fc_0.weight": "model-00006-of-00007.safetensors",
    "transformer.h.31.mlp.c_fc_1.weight": "model-00006-of-00007.safetensors",
    "transformer.h.31.mlp.c_proj.weight": "model-00006-of-00007.safetensors",
    "transformer.h.4.attn.attention.k_proj.weight": "model-00002-of-00007.safetensors",
    "transformer.h.4.attn.attention.out_proj.weight": "model-00002-of-00007.safetensors",
    "transformer.h.4.attn.attention.q_proj.weight": "model-00002-of-00007.safetensors",
    "transformer.h.4.attn.attention.v_proj.weight": "model-00002-of-00007.safetensors",
    "transformer.h.4.ln_1.weight": "model-00002-of-00007.safetensors",
    "transformer.h.4.ln_2.weight": "model-00002-of-00007.safetensors",
    "transformer.h.4.mlp.c_fc_0.weight": "model-00002-of-00007.safetensors",
    "transformer.h.4.mlp.c_fc_1.weight": "model-00002-of-00007.safetensors",
    "transformer.h.4.mlp.c_proj.weight": "model-00002-of-00007.safetensors",
    "transformer.h.5.attn.attention.k_proj.weight": "model-00002-of-00007.safetensors",
    "transformer.h.5.attn.attention.out_proj.weight": "model-00002-of-00007.safetensors",
    "transformer.h.5.attn.attention.q_proj.weight": "model-00002-of-00007.safetensors",
    "transformer.h.5.attn.attention.v_proj.weight": "model-00002-of-00007.safetensors",
    "transformer.h.5.ln_1.weight": "model-00002-of-00007.safetensors",
    "transformer.h.5.ln_2.weight": "model-00002-of-00007.safetensors",
    "transformer.h.5.mlp.c_fc_0.weight": "model-00002-of-00007.safetensors",
    "transformer.h.5.mlp.c_fc_1.weight": "model-00002-of-00007.safetensors",
    "transformer.h.5.mlp.c_proj.weight": "model-00002-of-00007.safetensors",
    "transformer.h.6.attn.attention.k_proj.weight": "model-00002-of-00007.safetensors",
    "transformer.h.6.attn.attention.out_proj.weight": "model-00002-of-00007.safetensors",
    "transformer.h.6.attn.attention.q_proj.weight": "model-00002-of-00007.safetensors",
    "transformer.h.6.attn.attention.v_proj.weight": "model-00002-of-00007.safetensors",
    "transformer.h.6.ln_1.weight": "model-00002-of-00007.safetensors",
    "transformer.h.6.ln_2.weight": "model-00002-of-00007.safetensors",
    "transformer.h.6.mlp.c_fc_0.weight": "model-00002-of-00007.safetensors",
    "transformer.h.6.mlp.c_fc_1.weight": "model-00002-of-00007.safetensors",
    "transformer.h.6.mlp.c_proj.weight": "model-00002-of-00007.safetensors",
    "transformer.h.7.attn.attention.k_proj.weight": "model-00002-of-00007.safetensors",
    "transformer.h.7.attn.attention.out_proj.weight": "model-00002-of-00007.safetensors",
    "transformer.h.7.attn.attention.q_proj.weight": "model-00002-of-00007.safetensors",
    "transformer.h.7.attn.attention.v_proj.weight": "model-00002-of-00007.safetensors",
    "transformer.h.7.ln_1.weight": "model-00002-of-00007.safetensors",
    "transformer.h.7.ln_2.weight": "model-00002-of-00007.safetensors",
    "transformer.h.7.mlp.c_fc_0.weight": "model-00002-of-00007.safetensors",
    "transformer.h.7.mlp.c_fc_1.weight": "model-00002-of-00007.safetensors",
    "transformer.h.7.mlp.c_proj.weight": "model-00002-of-00007.safetensors",
    "transformer.h.8.attn.attention.k_proj.weight": "model-00002-of-00007.safetensors",
    "transformer.h.8.attn.attention.out_proj.weight": "model-00002-of-00007.safetensors",
    "transformer.h.8.attn.attention.q_proj.weight": "model-00002-of-00007.safetensors",
    "transformer.h.8.attn.attention.v_proj.weight": "model-00002-of-00007.safetensors",
    "transformer.h.8.ln_1.weight": "model-00002-of-00007.safetensors",
    "transformer.h.8.ln_2.weight": "model-00002-of-00007.safetensors",
    "transformer.h.8.mlp.c_fc_0.weight": "model-00002-of-00007.safetensors",
    "transformer.h.8.mlp.c_fc_1.weight": "model-00002-of-00007.safetensors",
    "transformer.h.8.mlp.c_proj.weight": "model-00002-of-00007.safetensors",
    "transformer.h.9.attn.attention.k_proj.weight": "model-00002-of-00007.safetensors",
    "transformer.h.9.attn.attention.out_proj.weight": "model-00002-of-00007.safetensors",
    "transformer.h.9.attn.attention.q_proj.weight": "model-00002-of-00007.safetensors",
    "transformer.h.9.attn.attention.v_proj.weight": "model-00002-of-00007.safetensors",
    "transformer.h.9.ln_1.weight": "model-00002-of-00007.safetensors",
    "transformer.h.9.ln_2.weight": "model-00002-of-00007.safetensors",
    "transformer.h.9.mlp.c_fc_0.weight": "model-00002-of-00007.safetensors",
    "transformer.h.9.mlp.c_fc_1.weight": "model-00003-of-00007.safetensors",
    "transformer.h.9.mlp.c_proj.weight": "model-00003-of-00007.safetensors",
    "transformer.ln_f.weight": "model-00006-of-00007.safetensors",
    "transformer.wte.weight": "model-00001-of-00007.safetensors"
  }
 }
--- a/modeling_exaone.py
+++ b/modeling_exaone.py
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,30 @@
 {
  "bos_token": {
    "content": "[BOS]",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "eos_token": {
    "content": "[|endofturn|]",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "pad_token": {
    "content": "[PAD]",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "unk_token": {
    "content": "[UNK]",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  }
 }
--- a/tokenizer.json
+++ b/tokenizer.json
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
--- a/vocab.json
+++ b/vocab.json
		`@@ -0,0 +1 @@`
							`{"framework": "pytorch", "task": "text-generation", "allow_remote": true}`