初始化项目，由ModelHub XC社区提供模型

Model: IIC/RigoChat-7b-v2 Source: Original Platform
2026-05-15 14:21:11 +08:00
commit 5fa8951a2f
18 changed files with 152901 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,36 @@
+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text
--- a/199
+++ b/199
@@ -0,0 +1,199 @@
+ACUERDO DE LICENCIA NO COMERCIAL DE RIGOCHAT
+
+Fecha de lanzamiento: 15-01-2025
+
+Al utilizar o distribuir cualquier parte o elemento del Modelo de
+lenguaje RigoChat (versión 2), Usted acepta estar sujeto a este Acuerdo,
+reconociendo los presentes términos y condiciones.
+
+1\. DEFINICIONES
+
+1.1. Modelo: se refiere al Modelo de lenguaje RigoChat (versión 2),
+propiedad de ADIC, puesto a disposición en virtud de este Acuerdo.
+
+1.2. Licenciante: ADIC.
+
+1.3. Acuerdo: significa este Acuerdo de Licencia.
+
+1.4. Política de Uso Aceptable: El uso del Modelo está sujeto a la
+Política de Uso Aceptable que se define a continuación. El licenciatario
+se compromete a cumplirla y a no permitir que los usuarios y/o terceros
+utilicen el Modelo para:
+
+i\) violar o alentar la violación de los derechos fundamentales de
+terceros;
+
+ii\) dañar o lesionar a cualquier tipo de personas, y en particular, a
+los menores o a grupos de personas consideradas vulnerables, en minoría
+o protegidas;
+
+iii\) cualquier propósito ilegal, invasivo, infractor, difamatorio o
+fraudulento;
+
+iv\) generar o difundir información falsa o información y/o contenido
+con el propósito de dañar a otros;
+
+v\) generar o difundir información de identificación personal que pueda
+utilizarse para dañar a un individuo o grupo de individuos;
+
+vi\) llevar a cabo una toma de decisiones totalmente automatizada que
+impacte negativamente en los derechos de las personas;
+
+vii\) participar, promover o fomentar actividades ilegales;
+
+viii\) para distribuir intencionadamente virus, gusanos, troyanos,
+archivos dañados, bulos u otros elementos de naturaleza destructiva o
+engañosa;
+
+ix\) interferir con el uso del Modelo con el fin de provocar un mal
+funcionamiento del mismo;
+
+x\) desactivar, interferir o eludir cualquier aspecto del Modelo;
+
+xi\) utilizar el Modelo, para acceder a cualquier otro producto o
+servicio de ADIC de una manera que viole los términos de servicio de
+dicho otro producto o servicio de ADIC.
+
+1.5. Trabajo(s) derivado(s): significa (a) cualquier trabajo derivado
+del Modelo según lo reconocido por las leyes de propiedad intelectual e
+industrial españolas y europeas y (b) cualquier modificación del Modelo,
+y cualquier otro modelo creado que se base en el Modelo o en el
+resultado del Modelo o que derive de él. Para mayor claridad, los
+Trabajos Derivados no incluyen el resultado de ningún Modelo.
+
+1.6. Documentación: significa cualquier especificación, manual,
+documentación y otra información escrita proporcionada por ADIC
+relacionada con el Modelo.
+
+1.7. Licenciatario: significa usted, su empleador o cualquier otra
+persona o entidad (si usted está aceptando este Acuerdo en nombre de
+dicha persona o entidad), mayor de edad requerida por las leyes, normas
+o reglamentaciones aplicables para proporcionar el consentimiento legal
+y que tiene autoridad legal para vincular a su empleador o a dicha otra
+persona o entidad si usted está aceptando este Acuerdo en su nombre.
+
+1.8. Usos no comerciales: significa ejercer cualquiera de los derechos
+otorgados en este Acuerdo con fines de investigación y/o no comerciales.
+Los usos no comerciales no incluyen ningún uso de producción y/o
+comercial del Modelo ni de ningún trabajo derivado.
+
+2\. CONCESIÓN DE LICENCIA
+
+a\. Sujeto a su cumplimiento de este Acuerdo, la Política de Uso
+Aceptable y la Documentación, ADIC le otorga una licencia no exclusiva,
+mundial, intransferible, no sublicenciable, revocable, libre de regalías
+y limitada bajo la propiedad intelectual de ADIC u otros derechos de
+propiedad o controlados por ADIC incorporados en el Modelo para usar,
+reproducir, distribuir y crear Trabajos derivados del Modelo, en cada
+caso exclusivamente para usos de investigación y/o no comerciales.
+
+b\. No puede utilizar el Modelo o los Trabajos derivados para permitir
+que terceros utilicen el Modelo o los Trabajos derivados como parte de
+su servicio alojado o a través de sus API, independientemente de si les
+está añadiendo o no una funcionalidad adicional sustancial. La mera
+distribución del Modelo o los Trabajos derivados para su descarga en
+línea sin ofrecer ningún servicio relacionado no constituye una
+infracción de este párrafo. Si desea utilizar el Modelo o cualquier
+Trabajo derivado para uso comercial y/o de producción o desea poner el
+Modelo o cualquier Trabajo derivado a disposición de terceros a través
+de su servicio alojado o sus API, póngase en contacto con ADIC. En el
+caso de utilizar el Modelo o cualquier Trabajo derivado para uso
+comercial y/o de producción, los términos de esta licencia no serán de
+aplicación, debiéndose aceptar y reconocer unos términos y condiciones
+de licencia diferentes.
+
+c\. Si distribuye o pone a disposición de un tercero el Modelo, o
+cualquier Trabajo derivado del mismo, éstos permanecerán sujetos a este
+Acuerdo y usted deberá (i) proporcionar una copia de este Acuerdo a
+dicho tercero y (ii) conservar el siguiente aviso de atribución dentro
+de un archivo de texto de "Aviso" distribuido como parte de dichas
+copias: "*RigoChat (v2), modelo lenguaje propiedad de ADIC, se
+distribuye bajo una licencia de investigación no comercial otorgada por
+ADIC*”. Si usted crea un Trabajo derivado del Modelo, usted debe agregar
+sus propios avisos de atribución al archivo de Aviso incluido con el
+Modelo, siempre que indique claramente qué atribuciones se aplican al
+Modelo y debe indicar en el archivo de Aviso que modificó el Modelo y
+cómo lo modificó.
+
+3\. EXCLUSIÓN DE GARANTÍA
+
+A menos que la legislación aplicable lo exija, el Modelo y cualquier
+resultado o producto del mismo se proporcionan "*tal cual*", sin
+garantías de ningún tipo, ya sean expresas o implícitas, incluyendo, sin
+limitación, cualquier garantía de título, no infracción, comerciabilidad
+o idoneidad para un propósito particular. Usted es el único responsable
+de determinar la idoneidad de usar o redistribuir el Modelo, los
+trabajos derivados o cualquier resultado o producto y asume cualquier
+riesgo asociado con su uso del Modelo, de los trabajos derivados y
+cualquier resultado o producto.
+
+4\. LIMITACIÓN DE RESPONSABILIDAD
+
+En ningún caso ADIC será responsable bajo ninguna teoría de
+responsabilidad, ya sea contractual, por dolo, negligencia,
+incumplimiento culposo, responsabilidad por uso o funcional o por otros
+motivos, que surjan de este Acuerdo, por cualquier pérdida de ganancias
+o cualquier daño directo, indirecto, especial, consecuente, incidental o
+punitivo, aunque ADIC haya sido advertida de la posibilidad de
+cualquiera de los anteriores.
+
+ADIC no responderá frente al Licenciatario por el uso del Modelo en
+contravención de las condiciones y términos del presente Acuerdo.
+
+5\. PROPIEDAD INTELECTUAL.
+
+a\. No se otorgan licencias de marcas comerciales bajo este Acuerdo, y
+en relación con el Modelo o los Trabajos derivados, ni ADIC ni el
+Licenciatario pueden usar ningún nombre o marca de propiedad de o
+asociada con el otro, excepto según sea necesario para el uso razonable
+y habitual al describir y redistribuir el Modelo o los Trabajos
+derivados.
+
+b\. Todos los derechos, títulos e intereses en y para el Modelo,
+incluidos todos los derechos de propiedad intelectual, son y seguirán
+siendo propiedad exclusiva de ADIC.
+
+c\. Sujeto a la propiedad de ADIC del Modelo y de los Trabajos Derivados
+realizados por o para ADIC, con respecto a cualquier Trabajo Derivado
+realizado por usted, entre usted y ADIC, usted es y será el propietario
+de dichos Trabajos Derivados. Del mismo modo, cualquier Trabajo
+Derivado, aunque no tenga el encargo de ADIC, será de su propiedad.
+
+d\. Si inicia un litigio u otro procedimiento contra ADIC (incluida una
+reconvención en una demanda) alegando que el Modelo, los Trabajos
+derivados o los resultados o productos asociados, o cualquier parte de
+los anteriores, constituyen una infracción de la propiedad intelectual u
+otros derechos de su propiedad o licenciables por usted, entonces
+cualquier licencia otorgada a usted en virtud de este Acuerdo finalizará
+a partir de la fecha en que se presente o inicie dicho litigio o
+reclamación. Usted indemnizará y eximirá de responsabilidad a ADIC de
+cualquier reclamación de terceros que surja de o esté relacionado con su
+uso o distribución del Modelo o de los Trabajos derivados en violación
+de este Acuerdo.
+
+6\. DURACIÓN Y RESOLUCIÓN
+
+La duración de este Acuerdo comenzará a regir a partir del momento en
+que usted acepte este Acuerdo o acceda al Modelo y continuará en plena
+vigencia y efecto hasta que se resuelva de conformidad con los términos
+y condiciones aquí establecidos.
+
+Este Acuerdo se resolverá automáticamente si el Licenciatario incumple
+cualquiera de los términos y condiciones del mismo. En caso de
+resolución, el Licenciatario deberá dejar de usar el Modelo y eliminar
+todas las copias del mismo en cualquier dispositivo en el que pudiera
+estar alojado. Las cláusulas 3 a 5 seguirán vigentes tras la resolución
+de este Acuerdo.
+
+7\. DISPOSICIONES GENERALES
+
+7.1. Ley Aplicable: Este Acuerdo se regirá e interpretará de acuerdo con
+las leyes de España.
+
+7.2. Acuerdo Completo: Este Acuerdo constituye el acuerdo completo entre
+ADIC y el Licenciatario con respecto al uso del Modelo y reemplaza todos
+los acuerdos y entendimientos previos, ya sean orales o escritos,
+relacionados con el Modelo.
+
+7.3. Modificaciones: Ninguna modificación de este Acuerdo será válida a
+menos que sea por escrito y firmada por ADIC y el Licenciatario.
--- a/180
+++ b/180
@@ -0,0 +1,180 @@
+RIGOCHAT NON-COMMERCIAL LICENSE AGREEMENT
+
+Release Date: 15-01-2025
+
+By using or distributing any portion or element of the RigoChat Language
+Model (v2), you agree to be bound by this Agreement, acknowledging the
+following terms and conditions.
+
+1\. DEFINITIONS
+
+1.1. Model: Refers to the RigoChat Language Model (v2), owned by ADIC,
+made available under this Agreement.
+
+1.2. Licensor: ADIC.
+
+1.3. Agreement: Means this Non-Commercial License Agreement.
+
+1.4. Acceptable Use Policy: Use of the Model is subject to the
+Acceptable Use Policy defined below. The Licensee agrees to comply with
+it and ensure that users and/or third parties do not use the Model to:
+
+i\) violate or encourage the violation of fundamental rights of third
+parties;
+
+ii\) harm or injure any persons, particularly minors or groups of
+vulnerable, minority, or protected individuals;
+
+iii\) engage in any illegal, invasive, infringing, defamatory, or
+fraudulent activities;
+
+iv\) generate or disseminate false information or content intended to
+harm others;
+
+v\) generate or disseminate personally identifiable information that
+could be used to harm an individual or group of individuals;
+
+vi\) carry out fully automated decision-making that negatively impacts
+individuals' rights;
+
+vii\) engage in, promote, or encourage illegal activities;
+
+viii\) deliberately distribute viruses, worms, trojans, corrupted files,
+hoaxes, or other destructive or deceptive elements;
+
+ix\) interfere with the use of the Model to cause malfunction;
+
+x\) disable, interfere with, or circumvent any aspect of the Model;
+
+xi\) use the Model to access any other ADIC product or service in a way
+that violates the terms of service of such other ADIC product or
+service.
+
+1.5. Derivative Work(s): Means (a) any derivative work from the Model as
+recognized by Spanish and European intellectual property laws, and (b)
+any modifications of the Model and any other model created based on or
+derived from the Model. For clarity, Derivative Works do not include the
+output of any Model.
+
+1.6. Documentation: Means any specifications, manuals, documentation, or
+other written information provided by ADIC related to the Model.
+
+1.7. Licensee: Means you, your employer, or any other person or entity
+(if you are entering into this Agreement on such person or entity's
+behalf), of the age required under applicable laws, rules or regulations
+to provide legal consent and that has legal authority to bind your
+employer or such other person or entity if you are entering in this
+Agreement on their behalf.
+
+1.8. Non-commercial Use: Means exercising any of the rights granted
+under this Agreement for research and/or non-commercial purposes.
+Non-commercial use does not include any production and/or commercial use
+of the Model or any Derivative Work.
+
+2\. LICENSE RIGHTS
+
+a\. Subject to your compliance with this Agreement, the Acceptable Use
+Policy, and the Documentation, ADIC grants you a non-exclusive,
+worldwide, non-transferable, non-sublicensable, revocable, royalty-free,
+and limited license under ADIC’s intellectual property rights or other
+proprietary rights embodied in the Model to use, reproduce, distribute,
+and create Derivative Works from the Model, in each case solely for
+research and/or non-commercial uses.
+
+b\. You may not use the Model or Derivative Works to enable third
+parties to use the Model or Derivative Works as part of your hosted
+service or via your APIs, regardless of whether you are adding
+substantial additional functionality thereto or not. Merely distributing
+the Model or Derivative Works for online download without offering any
+related service is not a violation of this section. If you wish to use
+the Model or any Derivative Work for commercial and/or production use or
+make the Model or any Derivative Work available to third parties via
+your hosted service or APIs, please contact ADIC. In case of using the
+Model or any Derivative Work for commercial and/or production purposes,
+the terms of this license will not apply, and different license terms
+and conditions must be accepted and acknowledged.
+
+c\. If you distribute or make the Model or any Derivative Work available
+to a third party, such distribution or availability will remain subject
+to this Agreement, and you must (i) provide a copy of this Agreement to
+such third party and (ii) retain the following attribution notice in a
+"Notice" text file distributed as part of such copies: "*RigoChat (v2)
+is a language model owned by ADIC and is distributed under a
+non-commercial research license granted by ADIC.*” If you create a
+Derivative Work from the Model, you must add your own attribution
+notices to the Notice file included with the Model, clearly indicating
+which attributions apply to the Model, and you must state in the Notice
+file that you changed the Model and describe how it was modified.
+
+3\. WARRANTY DISCLAIMER
+
+Unless required by applicable law, the Model and any output or results
+therefrom are provided "as is" without warranties of any kind, either
+express or implied, including, without limitation, any warranties of
+title, non-infringement, merchantability, or fitness for a particular
+purpose. You are solely responsible for determining the appropriateness
+of using or redistributing the Model, Derivative Works, or any output or
+results, and assume any risks associated with your use of the Model,
+Derivative Works, and any output or results.
+
+4\. LIMITATION OF LIABILITY
+
+In no event will ADIC be liable under any theory of liability, whether
+in contract, tort, negligence, products liability or otherwise, arising
+out of this Agreement, for any lost profits or any direct, indirect,
+special, consequential, incidental, or punitive damages, even if ADIC
+has been advised of the possibility of any of the foregoing. ADIC shall
+not be liable to the Licensee for any use of the Model in violation of
+the terms and conditions of this Agreement.
+
+5\. INTELLECTUAL PROPERTY
+
+a\. No trademark licenses are granted under this Agreement, and neither
+ADIC nor the Licensee may use any name or mark owned by or associated
+with the other except as required for reasonable and customary use in
+describing and redistributing the Model or Derivative Works.
+
+b\. All rights, title, and interest in and to the Model, including all
+intellectual property rights, are and will remain the exclusive property
+of ADIC.
+
+c\. Subject to ADIC's ownership of the Model and Derivative Works
+created by or for ADIC, regarding any Derivative Work created by you, as
+between you and ADIC, you shall own such Derivative Works. Similarly,
+any Derivative Work not commissioned by ADIC shall be owned by you.
+
+d\. If you initiate litigation or other proceedings against ADIC
+(including a counterclaim in a lawsuit) alleging that the Model,
+Derivative Works, outputs, or associated products, or any part thereof,
+infringe intellectual property rights or other rights you own or are
+licensable by you, any license granted to you under this Agreement will
+terminate as of the date such litigation or claim is filed or initiated.
+You agree to indemnify and hold ADIC harmless from any third-party
+claims arising out of or related to your use or distribution of the
+Model or Derivative Works in violation of this Agreement.
+
+6\. TERM AND TERMINATION
+
+The term of this Agreement will commence upon your acceptance of this
+Agreement or your access to the Model and will continue in full force
+and effect until terminated in accordance with the terms and conditions
+herein.
+
+This Agreement will automatically terminate if the Licensee breaches any
+of its terms and conditions. Upon termination, the Licensee must cease
+using the Model and delete all copies of it from any device on which it
+may reside. Clauses 3 to 5 shall survive the termination of this
+Agreement.
+
+7\. GENERAL PROVISIONS
+
+7.1. Governing Law: This Agreement shall be governed by and construed in
+accordance with the laws of Spain.
+
+7.2. Entire Agreement: This Agreement constitutes the entire agreement
+between ADIC and the Licensee regarding the use of the Model and
+supersedes all prior agreements and understandings, whether oral or
+written, relating to the Model.
+
+7.3. Modifications: No modification to this Agreement shall be valid
+unless in writing and signed by both ADIC and the Licensee.
--- a/README.md
+++ b/README.md
@@ -0,0 +1,431 @@
+---
+library_name: transformers
+language:
+- es
+base_model:
+- Qwen/Qwen2.5-7B-Instruct
+pipeline_tag: text-generation
+license: other
+license_name: rigochat-nc
+license_link: https://huggingface.co/IIC/RigoChat-7b-v2/blob/main/LICENSE
+tags:
+- chat
+---
+
+# Model Card for RigoChat-7b-v2
+
+<div style="display: flex; align-items: flex-start;">
+
+<div style="flex: 1;">
+
+RigoChat-7b-v2 is a Qwen-2.5-based model specifically designed to provide accurate responses from Spanish queries. Specifically, is based on the [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) model and has been fine-tuned with Direct Preference Optimization ([DPO](https://arxiv.org/pdf/2305.18290)) for improved performance in Spanish language.
+
+This model is licensed for **non-commercial use**. If you want to use it commercially, please contact us or use it through the service we offer from the [AWS Marketplace](https://aws.amazon.com/marketplace/pp/prodview-3j6o63v6c3pq2?sr=0-2&ref_=beagle&applicationId=AWSMPContessa). You can find a tutorial on how to use it from that infrastructure at the [following link](https://github.com/iiconocimiento/iic-aws/blob/main/notebooks/rigochat-7b/rigochat-7b-Model.ipynb).
+
+</div>
+
+<div style="margin-left: 20px;">
+<img src="./images/RigoChat.jpg">
+</div>
+
+</div>
+
+## Model Details
+
+### Model Description
+
+This model is the second version of RigoChat, a family of Large Language Models (LLMs) designed to solve typical NLP tasks with Spanish instructions such as: Tool Use, Summarization, Math, Code, Abstractive-QA, etc. Like [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct), this model has no specific use case and can be applied to a wide range of tasks. Indeed, it offers a slight improvement for generalist tasks in Spanish, particularly in RAG (Retriever Augmented Generation) systems with Spanish databases, as its training focused on resolving questions about contexts to prevent hallucinations and ensure safe responses.
+
+Key benefits of this model include:
+
+- Improved performance on generalist tasks in Spanish.
+- Enhanced safety and reduced hallucinations in RAG systems with Spanish texts.
+- Possibility of using it in different hardware requirements, especially those with reduced computational capacity. For more information on how to use RigoChat-7b-v2 on reduced hardware, see [IIC/RigoChat-7b-v2-GGUF](https://huggingface.co/IIC/RigoChat-7b-v2-GGUF).
+
+Remarkably, this model was trained on a single A100 GPU with limited computational resources, yet achieved its current state in a relatively short time (8.5 hours). This feat was made possible by leveraging a high-quality dataset and employing advanced techniques such as [LoRA](https://arxiv.org/pdf/2106.09685) to optimize memory usage. Further details on the training process can be found below.
+
+- **Developed by:** Instituto de Ingeniería del Conocimiento (IIC).
+- **Model type:** Generative Fine-tuned Transformer.
+- **Language(s) (NLP):** Spanish (BCP-47 es).
+- **License:** RIGOCHAT NON-COMMERCIAL.
+- **Arquitecture:** We use [Qwen's](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) architecture without modifications.
+- **Finetuned from model:** [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct).
+
+### Model Sources
+
+- **Paper:** https://arxiv.org/abs/2503.08188
+
+## Uses
+
+### Direct Use
+
+You can use and deploy RigoChat-v2 for commercial purposes through a model package from AWS Marketplace. You can check the instructions inside the [following notebook](https://github.com/iiconocimiento/iic-aws/tree/main/notebooks/rigochat-v2).
+
+### Out-of-Scope Use
+
+This language model has been adapted for general natural language processing tasks in Spanish and specific use cases such as RAG. However, there are several cases where the model should not be used due to its technical and ethical limitations:
+
+- **Illegal Activities:** The model should not be used to generate content related to illegal activities, such as creating malicious software, fraud, incitement to crime, or any illegal material.
+- **Harmful or Dangerous Content:** It should not be used to generate hate speech, violence, harassment, or any content that promotes discrimination, violence, or abuse.
+
+### Bias, Risks, and Limitations
+
+Although this model has been trained to understand and generate text in Spanish, there are several risks, biases, and limitations that users should be aware of:
+
+- **Biases:** The model may reflect biases present in the training data. These biases could be related to gender, race, social class, sexual orientation, among others, and may generate responses that perpetuate stereotypes or discrimination.
+- **Accuracy and Reliability:** While the model generates coherent and useful text in many contexts, it may not always be 100% accurate or reliable, especially in technical, scientific, or legal matters where high certainty is required.
+- **Limited or Outdated Knowledge:** The model is not trained with information beyond its training cutoff date. Therefore, it may not reflect recent events, research, or advancements.
+
+### Recommendations
+
+We recommend using this model as a general chatbot or within applications designed for specific tasks, such as SQL queries, RAG systems, or as an autonomous agent to facilitate the use of tools.
+
+## How to Get Started with the Model
+
+As [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) or any model that uses the `chatml` template.
+
+### To load the model and tokenizer
+
+```python
+from transformers import (
+    AutoModelForCausalLM,
+    AutoTokenizer,
+)
+import torch
+
+model_name = "IIC/RigoChat-7b-v2"
+
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype=torch.bfloat16,
+    device_map="cuda",
+    trust_remote_code=True,
+)
+tokenizer = AutoTokenizer.from_pretrained(
+    model_name,
+    trust_remote_code=True,
+)
+```
+
+### Sample generation
+
+```python
+messages = [
+    {"role": "user", "content": "¿Cómo puedo transformar un diccionario de listas en una lista de diccionarios, y viceversa, en Python sin utilizar bucles for?"}
+]
+
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True,
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+
+generated_ids = model.generate(
+    **model_inputs,
+    max_new_tokens=1024,
+)
+generated_ids = [
+    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+
+response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+```
+
+For a better experience, we recommend to use [the following generation parameters](https://huggingface.co/IIC/RigoChat-7b-v2/blob/main/generation_config.json).
+
+### Tool Use
+
+```python
+def obtener_temperatura_actual(location: str) -> float:
+    """
+    Obtener la temperatura actual de una localización.
+    
+    Args:
+        location: La localización, con el siguiente formato: "Ciudad, País."
+    Returns:
+        El tiempo en dicha localización, en grados Celsius.
+    """
+    return 22.
+
+
+messages = [
+  {"role": "user", "content": "¿Cuál es el tiempo en Madrid ahora mismo?"}
+]
+
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    tools=[obtener_temperatura_actual],
+    add_generation_prompt=True
+)
+
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+
+generated_ids = model.generate(
+    **model_inputs,
+    max_new_tokens=1024
+)
+generated_ids = [
+    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+
+response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+```
+
+Check the [tool use documentation](https://huggingface.co/docs/transformers/main/chat_templating#advanced-tool-use--function-calling) from HuggingFace for more information.
+
+If the model generates a tool call, you should add it to the chat like so:
+
+```python
+import re
+import json
+
+tools = {
+    "obtener_temperatura_actual" : obtener_temperatura_actual,
+}
+
+tool_call = re.search(
+    r"<tool_call>\s*(\{.*?\})\s*</tool_call>",
+    response,
+)
+tool_call = json.loads(tool_call.group(1))
+
+# Add tool metadata to messages
+messages.append(
+    {
+        "role": "assistant",
+        "tool_calls": [{"type": "function", "function": tool_call}],
+    },
+)
+
+# Add tool result to messages
+messages.append(
+    {
+        "role": "tool",
+        "name": tool_call["name"],
+        "content": tools[tool_call["name"]](**tool_call["arguments"]),
+    },
+)
+```
+
+The above code is intended only for when the model generates a function call, but the same logic can be used if several functions are called at the same time. After that, you can continue to generate messages as normal:
+
+```python
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    tools=[obtener_temperatura_actual],
+    add_generation_prompt=True
+)
+
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+
+generated_ids = model.generate(
+    **model_inputs,
+    max_new_tokens=1024
+)
+generated_ids = [
+    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+
+response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+```
+
+## Training Details
+
+### Training Data
+
+A combination of both public and private datasets designed in the IIC. The dataset consists of 21975 conversations in Spanish, with the format `chatml` and has the same structure as the [Anthropic/hh-rlhf dataset](https://huggingface.co/datasets/Anthropic/hh-rlhf). Each conversation has two variants: `chosen` and `rejected`, and only differs the last answer of the assistant. The last answer in the `chosen` variant is considered a better answer than the one in the `rejected` variant. Different techniques have been used to generate the dataset, which we explain in depth in the research (**coming soon**).
+
+### Training Procedure
+
+We use the [Transformer Reinforcement Learning](https://huggingface.co/docs/trl/index) (TRL) library. Specifically, we have applied [the script they have published](https://github.com/huggingface/trl/blob/main/examples/scripts/dpo.py) as an example for using DPO to the dataset we have generated.
+
+#### Training Hyperparameters
+
+
+<details><summary>Details</summary>
+<p>
+  
+```python
+LORA_CONFIG = {
+    "r": 64,
+    "lora_alpha": 16,
+    "lora_dropout": 0.1,
+    "bias": "none",
+    "task_type": "CAUSAL_LM",
+    "target_modules": [
+        "q_proj",
+        "k_proj",
+        "v_proj",
+        "o_proj",
+        "up_proj",
+        "gate_proj",
+        "down_proj",
+    ],
+    "use_rslora": True,
+}
+
+DPO_CONFIG = {
+    "num_train_epochs": 2,
+    "logging_steps": 25,
+    "eval_steps": 500,
+    "save_steps": 100,
+    "save_total_limit": 5,
+    "per_device_train_batch_size": 1,
+    "per_device_eval_batch_size": 1,
+    "gradient_accumulation_steps": 16,
+    "learning_rate": 5e-6,
+    "max_length": 8192, # max length in the history chat + latest assistant response.
+    "max_prompt_length": 6656, # max length in the history chat: user-assistant-...-assistant-user.
+    "gradient_checkpointing": True,
+    "weight_decay": 0.001,
+    "optim": "rmsprop",
+    "evaluation_strategy": "steps",
+    "lr_scheduler_type": "cosine",
+    "bf16": True,
+}
+```
+
+</p>
+</details>
+
+#### Speeds, Sizes, Times
+
+Below are some useful parameters showing the results of the latest training logs.
+
+```python
+latest_logs = {'loss': 0.3716, 'grad_norm': 4.989994049072266, 'learning_rate': 1.0380020311950844e-10, 'rewards/chosen': 0.534086287021637, 'rewards/rejected': -0.6236276030540466, 'rewards/accuracies': 0.8899999856948853, 'rewards/margins': 1.1577140092849731, 'logps/rejected': -218.88198852539062, 'logps/chosen': -250.0700225830078, 'logits/rejected': -1.6214849948883057, 'logits/chosen': -1.9585875272750854, 'epoch': 1.99}
+
+final_training_results = {'train_runtime': 30825.7138, 'train_samples_per_second': 1.432, 'train_steps_per_second': 0.089, 'train_loss': 0.483570138469306, 'epoch': 2.0}
+```
+
+As can be seen in the time used, in eight and a half hours we have managed to improve a state-of-the-art model, with very little hardware, in tasks adapted to Spanish. This can be seen in more detail in the following sections.
+
+## Evaluation
+
+### Testing Data, Factors & Metrics
+
+#### Testing Data
+
+
+To assess the performance of Large Language Models (LLMs), we have developed and utilized several high-quality corpora tailored to specific evaluation needs:
+
+1. [IIC/AQuAS](https://huggingface.co/datasets/IIC/AQuAS): A manually curated corpus created by two computational linguists to evaluate language models in the task of Abstractive Question Answering in Spanish. It includes examples from domains such as finance, insurance, healthcare, law, and music.
+
+2. [IIC/RagQuAS](https://huggingface.co/datasets/IIC/RagQuAS). Another manually curated corpus developed by the same linguists to evaluate full RAG systems and language models in Abstractive Question Answering tasks in Spanish. This corpus spans a wide range of domains, including hobbies, linguistics, pets, health, astronomy, customer service, cars, daily life, documentation, energy, skiing, fraud, gastronomy, languages, games, nail care, music, skating, first aid, recipes, recycling, complaints, insurance, tennis, transportation, tourism, veterinary, travel, and yoga.
+
+3. **CAM:** Designed for all CAM tasks, this corpus consists of frequently asked questions (FAQs) sourced from consumer-related topics on the websites of the Comunidad de Madrid. The questions are categorized into three levels of degradation—E1, E2, and E3—intended to measure the LLMs’ ability to understand and effectively respond to poorly formulated queries caused by spelling errors, varying levels of colloquialism, and similar issues. This task also falls under the Abstractive Question Answering category.
+
+4. **Shops:** A multi-turn conversational corpus centered on policies from various clothing companies. The task involves Multi-turn Abstractive Question Answering.
+
+5. **Insurance:** Another multi-turn conversational corpus, this one focuses on policies from various insurance companies. It also involves Multi-turn Abstractive Question Answering.
+
+Each corpus includes the following columns: question, answer, and context(s) containing relevant information from which the model can derive the answer. In multi-turn tasks, a chat history is also provided.
+
+The scoring process for LLMs involves measuring the similarity between the original answer and the one generated by the model. All corpora are private except for AQuAS and RagQuAS, which are publicly available and can serve as examples of the structure and content of the others.
+
+#### Factors
+
+These evaluations are very specific and do not encompass all the general scenarios to which the model could be exposed, since all evaluations are focused on solving tasks for RAG in very specific domains.
+
+#### Metrics
+
+The evaluation is based on using [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) to score the answers.
+
+### Results
+
+| **Model**                  | **Average** | **AQuAS** | **RagQuAS** | **CAM**  | **CAM_E1** | **CAM_E2** | **CAM_E3** | **Shops** | **Insurance** |
+|----------------------------|-------------|-----------|-------------|----------|------------|------------|------------|-----------|---------------|
+| **RigoChat-7b-v2**         | **79.55**   | 82.52     | 79.10       | **78.91**| **79.17**  | 76.73      | **78.23**  | **80.79** | **81.04**     |
+| GPT-4o                     | 78.26       | **85.23** | 77.91       | 78.00    | 74.91      | 73.45      | 77.09      | 78.60     | 80.89         |
+| stablelm-2-12b-chat        | 77.74       | 78.88     | 78.21       | 77.82    | 78.73      | **77.27**  | 74.73      | 77.03     | 79.26         |
+| Mistral-Small-Instruct-2409| 77.29       | 80.56     | 78.81       | 77.82    | 75.82      | 73.27      | 73.45      | 78.25     | 80.36         |
+| Qwen2.5-7B-Instruct        | 77.17       | 80.93     | 77.41       | 77.82    | 75.09      | 75.45      | 72.91      | 78.08     | 79.67         |
+| Meta-Llama-3.1-8B-Instruct | 76.55       | 81.87     | 80.50       | 72.91    | 73.45      | 75.45      | 71.64      | 77.73     | 78.88         |
+| GPT-4o-mini                | 76.48       | 82.80     | 75.82       | 76.36    | 74.36      | 72.36      | 71.82      | 78.25     | 80.08         |
+| Phi-3.5-mini-instruct      | 76.38       | 81.68     | **81.09**   | 75.82    | 74.73      | 71.45      | 70.36      | 77.43     | 78.45         |
+| gemma-2-9b-it              | 75.80       | 82.80     | 78.11       | 72.91    | 73.45      | 71.09      | 71.27      | 77.08     | 79.72         |
+| Ministral-8B-Instruct-2410 | 75.19       | 79.63     | 77.31       | 76.00    | 73.45      | 72.36      | 70.18      | 76.44     | 76.14         |
+| GPT-3.5-turbo-0125         | 74.78       | 80.93     | 73.53       | 76.73    | 72.55      | 72.18      | 69.09      | 75.63     | 77.64         |
+| Llama-2-7b-chat-hf         | 71.18       | 67.10     | 77.31       | 71.45    | 70.36      | 70.73      | 68.55      | 72.07     | 71.90         |
+| granite-3.0-8b-instruct    | 71.08       | 73.08     | 72.44       | 72.36    | 71.82      | 69.09      | 66.18      | 69.97     | 73.73         |
+| RigoChat-7b-v1             | 62.13       | 72.34     | 67.46       | 61.27    | 59.45      | 57.45      | 57.64      | 62.10     | 59.34         |
+| salamandra-7b-instruct     | 61.96       | 63.74     | 60.70       | 64.91    | 63.27      | 62.36      | 60.55      | 59.94     | 60.23         |
+
+
+#### Summary
+
+RigoChat-7b-v2 manages to significantly improve performance compared to Qwen-2.5 in the tasks for which it has been indirectly designed. On the other hand, it manages to outperform most state-of-the-art models in these tasks, demonstrating that with few resources LLMs can be aligned for specific use cases.
+
+## Environmental Impact
+
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+
+- **Hardware Type:** A100 PCIe 80GB
+- **Hours used:** 8.5
+- **Cloud Provider:** Private Infrastructure
+- **Compute Region:** Spain
+- **Carbon Emitted:** 0.92
+
+#### Hardware
+
+GPU NVIDIA A100 with Tensor Core and 80GB.
+
+#### Software
+
+`Ubuntu 22.04.5 LTS` and the following requirements:
+
+```
+python=3.11
+flash_attn>=2.5.8
+datasets
+numpy
+trl
+peft
+huggingface_hub
+```
+
+## Citation
+
+```
+@misc {instituto_de_ingeniería_del_conocimiento_2025,
+	author       = { {Instituto de Ingeniería del Conocimiento} },
+	title        = { RigoChat-7b-v2 },
+	year         = 2025,
+	url          = { https://huggingface.co/IIC/RigoChat-7b-v2 },
+	doi          = { 10.57967/hf/4158 },
+	publisher    = { Hugging Face }
+}
+```
+
+```
+@misc{gómez2025rigochat2adaptedlanguage,
+      title={RigoChat 2: an adapted language model to Spanish using a bounded dataset and reduced hardware}, 
+      author={Gonzalo Santamaría Gómez and Guillem García Subies and Pablo Gutiérrez Ruiz and Mario González Valero and Natàlia Fuertes and Helena Montoro Zamorano and Carmen Muñoz Sanz and Leire Rosado Plaza and Nuria Aldama García and David Betancur Sánchez and Kateryna Sushkova and Marta Guerrero Nieto and Álvaro Barbero Jiménez},
+      year={2025},
+      eprint={2503.08188},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2503.08188}, 
+}
+```
+
+## Disclaimer
+
+Please note that the model may contain biases or other undesired distortions based on the training data. When third parties implement systems or provide services based on this model, or use the model themselves, they are responsible for mitigating associated risks and ensuring compliance with applicable regulations, including those governing the use of artificial intelligence. The IIC, as the owner and creator of the model, shall not be held liable for any outcomes resulting from third-party use.
+
+## Model Card Contact
+
+- `contacto.iic@iic.uam.es`
+
+## Authorship Notice
+
+### Additional Information
+
+- **Author**: Instituto de Ingeniería del Conocimiento (IIC).
+- **Contact**: For more information, please email [contacto.iic@iic.uam.es](contacto.iic@iic.uam.es).
+- **Copyright**: Copyright(c) 2024 ADIC.
--- a/added_tokens.json
+++ b/added_tokens.json
@@ -0,0 +1,24 @@
+{
+  "</tool_call>": 151658,
+  "<tool_call>": 151657,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}
--- a/config.json
+++ b/config.json
@@ -0,0 +1,29 @@
+{
+  "_name_or_path": "IIC/RigoChat-7b-v2",
+  "architectures": [
+    "Qwen2ForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "eos_token_id": 151645,
+  "hidden_act": "silu",
+  "hidden_size": 3584,
+  "initializer_range": 0.02,
+  "intermediate_size": 18944,
+  "max_position_embeddings": 32768,
+  "max_window_layers": 28,
+  "model_type": "qwen2",
+  "num_attention_heads": 28,
+  "num_hidden_layers": 28,
+  "num_key_value_heads": 4,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 1000000.0,
+  "sliding_window": null,
+  "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.45.2",
+  "use_cache": true,
+  "use_sliding_window": false,
+  "vocab_size": 152064
+}
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,14 @@
+{
+  "bos_token_id": 151643,
+  "do_sample": true,
+  "eos_token_id": [
+    151645,
+    151643
+  ],
+  "pad_token_id": 151643,
+  "repetition_penalty": 1.05,
+  "temperature": 0.7,
+  "top_k": 20,
+  "top_p": 0.8,
+  "transformers_version": "4.45.2"
+}
--- a/images/RigoChat.jpg
+++ b/images/RigoChat.jpg
--- a/merges.txt
+++ b/merges.txt
--- a/model-00001-of-00004.safetensors
+++ b/model-00001-of-00004.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:7c2f730cdef23656141e233cd1cb15b68b47bef86cccd81c1e86ca39f52d553d
+size 4877660776
--- a/model-00002-of-00004.safetensors
+++ b/model-00002-of-00004.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:8414b193717b2c89e8bbda2b1e4de60bc2c19d6d752a996cf1f7f8a283e4b838
+size 4932751008
--- a/model-00003-of-00004.safetensors
+++ b/model-00003-of-00004.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:42a44c88c4393da603d9141e0e8d138a059e9be296339988f80a4a98a5de3786
+size 4330865200
--- a/model-00004-of-00004.safetensors
+++ b/model-00004-of-00004.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:06006972c3be88e8a44fe21cfe2b0472b130780c781a741f8f90f1fe5ba3aae2
+size 1089994880
--- a/model.safetensors.index.json
+++ b/model.safetensors.index.json
@@ -0,0 +1,346 @@
+{
+  "metadata": {
+    "total_size": 15231233024
+  },
+  "weight_map": {
+    "lm_head.weight": "model-00004-of-00004.safetensors",
+    "model.embed_tokens.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.18.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.18.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.19.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.19.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.19.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.19.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.19.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.19.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.19.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.19.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.19.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.19.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.19.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.20.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.8.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.8.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.8.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.8.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.8.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.norm.weight": "model-00003-of-00004.safetensors"
+  }
+}
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,31 @@
+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}
--- a/tokenizer.json
+++ b/tokenizer.json
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
+size 11421896
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,207 @@
+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "chat_template": "{%- if tools %}\n    {{- '<|im_start|>system\\n' }}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- messages[0]['content'] }}\n    {%- else %}\n        {{- 'Te llamas RigoChat, y eres un modelo de lenguaje diseñado por el Instituto de Ingeniería del Conocimiento (IIC). Tu propósito es resolver las consultas del usuario, respondiendo en español, de forma cercana y amable.' }}\n    {%- endif %}\n    {{- \"\\n\\n# Tools\\n\\nPuede que necesites llamar a una o varias herramientas para resolver la petición del usuario.\\n\\nPuedes consultar las funciones junto con sus argumentos dentro de <tools></tools> XML tags:\\n<tools>\" }}\n    {%- for tool in tools %}\n        {{- \"\\n\" }}\n        {{- tool | tojson }}\n    {%- endfor %}\n    {{- \"\\n</tools>\\n\\nPara cada llamada que necesites realizar, devuelve un objeto json con el nombre de la función y los argumentos dentro de <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n    {%- else %}\n        {{- '<|im_start|>system\\nTe llamas RigoChat, y eres un modelo de lenguaje diseñado por el Instituto de Ingeniería del Conocimiento (IIC). Tu propósito es resolver las consultas del usuario, respondiendo en español, de forma cercana y amable.<|im_end|>\\n' }}\n    {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n    {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n        {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n    {%- elif message.role == \"assistant\" %}\n        {{- '<|im_start|>' + message.role }}\n        {%- if message.content %}\n            {{- '\\n' + message.content }}\n        {%- endif %}\n        {%- for tool_call in message.tool_calls %}\n            {%- if tool_call.function is defined %}\n                {%- set tool_call = tool_call.function %}\n            {%- endif %}\n            {{- '\\n<tool_call>\\n{\"name\": \"' }}\n            {{- tool_call.name }}\n            {{- '\", \"arguments\": ' }}\n            {{- tool_call.arguments | tojson }}\n            {{- '}\\n</tool_call>' }}\n        {%- endfor %}\n        {{- '<|im_end|>\\n' }}\n    {%- elif message.role == \"tool\" %}\n        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n            {{- '<|im_start|>user' }}\n        {%- endif %}\n        {{- '\\n<tool_response>\\n' }}\n        {{- message.content }}\n        {{- '\\n</tool_response>' }}\n        {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n            {{- '<|im_end|>\\n' }}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "model_max_length": 131072,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}
--- a/vocab.json
+++ b/vocab.json