初始化项目,由ModelHub XC社区提供模型

Model: IIC/RigoChat-7b-v2
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-15 14:21:11 +08:00
commit 5fa8951a2f
18 changed files with 152901 additions and 0 deletions

36
.gitattributes vendored Normal file
View File

@@ -0,0 +1,36 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text

199
LICENCIA Normal file
View File

@@ -0,0 +1,199 @@
ACUERDO DE LICENCIA NO COMERCIAL DE RIGOCHAT
Fecha de lanzamiento: 15-01-2025
Al utilizar o distribuir cualquier parte o elemento del Modelo de
lenguaje RigoChat (versión 2), Usted acepta estar sujeto a este Acuerdo,
reconociendo los presentes términos y condiciones.
1\. DEFINICIONES
1.1. Modelo: se refiere al Modelo de lenguaje RigoChat (versión 2),
propiedad de ADIC, puesto a disposición en virtud de este Acuerdo.
1.2. Licenciante: ADIC.
1.3. Acuerdo: significa este Acuerdo de Licencia.
1.4. Política de Uso Aceptable: El uso del Modelo está sujeto a la
Política de Uso Aceptable que se define a continuación. El licenciatario
se compromete a cumplirla y a no permitir que los usuarios y/o terceros
utilicen el Modelo para:
i\) violar o alentar la violación de los derechos fundamentales de
terceros;
ii\) dañar o lesionar a cualquier tipo de personas, y en particular, a
los menores o a grupos de personas consideradas vulnerables, en minoría
o protegidas;
iii\) cualquier propósito ilegal, invasivo, infractor, difamatorio o
fraudulento;
iv\) generar o difundir información falsa o información y/o contenido
con el propósito de dañar a otros;
v\) generar o difundir información de identificación personal que pueda
utilizarse para dañar a un individuo o grupo de individuos;
vi\) llevar a cabo una toma de decisiones totalmente automatizada que
impacte negativamente en los derechos de las personas;
vii\) participar, promover o fomentar actividades ilegales;
viii\) para distribuir intencionadamente virus, gusanos, troyanos,
archivos dañados, bulos u otros elementos de naturaleza destructiva o
engañosa;
ix\) interferir con el uso del Modelo con el fin de provocar un mal
funcionamiento del mismo;
x\) desactivar, interferir o eludir cualquier aspecto del Modelo;
xi\) utilizar el Modelo, para acceder a cualquier otro producto o
servicio de ADIC de una manera que viole los términos de servicio de
dicho otro producto o servicio de ADIC.
1.5. Trabajo(s) derivado(s): significa (a) cualquier trabajo derivado
del Modelo según lo reconocido por las leyes de propiedad intelectual e
industrial españolas y europeas y (b) cualquier modificación del Modelo,
y cualquier otro modelo creado que se base en el Modelo o en el
resultado del Modelo o que derive de él. Para mayor claridad, los
Trabajos Derivados no incluyen el resultado de ningún Modelo.
1.6. Documentación: significa cualquier especificación, manual,
documentación y otra información escrita proporcionada por ADIC
relacionada con el Modelo.
1.7. Licenciatario: significa usted, su empleador o cualquier otra
persona o entidad (si usted está aceptando este Acuerdo en nombre de
dicha persona o entidad), mayor de edad requerida por las leyes, normas
o reglamentaciones aplicables para proporcionar el consentimiento legal
y que tiene autoridad legal para vincular a su empleador o a dicha otra
persona o entidad si usted está aceptando este Acuerdo en su nombre.
1.8. Usos no comerciales: significa ejercer cualquiera de los derechos
otorgados en este Acuerdo con fines de investigación y/o no comerciales.
Los usos no comerciales no incluyen ningún uso de producción y/o
comercial del Modelo ni de ningún trabajo derivado.
2\. CONCESIÓN DE LICENCIA
a\. Sujeto a su cumplimiento de este Acuerdo, la Política de Uso
Aceptable y la Documentación, ADIC le otorga una licencia no exclusiva,
mundial, intransferible, no sublicenciable, revocable, libre de regalías
y limitada bajo la propiedad intelectual de ADIC u otros derechos de
propiedad o controlados por ADIC incorporados en el Modelo para usar,
reproducir, distribuir y crear Trabajos derivados del Modelo, en cada
caso exclusivamente para usos de investigación y/o no comerciales.
b\. No puede utilizar el Modelo o los Trabajos derivados para permitir
que terceros utilicen el Modelo o los Trabajos derivados como parte de
su servicio alojado o a través de sus API, independientemente de si les
está añadiendo o no una funcionalidad adicional sustancial. La mera
distribución del Modelo o los Trabajos derivados para su descarga en
línea sin ofrecer ningún servicio relacionado no constituye una
infracción de este párrafo. Si desea utilizar el Modelo o cualquier
Trabajo derivado para uso comercial y/o de producción o desea poner el
Modelo o cualquier Trabajo derivado a disposición de terceros a través
de su servicio alojado o sus API, póngase en contacto con ADIC. En el
caso de utilizar el Modelo o cualquier Trabajo derivado para uso
comercial y/o de producción, los términos de esta licencia no serán de
aplicación, debiéndose aceptar y reconocer unos términos y condiciones
de licencia diferentes.
c\. Si distribuye o pone a disposición de un tercero el Modelo, o
cualquier Trabajo derivado del mismo, éstos permanecerán sujetos a este
Acuerdo y usted deberá (i) proporcionar una copia de este Acuerdo a
dicho tercero y (ii) conservar el siguiente aviso de atribución dentro
de un archivo de texto de "Aviso" distribuido como parte de dichas
copias: "*RigoChat (v2), modelo lenguaje propiedad de ADIC, se
distribuye bajo una licencia de investigación no comercial otorgada por
ADIC*”. Si usted crea un Trabajo derivado del Modelo, usted debe agregar
sus propios avisos de atribución al archivo de Aviso incluido con el
Modelo, siempre que indique claramente qué atribuciones se aplican al
Modelo y debe indicar en el archivo de Aviso que modificó el Modelo y
cómo lo modificó.
3\. EXCLUSIÓN DE GARANTÍA
A menos que la legislación aplicable lo exija, el Modelo y cualquier
resultado o producto del mismo se proporcionan "*tal cual*", sin
garantías de ningún tipo, ya sean expresas o implícitas, incluyendo, sin
limitación, cualquier garantía de título, no infracción, comerciabilidad
o idoneidad para un propósito particular. Usted es el único responsable
de determinar la idoneidad de usar o redistribuir el Modelo, los
trabajos derivados o cualquier resultado o producto y asume cualquier
riesgo asociado con su uso del Modelo, de los trabajos derivados y
cualquier resultado o producto.
4\. LIMITACIÓN DE RESPONSABILIDAD
En ningún caso ADIC será responsable bajo ninguna teoría de
responsabilidad, ya sea contractual, por dolo, negligencia,
incumplimiento culposo, responsabilidad por uso o funcional o por otros
motivos, que surjan de este Acuerdo, por cualquier pérdida de ganancias
o cualquier daño directo, indirecto, especial, consecuente, incidental o
punitivo, aunque ADIC haya sido advertida de la posibilidad de
cualquiera de los anteriores.
ADIC no responderá frente al Licenciatario por el uso del Modelo en
contravención de las condiciones y términos del presente Acuerdo.
5\. PROPIEDAD INTELECTUAL.
a\. No se otorgan licencias de marcas comerciales bajo este Acuerdo, y
en relación con el Modelo o los Trabajos derivados, ni ADIC ni el
Licenciatario pueden usar ningún nombre o marca de propiedad de o
asociada con el otro, excepto según sea necesario para el uso razonable
y habitual al describir y redistribuir el Modelo o los Trabajos
derivados.
b\. Todos los derechos, títulos e intereses en y para el Modelo,
incluidos todos los derechos de propiedad intelectual, son y seguirán
siendo propiedad exclusiva de ADIC.
c\. Sujeto a la propiedad de ADIC del Modelo y de los Trabajos Derivados
realizados por o para ADIC, con respecto a cualquier Trabajo Derivado
realizado por usted, entre usted y ADIC, usted es y será el propietario
de dichos Trabajos Derivados. Del mismo modo, cualquier Trabajo
Derivado, aunque no tenga el encargo de ADIC, será de su propiedad.
d\. Si inicia un litigio u otro procedimiento contra ADIC (incluida una
reconvención en una demanda) alegando que el Modelo, los Trabajos
derivados o los resultados o productos asociados, o cualquier parte de
los anteriores, constituyen una infracción de la propiedad intelectual u
otros derechos de su propiedad o licenciables por usted, entonces
cualquier licencia otorgada a usted en virtud de este Acuerdo finalizará
a partir de la fecha en que se presente o inicie dicho litigio o
reclamación. Usted indemnizará y eximirá de responsabilidad a ADIC de
cualquier reclamación de terceros que surja de o esté relacionado con su
uso o distribución del Modelo o de los Trabajos derivados en violación
de este Acuerdo.
6\. DURACIÓN Y RESOLUCIÓN
La duración de este Acuerdo comenzará a regir a partir del momento en
que usted acepte este Acuerdo o acceda al Modelo y continuará en plena
vigencia y efecto hasta que se resuelva de conformidad con los términos
y condiciones aquí establecidos.
Este Acuerdo se resolverá automáticamente si el Licenciatario incumple
cualquiera de los términos y condiciones del mismo. En caso de
resolución, el Licenciatario deberá dejar de usar el Modelo y eliminar
todas las copias del mismo en cualquier dispositivo en el que pudiera
estar alojado. Las cláusulas 3 a 5 seguirán vigentes tras la resolución
de este Acuerdo.
7\. DISPOSICIONES GENERALES
7.1. Ley Aplicable: Este Acuerdo se regirá e interpretará de acuerdo con
las leyes de España.
7.2. Acuerdo Completo: Este Acuerdo constituye el acuerdo completo entre
ADIC y el Licenciatario con respecto al uso del Modelo y reemplaza todos
los acuerdos y entendimientos previos, ya sean orales o escritos,
relacionados con el Modelo.
7.3. Modificaciones: Ninguna modificación de este Acuerdo será válida a
menos que sea por escrito y firmada por ADIC y el Licenciatario.

180
LICENSE Normal file
View File

@@ -0,0 +1,180 @@
RIGOCHAT NON-COMMERCIAL LICENSE AGREEMENT
Release Date: 15-01-2025
By using or distributing any portion or element of the RigoChat Language
Model (v2), you agree to be bound by this Agreement, acknowledging the
following terms and conditions.
1\. DEFINITIONS
1.1. Model: Refers to the RigoChat Language Model (v2), owned by ADIC,
made available under this Agreement.
1.2. Licensor: ADIC.
1.3. Agreement: Means this Non-Commercial License Agreement.
1.4. Acceptable Use Policy: Use of the Model is subject to the
Acceptable Use Policy defined below. The Licensee agrees to comply with
it and ensure that users and/or third parties do not use the Model to:
i\) violate or encourage the violation of fundamental rights of third
parties;
ii\) harm or injure any persons, particularly minors or groups of
vulnerable, minority, or protected individuals;
iii\) engage in any illegal, invasive, infringing, defamatory, or
fraudulent activities;
iv\) generate or disseminate false information or content intended to
harm others;
v\) generate or disseminate personally identifiable information that
could be used to harm an individual or group of individuals;
vi\) carry out fully automated decision-making that negatively impacts
individuals' rights;
vii\) engage in, promote, or encourage illegal activities;
viii\) deliberately distribute viruses, worms, trojans, corrupted files,
hoaxes, or other destructive or deceptive elements;
ix\) interfere with the use of the Model to cause malfunction;
x\) disable, interfere with, or circumvent any aspect of the Model;
xi\) use the Model to access any other ADIC product or service in a way
that violates the terms of service of such other ADIC product or
service.
1.5. Derivative Work(s): Means (a) any derivative work from the Model as
recognized by Spanish and European intellectual property laws, and (b)
any modifications of the Model and any other model created based on or
derived from the Model. For clarity, Derivative Works do not include the
output of any Model.
1.6. Documentation: Means any specifications, manuals, documentation, or
other written information provided by ADIC related to the Model.
1.7. Licensee: Means you, your employer, or any other person or entity
(if you are entering into this Agreement on such person or entity's
behalf), of the age required under applicable laws, rules or regulations
to provide legal consent and that has legal authority to bind your
employer or such other person or entity if you are entering in this
Agreement on their behalf.
1.8. Non-commercial Use: Means exercising any of the rights granted
under this Agreement for research and/or non-commercial purposes.
Non-commercial use does not include any production and/or commercial use
of the Model or any Derivative Work.
2\. LICENSE RIGHTS
a\. Subject to your compliance with this Agreement, the Acceptable Use
Policy, and the Documentation, ADIC grants you a non-exclusive,
worldwide, non-transferable, non-sublicensable, revocable, royalty-free,
and limited license under ADICs intellectual property rights or other
proprietary rights embodied in the Model to use, reproduce, distribute,
and create Derivative Works from the Model, in each case solely for
research and/or non-commercial uses.
b\. You may not use the Model or Derivative Works to enable third
parties to use the Model or Derivative Works as part of your hosted
service or via your APIs, regardless of whether you are adding
substantial additional functionality thereto or not. Merely distributing
the Model or Derivative Works for online download without offering any
related service is not a violation of this section. If you wish to use
the Model or any Derivative Work for commercial and/or production use or
make the Model or any Derivative Work available to third parties via
your hosted service or APIs, please contact ADIC. In case of using the
Model or any Derivative Work for commercial and/or production purposes,
the terms of this license will not apply, and different license terms
and conditions must be accepted and acknowledged.
c\. If you distribute or make the Model or any Derivative Work available
to a third party, such distribution or availability will remain subject
to this Agreement, and you must (i) provide a copy of this Agreement to
such third party and (ii) retain the following attribution notice in a
"Notice" text file distributed as part of such copies: "*RigoChat (v2)
is a language model owned by ADIC and is distributed under a
non-commercial research license granted by ADIC.*” If you create a
Derivative Work from the Model, you must add your own attribution
notices to the Notice file included with the Model, clearly indicating
which attributions apply to the Model, and you must state in the Notice
file that you changed the Model and describe how it was modified.
3\. WARRANTY DISCLAIMER
Unless required by applicable law, the Model and any output or results
therefrom are provided "as is" without warranties of any kind, either
express or implied, including, without limitation, any warranties of
title, non-infringement, merchantability, or fitness for a particular
purpose. You are solely responsible for determining the appropriateness
of using or redistributing the Model, Derivative Works, or any output or
results, and assume any risks associated with your use of the Model,
Derivative Works, and any output or results.
4\. LIMITATION OF LIABILITY
In no event will ADIC be liable under any theory of liability, whether
in contract, tort, negligence, products liability or otherwise, arising
out of this Agreement, for any lost profits or any direct, indirect,
special, consequential, incidental, or punitive damages, even if ADIC
has been advised of the possibility of any of the foregoing. ADIC shall
not be liable to the Licensee for any use of the Model in violation of
the terms and conditions of this Agreement.
5\. INTELLECTUAL PROPERTY
a\. No trademark licenses are granted under this Agreement, and neither
ADIC nor the Licensee may use any name or mark owned by or associated
with the other except as required for reasonable and customary use in
describing and redistributing the Model or Derivative Works.
b\. All rights, title, and interest in and to the Model, including all
intellectual property rights, are and will remain the exclusive property
of ADIC.
c\. Subject to ADIC's ownership of the Model and Derivative Works
created by or for ADIC, regarding any Derivative Work created by you, as
between you and ADIC, you shall own such Derivative Works. Similarly,
any Derivative Work not commissioned by ADIC shall be owned by you.
d\. If you initiate litigation or other proceedings against ADIC
(including a counterclaim in a lawsuit) alleging that the Model,
Derivative Works, outputs, or associated products, or any part thereof,
infringe intellectual property rights or other rights you own or are
licensable by you, any license granted to you under this Agreement will
terminate as of the date such litigation or claim is filed or initiated.
You agree to indemnify and hold ADIC harmless from any third-party
claims arising out of or related to your use or distribution of the
Model or Derivative Works in violation of this Agreement.
6\. TERM AND TERMINATION
The term of this Agreement will commence upon your acceptance of this
Agreement or your access to the Model and will continue in full force
and effect until terminated in accordance with the terms and conditions
herein.
This Agreement will automatically terminate if the Licensee breaches any
of its terms and conditions. Upon termination, the Licensee must cease
using the Model and delete all copies of it from any device on which it
may reside. Clauses 3 to 5 shall survive the termination of this
Agreement.
7\. GENERAL PROVISIONS
7.1. Governing Law: This Agreement shall be governed by and construed in
accordance with the laws of Spain.
7.2. Entire Agreement: This Agreement constitutes the entire agreement
between ADIC and the Licensee regarding the use of the Model and
supersedes all prior agreements and understandings, whether oral or
written, relating to the Model.
7.3. Modifications: No modification to this Agreement shall be valid
unless in writing and signed by both ADIC and the Licensee.

431
README.md Normal file
View File

@@ -0,0 +1,431 @@
---
library_name: transformers
language:
- es
base_model:
- Qwen/Qwen2.5-7B-Instruct
pipeline_tag: text-generation
license: other
license_name: rigochat-nc
license_link: https://huggingface.co/IIC/RigoChat-7b-v2/blob/main/LICENSE
tags:
- chat
---
# Model Card for RigoChat-7b-v2
<div style="display: flex; align-items: flex-start;">
<div style="flex: 1;">
RigoChat-7b-v2 is a Qwen-2.5-based model specifically designed to provide accurate responses from Spanish queries. Specifically, is based on the [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) model and has been fine-tuned with Direct Preference Optimization ([DPO](https://arxiv.org/pdf/2305.18290)) for improved performance in Spanish language.
This model is licensed for **non-commercial use**. If you want to use it commercially, please contact us or use it through the service we offer from the [AWS Marketplace](https://aws.amazon.com/marketplace/pp/prodview-3j6o63v6c3pq2?sr=0-2&ref_=beagle&applicationId=AWSMPContessa). You can find a tutorial on how to use it from that infrastructure at the [following link](https://github.com/iiconocimiento/iic-aws/blob/main/notebooks/rigochat-7b/rigochat-7b-Model.ipynb).
</div>
<div style="margin-left: 20px;">
<img src="./images/RigoChat.jpg">
</div>
</div>
## Model Details
### Model Description
This model is the second version of RigoChat, a family of Large Language Models (LLMs) designed to solve typical NLP tasks with Spanish instructions such as: Tool Use, Summarization, Math, Code, Abstractive-QA, etc. Like [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct), this model has no specific use case and can be applied to a wide range of tasks. Indeed, it offers a slight improvement for generalist tasks in Spanish, particularly in RAG (Retriever Augmented Generation) systems with Spanish databases, as its training focused on resolving questions about contexts to prevent hallucinations and ensure safe responses.
Key benefits of this model include:
- Improved performance on generalist tasks in Spanish.
- Enhanced safety and reduced hallucinations in RAG systems with Spanish texts.
- Possibility of using it in different hardware requirements, especially those with reduced computational capacity. For more information on how to use RigoChat-7b-v2 on reduced hardware, see [IIC/RigoChat-7b-v2-GGUF](https://huggingface.co/IIC/RigoChat-7b-v2-GGUF).
Remarkably, this model was trained on a single A100 GPU with limited computational resources, yet achieved its current state in a relatively short time (8.5 hours). This feat was made possible by leveraging a high-quality dataset and employing advanced techniques such as [LoRA](https://arxiv.org/pdf/2106.09685) to optimize memory usage. Further details on the training process can be found below.
- **Developed by:** Instituto de Ingeniería del Conocimiento (IIC).
- **Model type:** Generative Fine-tuned Transformer.
- **Language(s) (NLP):** Spanish (BCP-47 es).
- **License:** RIGOCHAT NON-COMMERCIAL.
- **Arquitecture:** We use [Qwen's](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) architecture without modifications.
- **Finetuned from model:** [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct).
### Model Sources
- **Paper:** https://arxiv.org/abs/2503.08188
## Uses
### Direct Use
You can use and deploy RigoChat-v2 for commercial purposes through a model package from AWS Marketplace. You can check the instructions inside the [following notebook](https://github.com/iiconocimiento/iic-aws/tree/main/notebooks/rigochat-v2).
### Out-of-Scope Use
This language model has been adapted for general natural language processing tasks in Spanish and specific use cases such as RAG. However, there are several cases where the model should not be used due to its technical and ethical limitations:
- **Illegal Activities:** The model should not be used to generate content related to illegal activities, such as creating malicious software, fraud, incitement to crime, or any illegal material.
- **Harmful or Dangerous Content:** It should not be used to generate hate speech, violence, harassment, or any content that promotes discrimination, violence, or abuse.
### Bias, Risks, and Limitations
Although this model has been trained to understand and generate text in Spanish, there are several risks, biases, and limitations that users should be aware of:
- **Biases:** The model may reflect biases present in the training data. These biases could be related to gender, race, social class, sexual orientation, among others, and may generate responses that perpetuate stereotypes or discrimination.
- **Accuracy and Reliability:** While the model generates coherent and useful text in many contexts, it may not always be 100% accurate or reliable, especially in technical, scientific, or legal matters where high certainty is required.
- **Limited or Outdated Knowledge:** The model is not trained with information beyond its training cutoff date. Therefore, it may not reflect recent events, research, or advancements.
### Recommendations
We recommend using this model as a general chatbot or within applications designed for specific tasks, such as SQL queries, RAG systems, or as an autonomous agent to facilitate the use of tools.
## How to Get Started with the Model
As [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) or any model that uses the `chatml` template.
### To load the model and tokenizer
```python
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
)
import torch
model_name = "IIC/RigoChat-7b-v2"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="cuda",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
model_name,
trust_remote_code=True,
)
```
### Sample generation
```python
messages = [
{"role": "user", "content": "¿Cómo puedo transformar un diccionario de listas en una lista de diccionarios, y viceversa, en Python sin utilizar bucles for?"}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=1024,
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```
For a better experience, we recommend to use [the following generation parameters](https://huggingface.co/IIC/RigoChat-7b-v2/blob/main/generation_config.json).
### Tool Use
```python
def obtener_temperatura_actual(location: str) -> float:
"""
Obtener la temperatura actual de una localización.
Args:
location: La localización, con el siguiente formato: "Ciudad, País."
Returns:
El tiempo en dicha localización, en grados Celsius.
"""
return 22.
messages = [
{"role": "user", "content": "¿Cuál es el tiempo en Madrid ahora mismo?"}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
tools=[obtener_temperatura_actual],
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=1024
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```
Check the [tool use documentation](https://huggingface.co/docs/transformers/main/chat_templating#advanced-tool-use--function-calling) from HuggingFace for more information.
If the model generates a tool call, you should add it to the chat like so:
```python
import re
import json
tools = {
"obtener_temperatura_actual" : obtener_temperatura_actual,
}
tool_call = re.search(
r"<tool_call>\s*(\{.*?\})\s*</tool_call>",
response,
)
tool_call = json.loads(tool_call.group(1))
# Add tool metadata to messages
messages.append(
{
"role": "assistant",
"tool_calls": [{"type": "function", "function": tool_call}],
},
)
# Add tool result to messages
messages.append(
{
"role": "tool",
"name": tool_call["name"],
"content": tools[tool_call["name"]](**tool_call["arguments"]),
},
)
```
The above code is intended only for when the model generates a function call, but the same logic can be used if several functions are called at the same time. After that, you can continue to generate messages as normal:
```python
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
tools=[obtener_temperatura_actual],
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=1024
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```
## Training Details
### Training Data
A combination of both public and private datasets designed in the IIC. The dataset consists of 21975 conversations in Spanish, with the format `chatml` and has the same structure as the [Anthropic/hh-rlhf dataset](https://huggingface.co/datasets/Anthropic/hh-rlhf). Each conversation has two variants: `chosen` and `rejected`, and only differs the last answer of the assistant. The last answer in the `chosen` variant is considered a better answer than the one in the `rejected` variant. Different techniques have been used to generate the dataset, which we explain in depth in the research (**coming soon**).
### Training Procedure
We use the [Transformer Reinforcement Learning](https://huggingface.co/docs/trl/index) (TRL) library. Specifically, we have applied [the script they have published](https://github.com/huggingface/trl/blob/main/examples/scripts/dpo.py) as an example for using DPO to the dataset we have generated.
#### Training Hyperparameters
<details><summary>Details</summary>
<p>
```python
LORA_CONFIG = {
"r": 64,
"lora_alpha": 16,
"lora_dropout": 0.1,
"bias": "none",
"task_type": "CAUSAL_LM",
"target_modules": [
"q_proj",
"k_proj",
"v_proj",
"o_proj",
"up_proj",
"gate_proj",
"down_proj",
],
"use_rslora": True,
}
DPO_CONFIG = {
"num_train_epochs": 2,
"logging_steps": 25,
"eval_steps": 500,
"save_steps": 100,
"save_total_limit": 5,
"per_device_train_batch_size": 1,
"per_device_eval_batch_size": 1,
"gradient_accumulation_steps": 16,
"learning_rate": 5e-6,
"max_length": 8192, # max length in the history chat + latest assistant response.
"max_prompt_length": 6656, # max length in the history chat: user-assistant-...-assistant-user.
"gradient_checkpointing": True,
"weight_decay": 0.001,
"optim": "rmsprop",
"evaluation_strategy": "steps",
"lr_scheduler_type": "cosine",
"bf16": True,
}
```
</p>
</details>
#### Speeds, Sizes, Times
Below are some useful parameters showing the results of the latest training logs.
```python
latest_logs = {'loss': 0.3716, 'grad_norm': 4.989994049072266, 'learning_rate': 1.0380020311950844e-10, 'rewards/chosen': 0.534086287021637, 'rewards/rejected': -0.6236276030540466, 'rewards/accuracies': 0.8899999856948853, 'rewards/margins': 1.1577140092849731, 'logps/rejected': -218.88198852539062, 'logps/chosen': -250.0700225830078, 'logits/rejected': -1.6214849948883057, 'logits/chosen': -1.9585875272750854, 'epoch': 1.99}
final_training_results = {'train_runtime': 30825.7138, 'train_samples_per_second': 1.432, 'train_steps_per_second': 0.089, 'train_loss': 0.483570138469306, 'epoch': 2.0}
```
As can be seen in the time used, in eight and a half hours we have managed to improve a state-of-the-art model, with very little hardware, in tasks adapted to Spanish. This can be seen in more detail in the following sections.
## Evaluation
### Testing Data, Factors & Metrics
#### Testing Data
To assess the performance of Large Language Models (LLMs), we have developed and utilized several high-quality corpora tailored to specific evaluation needs:
1. [IIC/AQuAS](https://huggingface.co/datasets/IIC/AQuAS): A manually curated corpus created by two computational linguists to evaluate language models in the task of Abstractive Question Answering in Spanish. It includes examples from domains such as finance, insurance, healthcare, law, and music.
2. [IIC/RagQuAS](https://huggingface.co/datasets/IIC/RagQuAS). Another manually curated corpus developed by the same linguists to evaluate full RAG systems and language models in Abstractive Question Answering tasks in Spanish. This corpus spans a wide range of domains, including hobbies, linguistics, pets, health, astronomy, customer service, cars, daily life, documentation, energy, skiing, fraud, gastronomy, languages, games, nail care, music, skating, first aid, recipes, recycling, complaints, insurance, tennis, transportation, tourism, veterinary, travel, and yoga.
3. **CAM:** Designed for all CAM tasks, this corpus consists of frequently asked questions (FAQs) sourced from consumer-related topics on the websites of the Comunidad de Madrid. The questions are categorized into three levels of degradation—E1, E2, and E3—intended to measure the LLMs ability to understand and effectively respond to poorly formulated queries caused by spelling errors, varying levels of colloquialism, and similar issues. This task also falls under the Abstractive Question Answering category.
4. **Shops:** A multi-turn conversational corpus centered on policies from various clothing companies. The task involves Multi-turn Abstractive Question Answering.
5. **Insurance:** Another multi-turn conversational corpus, this one focuses on policies from various insurance companies. It also involves Multi-turn Abstractive Question Answering.
Each corpus includes the following columns: question, answer, and context(s) containing relevant information from which the model can derive the answer. In multi-turn tasks, a chat history is also provided.
The scoring process for LLMs involves measuring the similarity between the original answer and the one generated by the model. All corpora are private except for AQuAS and RagQuAS, which are publicly available and can serve as examples of the structure and content of the others.
#### Factors
These evaluations are very specific and do not encompass all the general scenarios to which the model could be exposed, since all evaluations are focused on solving tasks for RAG in very specific domains.
#### Metrics
The evaluation is based on using [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) to score the answers.
### Results
| **Model** | **Average** | **AQuAS** | **RagQuAS** | **CAM** | **CAM_E1** | **CAM_E2** | **CAM_E3** | **Shops** | **Insurance** |
|----------------------------|-------------|-----------|-------------|----------|------------|------------|------------|-----------|---------------|
| **RigoChat-7b-v2** | **79.55** | 82.52 | 79.10 | **78.91**| **79.17** | 76.73 | **78.23** | **80.79** | **81.04** |
| GPT-4o | 78.26 | **85.23** | 77.91 | 78.00 | 74.91 | 73.45 | 77.09 | 78.60 | 80.89 |
| stablelm-2-12b-chat | 77.74 | 78.88 | 78.21 | 77.82 | 78.73 | **77.27** | 74.73 | 77.03 | 79.26 |
| Mistral-Small-Instruct-2409| 77.29 | 80.56 | 78.81 | 77.82 | 75.82 | 73.27 | 73.45 | 78.25 | 80.36 |
| Qwen2.5-7B-Instruct | 77.17 | 80.93 | 77.41 | 77.82 | 75.09 | 75.45 | 72.91 | 78.08 | 79.67 |
| Meta-Llama-3.1-8B-Instruct | 76.55 | 81.87 | 80.50 | 72.91 | 73.45 | 75.45 | 71.64 | 77.73 | 78.88 |
| GPT-4o-mini | 76.48 | 82.80 | 75.82 | 76.36 | 74.36 | 72.36 | 71.82 | 78.25 | 80.08 |
| Phi-3.5-mini-instruct | 76.38 | 81.68 | **81.09** | 75.82 | 74.73 | 71.45 | 70.36 | 77.43 | 78.45 |
| gemma-2-9b-it | 75.80 | 82.80 | 78.11 | 72.91 | 73.45 | 71.09 | 71.27 | 77.08 | 79.72 |
| Ministral-8B-Instruct-2410 | 75.19 | 79.63 | 77.31 | 76.00 | 73.45 | 72.36 | 70.18 | 76.44 | 76.14 |
| GPT-3.5-turbo-0125 | 74.78 | 80.93 | 73.53 | 76.73 | 72.55 | 72.18 | 69.09 | 75.63 | 77.64 |
| Llama-2-7b-chat-hf | 71.18 | 67.10 | 77.31 | 71.45 | 70.36 | 70.73 | 68.55 | 72.07 | 71.90 |
| granite-3.0-8b-instruct | 71.08 | 73.08 | 72.44 | 72.36 | 71.82 | 69.09 | 66.18 | 69.97 | 73.73 |
| RigoChat-7b-v1 | 62.13 | 72.34 | 67.46 | 61.27 | 59.45 | 57.45 | 57.64 | 62.10 | 59.34 |
| salamandra-7b-instruct | 61.96 | 63.74 | 60.70 | 64.91 | 63.27 | 62.36 | 60.55 | 59.94 | 60.23 |
#### Summary
RigoChat-7b-v2 manages to significantly improve performance compared to Qwen-2.5 in the tasks for which it has been indirectly designed. On the other hand, it manages to outperform most state-of-the-art models in these tasks, demonstrating that with few resources LLMs can be aligned for specific use cases.
## Environmental Impact
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
- **Hardware Type:** A100 PCIe 80GB
- **Hours used:** 8.5
- **Cloud Provider:** Private Infrastructure
- **Compute Region:** Spain
- **Carbon Emitted:** 0.92
#### Hardware
GPU NVIDIA A100 with Tensor Core and 80GB.
#### Software
`Ubuntu 22.04.5 LTS` and the following requirements:
```
python=3.11
flash_attn>=2.5.8
datasets
numpy
trl
peft
huggingface_hub
```
## Citation
```
@misc {instituto_de_ingeniería_del_conocimiento_2025,
author = { {Instituto de Ingeniería del Conocimiento} },
title = { RigoChat-7b-v2 },
year = 2025,
url = { https://huggingface.co/IIC/RigoChat-7b-v2 },
doi = { 10.57967/hf/4158 },
publisher = { Hugging Face }
}
```
```
@misc{gómez2025rigochat2adaptedlanguage,
title={RigoChat 2: an adapted language model to Spanish using a bounded dataset and reduced hardware},
author={Gonzalo Santamaría Gómez and Guillem García Subies and Pablo Gutiérrez Ruiz and Mario González Valero and Natàlia Fuertes and Helena Montoro Zamorano and Carmen Muñoz Sanz and Leire Rosado Plaza and Nuria Aldama García and David Betancur Sánchez and Kateryna Sushkova and Marta Guerrero Nieto and Álvaro Barbero Jiménez},
year={2025},
eprint={2503.08188},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2503.08188},
}
```
## Disclaimer
Please note that the model may contain biases or other undesired distortions based on the training data. When third parties implement systems or provide services based on this model, or use the model themselves, they are responsible for mitigating associated risks and ensuring compliance with applicable regulations, including those governing the use of artificial intelligence. The IIC, as the owner and creator of the model, shall not be held liable for any outcomes resulting from third-party use.
## Model Card Contact
- `contacto.iic@iic.uam.es`
## Authorship Notice
### Additional Information
- **Author**: Instituto de Ingeniería del Conocimiento (IIC).
- **Contact**: For more information, please email [contacto.iic@iic.uam.es](contacto.iic@iic.uam.es).
- **Copyright**: Copyright(c) 2024 ADIC.

24
added_tokens.json Normal file
View File

@@ -0,0 +1,24 @@
{
"</tool_call>": 151658,
"<tool_call>": 151657,
"<|box_end|>": 151649,
"<|box_start|>": 151648,
"<|endoftext|>": 151643,
"<|file_sep|>": 151664,
"<|fim_middle|>": 151660,
"<|fim_pad|>": 151662,
"<|fim_prefix|>": 151659,
"<|fim_suffix|>": 151661,
"<|im_end|>": 151645,
"<|im_start|>": 151644,
"<|image_pad|>": 151655,
"<|object_ref_end|>": 151647,
"<|object_ref_start|>": 151646,
"<|quad_end|>": 151651,
"<|quad_start|>": 151650,
"<|repo_name|>": 151663,
"<|video_pad|>": 151656,
"<|vision_end|>": 151653,
"<|vision_pad|>": 151654,
"<|vision_start|>": 151652
}

29
config.json Normal file
View File

@@ -0,0 +1,29 @@
{
"_name_or_path": "IIC/RigoChat-7b-v2",
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 3584,
"initializer_range": 0.02,
"intermediate_size": 18944,
"max_position_embeddings": 32768,
"max_window_layers": 28,
"model_type": "qwen2",
"num_attention_heads": 28,
"num_hidden_layers": 28,
"num_key_value_heads": 4,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.45.2",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 152064
}

14
generation_config.json Normal file
View File

@@ -0,0 +1,14 @@
{
"bos_token_id": 151643,
"do_sample": true,
"eos_token_id": [
151645,
151643
],
"pad_token_id": 151643,
"repetition_penalty": 1.05,
"temperature": 0.7,
"top_k": 20,
"top_p": 0.8,
"transformers_version": "4.45.2"
}

BIN
images/RigoChat.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

151388
merges.txt Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7c2f730cdef23656141e233cd1cb15b68b47bef86cccd81c1e86ca39f52d553d
size 4877660776

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8414b193717b2c89e8bbda2b1e4de60bc2c19d6d752a996cf1f7f8a283e4b838
size 4932751008

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:42a44c88c4393da603d9141e0e8d138a059e9be296339988f80a4a98a5de3786
size 4330865200

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:06006972c3be88e8a44fe21cfe2b0472b130780c781a741f8f90f1fe5ba3aae2
size 1089994880

View File

@@ -0,0 +1,346 @@
{
"metadata": {
"total_size": 15231233024
},
"weight_map": {
"lm_head.weight": "model-00004-of-00004.safetensors",
"model.embed_tokens.weight": "model-00001-of-00004.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.18.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.19.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.19.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.19.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.20.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.20.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.20.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.20.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.8.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.norm.weight": "model-00003-of-00004.safetensors"
}
}

31
special_tokens_map.json Normal file
View File

@@ -0,0 +1,31 @@
{
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"eos_token": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

3
tokenizer.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
size 11421896

207
tokenizer_config.json Normal file
View File

@@ -0,0 +1,207 @@
{
"add_bos_token": false,
"add_prefix_space": false,
"added_tokens_decoder": {
"151643": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151644": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151645": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151646": {
"content": "<|object_ref_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151647": {
"content": "<|object_ref_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151648": {
"content": "<|box_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151649": {
"content": "<|box_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151650": {
"content": "<|quad_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151651": {
"content": "<|quad_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151652": {
"content": "<|vision_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151653": {
"content": "<|vision_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151654": {
"content": "<|vision_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151655": {
"content": "<|image_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151656": {
"content": "<|video_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151657": {
"content": "<tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151658": {
"content": "</tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151659": {
"content": "<|fim_prefix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151660": {
"content": "<|fim_middle|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151661": {
"content": "<|fim_suffix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151662": {
"content": "<|fim_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151663": {
"content": "<|repo_name|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151664": {
"content": "<|file_sep|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
}
},
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"bos_token": null,
"chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0]['role'] == 'system' %}\n {{- messages[0]['content'] }}\n {%- else %}\n {{- 'Te llamas RigoChat, y eres un modelo de lenguaje diseñado por el Instituto de Ingeniería del Conocimiento (IIC). Tu propósito es resolver las consultas del usuario, respondiendo en español, de forma cercana y amable.' }}\n {%- endif %}\n {{- \"\\n\\n# Tools\\n\\nPuede que necesites llamar a una o varias herramientas para resolver la petición del usuario.\\n\\nPuedes consultar las funciones junto con sus argumentos dentro de <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nPara cada llamada que necesites realizar, devuelve un objeto json con el nombre de la función y los argumentos dentro de <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0]['role'] == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n {%- else %}\n {{- '<|im_start|>system\\nTe llamas RigoChat, y eres un modelo de lenguaje diseñado por el Instituto de Ingeniería del Conocimiento (IIC). Tu propósito es resolver las consultas del usuario, respondiendo en español, de forma cercana y amable.<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {{- '<|im_start|>' + message.role }}\n {%- if message.content %}\n {{- '\\n' + message.content }}\n {%- endif %}\n {%- for tool_call in message.tool_calls %}\n {%- if tool_call.function is defined %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '\\n<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {{- tool_call.arguments | tojson }}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
"clean_up_tokenization_spaces": false,
"eos_token": "<|im_end|>",
"errors": "replace",
"model_max_length": 131072,
"pad_token": "<|endoftext|>",
"split_special_tokens": false,
"tokenizer_class": "Qwen2Tokenizer",
"unk_token": null
}

1
vocab.json Normal file

File diff suppressed because one or more lines are too long