初始化项目,由ModelHub XC社区提供模型

Model: AgentPublic/guillaumetell-7b
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-25 10:31:17 +08:00
commit 89c00e608e
21 changed files with 92430 additions and 0 deletions

36
.gitattributes vendored Normal file
View File

@@ -0,0 +1,36 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
references_mfs_corpus.json filter=lfs diff=lfs merge=lfs -text

312
README.md Normal file
View File

@@ -0,0 +1,312 @@
---
license: apache-2.0
pipeline_tag: text-generation
language:
- fr
---
# Carte du modèle : Guillaume Tell
[Version française](#version-française) / [English version](#english-version)
---
# Version française
---
**Guillaume Tell** est un Large Language Model (LLM) français basé sur Mistral Open-Hermes 2.5 optimisé pour le RAG (Retrieval Augmented Generation) avec traçabilité des sources et explicabilité.
---
## Sommaire
1. [Détails du modèle](#détails-du-modèle)
2. [Utilisation](#utilisation)
- [Contexte de création](#contexte-de-création)
- [Finalités et limites du modèle](#finalités-et-limites-du-modèle)
- [Cas d'usage et utilisateurs](#cas-dusage-et-utilisateurs)
- [Exemple](#exemple)
3. [Prompt](#prompt)
4. [Informations sur le finetuning](#informations-sur-le-finetuning)
5. [Utilisation d'Albert pour des tâches de RAG](#utilisation-dalbert-pour-des-tâches-de-rag)
5. [Glossaire](#glossaire)
---
## Détails du modèle
### Description du modèle
<!-- Provide a longer summary of what this model is. -->
Le modèle "Guillaume Tell" vise à améliorer la vérifiabilité de la génération de textes basés sur des sources administratives françaises. À partir d'une question et d'une sélection de cinq sources, il génère une réponse sourcée, avec des paramètres spéciaux pour les citations.
- **Développé par :** Etalab (Service du Datalab) - Direction Interministérielle du Numérique
- **Version:** Guillaume-Tell-base
- **Type de modèle :** Transformers, Text-Generation
- **Licence :** [Apache-2.0](https://spdx.org/licenses/Apache-2.0.html)
- **Entraîné depuis le modèle :** OpenHermes-2.5-Mistral-7B
---
## Utilisation
### Contexte de création
Guillaume Tell a été developpé pour **ALBERT**, loutil dIA Générative interministérielle de lÉtat, et plus particulièrement dans le cadre de [l'expérimentation d'un modèle d'assistance aux conseillers numériques](https://www.france-services.gouv.fr/actualites/experimentation-dun-modele-dassistance-france-services-IA) [France services](#glossaire) basé sur lintelligence artificielle. Guillaume Tell vise à répondre aux besoins spécifiques des conseillers face à un LLM, en l'occurence la vérification des réponses générées par Albert pour s'assurer de leur justesse avant de les transmettre à des usagers accueillis en maison France services.
### Finalités et limites du modèle
Guillaume Tell est un modèle de langage, avec des capacités conversationnelles et de recherche d'information sourcée. Il peut être utilisé pour formuler une réponse à des questions relatives à l'administration française (eg. démarches administratives) en allant récupérer des informations pertinentes dans sa base de connaissances (RAG) et en synthétisant une réponse à partir de celles-ci.
Guillaume Tell fournit des réponses de premier niveau et n'est pas en capacité de donner des réponses administratives complexes. Il n'est pas en capacité de répondre à des questions sortant du champ administratif français. Il formule des réponses seulement en français.
### Cas d'usage et utilisateurs
Son usage est prévu par les agents publics des administrations françaises afin de faciliter la recherche d'information administrative. Il est déconseillé de mettre Guillaume Tell directement entre les mains de personnes qui n'ont pas été formées spécifiquement à son usage et qui manquent d'expertise sur les démarches administratives, comme par exemple les usagers des services publics. En effet, comme tout LLM, Guillaume Tell présente des risques d'erreurs ("hallucinations") ou d'oublis qu'une personne non formée ne sera pas en capacité de détecter.
### Exemple
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
> Question: Quelles sont les étapes de la procédure de rétablissement en cas de surendettement ?
>
> Sources:
> * **10965c7bc7a8e185_0** : Surendettement : rétablissement personnel avec liquidation judiciaire. La procédure de rétablissement consiste à effacer les dettes d'une personne surendettée…
> * **4c4511d1c0e6dc4c_0** : Surendettement : rétablissement personnel sans liquidation judiciaire. La procédure de rétablissement personnel consiste à effacer les dettes
> * **b8d3342e4ac89fe9_0** : Peut-on être saisi pendant la procédure de surendettement ? Dès le dépôt du dossier de surendettement. Durée de la suspension…
> * **n28e7fcf81deee0ff_0** : Surendettement : rétablissement personnel avec liquidation judiciaire. Conditions. Personnes concernées. Une procédure de rétablissement personnel avec liquidation judiciaire est engagée…
> * **568bebd8fb112ee_0** : Peut-on être saisi pendant la procédure de surendettement ? Dès que le dossier est recevable…
>
> Résultat :
> La proc&eacute;dure de r&eacute;tablissement en cas de surendettement comporte plusieurs &eacute;tapes. Tout d'abord, la personne concern&eacute;e doit r&eacute;pondre &agrave; certaines conditions pour &ecirc;tre &eacute;ligible &agrave; cette proc&eacute;dure.
> Ensuite, la commission de surendettement ouvre la proc&eacute;dure avec l'accord du surendett&eacute;. Si le surendett&eacute; poss&egrave;de un patrimoine pouvant &ecirc;tre vendu, une liquidation judiciaire est prononc&eacute;e&lt;ref text=&quot;Lorsque le surendett&eacute; poss&egrave;de un patrimoine pouvant &ecirc;tre vendu, la proc&eacute;dure de r&eacute;tablissement consiste &agrave; effacer les dettes et est prononc&eacute;e avec liquidation judiciaire (vente des biens).&quot;&gt;[28e7fcf81deee0ff_0]&lt;/ref&gt;.
> Dans le cas contraire, une proc&eacute;dure sans liquidation judiciaire est engag&eacute;e&lt;ref text=&quot;Elle est prononc&eacute;e sans liquidation judiciaire (c'est-&agrave;-dire sans vente des biens) lorsque la personne surendett&eacute;e ne poss&egrave;de pas de patrimoine.&quot;&gt;[4c4511d1c0e6dc4c_0]&lt;/ref&gt;.
>
> Pendant la proc&eacute;dure, la suspension de toute action juridique contre le surendett&eacute; est possible&lt;ref text=&quot;D&egrave;s le d&eacute;p&ocirc;t du dossier de surendettement. Dur&eacute;e de la suspension. La suspension, si elle est accord&eacute;e, est acquise selon les cas jusqu'&agrave; l'un des &eacute;v&eacute;nements suivants : D&eacute;cision d'irrecevabilit&eacute; du dossier Approbation du plan conventionnel de redressement D&eacute;cision de mesures impos&eacute;es Jugement pronon&ccedil;ant un r&eacute;tablissement personnel sans liquidation judiciaire Jugement d'ouverture d'une proc&eacute;dure de r&eacute;tablissement personnel avec liquidation judiciaire Elle ne peut toutefois pas d&eacute;passer 2 ans.&quot;&gt;[b8d3342e4ac89fe9_0]&lt;/ref&gt;.
>
> Enfin, la cl&ocirc;ture de la proc&eacute;dure se fait par jugement qui permet l'effacement des dettes&lt;ref text=&quot;Jugement pronon&ccedil;ant un r&eacute;tablissement personnel sans liquidation judiciaire Jugement d'ouverture d'une proc&eacute;dure de r&eacute;tablissement personnel avec liquidation judiciaire&quot;&gt;[28e7fcf81deee0ff_0]&lt;/ref&gt;.
>
<!-- Provide the basic links for the model.
### Model Sources [optional]
- **Repository:**
- **Paper [optional]:**
- **Demo [optional]:**
-->
---
## Prompt
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
### Format du prompt
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
Comme Mistral, Open-Hermes 2.5, la syntaxe de Guillaume Tell est basée sur chatml. Elle nécessite un prompt spécifique, ainsi qu'une syntaxe prédéfinie pour ajouter les sources à la question.
**Exemple de prompt:**
```
<|im_start|>system
Tu es Albert, le chatbot des Maisons France Service qui donne des réponses sourcées.<|im_end|>
<|im_start|>user
Ecrit un texte référencé en réponse à cette question : Quelles sont les étapes de la procédure de rétablissement en cas de surendettement ?
Les références doivent être citées de cette manière : texte rédigé<ref text=\"[passage pertinent dans la référence]\">[\"identifiant de la référence\"]</ref>Si les références ne permettent pas de répondre, qu'il n'y a pas de réponse.
Les cinq références disponibles :
10965c7bc7a8e185_0 :(…)
4c4511d1c0e6dc4c_0 :(…)
b8d3342e4ac89fe9_0 :(…)
28e7fcf81deee0ff_0 :(…)
e568bebd8fb112ee_0 :(…)
```
Guillaume-Tell est actuellement entraîné et testé sur une sélection fixe de cinq sources. Il devrait fonctionner sur un ensemble plus petit ou plus grand, mais cela n'a pas encore été expérimenté.
---
## Informations sur le finetuning
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
Guillaume Tell a été fine tuné en utilisant l'approche LORA et une quantization sur 4 bits sur :
- 3880 instructions RAG synthétiques basées sur les données de service-public.fr ;
- 5000 instructions chatRAG basées sur les données de service-public.fr et d'Open Hermes.
Le code de finetuning [`finetuning.py`](https://huggingface.co/AgentPublic/guillaumetell-7b/blob/main/finetuning.py) est disponible dans la section [`Files and versions`](https://huggingface.co/AgentPublic/guillaumetell-7b/tree/main).
---
## Utilisation d'Albert pour des tâches de [RAG](#glossaire)
Il est possible d'utiliser des techniques de RAG afin d'optimiser la pertinence de la réponse du modèle. Nous pouvons ainsi obtenir des réponses basées sur les bonnes données adaptées à la question.
C'est ce que nous faisons actuellement en production avec ALBERT.
À la date de la sortie du modèle, les données pour effectuer le RAG d'ALBERT sont constituées de:
- Fiches service-public.fr decoupées en chunks de 300 mots.
---
## Glossaire
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
- **France services** : réseau de structures territoriales qui combinent accueil physique et accompagnement numérique pour aider les publics reçus dans les démarches administratives de plusieurs services publics.
- **LLM** (Large Language Model): modèle de Deep Learning capable de comprendre et de générer du langage humain en traitant de grandes quantités de données textuelles.
- **RAG** (Retrieval Augmented Generation) : Technique améliorant les performances des IA génératives en permettant aux LLM d'utiliser des ressources de données supplémentaires sans besoin de réentraînement.
---
# English version
---
**Guillaume Tell** is a French LLM based on Mistral Open-Hermes 2.5 optimized for RAG (Retrieval Augmented Generation) with source traceability and explicability.
---
## Table of contents
1. [Model details](#model-details)
2. [Uses](#uses)
- [Creation context](#creation-context)
- [Purposes and limitations of the model](#purposes-and-limitations-of-the-model)
- [Use-cases-and-users](#use-cases-and-users)
- [Example](#example)
3. [Prompt](#prompt-1)
4. [Finetuning information](#finetuning-information)
5. [Using Albert for RAG tasks](#using-albert-for-rag--tasks)
5. [Glossary](#glossary)
---
## Model details
### Model Description
<!-- Provide a longer summary of what this model is. -->
Guillaume Tell aims to improve the verifiability of the generation of texts based on French administrative sources. From a question and a selection of five sources, it generates a sourced answer, with special parameters for citations.
- **Developed by:** Etalab (Service du Datalab) - Direction Interministérielle du Numérique
- **Version:** Guillaume-Tell-base
- **Model type:** Transformers, Text-Generation
- **License:** [Apache-2.0](https://spdx.org/licenses/Apache-2.0.html)
- **Finetuned from model :** OpenHermes-2.5-Mistral-7B
---
## Uses
### Creation context
Guillaume Tell has been developed for **ALBERT**, the French government's interministerial Generative AI tool, and more specifically as part of the [experimentation of a France services advisor assistance model](https://www.france-services.gouv.fr/actualites/experimentation-dun-modele-dassistance-france-services-IA) based on artificial intelligence. Guillaume Tell is designed to meet the specific needs of advisors faced with an LLM, in this case the verification of answers generated by Albert to ensure their accuracy before transmitting them to users welcomed in France services centers.
### Purposes and limitations of the model
Guillaume Tell is a language model, with conversational and sourced information retrieval capabilities. It can be used to formulate an answer to questions relating to the French administration (eg. administrative procedures) by retrieving relevant information from its knowledge base (RAG) and synthesizing an answer from it.
Guillaume Tell provides first-level answers and is not able to give complex administrative answers. It is not able to answer questions outside the French administrative field. It provides answers in French only.
### Use cases and users
It is intended for use by public officials of French administrations to facilitate the search for administrative information. It is not recommended to put Guillaume Tell directly into the hands of people who have not been specifically trained in its use and who lack expertise in administrative procedures, such as users of public services. Indeed, like any LLM, Guillaume Tell presents risks of hallucination or forgetfulness that an untrained person will not be able to detect.
### Example
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
> Question: Quelles sont les étapes de la procédure de rétablissement en cas de surendettement ?
>
> Sources:
> * **10965c7bc7a8e185_0** : Surendettement : rétablissement personnel avec liquidation judiciaire. La procédure de rétablissement consiste à effacer les dettes d'une personne surendettée…
> * **4c4511d1c0e6dc4c_0** : Surendettement : rétablissement personnel sans liquidation judiciaire. La procédure de rétablissement personnel consiste à effacer les dettes
> * **b8d3342e4ac89fe9_0** : Peut-on être saisi pendant la procédure de surendettement ? Dès le dépôt du dossier de surendettement. Durée de la suspension…
> * **n28e7fcf81deee0ff_0** : Surendettement : rétablissement personnel avec liquidation judiciaire. Conditions. Personnes concernées. Une procédure de rétablissement personnel avec liquidation judiciaire est engagée…
> * **568bebd8fb112ee_0** : Peut-on être saisi pendant la procédure de surendettement ? Dès que le dossier est recevable…
>
> Result:
> La proc&eacute;dure de r&eacute;tablissement en cas de surendettement comporte plusieurs &eacute;tapes. Tout d'abord, la personne concern&eacute;e doit r&eacute;pondre &agrave; certaines conditions pour &ecirc;tre &eacute;ligible &agrave; cette proc&eacute;dure.
> Ensuite, la commission de surendettement ouvre la proc&eacute;dure avec l'accord du surendett&eacute;. Si le surendett&eacute; poss&egrave;de un patrimoine pouvant &ecirc;tre vendu, une liquidation judiciaire est prononc&eacute;e&lt;ref text=&quot;Lorsque le surendett&eacute; poss&egrave;de un patrimoine pouvant &ecirc;tre vendu, la proc&eacute;dure de r&eacute;tablissement consiste &agrave; effacer les dettes et est prononc&eacute;e avec liquidation judiciaire (vente des biens).&quot;&gt;[28e7fcf81deee0ff_0]&lt;/ref&gt;.
> Dans le cas contraire, une proc&eacute;dure sans liquidation judiciaire est engag&eacute;e&lt;ref text=&quot;Elle est prononc&eacute;e sans liquidation judiciaire (c'est-&agrave;-dire sans vente des biens) lorsque la personne surendett&eacute;e ne poss&egrave;de pas de patrimoine.&quot;&gt;[4c4511d1c0e6dc4c_0]&lt;/ref&gt;.
>
> Pendant la proc&eacute;dure, la suspension de toute action juridique contre le surendett&eacute; est possible&lt;ref text=&quot;D&egrave;s le d&eacute;p&ocirc;t du dossier de surendettement. Dur&eacute;e de la suspension. La suspension, si elle est accord&eacute;e, est acquise selon les cas jusqu'&agrave; l'un des &eacute;v&eacute;nements suivants : D&eacute;cision d'irrecevabilit&eacute; du dossier Approbation du plan conventionnel de redressement D&eacute;cision de mesures impos&eacute;es Jugement pronon&ccedil;ant un r&eacute;tablissement personnel sans liquidation judiciaire Jugement d'ouverture d'une proc&eacute;dure de r&eacute;tablissement personnel avec liquidation judiciaire Elle ne peut toutefois pas d&eacute;passer 2 ans.&quot;&gt;[b8d3342e4ac89fe9_0]&lt;/ref&gt;.
>
> Enfin, la cl&ocirc;ture de la proc&eacute;dure se fait par jugement qui permet l'effacement des dettes&lt;ref text=&quot;Jugement pronon&ccedil;ant un r&eacute;tablissement personnel sans liquidation judiciaire Jugement d'ouverture d'une proc&eacute;dure de r&eacute;tablissement personnel avec liquidation judiciaire&quot;&gt;[28e7fcf81deee0ff_0]&lt;/ref&gt;.
>
<!-- Provide the basic links for the model.
### Model Sources [optional]
- **Repository:**
- **Paper [optional]:**
- **Demo [optional]:**
-->
---
## Prompt
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
### Prompt format
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
Like Mistral Open-Hermes 2.5, Guillaume Tell's syntax is based on chatml. It requires a specific prompt, as well as a predefined syntax for adding sources to the question.
**Prompt example :**
```
<|im_start|>system
Tu es Albert, le chatbot des Maisons France Service qui donne des réponses sourcées.<|im_end|>
<|im_start|>user
Ecrit un texte référencé en réponse à cette question : Quelles sont les étapes de la procédure de rétablissement en cas de surendettement ?
Les références doivent être citées de cette manière : texte rédigé<ref text=\"[passage pertinent dans la référence]\">[\"identifiant de la référence\"]</ref>Si les références ne permettent pas de répondre, qu'il n'y a pas de réponse.
Les cinq références disponibles :
10965c7bc7a8e185_0 :(…)
4c4511d1c0e6dc4c_0 :(…)
b8d3342e4ac89fe9_0 :(…)
28e7fcf81deee0ff_0 :(…)
e568bebd8fb112ee_0 :(…)
```
Guillaume-Tell is currently being trained and tested on a fixed selection of five sources. It should work on a smaller or larger set, but this has not yet been tested.
---
## Finetuning information
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
Guillaume Tell was fine-tuned using the LORA approach and 4-bit quantization on :
- 3880 synthetic RAG instructions based on service-public.fr data
- 5000 chatRAG instructions based on service-public.fr and Open Hermes data.
The finetuning code [`finetuning.py`](https://huggingface.co/AgentPublic/guillaumetell-7b/blob/main/finetuning.py) is available in the [`Files and versions`](https://huggingface.co/AgentPublic/guillaumetell-7b/tree/main) section.
---
## Using Albert for [RAG](#glossary) tasks
RAG techniques can be used to optimize the relevance of the model's response. In this way, we can obtain answers based on the right data for the right question.
This is what we are currently doing in production with ALBERT.
At the time of the model's release, the data for ALBERT's RAG consisted of the following:
- service-public.fr sheets cut into 300-word chunks
---
## Glossary
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
- **France services** : network of local structures that combine physical reception with digital support to help visitors with administrative procedures for several public services.
- **LLM** (Large Language Model): Deep Learning model capable of understanding and generating human language by processing large amounts of textual data.
- **RAG** (Retrieval Augmented Generation): Technique improving the performance of generative AI by enabling LLMs to use additional data resources without the need for retraining.
---
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
<!--
## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->

4
added_tokens.json Normal file
View File

@@ -0,0 +1,4 @@
{
"<|im_end|>": 32000,
"<|im_start|>": 32001
}

26
config.json Normal file
View File

@@ -0,0 +1,26 @@
{
"_name_or_path": "mistral-hermes-2.5",
"architectures": [
"MistralForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 32000,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 32768,
"model_type": "mistral",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-05,
"rope_theta": 10000.0,
"sliding_window": 4096,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.37.0",
"use_cache": false,
"vocab_size": 32002
}

231
finetuning.py Normal file
View File

@@ -0,0 +1,231 @@
import os
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)
from datasets import load_dataset
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
HfArgumentParser,
TrainingArguments,
pipeline,
logging,
LlamaTokenizerFast
)
from peft import LoraConfig, PeftModel, get_peft_model
from trl import SFTTrainer
# Le modèle que nous allons utiliser dans le Hugging Face hub
model_name = "mistral-hermes-2.5"
torch.cuda.empty_cache()
#project_directory = "~/finetuning/sigmund-spplus"
# Le nom du nouveau modèle
new_model_name = "mistral-mfs-reference-2"
# The output directory where the model predictions and checkpoints will be written
output_dir = "./mistral-mfs-reference-2"
# Tensorboard logs
tb_log_dir = "./mistral-mfs-reference-2/logs"
# Nombre de steps : à ajuster selon la taille du corpus et le nombre d'epochs à faire tourner.
max_steps = 2000
# Les paramètres importants !!
per_device_train_batch_size = 4 #Nombre d'exemples envoyés par batch. En mettre plus pour aller plus vite.
learning_rate = 2e-5 #De préférence un taux d'apprentissage bas comme Mistral-Hermes se débrouille bien en français
max_seq_length = 4096 #C'est la fenêtre contextuelle. Elle peut être portée jusqu'à 4096 tokens (mais attention à la mémoire disponible !)
save_steps = 1000 # Sauvegarde des steps (permet de faire redémarrer l'entraînement si le fine-tuning ne fonctionne pas)
# Learning rate schedule (constant a bit better than cosine, and has advantage for analysis)
lr_scheduler_type = "linear"
#Les autres paramètres
local_rank = -1
per_device_eval_batch_size = 1
gradient_accumulation_steps = 4
max_grad_norm = 0.3
weight_decay = 0.001
lora_alpha = 16
lora_dropout = 0.1
lora_r = 64
# Group sequences into batches with same length (saves memory and speeds up training considerably)
group_by_length = True
# Activate 4-bit precision base model loading
use_4bit = True
# Activate nested quantization for 4-bit base models
use_nested_quant = False
# Compute dtype for 4-bit base models
bnb_4bit_compute_dtype = "float16"
# Quantization type (fp4 or nf4=
bnb_4bit_quant_type = "nf4"
# Number of training epochs
num_train_epochs = 1
# Enable fp16 training
fp16 = True
# Enable bf16 training
bf16 = False
# Use packing dataset creating
packing = False
# Enable gradient checkpointing
gradient_checkpointing = True
# Optimizer to use, original is paged_adamw_32bit
optim = "paged_adamw_32bit"
# Fraction of steps to do a warmup for
warmup_ratio = 0.03
# Log every X updates steps
logging_steps = 1
# Load the entire model on the GPU 0
device_map = {"": 0}
# Visualize training
report_to = "tensorboard"
#2. Import du tokenizer.
peft_config = LoraConfig(
lora_alpha=lora_alpha,
lora_dropout=lora_dropout,
r=lora_r,
inference_mode=False,
task_type="CAUSAL_LM",
target_modules = ["q_proj", "v_proj"]
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# This is the fix for fp16 training
#tokenizer.padding_side = "right"
tokenizer.pad_token = tokenizer.eos_token
#3. Préparation de la base de données
from datasets import load_dataset
def format_alpaca(sample):
prompt = f"{sample['conversation']}"
return prompt
# template dataset to add prompt to each sample
def template_dataset(sample):
sample["text"] = f"{format_alpaca(sample)}{tokenizer.eos_token}"
return sample
# Chargement du dataset.
#dataset = load_dataset("databricks/databricks-dolly-15k", split="train")
data_files = {"train": "corpus_guillaume_tell_2.json"}
dataset = load_dataset("json", data_files=data_files, split="train")
# Shuffle the dataset
dataset_shuffled = dataset.shuffle(seed=42)
# Select the first 250 rows from the shuffled dataset, comment if you want 15k
#dataset = dataset_shuffled.select(range(512))
#Transformation du dataset pour utiliser le format guanaco
dataset = dataset.map(template_dataset, remove_columns=list(dataset.features))
print(dataset[40])
#4. Import du modèle
# Load tokenizer and model with QLoRA configuration
compute_dtype = getattr(torch, bnb_4bit_compute_dtype)
bnb_config = BitsAndBytesConfig(
load_in_4bit=use_4bit,
bnb_4bit_quant_type=bnb_4bit_quant_type,
bnb_4bit_compute_dtype=compute_dtype,
bnb_4bit_use_double_quant=use_nested_quant,
)
if compute_dtype == torch.float16 and use_4bit:
major, _ = torch.cuda.get_device_capability()
if major >= 8:
print("=" * 80)
print("Your GPU supports bfloat16, you can accelerate training with the argument --bf16")
print("=" * 80)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map=device_map,
quantization_config=bnb_config
)
model.config.use_cache = False
model.config.pretraining_tp = 1
#5. Fine-tuning
torch.cuda.empty_cache()
training_arguments = TrainingArguments(
output_dir=output_dir,
per_device_train_batch_size=per_device_train_batch_size,
gradient_accumulation_steps=gradient_accumulation_steps,
gradient_checkpointing=True,
optim=optim,
save_steps=save_steps,
logging_steps=logging_steps,
learning_rate=learning_rate,
fp16=fp16,
bf16=bf16,
max_grad_norm=max_grad_norm,
max_steps=max_steps,
warmup_ratio=warmup_ratio,
group_by_length=group_by_length,
lr_scheduler_type=lr_scheduler_type,
report_to="tensorboard"
)
trainer = SFTTrainer(
model=model,
train_dataset=dataset,
peft_config=peft_config,
dataset_text_field="text",
max_seq_length=max_seq_length,
tokenizer=tokenizer,
args=training_arguments,
packing=packing
)
trainer.train()
#trainer.train(resume_from_checkpoint=True)
#6. Sauvegarde
model_to_save = trainer.model.module if hasattr(trainer.model, 'module') else trainer.model # Take care of distributed/parallel training
model_to_save.save_pretrained(new_model_name)
torch.cuda.empty_cache()
from peft import AutoPeftModelForCausalLM
model = AutoPeftModelForCausalLM.from_pretrained(new_model_name, device_map="auto", torch_dtype=torch.bfloat16)
model = model.merge_and_unload()
output_merged_dir = os.path.join(new_model_name, new_model_name)
model.save_pretrained(output_merged_dir, safe_serialization=True)
tokenizer.save_pretrained(output_merged_dir)

6
generation_config.json Normal file
View File

@@ -0,0 +1,6 @@
{
"_from_model_config": true,
"bos_token_id": 1,
"eos_token_id": 32000,
"transformers_version": "4.37.0"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6f183b901f05af17b043657e7f1d4b570b903c9386400e31b53e31320a4dfef4
size 4943178720

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:43d9be88a4817499bb516292d8806461debd9462c70fe3a9e20ca301625190db
size 4999819336

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:198d2f5141fb26479dd63cae58926e12f3576b0876096a3be5be95f8bb5e19ea
size 4540532728

View File

@@ -0,0 +1,298 @@
{
"metadata": {
"total_size": 14483496960
},
"weight_map": {
"lm_head.weight": "model-00003-of-00003.safetensors",
"model.embed_tokens.weight": "model-00001-of-00003.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.10.input_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.11.input_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.12.input_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.13.input_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.14.input_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.15.input_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.16.input_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.17.input_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.18.input_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.19.input_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.20.input_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.21.input_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.22.input_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
"model.layers.23.input_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.24.input_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.25.input_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.26.input_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.27.input_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.28.input_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.29.input_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.3.input_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.30.input_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.31.input_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
"model.layers.4.input_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.5.input_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.6.input_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.7.input_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.8.input_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.9.input_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
"model.norm.weight": "model-00003-of-00003.safetensors"
}
}

14
prompt_config.yaml Normal file
View File

@@ -0,0 +1,14 @@
prompt_format: chatml
max_tokens: 4096
prompts:
- mode: rag
system_prompt: "Tu es Albert, le chatbot des Maisons France Service qui donne des réponses sourcées."
template: tell_rag_prompt.jinja
default:
limit: 5
- mode: analysis
system_prompt: "Tu es Albert, le chatbot des Maisons France Service qui donne des réponses sourcées."
template: tell_analysis_prompt.jinja
default:
limit: 5

79
prompt_demo.py Normal file
View File

@@ -0,0 +1,79 @@
#!/bin/python
import sys, os
from pprint import pprint
from jinja2 import Environment, FileSystemLoader, meta
import yaml
sys.path.append(".")
os.chdir(os.path.dirname(os.path.abspath(__file__)))
if __name__ == "__main__":
with open('prompt_config.yaml') as f:
config = yaml.safe_load(f)
print("prompt format:", config.get("prompt_format"))
print(config)
print()
for prompt in config["prompts"]:
if prompt["mode"] == "rag":
print(f'--- prompt mode: {prompt["mode"]} ---')
env = Environment(loader=FileSystemLoader("."))
template = env.get_template(prompt["template"])
source = template.environment.loader.get_source(template.environment, template.name)
variables = meta.find_undeclared_variables(env.parse(source[0]))
print("variables:", variables)
print("---")
data = {
"query": "Quel est le meilleur moyen de cuire une blanquette?",
"chunks" : [
{
"url": "http://data.gouv.fr",
"h": "hash49080805",
"title": "A chunk title",
"text": "Moi j'aime la blanquette avec du beurre dedans\nEt une sauce bien épaisse.",
},
{
"url": "http://...",
"h": "hash49080806",
"title": "A chunk title",
"text": "Faites chauffer la blanquette à feu doux pendant 46 heures.",
"context": "Recette de blanquette"
},
{
"url": "http://...",
"h": "hash49080806",
"title": "A chunk title",
"text": "Les meilleures blanquettes se font avec des champignons de Paris",
"context": "Avis de grand-mère"
},
{
"url": "http://...",
"h": "hash49080806",
"title": "A chunk title",
"text": """Étape 1 Faire revenir la viande dans un peu de beurre doux jusqu'à ce que les morceaux soient un peu dorés.
Étape 2: Saupoudrer de 2 cuillères de farine. Bien remuer.
Étape 3: Ajouter 2 ou 3 verres d'eau, les cubes de bouillon, le vin et remuer. Ajouter de l'eau si nécessaire pour couvrir.
Étape 4: Couper les carottes en rondelles et émincer les oignons puis les incorporer à la viande, ainsi que les champignons.
Étape 5: Laisser mijoter à feu très doux environ 1h30 à 2h00 en remuant.
Étape 6: Si nécessaire, ajouter de l'eau de temps en temps.
Étape 7: Dans un bol, bien mélanger la crème fraîche, le jaune doeuf et le jus de citron. Ajouter ce mélange au dernier moment, bien remuer et servir tout de suite.
""",
"context": "Recette Marmiton"
},
]
}
if "system_prompt" in variables:
data["system_prompt"] = prompt["system_prompt"]
rendered_template = template.render(**data)
print(rendered_template)
print("---")

63
prompt_demo_analysis.py Normal file
View File

@@ -0,0 +1,63 @@
#A demo of a side functionality of Guillaume-Tell: guessing whether the question should open up a source retrieval pipeline.
#The function should return a structured answer in json with two components:
##A short analysis with reasoning.
##A boolean answer in French ("oui" or "non")
import sys, os
from pprint import pprint
from jinja2 import Environment, FileSystemLoader, meta
import yaml
import pandas as pd
from vllm import LLM, SamplingParams
sys.path.append(".")
os.chdir(os.path.dirname(os.path.abspath(__file__)))
#Specific function to deal with json format.
def get_llm_response(prompt_template, sampling_params):
prompts = [prompt_template]
outputs = llm.generate(prompts, sampling_params, use_tqdm = False)
generated_text = outputs[0].outputs[0].text
if generated_text[-1] != "}":
generated_text = generated_text + "}"
prompt = prompt_template + generated_text
return prompt, generated_text
if __name__ == "__main__":
with open('prompt_config.yaml') as f:
config = yaml.safe_load(f)
print("prompt format:", config.get("prompt_format"))
print(config)
print()
for prompt in config["prompts"]:
if prompt["mode"] == "analysis":
print(f'--- prompt mode: {prompt["mode"]} ---')
env = Environment(loader=FileSystemLoader("."))
template = env.get_template(prompt["template"])
source = template.environment.loader.get_source(template.environment, template.name)
variables = meta.find_undeclared_variables(env.parse(source[0]))
print("variables:", variables)
print("---")
data = {"query": "Comment obtenir le formulaire A36 ?"}
if "system_prompt" in variables:
data["system_prompt"] = prompt["system_prompt"]
rendered_template = template.render(**data)
print(rendered_template)
print("---")
llm = LLM("mistral-mfs-reference-2/mistral-mfs-reference-2")
sampling_params = SamplingParams(temperature=0.2, top_p=0.95, max_tokens=300, stop="}")
prompt, generated_text = get_llm_response(rendered_template, sampling_params)
print("Albert : ", generated_text)

104
prompt_demo_rag.py Normal file
View File

@@ -0,0 +1,104 @@
#Full demo of the Guillaume-Tell reference model with three references.
#Guillaume-Tell is currently trained by default on five references but future version will enhance the flexibility of the model.
#Example of generated text:
#Le meilleur moyen de cuire une blanquette est d'utiliser un mélange de viande et de légumes, tels que des champignons de Paris<ref text="Les meilleures blanquettes se font avec des champignons de Paris">hash49080806</ref>.
#Il est recommandé de faire chauffer la blanquette à feu doux pendant 46 heures<ref text="faîtes chauffer la blanquette à feu doux pendant 46 heures.">hash49080806</ref>.
#Enfin, pour achever la préparation, il faut ajouter une crème fraîche, un jaune dœuf et du jus de citron juste avant de servir<ref text="Dans un bol, bien mélanger la crème fraîche, le jaune doeuf et le jus de citron. Ajouter ce mélange au dernier moment, bien remuer et servir tout de suite.">hash49080806</ref>.
import sys, os
from pprint import pprint
from jinja2 import Environment, FileSystemLoader, meta
import yaml
import pandas as pd
from vllm import LLM, SamplingParams
sys.path.append(".")
os.chdir(os.path.dirname(os.path.abspath(__file__)))
def get_llm_response(prompt_template):
sampling_params = SamplingParams(temperature=.7, top_p=.95, max_tokens=2000, presence_penalty = 1.5, stop = ["``"]) #Officially recommended parameters
prompts = [prompt_template]
outputs = llm.generate(prompts, sampling_params, use_tqdm = False)
generated_text = outputs[0].outputs[0].text
prompt = prompt_template + generated_text
return prompt, generated_text
#Typical example:
llm = LLM("AgentPublic/Guillaume-Tell")
if __name__ == "__main__":
with open('prompt_config.yaml') as f:
config = yaml.safe_load(f)
print("prompt format:", config.get("prompt_format"))
print(config)
print()
for prompt in config["prompts"]:
if prompt["mode"] == "rag":
print(f'--- prompt mode: {prompt["mode"]} ---')
env = Environment(loader=FileSystemLoader("."))
template = env.get_template(prompt["template"])
source = template.environment.loader.get_source(template.environment, template.name)
variables = meta.find_undeclared_variables(env.parse(source[0]))
print("variables:", variables)
print("---")
data = {
"query": "Quel est le meilleur moyen de cuire une blanquette?",
"chunks" : [
{
"url": "http://data.gouv.fr",
"h": "hash49080805",
"title": "A chunk title",
"text": "Moi j'aime la blanquette avec du beurre dedans\nEt une sauce bien épaisse.",
},
{
"url": "http://...",
"h": "hash49080806",
"title": "A chunk title",
"text": "Faites chauffer la blanquette à feu doux pendant 46 heures.",
"context": "Recette de blanquette"
},
{
"url": "http://...",
"h": "hash49080806",
"title": "A chunk title",
"text": "Les meilleures blanquettes se font avec des champignons de Paris",
"context": "Avis de grand-mère"
},
{
"url": "http://...",
"h": "hash49080806",
"title": "A chunk title",
"text": """Étape 1 Faire revenir la viande dans un peu de beurre doux jusqu'à ce que les morceaux soient un peu dorés.
Étape 2: Saupoudrer de 2 cuillères de farine. Bien remuer.
Étape 3: Ajouter 2 ou 3 verres d'eau, les cubes de bouillon, le vin et remuer. Ajouter de l'eau si nécessaire pour couvrir.
Étape 4: Couper les carottes en rondelles et émincer les oignons puis les incorporer à la viande, ainsi que les champignons.
Étape 5: Laisser mijoter à feu très doux environ 1h30 à 2h00 en remuant.
Étape 6: Si nécessaire, ajouter de l'eau de temps en temps.
Étape 7: Dans un bol, bien mélanger la crème fraîche, le jaune doeuf et le jus de citron. Ajouter ce mélange au dernier moment, bien remuer et servir tout de suite.
""",
"context": "Recette Marmiton"
},
]
}
if "system_prompt" in variables:
data["system_prompt"] = prompt["system_prompt"]
rendered_template = template.render(**data)
print(rendered_template)
print("---")
prompt, generated_text = get_llm_response(rendered_template)
print("Albert : ", generated_text)

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c672582b6ddb27de5e82f3fec2cd8a8a343e02289597a3daebb46b949d76d0d7
size 13124572

24
special_tokens_map.json Normal file
View File

@@ -0,0 +1,24 @@
{
"bos_token": {
"content": "<s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": "<|im_end|>",
"unk_token": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

View File

@@ -0,0 +1,3 @@
Evalue si cette question nécessite des références à d'autres sources, ou s'il est possible d'y répondre directement : {{query}}
Réponds sous la forme de données json structurées comme suit : {"analysis": "…", "result": "…"}

9
tell_rag_prompt.jinja Normal file
View File

@@ -0,0 +1,9 @@
Ecris un texte référencé en réponse à cette question : {{query}}
Les références doivent être citées de cette manière : texte rédigé<ref text=\"[passage pertinent dans la référence]\">[\"identifiant de la référence\"]</ref>Si les références ne permettent pas de répondre, spécifie juste qu'il n'y a pas de réponse.
Les {{limit}} références disponibles :
{% for chunk in sheet_chunks %}
{{chunk.hash}}: {% if chunk.context %}({{chunk.context}}){% endif %} {{chunk.text}} {% if not loop.last %}{{"\n"}}{% endif %}
{% endfor %}

91145
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

BIN
tokenizer.model (Stored with Git LFS) Normal file

Binary file not shown.

61
tokenizer_config.json Normal file
View File

@@ -0,0 +1,61 @@
{
"add_bos_token": true,
"add_eos_token": false,
"added_tokens_decoder": {
"0": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"1": {
"content": "<s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"2": {
"content": "</s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"32000": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"32001": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
}
},
"additional_special_tokens": [],
"bos_token": "<s>",
"chat_template": "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
"clean_up_tokenization_spaces": false,
"eos_token": "<|im_end|>",
"legacy": true,
"model_max_length": 1000000000000000019884624838656,
"pad_token": "<|im_end|>",
"sp_model_kwargs": {},
"spaces_between_special_tokens": false,
"tokenizer_class": "LlamaTokenizer",
"trust_remote_code": false,
"unk_token": "<unk>",
"use_default_system_prompt": true,
"use_fast": true
}