155 lines
5.6 KiB
Markdown
155 lines
5.6 KiB
Markdown
---
|
|
library_name: transformers
|
|
tags:
|
|
- chocolatine
|
|
- phi4
|
|
license: mit
|
|
datasets:
|
|
- jpacifico/french-orca-dpo-pairs-revised
|
|
language:
|
|
- fr
|
|
- en
|
|
base_model:
|
|
- microsoft/phi-4
|
|
---
|
|
### Chocolatine-14B-Instruct-DPO-v1.3
|
|
|
|
DPO fine-tuning of [microsoft/Phi-4](https://huggingface.co/microsoft/Phi-4) (14B params)
|
|
using the [jpacifico/french-orca-dpo-pairs-revised](https://huggingface.co/datasets/jpacifico/french-orca-dpo-pairs-revised) rlhf dataset.
|
|
Training in French also improves the model's overall capabilities, surpassing the performances of its base model.
|
|
Window context = up to 16k tokens
|
|
|
|
### OpenLLM Leaderboard
|
|
|
|
Could this be the biggest performance boost ever seen from LLM fine-tuning ? 🤔
|
|

|
|
|
|
Chocolatine-14B-Instruct-DPO-v1.3 is the best-performing Phi-4 based model on the [OpenLLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
|
|
for only 1.70kgCo2 (versus > 3kg for other models in the same category and performance)
|
|
[Updated 2025-02-17]
|
|
|
|
| Metric |Value|
|
|
|-------------------|----:|
|
|
|**Avg.** |**42.42**|
|
|
|IFEval |70.40|
|
|
|BBH |54.85|
|
|
|MATH Lvl 5 |56.19|
|
|
|GPQA |12.19|
|
|
|MuSR |12.29|
|
|
|MMLU-PRO |48.60|
|
|
|
|
|
|
### MT-Bench-French
|
|
|
|
Chocolatine-14B-Instruct-DPO-v1.3 outperforms its previous Chocolatine versions and its base model Phi-4 on [MT-Bench-French](https://huggingface.co/datasets/bofenghuang/mt-bench-french), used with [multilingual-mt-bench](https://github.com/Peter-Devine/multilingual_mt_bench) and GPT-4-Turbo as LLM-judge.
|
|
|
|
```
|
|
########## First turn ##########
|
|
score
|
|
model turn
|
|
gpt-4o-mini 1 9.2875
|
|
Chocolatine-14B-Instruct-DPO-v1.3 1 9.0125
|
|
Chocolatine-14B-Instruct-DPO-v1.2 1 8.6125
|
|
Phi-3.5-mini-instruct 1 8.5250
|
|
Chocolatine-3B-Instruct-DPO-v1.2 1 8.3750
|
|
phi-4 1 8.3000
|
|
Phi-3-medium-4k-instruct 1 8.2250
|
|
gpt-3.5-turbo 1 8.1375
|
|
Chocolatine-3B-Instruct-DPO-Revised 1 7.9875
|
|
Daredevil-8B 1 7.8875
|
|
Meta-Llama-3.1-8B-Instruct 1 7.0500
|
|
vigostral-7b-chat 1 6.7875
|
|
Mistral-7B-Instruct-v0.3 1 6.7500
|
|
gemma-2-2b-it 1 6.4500
|
|
French-Alpaca-7B-Instruct_beta 1 5.6875
|
|
vigogne-2-7b-chat 1 5.6625
|
|
|
|
########## Second turn ##########
|
|
score
|
|
model turn
|
|
gpt-4o-mini 2 8.912500
|
|
Chocolatine-14B-Instruct-DPO-v1.3 2 8.762500
|
|
Chocolatine-14B-Instruct-DPO-v1.2 2 8.337500
|
|
phi-4 2 8.131250
|
|
Chocolatine-3B-Instruct-DPO-Revised 2 7.937500
|
|
Chocolatine-3B-Instruct-DPO-v1.2 2 7.862500
|
|
Phi-3-medium-4k-instruct 2 7.750000
|
|
gpt-3.5-turbo 2 7.679167
|
|
Phi-3.5-mini-instruct 2 7.575000
|
|
Daredevil-8B 2 7.087500
|
|
Meta-Llama-3.1-8B-Instruct 2 6.787500
|
|
Mistral-7B-Instruct-v0.3 2 6.500000
|
|
vigostral-7b-chat 2 6.162500
|
|
gemma-2-2b-it 2 6.100000
|
|
French-Alpaca-7B-Instruct_beta 2 5.487395
|
|
vigogne-2-7b-chat 2 2.775000
|
|
|
|
########## Average ##########
|
|
score
|
|
model
|
|
gpt-4o-mini 9.100000
|
|
Chocolatine-14B-Instruct-DPO-v1.3 8.825000
|
|
Chocolatine-14B-Instruct-DPO-v1.2 8.475000
|
|
phi-4 8.215625
|
|
Chocolatine-3B-Instruct-DPO-v1.2 8.118750
|
|
Phi-3.5-mini-instruct 8.050000
|
|
Phi-3-medium-4k-instruct 7.987500
|
|
Chocolatine-3B-Instruct-DPO-Revised 7.962500
|
|
gpt-3.5-turbo 7.908333
|
|
Daredevil-8B 7.487500
|
|
Meta-Llama-3.1-8B-Instruct 6.918750
|
|
Mistral-7B-Instruct-v0.3 6.625000
|
|
vigostral-7b-chat 6.475000
|
|
gemma-2-2b-it 6.275000
|
|
French-Alpaca-7B-Instruct_beta 5.587866
|
|
vigogne-2-7b-chat 4.218750
|
|
```
|
|
|
|
### Usage
|
|
|
|
You can run this model using my [Colab notebook](https://github.com/jpacifico/Chocolatine-LLM/blob/main/Chocolatine_14B_inference_test_colab.ipynb)
|
|
|
|
You can also run Chocolatine using the following code:
|
|
|
|
```python
|
|
import transformers
|
|
from transformers import AutoTokenizer
|
|
|
|
# Format prompt
|
|
message = [
|
|
{"role": "system", "content": "You are a helpful assistant chatbot."},
|
|
{"role": "user", "content": "What is a Large Language Model?"}
|
|
]
|
|
tokenizer = AutoTokenizer.from_pretrained(new_model)
|
|
prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)
|
|
|
|
# Create pipeline
|
|
pipeline = transformers.pipeline(
|
|
"text-generation",
|
|
model=new_model,
|
|
tokenizer=tokenizer
|
|
)
|
|
|
|
# Generate text
|
|
sequences = pipeline(
|
|
prompt,
|
|
do_sample=True,
|
|
temperature=0.7,
|
|
top_p=0.9,
|
|
num_return_sequences=1,
|
|
max_length=200,
|
|
)
|
|
print(sequences[0]['generated_text'])
|
|
```
|
|
|
|
### Limitations
|
|
|
|
The Chocolatine-2 model series is a quick demonstration that a base model can be easily fine-tuned to achieve compelling performance.
|
|
It does not have any moderation mechanism.
|
|
|
|
- **Developed by:** Jonathan Pacifico, 2025
|
|
- **Model type:** LLM
|
|
- **Language(s) (NLP):** French, English
|
|
- **License:** MIT
|
|
|
|
Made with ❤️ in France |